You’ve probably seen latency metrics on your dashboards: P95, response time, or network delay. But what exactly is latency, and how does it affect your app's performance and user experience?
🚦 What is latency?
Latency is the time delay between a request and its corresponding response.
In simple terms: How long does it take for something to happen after I ask for it?
It’s usually measured in milliseconds (ms) and applies to things like:
- ⌛ HTTP request/response time
- ⌛ DB query execution time
- ⌛ Message propagation across services
🆚 Latency vs throughput
These terms are often confused, but they’re very different:
Metric | Meaning | Analogy |
Latency | Time per request | How long one customer waits |
Throughput | Requests per second (RPS) | How many customers per minute |
You can have low latency but low throughput, or high throughput but high latency. They measure different aspects of performance.
🧪 Types of latency
- Network latency: Time to send a request over the wire
- Server latency: Time your server takes to process the request
- Application latency: Time to fetch data, run logic, etc.
- End-to-end latency: Total time from user action to response
In distributed systems, these latencies add up—across services, regions, and queues.
📏 How to measure latency
Use metrics and observability tools to track latency at different levels:
- Frontend: Use
Performance API
or RUM tools (like Sentry, New Relic Browser) - Backend: Log start/end times or use tracing tools
- APM tools: Datadog, OpenTelemetry, Grafana, etc.
Common metrics:
avg_latency
: average delay (e.g. 120ms)P95_latency
: 95% of requests completed under X msmax_latency
: worst-case timelatency_by_route
: latency per endpoint or service
🧠 Why latency matters
- 😡 User Experience: High latency feels slow. 100ms = smooth; 500ms+ = lag.
- 🔁 Cascading delays: In microservices, one slow service can block others.
- 💰 Costs: Longer latency can mean more server time, higher bills.
🚀 How to reduce latency
Here are some practical strategies:
- ✅ Cache aggressively (e.g. Redis, CDN)
- ✅ Optimize DB queries (indexes, batching)
- ✅ Use async and streaming for slow tasks
- ✅ Reduce network hops (co-locate services)
- ✅ Compress responses (e.g. GZIP)
- ✅ Profile hot paths (CPU/memory usage)
✅ Summary checklist
- ✅ Latency = time from request to response
- ✅ Measure avg, P95, max latency per endpoint
- ✅ Monitor frontend, backend, and network latencies
- ✅ Optimize with caching, DB tuning, async processing
🧠 Conclusion
Latency is one of the most important performance metrics you can track. It’s not just about speed—it’s about user trust, system reliability, and cost efficiency.
Modern observability stacks make it easier than ever to measure and reduce latency. And once you understand it, you can start building systems that feel fast, even when they’re doing a lot under the hood.