Blog

You’ve probably seen latency metrics on your dashboards: P95, response time, or network delay. But what exactly is latency, and how does it affect your app's performance and user experience?

🚦 What is latency?

Latency is the time delay between a request and its corresponding response.

In simple terms: How long does it take for something to happen after I ask for it?

It’s usually measured in milliseconds (ms) and applies to things like:

⌛ HTTP request/response time
⌛ DB query execution time
⌛ Message propagation across services

🆚 Latency vs throughput

These terms are often confused, but they’re very different:

Metric	Meaning	Analogy
Latency	Time per request	How long one customer waits
Throughput	Requests per second (RPS)	How many customers per minute

You can have low latency but low throughput, or high throughput but high latency. They measure different aspects of performance.

🧪 Types of latency

Network latency: Time to send a request over the wire
Server latency: Time your server takes to process the request
Application latency: Time to fetch data, run logic, etc.
End-to-end latency: Total time from user action to response

In distributed systems, these latencies add up—across services, regions, and queues.

📏 How to measure latency

Use metrics and observability tools to track latency at different levels:

Frontend: Use Performance API or RUM tools (like Sentry, New Relic Browser)
Backend: Log start/end times or use tracing tools
APM tools: Datadog, OpenTelemetry, Grafana, etc.

Common metrics:

avg_latency: average delay (e.g. 120ms)
P95_latency: 95% of requests completed under X ms
max_latency: worst-case time
latency_by_route: latency per endpoint or service

🧠 Why latency matters

😡 User Experience: High latency feels slow. 100ms = smooth; 500ms+ = lag.
🔁 Cascading delays: In microservices, one slow service can block others.
💰 Costs: Longer latency can mean more server time, higher bills.

🚀 How to reduce latency

Here are some practical strategies:

✅ Cache aggressively (e.g. Redis, CDN)
✅ Optimize DB queries (indexes, batching)
✅ Use async and streaming for slow tasks
✅ Reduce network hops (co-locate services)
✅ Compress responses (e.g. GZIP)
✅ Profile hot paths (CPU/memory usage)

✅ Summary checklist

✅ Latency = time from request to response
✅ Measure avg, P95, max latency per endpoint
✅ Monitor frontend, backend, and network latencies
✅ Optimize with caching, DB tuning, async processing

🧠 Conclusion

Latency is one of the most important performance metrics you can track. It’s not just about speed—it’s about user trust, system reliability, and cost efficiency.

Modern observability stacks make it easier than ever to measure and reduce latency. And once you understand it, you can start building systems that feel fast, even when they’re doing a lot under the hood.