Caching Strategies
Why caching exists, where caches live in a system, and what changes once the cache is shared across many instances.
A cache is a fast, temporary store that sits in front of a slower or more expensive source of truth (a database, an external API, an expensive computation) and keeps copies of recently or frequently used data close to where it's needed. The goal is simple: serve the same answer without paying the full cost to compute it twice.
The difference is not marginal. A read served from an in-memory cache lands in the microsecond-to-low-millisecond range; the same read against a relational database under load is often tens of milliseconds, and a call to a third-party API can be hundreds. Caching trades a small amount of memory and some consistency risk for large wins in latency, throughput, and cost.
Why Cache at All
Caching shows up everywhere in scalable systems because it attacks several problems at once:
- Latency — memory is orders of magnitude faster than disk or network. The closer the copy, the faster the response.
- Throughput — a cache absorbs read traffic that would otherwise hammer the database, letting the same backend serve far more requests per second.
- Cost — fewer database queries means smaller (or fewer) database instances, and fewer billed calls to metered external APIs.
- Resilience — a cache can keep serving recent data even when the origin is slow or briefly unavailable, smoothing over spikes and partial failures.
The catch is captured by the old joke that there are only two hard things in computer science: cache invalidation and naming things. A cache is a second copy of the truth, and the moment the original changes, your copy is wrong. Most of the work in caching is not "how do I store a value" — it's "how do I keep the copy honest, and what do I do when it isn't." That tension runs through every page in this section.
Where Caches Live
Caching is not a single thing in one place; it's a series of layers, each closer to the user and each catching what the layer behind it would otherwise have to serve.
Browser / client cache — the client keeps responses locally based on HTTP headers (
Cache-Control,ETag). The cheapest possible cache, because the request never leaves the device.CDN / edge cache — geographically distributed nodes cache static assets and cacheable responses close to users. Great for images, scripts, and public, slow-changing content.
Reverse proxy cache — Nginx, Varnish, or the LB itself can cache full HTTP responses in front of your application tier.
In-process (local) cache — an in-memory map inside the application process (Caffeine, Guava, a plain dictionary). Nanosecond access, but private to one instance.
Distributed cache — a shared, networked cache such as Redis or Memcached that every application instance reads from and writes to. This is the layer this section focuses on.
Each layer should only need to handle the misses of the layer in front of it. A request that the browser, CDN, and proxy all decline still has two cheap stops — the local cache and the distributed cache — before it's allowed to touch the database.
Local vs Distributed Caching
The two application-tier caches deserve a direct comparison, because picking the wrong one is a common scaling mistake.
An in-process cache lives inside a single application instance. It's blindingly fast and dead simple, but it has two problems the moment you run more than one instance behind a load balancer:
- It doesn't scale with the fleet — each instance has its own copy, so a 10-instance fleet caches the same hot key 10 times, wasting memory and lowering the effective hit ratio.
- It drifts — when the data changes, instance A can invalidate its copy while instances B–J keep serving stale data until their own TTLs expire. Different users get different answers depending on which instance the LB happened to pick.
A distributed cache solves both: one logical store, shared by every instance, so the cache is consistent across the fleet and a single entry serves all of them.
| Aspect | In-process (local) cache | Distributed cache (Redis) |
|---|---|---|
| Access latency | Nanoseconds (no network) | Sub-millisecond (one network hop) |
| Shared across fleet | No — one copy per instance | Yes — single logical store |
| Consistency | Drifts between instances | Consistent for all instances |
| Capacity | Bounded by one instance's RAM | Scales out across many nodes |
| Survives restart | No — lost with the process | Yes — independent of app lifecycle |
| Best for | Tiny, hot, read-mostly data | Shared state, session data, hot reads |
In practice the strongest setups combine both: a small local cache (L1) for the very hottest keys to avoid even the network hop, backed by a distributed cache (L2) as the shared source. The local layer absorbs the worst hot-key traffic; the distributed layer keeps the fleet coherent. The trade-off — local copies can briefly go stale — is covered in Invalidation and Consistency.
Measuring a Cache
You can't tune what you don't measure. Three numbers tell you almost everything about a cache's health:
-
Hit ratio —
hits / (hits + misses). The single most important metric. A cache with a 30% hit ratio is barely earning its keep; a well-targeted cache is often 90%+. Example: Suppose your application receives 1,000 requests:Cache hits: 850 Cache misses: 150 Hit ratio:
850 / (850 + 150) = 0.85 = 85%
This means 85% of requests were served directly from the cache, and only 15% required fetching data from the original source.
Why it matters
A higher hit ratio usually means:
- Faster response times
- Lower database load
- Lower infrastructure costs
- Better scalability
Hit Ratio Interpretation 30% Cache is not very effective 60% Moderate 80% Good 95%+ Excellent for many workloads -
Latency (p50/p99) — the whole point is speed. Watch the tail (p99), not just the average — a cache that's fast on average but slow at the tail can still wreck your latency budget.
-
Eviction rate — how often entries are pushed out to make room. A high eviction rate means the cache is too small for its working set, and your hit ratio will suffer.
A useful rule of thumb: cache the data that is read far more than it's written and expensive to produce. A user profile read thousands of times between rare edits is an ideal candidate; a value that changes on every read is not worth caching at all.
| Data | Changes Often? | Read Often? | Good for Cache? |
|---|---|---|---|
| User profile | No | Yes | ✅ |
| Product catalog | No | Yes | ✅ |
| Exchange rates (updated hourly) | Sometimes | Yes | ✅ |
| Live auction highest bid | Yes | Yes | ❌ |
Current timestamp (Date.now()) | Every read | Yes | ❌ |
What's Next
The rest of this section moves from patterns to the concrete machinery of a distributed cache:
- Caching Patterns — cache-aside, read-through, write-through, write-behind, and refresh-ahead: who writes to the cache, when, and what each pattern costs you.
- Redis as a Cache — the data structures, TTLs, eviction policies, and persistence options that make Redis the default distributed cache.
- Distributed Caching — replication, sharding, consistent hashing, and Redis Cluster: how the cache scales past a single node.
- Invalidation and Consistency — TTL strategies, keeping the copy honest, and surviving stampedes, hot keys, and avalanches.