Key Concepts When Designing a Cache

9 min read

What is caching?

Caching is the process of storing copies of data or results of computations in a temporary storage area, called a cache, to reduce the time it takes to access that data or recompute the results. By keeping frequently accessed data in a cache, the system can reduce the time taken to retrieve the data, improving overall performance and efficiency.

Why caching is important

Caching is crucial in modern computing systems for several reasons:

Reduced latency: Caching reduces the time it takes to access data, as the data is fetched from a location closer to the consumer, or is in a form that is easier to access.
Decreased load on resources: By serving data from the cache, the load on the original data source is reduced, which can help improve its performance.
Lower bandwidth consumption: Caching can significantly reduce bandwidth consumption, especially when the data is fetched from remote servers.
Enhanced user experience: In the case of web applications, caching can lead to faster page loads and a better user experience.

Caching methods

Various caching methods can be employed depending on the requirements and architecture of your application. Three common caching methods include:

In-memory application cache: This method involves storing the cache within the application's memory. In-memory application caches are easy to implement and offer low latency access to cached data. However, they have some limitations, such as being local to the application instance and consuming the application's memory resources. Additionally, the cache is lost when the application is restarted or crashes.
Distributed in-memory cache: A distributed in-memory cache is a separate caching layer that runs on multiple servers, allowing for horizontal scaling and high availability. This method overcomes the limitations of in-memory application caches by providing a shared cache accessible by multiple application instances. It also allows for better control over cache size and memory usage. Distributed in-memory caches typically use a consistent hashing mechanism to distribute the data evenly among cache nodes. Popular distributed in-memory cache solutions include Redis and Memcached.
Distributed file system cache: This caching method leverages a distributed file system to store cache data. Cached data is saved as files on multiple servers, providing redundancy and fault tolerance. Distributed file system caches can handle larger amounts of data compared to in-memory caches and are better suited for scenarios where the cache data is too large to fit into memory. However, they usually have higher latency compared to in-memory caches due to the need to access the file system. Content Delivery Networks (CDNs) are an example of a distributed file system.

Caching policy

A caching policy is a set of rules that determine how a cache is managed. The policy defines when and how data is added, updated, or evicted from the cache.

The concept of cache coherence and the various strategies will determine how we add / update cache data (more below).

An eviction policy determines which data is removed from the cache when space is needed for new data. Common eviction policies include:

Least Recently Used (LRU): This policy removes the least recently accessed item from the cache. LRU works on the assumption that if an item hasn't been accessed recently, it's less likely to be accessed in the near future. This policy is effective in most scenarios where data access patterns have temporal locality.
First In, First Out (FIFO): FIFO eviction policy removes the oldest item in the cache, based on the order of insertion. This policy can be simple to implement but may not always be the most efficient, as it doesn't consider the actual usage pattern of the data.
Least Frequently Used (LFU): LFU eviction policy removes the item with the lowest access frequency from the cache. It's based on the assumption that infrequently accessed items are less likely to be accessed again. This policy can be effective in situations where data access patterns have frequency locality, but it may require more bookkeeping compared to LRU.
Random Replacement (RR): This policy selects a random item for eviction when the cache is full. It's easy to implement and has lower overhead compared to LRU and LFU. However, it doesn't take into account any access patterns and may lead to suboptimal cache performance.
Time To Live (TTL): TTL eviction policy assigns an expiration time to each item in the cache. Once the expiration time is reached, the item is removed from the cache. This policy is particularly useful for caching data that changes over time or has a known expiration, such as API tokens, session data, or data with a defined period of validity.

Cache coherence

Cache coherence refers to the consistency of data between the cache and the underlying data source. Ensuring cache coherence is essential to prevent data corruption and maintain data integrity. There are different strategies for managing cache coherence based on read and write operations:

Read strategies

Cache-Aside (lazy loading): In this approach, the application first checks the cache for the requested data. If the data is not present in the cache (cache miss), the application retrieves the data from the data source and then stores it in the cache for future requests. Cache-aside works well when the cache hit rate is high but may result in increased latency for cache misses.
Read-Through: With the read-through strategy, the cache itself is responsible for retrieving data from the underlying data source in case of a cache miss. This approach simplifies the application logic and ensures that data is always read from the cache, maintaining coherence. However, it may lead to increased latency for cache misses as the cache has to retrieve the data before returning it to the application.

Write strategies

Write-Around: In the write-around strategy, write operations are performed directly to the data source, bypassing the cache. This approach reduces cache pollution caused by infrequently accessed data but may increase read latency for newly written data, as it must be fetched from the data source on the first read.
Write-Through: With write-through caching, write operations are performed both to the cache and the data source. This ensures that the cache always contains the most up-to-date data and provides low read latency for recently written data. However, this approach may lead to increased write latency, as both the cache and the data source must be updated.
Write-Back (or Write-Behind): In the write-back strategy, write operations are performed only to the cache, which then asynchronously updates the data source. This approach provides low write latency but may result in temporary data inconsistency between the cache and the data source until the update is propagated. Additionally, in case of cache failures, there is a risk of data loss if the cache has not yet written the data back to the data source.

By understanding these different cache coherence strategies and selecting the appropriate one based on your application's read and write patterns, you can maintain data consistency and improve the overall performance of your caching system.

Cache efficiency

Cache efficiency refers to the effectiveness of a caching system in serving requests from the cache, thereby reducing the load on the primary data source. The two primary metrics for evaluating cache efficiency are the hit ratio and the miss ratio.

The hit ratio is the percentage of requests that are successfully served by the cache, while the miss ratio is the percentage of requests that need to be fetched from the primary data source. A higher hit ratio indicates better cache efficiency, as it means that the cache is effectively serving more requests without accessing the primary data source. Optimising cache efficiency can be achieved by carefully selecting the cache size, eviction policy, and caching algorithms.

How caches can fail (and mitigators)

Caches can fail or underperform due to various reasons. Here are some common issues and their mitigation strategies:

Cache pollution: This occurs when less frequently accessed or irrelevant data replaces more frequently accessed data in the cache, reducing cache efficiency. To mitigate, monitor your cache hit rate and ensure the eviction policy to best suit your application's access patterns.
Cache avalanche: A cache avalanche happens when a large number of cache items expire simultaneously, causing a sudden spike in requests to the underlying data source. To prevent cache avalanches, implement a staggered expiration strategy by adding a random time offset to each item's time-to-live (TTL).
Cache stampede (thundering herd): Cache stampedes occur when multiple requests for the same data hit the cache simultaneously, causing a high load on the data source as it tries to serve the requests. To prevent cache stampedes, use techniques such as request throttling, implementing a short negative cache for unsuccessful requests, or using a distributed lock to serialise cache filling.
- Implementing a short negative cache for unsuccessful requests is to temporarily cache the unsuccessful requests or errors (e.g. not found or timeout) for a short duration. This prevents repeated requests for the same non-existent or erroneous data from hitting the backend system and reduces the overall load.
- Using a distributed lock ensures that only one request can query the cache for a specific item, at any one time (i.e. a sole request acquires a lock and fetches the data from the backend, while other requests wait for the lock to be released).
Cache penetration: Cache penetration happens when requests bypass the cache and directly access the underlying data source, reducing the effectiveness of the cache. To mitigate cache penetration, ensure that your caching strategy accounts for all possible data access patterns and implement a consistent hashing algorithm to distribute keys evenly across cache nodes.
Data sensitivity: Be cautious when caching sensitive data, as cache storage may be less secure than the primary data source. Consider employing encryption, access control mechanisms, or avoiding caching sensitive data altogether.

Hola, I'm Julia.