Link Search Menu Expand Document

API Rate Limiter Design

Allow a client to send a fixed number of requests in a given time window. If the number of requests in the time window exceeds, then deny the current request.

Requirements

Non Functional Requirements

  • Highly Available.
  • The system should take minimum time to process and should not increase the wait or loading time for the user.

Functional Requirements

  • Should be configurable for plug and play for any client to use.
  • Should be configurable to accept the number of requests and the time window for the same.

Algorithms

  • Fixed Window - 0-100ms, 100-200ms, etc.
  • Sliding Window - Checks the window from the current time and checks if the request can be processed.

Low-Level Design - In Memory Solution

A basic interface RateLimiter would be created with a function that will accept clientId

public interface RateLimiter {  
  
    boolean canAllow(String clientId);  
}

Now this interface can be extended to provide the capabilities required. It can use Fixed Window Algorithm or Sliding Window Algorithm to evict previous entries.

A concrete implementation for the same with Sliding Window would look like this.

public class BasicSlidingWindowRateLimiter implements RateLimiter {  
 private ConcurrentHashMap<String, Queue<Long>> map;  
 private final int TIME_WINDOW = 60000; // 1 minute  
 private final int REQUEST_LIMIT = 5;  
  
 public BasicSlidingWindowRateLimiter() {  
   this.map = new ConcurrentHashMap<>();  
 }  
  
 @Override  
 public boolean canAllow(String clientId) {  
   long currentTimeInMillis = System.currentTimeMillis();  
   Queue<Long> queue = map.getOrDefault(clientId, new ConcurrentLinkedDeque<>());  
   if (queue.isEmpty()) {  
     queue.add(currentTimeInMillis);  
     map.put(clientId, queue);  
     map.put(clientId, queue);  
     return true;
   }  
  
   while (!queue.isEmpty() && queue.peek() < currentTimeInMillis - TIME_WINDOW) {  
     queue.poll();  
  }  
  
  if (queue.size() > REQUEST_LIMIT) {  
    return false;  
  } else {  
    queue.add(currentTimeInMillis);  
    map.put(clientId, queue);  
    return true;  
  }  
 }
}

Memory Estimation

Queue Memory Approximate Estimation -

Variable Space Assumed
size 4 bytes
Node first 4 bytes
Node last 4 bytes
Total 12 bytes

HashMap Memory Approximate Estimation for 1 ClientId -

Variable Space Assumed
key (String Reference 4 bytes
Queue Reference - Assuming 10 Entries at peak for 1 Bucket 4 bytes
hashCode for Map 4 bytes
next - Map Property 4 bytes
hashCode for Map 4 bytes
loadFactor, initalCapcity and size (Map Properties) 8 bytes
header (Map Property) 8 bytes
Total 32 bytes

Assuming the scale for about 100K active clients - We’ll end up with - (Key Reference 4 bytes + Queue Reference 4 bytes + Header for each record 8 bytes + Queue Size of 10 for each Record of approximately 8 bytes) * 100K = 96,00,000 bytes = 9.6MB

Scaling

The above in-memory solution looks great on paper but when it comes to real-world scenarios for example - 100k Requests per Minute with varing traffic and request conditions, it can go out of memory as well.

One of the solves for that would be to keep the HashMap<String, Queue<Long>> structure in a Redis.

Caching and Sharding

The above solution can also be cached and shard based on clientId. Consistent Hashing can be used here. Cache Eviction Policy can be LRU.