API Rate Limiter Design

Allow a client to send a fixed number of requests in a given time window. If the number of requests in the time window exceeds, then deny the current request.

Requirements

Non Functional Requirements

Highly Available.
The system should take minimum time to process and should not increase the wait or loading time for the user.

Functional Requirements

Should be configurable for plug and play for any client to use.
Should be configurable to accept the number of requests and the time window for the same.

Algorithms

Fixed Window - 0-100ms, 100-200ms, etc.
Sliding Window - Checks the window from the current time and checks if the request can be processed.

Low-Level Design - In Memory Solution

A basic interface RateLimiter would be created with a function that will accept clientId

public interface RateLimiter {  
  
    boolean canAllow(String clientId);  
}

Now this interface can be extended to provide the capabilities required. It can use Fixed Window Algorithm or Sliding Window Algorithm to evict previous entries.

A concrete implementation for the same with Sliding Window would look like this.

public class BasicSlidingWindowRateLimiter implements RateLimiter {  
 private ConcurrentHashMap<String, Queue<Long>> map;  
 private final int TIME_WINDOW = 60000; // 1 minute  
 private final int REQUEST_LIMIT = 5;  
  
 public BasicSlidingWindowRateLimiter() {  
   this.map = new ConcurrentHashMap<>();  
 }  
  
 @Override  
 public boolean canAllow(String clientId) {  
   long currentTimeInMillis = System.currentTimeMillis();  
   Queue<Long> queue = map.getOrDefault(clientId, new ConcurrentLinkedDeque<>());  
   if (queue.isEmpty()) {  
     queue.add(currentTimeInMillis);  
     map.put(clientId, queue);  
     map.put(clientId, queue);  
     return true;
   }  
  
   while (!queue.isEmpty() && queue.peek() < currentTimeInMillis - TIME_WINDOW) {  
     queue.poll();  
  }  
  
  if (queue.size() > REQUEST_LIMIT) {  
    return false;  
  } else {  
    queue.add(currentTimeInMillis);  
    map.put(clientId, queue);  
    return true;  
  }  
 }
}

Memory Estimation

Queue Memory Approximate Estimation -

Variable	Space Assumed
size	4 bytes
Node first	4 bytes
Node last	4 bytes
Total	12 bytes

HashMap Memory Approximate Estimation for 1 ClientId -

Variable	Space Assumed
key (String Reference	4 bytes
Queue Reference - Assuming 10 Entries at peak for 1 Bucket	4 bytes
hashCode for Map	4 bytes
next - Map Property	4 bytes
hashCode for Map	4 bytes
loadFactor, initalCapcity and size (Map Properties)	8 bytes
header (Map Property)	8 bytes
Total	32 bytes

Assuming the scale for about 100K active clients - We’ll end up with - (Key Reference 4 bytes + Queue Reference 4 bytes + Header for each record 8 bytes + Queue Size of 10 for each Record of approximately 8 bytes) * 100K = 96,00,000 bytes = 9.6MB

Scaling

The above in-memory solution looks great on paper but when it comes to real-world scenarios for example - 100k Requests per Minute with varing traffic and request conditions, it can go out of memory as well.

One of the solves for that would be to keep the HashMap<String, Queue<Long>> structure in a Redis.

Caching and Sharding

The above solution can also be cached and shard based on clientId. Consistent Hashing can be used here. Cache Eviction Policy can be LRU.