What is Rate Limiting?
Rate limiting is a technique used to control the amount of incoming or outgoing traffic to or from a network, application, or resource over a specific period. It's crucial for:
- Preventing Abuse: Stopping malicious actors (bots) from overwhelming a service with requests (Denial-of-Service attacks).
- Ensuring Fair Usage: Preventing individual users or services from consuming disproportionate resources.
- Managing Costs: Controlling usage of paid third-party APIs or resource-intensive operations.
- Maintaining Stability: Preventing system overload and ensuring reliability for all users.
Rate limiters typically track requests based on identifiers like user ID, IP address, API key, etc., and enforce predefined limits.
Common Rate Limiting Algorithms
1. Token Bucket
Imagine a bucket with a fixed capacity
that holds tokens. Tokens are added to the bucket at a fixed refill rate
.
- How it works:
- Tokens are added periodically (e.g., 10 tokens per second) up to the bucket's
capacity
.
- When a request arrives, it checks if at least one token is available in the bucket.
- If yes, one token is removed (consumed), and the request is allowed.
- If no, the request is rejected (or queued).
- Pros: Allows bursts of traffic up to the bucket capacity, smooths out traffic over time, relatively simple to implement.
- Cons: Requires storing state (token count, last refill time). Choosing appropriate capacity and refill rate can be tricky.
2. Leaky Bucket
This algorithm uses a queue (the bucket) with a fixed capacity
. Requests are added to the queue as they arrive. The bucket "leaks" requests out at a fixed leak rate
, processing them.
- How it works:
- Incoming requests are added to the queue if there's space (queue size <
capacity
). If the queue is full, requests are rejected.
- Requests are processed (removed from the front of the queue) at a constant
leak rate
(e.g., 5 requests per second), regardless of burstiness of arrivals.
- Pros: Provides a very smooth output rate, good for scenarios where downstream services require a steady flow.
- Cons: Bursts are delayed (queued) rather than immediately accommodated (like Token Bucket). Doesn't allow for short bursts exceeding the leak rate even if the system was previously idle. Requires storing queue state.
3. Fixed Window Counter
This is a simpler approach. Time is divided into fixed windows (e.g., 60 seconds). A counter tracks requests within the current window.
- How it works:
- A counter is maintained for the current time window (e.g., minute 0:00-0:59).
- When a request arrives, if the counter is below the
limit
, the counter is incremented, and the request is allowed.
- If the counter is at or above the
limit
, the request is rejected.
- At the start of a new window (e.g., minute 1:00), the counter resets to 0.
- Pros: Very easy to implement, memory efficient (only needs one counter and window start time).
- Cons: Can allow double the rate at window boundaries. E.g., if the limit is 100/minute, a user could send 100 requests at 0:59 and another 100 at 1:00, effectively sending 200 requests in a very short interval spanning the boundary.
4. Sliding Window Log
This algorithm addresses the boundary issue of Fixed Window by keeping a log of request timestamps within the current window duration.
- How it works:
- A log (e.g., a list or sorted set) stores the timestamps of requests received within the past
window size
(e.g., last 60 seconds).
- When a request arrives:
- Remove all timestamps from the log that are older than
now - window size
.
- Count the remaining timestamps in the log.
- If the count is less than the
limit
, add the current timestamp to the log and allow the request.
- Otherwise, reject the request.
- Pros: Accurate rate limiting over a rolling window, avoids the fixed window boundary problem.
- Cons: Can consume significant memory if the limit and window size allow for many timestamps to be stored per user/key. Potentially higher computation cost for removing old entries and counting. (Sliding Window Counter is an optimization using less memory but more complex logic, not visualized here).
Visualize and Play
Select an algorithm, configure its parameters, and send requests to see how rate limiting works.
Status
Token Bucket State
Tokens: 0 / 10
Leaky Bucket State
Queue Size: 0 / 5
Fixed Window State
0
Requests in Current Window (Limit:
5)
Time Remaining:
10.0s
Sliding Window Log State
Requests in last 10s: 0 (Limit: 5)
Log messages will appear here...