API Management - Rate limiting

This document explains how to configure rate limiting using Azure API Management (APIM) to protect your APIs from excessive usage. Given that APIs may start up other processes that incur costs per run, rate limiting helps prevent unexpected high costs by limiting excessive usage, while also providing a safeguard against abuse.

Rate limiting restricts the number of API calls a client can make within a specified period. This approach not only prevents overloading backend systems and spamming but also helps control costs associated with triggering expensive backend processes.

While the recommendations in this document are generally applicable, it focuses on how to implement rate limiting using APIM.

Implementing Rate Limiting with APIM Policies

APIM provides built-in policies to implement rate limiting, which can be applied globally, at the API level, or per a specific operation. The two most commonly used policies are rate limiting by key and quota by key.

Rate Limiting Policy Options

Azure API Management provides two main policy options for rate limiting: rate-limit-by-key and quota-by-key. Each serves a slightly different purpose, and they can be used individually or in combination. For in-depth documentation on the differences between the two, refer to the docs at https://learn.microsoft.com/en-us/azure/api-management/api-management-sample-flexible-throttling#rate-limits-and-quotas.

1. Rate Limit by Key

The rate-limit-by-key policy counts the number of requests made by a client over a defined renewal period. When the number of requests exceeds the specified limit, subsequent requests are blocked until the period resets.

Purpose: Controls the rate of API calls over a short period (e.g., per second or per minute).
Most relevant attributes:
- calls: Maximum number of allowed requests.
- renewal-period: Time window (in seconds) after which the counter resets.
- counter-key: The key used to identify the client.
- retry-after-header-name (optional, defaults to Retry-After): The HTTP response will contain a header with this name that specifies how long a client must wait before they are able to send the next request. This header is only sent when the rate limit was hit.
- remaining-calls-header-name (optional): If specified, the HTTP response will contain a header with this name that specifies the amount of calls that remain before hitting the rate limit. This header is only sent when the rate limit is not yet hit.
- total-calls-header-name(optional): If specified, the HTTP response will contain a header with this name that specifies the total amount of calls that can be made, so it equals the value specified in the calls attribute. This header is always sent.

For more in-depth documentation, refer to the documentation at https://learn.microsoft.com/en-us/azure/api-management/rate-limit-by-key-policy.

Example:

This configuration limits each client (identified by their subscription ID) to 100 calls per 60-second period, identified by their IP address.

<rate-limit-by-key calls="100" 
                   renewal-period="60"
                   counter-key="@(context.Request.IpAddress)" 
                   remaining-calls-header-name="Remaining-Calls" 
                   total-calls-header-name="Total-Calls" />

Note that in, for example, a shared office environment, many end users may share the same IP address. Take this into account when setting up API limits, or if possible choose a different key.

As per the documentation at https://learn.microsoft.com/en-us/azure/api-management/rate-limit-by-key-policy:

Due to the distributed nature of throttling architecture, rate limiting is never completely accurate. The difference between the configured and the actual number of allowed requests varies based on request volume and rate, backend latency, and other factors.

2. Quota by Key

The quota-by-key policy tracks the cumulative number of calls made by a client during a specified period. Once the quota is reached, further requests are blocked until the period expires.

Purpose: Limits the total number of API calls over a longer period (e.g., per day, week, or month).
Most relevant attributes:
- calls: Maximum number of allowed requests for the period.
- renewal-period: Duration of the period (in seconds) during which the quota applies.
- counter-key: The key used to identify the client.
- increment-condition (optional): A condition to control when the quota counter should increment, allowing more fine-tuned control. The rationale is that you may have a usage-based billed API that should only increase quota usage if the HTTP response was a HTTP 200 OK success status.
- increment-count (optional): The amount to increase the quota usage by. The rationale is that you may have a usage-based billed API where one API request may incur more than the default of 1 point towards API limits. For example a bulk request that performs 100 paid calculations at once may need to increase the count by 100 in stead of by 1.

For more in-depth documentation, refer to the documentation at https://learn.microsoft.com/en-us/azure/api-management/quota-by-key-policy

Example:

This configuration enforces a quota of 10.000 calls per day (7*86.400 seconds) for each client, identified by their IP address

<quota-by-key calls="10000" 
                  renewal-period="604800"
                  counter-key="@(context.Request.IpAddress)" />

Note that in, for example, a shared office environment, many end users may share the same IP address. Take this into account when setting up API limits, or if possible choose a different key.

Combining Rate Limit and Quota Policies

Rate limiting and quota limiting serve slightly different purposes that you need to take into account when setting a policy. You can apply both policies to the same API or operation for a layered approach:

Rate Limit:
Protects against burst traffic by limiting requests per minute. Rate limits are used to protect against short and intense volume bursts. For example, if you know your backend service has a bottleneck at its database with a high call volume, you could set a rate-limit-by-key policy to not allow high call volume by using this setting.
Quota:
Controls overall usage over a longer period, which is especially useful for managing costs associated with triggering cost-incurring backend processes. Quotas are used for controlling call rates over a longer period of time. For example, they can set the total number of calls that a particular subscriber can make within a given month.

Within Azure API Management, rate limits are typically propagated faster across the nodes to protect against spikes. In contrast, usage quota information is used over a longer term and hence its implementation is different.

Trusted Client exceptions

For scenarios where certain clients, such as trusted server-to-server calls, require to be excluded from these limits, you can add conditional logic to bypass rate limiting. For example:

xml

<choose>
  <!-- Bypass for trusted clients -->
  <when condition="SOME_CONDITION">
    <!-- No rate limiting for trusted clients -->
  </when>
  <otherwise>
    <rate-limit-by-key calls="100" 
                       renewal-period="60"
                       counter-key="@(context.Request.IpAddress)" 
                       remaining-calls-header-name="Remaining-Calls" 
                       total-calls-header-name="Total-Calls" />
    <quota-by-key calls="10000" 
                  renewal-period="604800"
                  counter-key="@(context.Request.IpAddress)" />
  </otherwise>
</choose>

Policy Error Handling

Rate limit by Key

When a request exceeds limits, APIM returns a 429 (Too Many Requests) error. This section describes how this works and how to interpret and react to the response.

The response body looks as follows:

json

{ 
  "statusCode": 429, 
  "message": "Rate limit is exceeded. Try again in 299 seconds." 
}

Response Headers:

If using the provided example policy fragment, the following headers may be present in your response.

Retry-After:
Indicates the number of seconds the client should still wait before sending a new request. To avoid sending repeated requests that will be blocked and to prevent request from failing altogether, application logic should implement a strategy that properly reads the value of this header and waits at least for the given amount of time before retrying to send the previously failed request.
Total-Calls:
The maximum number of allowed requests in the time window.
Remaining-Calls:
The number of requests remaining in the current time window.

Rate limit by Quota

When a request exceeds limits, APIM returns a 403 (Forbidden) error. This section describes how this works and how to interpret and react to the response.

The response body looks as follows:

json

{
    "statusCode": 403,
    "message": "Out of call volume quota. Quota will be replenished in 10:36:46."
}

Response Headers:

If using the provided example policy fragment, the following headers may be present in your response.

Retry-After:
Indicates the number of seconds the client should still wait before sending a new request. To avoid sending repeated requests that will be blocked and to prevent request from failing altogether, application logic should implement a strategy that properly reads the value of this header and waits at least for the given amount of time before retrying to send the previously failed request.

Pros and Cons of Using APIM for Rate Limiting

Pros

Centralized management:
Apply consistent rate limiting policies across all APIs, simplifying management.
Ease of configuration:
Use policies in APIM without modifying backend code.
Flexibility:
Fine-tune policies for different APIs or individual API operations, and include exceptions for trusted clients.
Applied before hitting your API backend:
A major advantage for configuring rate limiting at the APIM gateway level, is that the limits are enforced already before requests are sent to the backend API at all. This greatly reduces the risk of your backend to experience downtime when it is being spammed.

Cons

Granularity:
While APIM provides robust options, very granular, code dependent or other forms of dynamic rate limiting may sometimes require additional custom logic at the backend.
Dependency:
Relying solely on APIM for rate limiting requires thorough monitoring to ensure that limits are appropriately set and adjusted in response to evolving usage patterns. This con is applicable more to using rate limiting in general, rather than specifically applicable to APIM.

Best Practices

Define appropriate limits:
Set limits that protect your backend and control costs while maintaining a positive user experience. As application characteristics are unique, there is no “silver bullet” configuration that will always work.
Monitor and adjust:
Regularly review API usage metrics and adjust rate limits as necessary. Also make sure to perform monitoring on HTTP 429 or collect user feedback to detect if rate limited responses are applied in undesirable ways.
Implement Trusted Client exceptions:
Consider configuring bypass rules for rate limiting for trusted clients, such as server-to-server calls, ensuring that critical operations are never hindered. If doing so, keep it mind that you need to take additional care to make sure these bypassing clients are not going to be (accidentally) abused as a workaround to still overload the API, potentially incurring high costs.
Document limits:
Make sure that rate limiting rules are communicated to clients that implement the API, so proper handling of rate limits can be implemented.
Implement in clients:
Make sure any clients properly implement rate limit handling, such as providing user feedback or in server-to-server scenarios a proper wait and retry approach. Note that the client needs access to read these HTTP Response headers to be able to react to them, make sure to consider this when creating a API Management - CORS policy.

Conclusion

Implementing rate limiting through Azure API Management provides a centralized and effective way to control API usage, protect backend systems, and manage costs associated with resource-intensive operations.

The approach also allows for trusted client exceptions, ensuring that essential server-to-server communications are not impeded. By carefully setting, monitoring, and adjusting your rate limits - and by allowing trusted client bypasses where necessary - you can achieve a balance between performance, cost control, and overall API security.

API Management - Rate limiting ​

Implementing Rate Limiting with APIM Policies ​

Rate Limiting Policy Options ​

1. Rate Limit by Key ​

2. Quota by Key ​

Combining Rate Limit and Quota Policies ​

Trusted Client exceptions ​

Policy Error Handling ​

Rate limit by Key ​

Response Headers: ​

Rate limit by Quota ​

Response Headers: ​

Pros and Cons of Using APIM for Rate Limiting ​

Pros ​

Cons ​

Best Practices ​

Conclusion ​

API Management - Rate limiting

Implementing Rate Limiting with APIM Policies

Rate Limiting Policy Options

1. Rate Limit by Key

2. Quota by Key

Combining Rate Limit and Quota Policies

Trusted Client exceptions

Policy Error Handling

Rate limit by Key

Response Headers:

Rate limit by Quota

Response Headers:

Pros and Cons of Using APIM for Rate Limiting

Pros

Cons

Best Practices

Conclusion