Implementing AI Gateway capabilities in API Management
Today, we are going to delve into some of the Generative AI Gateway capabilities (commonly referred to as AI Gateway) that are available in the API Management service.
These capabilities are designed to help secure and monitor your OpenAI endpoints in your applications as you head toward production.
๐ Overviewโ
Common scenarios that the AI Gateway capabilities can help with include:
- How can we track token usage across multiple applications? How can we do cross charges for multiple applications/teams that use Azure OpenAI models?
- How can we make sure that a single app does not consume the whole TPM quota, leaving other apps with no option to use Azure OpenAI models?
- How can we make sure that the API key is securely distributed across multiple applications?
- How can we distribute load across multiple Azure OpenAI endpoints? How can we make sure that PTUs are used first before falling back to Pay-as-you-go instances?
In this post, we will explore how to use the AI Gateway capabilities in API Management to address these scenarios.
The first step is to import the Azure OpenAI service and definition into API Management. This will allow you to create a new API in API Management that can act as a gateway to the Azure OpenAI service.
You can import an Azure OpenAI API directly from the Azure OpenAI Service to API Management. When you import the API, API Management automatically configures:
- Operations for each of the Azure OpenAI REST API endpoints.
- A system-assigned identity with the necessary permissions to access the Azure OpenAI resource.
- A backend resource and set-backend-service policy that directs API requests to the Azure OpenAI Service endpoint.
- An authentication-managed-identity policy that can authenticate to the Azure OpenAI resource using the instance's system-assigned identity.
- (optionally) Policies help you monitor and manage token consumption using the Azure OpenAI API.
The general availability of this functionality went into May 2024. GA Import Azure OpenAI endpoints as an API in Azure API Management.
So, let us take a look at how to import the Azure OpenAI service and definition into API Management.
๐ฌ Request Forwardingโ
Let us take a look at forwarding the request.
For this article, I have the following resources pre-deployed already:
Name | Type | Region |
---|---|---|
openai-ca-res-eastus | Azure OpenAI | East US |
openai-ca-res-uksouth | Azure OpenAI | UK South |
apim-lmv01-dev-eastus-aka | API Management service | East US |
We will start by adding openai-ca-res-eastus to the Azure API Management service first before looking at adding the other resources.
This will automatically, enable the System Managed Identity of API Management, to access the Azure OpenAI service with the role of Cognitive Services OpenAI User, and create a new Backend instance, and operations.
We can see the police changes applied, pointing to the backend and system-managed identity.
There is Azure Portal integration to enable a few API policies that are out of the box to help you manage the token consumption of the Azure OpenAI API.
These are:
Manage token consumption (Use the Azure OpenAI token limit policy to protect the API from overuse and to control Azure OpenAI cost. If selected, API Management will add the policy with the configured TPM value. You can add, edit, or remove this policy after the API is created.) Track token usage (Use the Azure OpenAI emit token policy to log the consumed total, completion, and prompt tokens. If selected, API Management will add the policy with the specified configuration. You can add, edit, or remove this policy after the API is created)
Now that we have a frontend for our Azure OpenAI service, it's time to test the Azure OpenAI endpoint through Azure API Management.
To do this, we will need a:
- deployment-id _(this is the name of the Model deployment, in Azure OpenAI)
- api-version (this is the version of the Azure OpenAI API, that we want to use)
We can get this information from the Azure AI Studio, in the Azure Portal, in the Deployment blade.
Now, we can test the Azure OpenAI endpoint, through Azure API Management. In Azure API Management, navigate to the openai API that you created, and select the Creates a completion for the chat message operation and the Test tab.
Add in your deployment ID and API version, and in the request body, enter a sample prompt like the one below:
{
"messages": [
{"role": "system", "content": "You are a knowledgeable assistant in an ice cream parlor."},
{"role": "user", "content": "What are the ingredients typically used to make a classic vanilla ice cream?"}
],
"max_tokens": 100
}
This confirms that we can now communicate with the Azure OpenAI service, through Azure API Management.
So let us test Request forwarding, from a client application, such as Postman.
We will need to get the subscription key from the Azure API Management instance and add this to the request header as 'API-key' after first associating the open API with a Product and the subscription key is generated.
Here is a sample PowerShell script for invoking the Azure OpenAI endpoint, through Azure API Management, as well.
$headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
$headers.Add("api-key", "YOURSUBSCRIPTIONKEY")
$headers.Add("Content-Type", "application/json")
$body = @"
{
`"messages`": [
{
`"role`": `"system`",
`"content`": `"You are a knowledgeable assistant in an ice cream parlor.`"
},
{
`"role`": `"user`",
`"content`": `"What are the ingredients typically used to make a classic vanilla ice cream?`"
}
],
`"max_tokens`": 100
}
"@
$response = Invoke-RestMethod 'https://apim-lmv01-dev-eastus-aka.azure-api.net/openai/deployments/gpt-4o/chat/completions?api-version=2024-02-15-preview' -Method 'POST' -Headers $headers -Body $body
$response | ConvertTo-Json
We have successfully implemented Request forwarding, through API Management to the Azure OpenAI endpoint.
๐ Backend circuit breakingโ
Now, let us take a look at Backend circuit breaking. The Circuit Breaker pattern, follows the same principles as the electrical circuit breaker. It is used to detect failures and encapsulate the logic of preventing a failure from constantly recurring, during maintenance, temporary overloads, or unexpected spikes in traffic. When the circuit breaker trips, it can return an error immediately, without trying to execute the operation.
The Circuit Breaker pattern is mentioned in my Cloud Design Patterns blog post.
A circuit breaker acts as a proxy for operations that might fail. The proxy should monitor the number of recent failures that have occurred, and use this information to decide whether to allow the operation to proceed, or simply return an exception immediately, when the circuit is Closed the requests are allowed to pass through, when the circuit is Open, the requests are blocked.
To implement a circuit breaker in API Management, we will need to add a policy to the Azure OpenAI API that monitors the number of recent failures and uses this information to decide whether to allow the operation to proceed or simply return an exception immediately.
Unlike Request forwarding, configuring the Circuit Breaker cannot currently be done in the Azure Portal and will require configuration through Infrastructure as Code (i.e., Bicep)_ or the API Management REST API.
Before you configure a circuitbreaker policy, make sure you have a backup of your Backend configuration and test this in a non-production environment.
Today, we will configure it using Bicep, by referencing the backend of an already existing API Management resource, and adding in the circuitBreaker rule.
param backend string
param existingUrl string
// Use the parameters to update properties
resource updatedBackend 'Microsoft.ApiManagement/service/backends@2023-09-01-preview' = {
name: backend
properties: {
url: existingUrl // Use the parameter to keep the existing URL
protocol: 'http'
circuitBreaker: {
rules: [
{
failureCondition: {
count: 3 // Number of failures before tripping the circuit breaker
errorReasons: [
'Server errors' // Reasons for the failure
]
interval: 'PT1H' // Time interval to count the failures
statusCodeRanges: [
{
min: 500 // Minimum status code to consider as a failure
max: 599 // Maximum status code to consider as a failure
}
]
}
name: 'myBreakerRule' // Name of the circuit breaker rule
tripDuration: 'PT1H' // Duration for which the circuit breaker remains tripped
acceptRetryAfter: true // Whether to accept retry after the trip duration
}
]
}
}
}
In the Circuit Breaker rule above, the following behavior will occur:
Parameter | Description |
---|---|
Number of Failures | The circuit breaker will trip if there are 3 or more failures within the specified time interval. |
Error Reasons | Failures considered are those matching the reason 'Server errors'. The application or service should categorize errors with this reason. |
Status Code Ranges | Failures are identified based on HTTP status codes ranging from 500 to 599, covering server-side errors such as internal server errors (500), service unavailable (503), and similar issues. |
Time Interval | Failures are counted within a time interval of 1 hour ('PT1H'). If there are 3 or more failures with status codes in the 500โ599 range within any 1-hour window, the circuit breaker will trip. |
Trip Duration | Once tripped, the circuit breaker remains in the tripped state for 1 hour ('PT1H'). During this period, requests will not pass through until the breaker is reset. |
Retry Behavior | After the trip duration of 1 hour, the circuit breaker is designed to allow retries (acceptRetryAfter: true). Requests will be accepted again after the trip duration, and the service will attempt to recover. |
Scenarios Where the Circuit Breaker Will Trip | If, within any 1-hour period, there are 3 or more server-side errors (HTTP status codes 500โ599) with the reason 'Server errors', the circuit breaker will trip. Once tripped, the circuit breaker will block requests for 1 hour. After the 1-hour block period, the circuit breaker will reset and start accepting requests again. |
There is nothing in the Azure Portal for API management, but if we take a look at the backend provider resource, we can confirm that the configuration for the backend has been set.