πŸ§ͺ

πŸ§ͺ

APIM ❀️ AI - This repo contains experiments on Azure API Management's AI capabilities, integrating with Azure OpenAI, AI Foundry, and much more πŸš€

Azure-Samples

Research & Data
Visit Server

README

<!-- markdownlint-disable MD033 -->

πŸ§ͺ AI Gateway Labs with Azure API Management

Open Source Love

What's new ✨

βž• Model Context Protocol (MCP) βš™οΈ experiments with the client authorization flow
βž• the FinOps Framework lab to manage AI budgets effectively πŸ’°
βž• Agentic ✨ experiments with Model Context Protocol (MCP).
βž• Agentic ✨ experiments with OpenAI Agents SDK.
βž• Agentic ✨ experiments with AI Agent Service from Azure AI Foundry.
βž• the AI Foundry Deepseek lab with Deepseek R1 model from Azure AI Foundry.
βž• the Zero-to-Production lab with an iterative policy exploration to fine-tune the optimal production configuration.
βž• the Terraform flavor of backend pool load balancing lab.
βž• the AI Foundry SDK lab.
βž• the Content filtering and Prompt shielding labs.
βž• the Model routing lab with OpenAI model based routing.
βž• the Prompt flow lab to try the Azure AI Studio Prompt Flow with Azure API Management.
βž• priority and weight parameters to the Backend pool load balancing lab.
βž• the Streaming tool to test OpenAI streaming with Azure API Management.
βž• the Tracing tool to debug and troubleshoot OpenAI APIs using Azure API Management tracing capability.
βž• image processing to the GPT-4o inferencing lab.
βž• the Function calling lab with a sample API on Azure Functions.

Contents

  1. 🧠 GenAI Gateway
  2. πŸ§ͺ Labs with AI Agents
  3. πŸ§ͺ Labs with the Inference API
  4. πŸ§ͺ Labs based on Azure OpenAI
  5. πŸš€ Getting started
  6. β›΅ Roll-out to production
  7. πŸ”¨ Supporting tools
  8. πŸ›οΈ Well-Architected Framework <!-- markdownlint-disable-line MD051 -->
  9. πŸŽ’ Show and tell
  10. πŸ₯‡ Other Resources

The rapid pace of AI advances demands experimentation-driven approaches for organizations to remain at the forefront of the industry. With AI steadily becoming a game-changer for an array of sectors, maintaining a fast-paced innovation trajectory is crucial for businesses aiming to leverage its full potential.

AI services are predominantly accessed via APIs, underscoring the essential need for a robust and efficient API management strategy. This strategy is instrumental for maintaining control and governance over the consumption of AI services.

With the expanding horizons of AI services and their seamless integration with APIs, there is a considerable demand for a comprehensive AI Gateway pattern, which broadens the core principles of API management. Aiming to accelerate the experimentation of advanced use cases and pave the road for further innovation in this rapidly evolving field. The well-architected principles of the AI Gateway provides a framework for the confident deployment of Intelligent Apps into production.

🧠 GenAI Gateway

AI-Gateway flow

This repo explores the AI Gateway pattern through a series of experimental labs. The GenAI Gateway capabilities of Azure API Management plays a crucial role within these labs, handling AI services APIs, with security, reliability, performance, overall operational efficiency and cost controls. The primary focus is on Azure OpenAI, which sets the standard reference for Large Language Models (LLM). However, the same principles and design patterns could potentially be applied to any LLM.

Acknowledging the rising dominance of Python, particularly in the realm of AI, along with the powerful experimental capabilities of Jupyter notebooks, the following labs are structured around Jupyter notebooks, with step-by-step instructions with Python scripts, Bicep files and Azure API Management policies:

πŸ§ͺ Labs with AI Agents

<!-- MCP Client Authorization -->

πŸ§ͺ MCP Client Authorization

Playground to experiment the Model Context Protocol with the client authorization flow. In this flow, Azure API Management act both as an OAuth client connecting to the Microsoft Entra ID authorization server and as an OAuth authorization server for the MCP client (MCP inspector in this lab).

<img src="images/mcp-client-authorization-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

<!-- Model Context Protocol (MCP) -->

πŸ§ͺ Model Context Protocol (MCP)

Playground to experiment the Model Context Protocol with Azure API Management to enable plug & play of tools to LLMs. Leverages the credential manager for managing OAuth 2.0 tokens to backend tools and client token validation to ensure end-to-end authentication and authorization.

<img src="images/model-context-protocol-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

<!-- OpenAI Agents -->

πŸ§ͺ OpenAI Agents

Playground to try the OpenAI Agents with Azure OpenAI models and API based tools controlled by Azure API Management.

<img src="images/openai-agents-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

<!-- AI Agent Service -->

πŸ§ͺ AI Agent Service

Use this playground to explore the Azure AI Agent Service, leveraging Azure API Management to control multiple services, including Azure OpenAI models, Logic Apps Workflows, and OpenAPI-based APIs.

<img src="images/ai-agent-service-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

<!-- Function calling -->

πŸ§ͺ Function calling

Playground to try the OpenAI function calling feature with an Azure Functions API that is also managed by Azure API Management.

<img src="images/function-calling-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

πŸ§ͺ Labs with the Inference API

<!-- AI Foundry Deepseek -->

πŸ§ͺ AI Foundry Deepseek

Playground to try the Deepseek R1 model via the AI Model Inference from Azure AI Foundry. This lab uses the Azure AI Model Inference API and two APIM LLM policies: llm-token-limit and llm-emit-token-metric.

<img src="images/ai-foundry-deepseek-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

<!-- SLM self-hosting -->

πŸ§ͺ SLM self-hosting (phy-3)

Playground to try the self-hosted phy-3 Small Language Model (SLM) through the Azure API Management self-hosted gateway with OpenAI API compatibility.

<img src="images/slm-self-hosting-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

πŸ§ͺ Labs based on Azure OpenAI

<!--FinOps framework -->

πŸ§ͺ FinOps Framework

This playground leverages the FinOps Framework and Azure API Management to control AI costs. It uses the token limit policy for each product and integrates Azure Monitor alerts with Logic Apps to automatically disable APIM subscriptions that exceed cost quotas.

<img src="images/finops-framework-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

<!-- Backend pool load balancing -->

πŸ§ͺ Backend pool load balancing - Available with Bicep and Terraform

Playground to try the built-in load balancing backend pool functionality of Azure API Management to either a list of Azure OpenAI endpoints or mock servers.

<img src="images/backend-pool-load-balancing-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

<!-- Token rate limiting -->

πŸ§ͺ Token rate limiting

Playground to try the token rate limiting policy to one or more Azure OpenAI endpoints. When the token usage is exceeded, the caller receives a 429.

<img src="images/token-rate-limiting-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

<!-- Token metrics emitting -->

πŸ§ͺ Token metrics emitting

Playground to try the emit token metric policy. The policy sends metrics to Application Insights about consumption of large language model tokens through Azure OpenAI Service APIs.

<img src="images/token-metrics-emitting-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

<!-- Semantic caching -->

πŸ§ͺ Semantic caching

Playground to try the semantic caching policy. Uses vector proximity of the prompt to previous requests and a specified similarity score threshold.

<img src="images/semantic-caching-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

<!-- Access controlling -->

πŸ§ͺ Access controlling

Playground to try the OAuth 2.0 authorization feature using identity provider to enable more fine-grained access to OpenAPI APIs by particular users or client.

<img src="images/access-controlling-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

<!-- zero-to-production -->

πŸ§ͺ Zero-to-Production

Playground to create a combination of several policies in an iterative approach. We start with load balancing, then progressively add token emitting, rate limiting, and, eventually, semantic caching. Each of these sets of policies is derived from other labs in this repo.

<img src="images/zero-to-production-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

<!-- GPT-4o inferencing -->

πŸ§ͺ GPT-4o inferencing

Playground to try the new GPT-4o model. GPT-4o ("o" for "omni") is designed to handle a combination of text, audio, and video inputs, and can generate outputs in text, audio, and image formats.

<img src="images/GPT-4o-inferencing-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

<!-- Model Routing -->

πŸ§ͺ Model Routing

Playground to try routing to a backend based on Azure OpenAI model and version.

<img src="images/model-routing-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

<!-- Vector searching -->

πŸ§ͺ Vector searching

Playground to try the Retrieval Augmented Generation (RAG) pattern with Azure AI Search, Azure OpenAI embeddings and Azure OpenAI completions.

<img src="images/vector-searching-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

<!-- Built-in logging -->

πŸ§ͺ Built-in logging

Playground to try the buil-in logging capabilities of Azure API Management. Logs requests into App Insights to track details and token usage.

<img src="images/built-in-logging-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

<!-- Message storing -->

πŸ§ͺ Message storing

Playground to test storing message details into Cosmos DB through the Log to event hub policy. With the policy we can control which data will be stored in the DB (prompt, completion, model, region, tokens etc.).

<img src="images/message-storing-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

<!-- Prompt flow -->

πŸ§ͺ Prompt flow

Playground to try the Azure AI Studio Prompt Flow with Azure API Management.

<img src="images/prompt-flow-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

<!-- Content Filtering -->

πŸ§ͺ Content Filtering

Playground to try integrating Azure API Management with Azure AI Content Safety to filter potentially offensive, risky, or undesirable content.

<img src="images/content-filtering-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

<!-- Prompt Shielding -->

πŸ§ͺ Prompt Shielding

Playground to try Prompt Shields from Azure AI Content Safety service that analyzes LLM inputs and detects User Prompt attacks and Document attacks, which are two common types of adversarial inputs.

<img src="images/content-filtering-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

Backlog of Labs

This is a list of potential future labs to be developed.

  • Real Time API
  • Semantic Kernel with Agents
  • Logic Apps RAG
  • PII handling
  • Gemini

[!TIP] Kindly use the feedback discussion so that we can continuously improve with your experiences, suggestions, ideas or lab requests.

πŸš€ Getting Started

Prerequisites

Quickstart

  1. Clone this repo and configure your local machine with the prerequisites. Or just create a GitHub Codespace and run it on the browser or in VS Code.
  2. Navigate through the available labs and select one that best suits your needs. For starters we recommend the token rate limiting.
  3. Open the notebook and run the provided steps.
  4. Tailor the experiment according to your requirements. If you wish to contribute to our collective work, we would appreciate your submission of a pull request.

[!NOTE] πŸͺ² Please feel free to open a new issue if you find something that should be fixed or enhanced.

β›΅ Roll-out to production

We recommend the guidelines and best practices from the AI Hub Gateway Landing Zone to implement a central AI API gateway to empower various line-of-business units in an organization to leverage Azure AI services.

πŸ”¨ Supporting Tools

  • AI-Gateway Mock server is designed to mimic the behavior and responses of the OpenAI API, thereby creating an efficient simulation environment suitable for testing and development purposes on the integration with Azure API Management and other use cases. The app.py can be customized to tailor the Mock server to specific use cases.
  • Tracing - Invoke OpenAI API with trace enabled and returns the tracing information.
  • Streaming - Invoke OpenAI API with stream enabled and returns response in chunks.

πŸ›οΈ Well-Architected Framework

The Azure Well-Architected Framework is a design framework that can improve the quality of a workload. The following table maps labs with the Well-Architected Framework pillars to set you up for success through architectural experimentation.

Lab Security Reliability Performance Operations Costs
Request forwarding ⭐
Backend circuit breaking ⭐ ⭐
Backend pool load balancing ⭐ ⭐ ⭐
Advanced load balancing ⭐ ⭐ ⭐
Response streaming ⭐ ⭐
Vector searching ⭐ ⭐ ⭐
Built-in logging ⭐ ⭐ ⭐ ⭐ ⭐
SLM self-hosting ⭐ ⭐

[!TIP] Check the Azure Well-Architected Framework perspective on Azure OpenAI Service for aditional guidance.

πŸŽ’ Show and tell

[!TIP] Install the VS Code Reveal extension, open AI-GATEWAY.md and click on 'slides' at the botton to present the AI Gateway without leaving VS Code. Or just open the AI-GATEWAY.pptx for a plain old PowerPoint experience.

πŸ₯‡ Other resources

Numerous reference architectures, best practices and starter kits are available on this topic. Please refer to the resources provided if you need comprehensive solutions or a landing zone to initiate your project. We suggest leveraging the AI-Gateway labs to discover additional capabilities that can be integrated into the reference architectures.

We believe that there may be valuable content that we are currently unaware of. We would greatly appreciate any suggestions or recommendations to enhance this list.

🌐 WW GBB initiative

GBB

Disclaimer

[!IMPORTANT] This software is provided for demonstration purposes only. It is not intended to be relied upon for any purpose. The creators of this software make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the software or the information, products, services, or related graphics contained in the software for any purpose. Any reliance you place on such information is therefore strictly at your own risk.

Recommended Servers

Crypto Price & Market Analysis MCP Server

Crypto Price & Market Analysis MCP Server

A Model Context Protocol (MCP) server that provides comprehensive cryptocurrency analysis using the CoinCap API. This server offers real-time price data, market analysis, and historical trends through an easy-to-use interface.

Featured
TypeScript
MCP PubMed Search

MCP PubMed Search

Server to search PubMed (PubMed is a free, online database that allows users to search for biomedical and life sciences literature). I have created on a day MCP came out but was on vacation, I saw someone post similar server in your DB, but figured to post mine.

Featured
Python
dbt Semantic Layer MCP Server

dbt Semantic Layer MCP Server

A server that enables querying the dbt Semantic Layer through natural language conversations with Claude Desktop and other AI assistants, allowing users to discover metrics, create queries, analyze data, and visualize results.

Featured
TypeScript
mixpanel

mixpanel

Connect to your Mixpanel data. Query events, retention, and funnel data from Mixpanel analytics.

Featured
TypeScript
Sequential Thinking MCP Server

Sequential Thinking MCP Server

This server facilitates structured problem-solving by breaking down complex issues into sequential steps, supporting revisions, and enabling multiple solution paths through full MCP integration.

Featured
Python
Nefino MCP Server

Nefino MCP Server

Provides large language models with access to news and information about renewable energy projects in Germany, allowing filtering by location, topic (solar, wind, hydrogen), and date range.

Official
Python
Vectorize

Vectorize

Vectorize MCP server for advanced retrieval, Private Deep Research, Anything-to-Markdown file extraction and text chunking.

Official
JavaScript
Mathematica Documentation MCP server

Mathematica Documentation MCP server

A server that provides access to Mathematica documentation through FastMCP, enabling users to retrieve function documentation and list package symbols from Wolfram Mathematica.

Local
Python
kb-mcp-server

kb-mcp-server

An MCP server aimed to be portable, local, easy and convenient to support semantic/graph based retrieval of txtai "all in one" embeddings database. Any txtai embeddings db in tar.gz form can be loaded

Local
Python
Research MCP Server

Research MCP Server

The server functions as an MCP server to interact with Notion for retrieving and creating survey data, integrating with the Claude Desktop Client for conducting and reviewing surveys.

Local
Python