Azure AI Foundry¶
Microsoft's unified AI platform (formerly Azure AI Studio). Access GPT-4, GPT-4 Turbo, GPT-3.5 via Azure OpenAI Service. Custom models, prompt flow, evaluation tools. Enterprise-grade security and compliance. Integrated with Microsoft ecosystem.
2026 Update
GPT-4 Turbo with 128k context. DALL-E 3 for image generation. Whisper for speech-to-text. Custom GPTs in Azure. Content safety built-in. Private endpoints standard. On Your Data feature for RAG without code.
Quick Hits¶
# Install Azure SDK
pip install openai azure-identity
# Azure OpenAI with GPT-4
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential
client = AzureOpenAI(
api_version="2024-02-15-preview",
azure_endpoint="https://myresource.openai.azure.com",
azure_ad_token_provider=DefaultAzureCredential().get_token # (1)!
)
response = client.chat.completions.create(
model="gpt-4-turbo", # (2)!
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain containerization"}
],
max_tokens=800,
temperature=0.7
)
print(response.choices[0].message.content)
- Managed Identity auth (recommended), or use API key
- Model is deployment name (created in Azure portal)
# Streaming responses
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": "Write a poem"}],
stream=True # (1)!
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end='', flush=True)
- Streaming recommended for better UX (token-by-token)
# Function calling (tools)
import json
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
}] # (1)!
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": "What's the weather in NYC?"}],
tools=tools,
tool_choice="auto" # (2)!
)
# Check if function called
if response.choices[0].message.tool_calls:
tool_call = response.choices[0].message.tool_calls[0]
function_args = json.loads(tool_call.function.arguments)
print(f"Calling {tool_call.function.name} with {function_args}") # (3)!
- Define functions GPT can call (JSON schema)
autolets model decide,nonedisables,requiredforces- Your code executes function, sends result back to GPT
# On Your Data (RAG without infrastructure)
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": "What are our Q4 results?"}],
extra_body={
"data_sources": [{
"type": "azure_search", # (1)!
"parameters": {
"endpoint": "https://mysearch.search.windows.net",
"index_name": "financial-docs",
"authentication": {
"type": "api_key",
"key": "search-key"
}
}
}]
}
)
print(response.choices[0].message.content)
print("\nCitations:", response.choices[0].message.context) # (2)!
- Azure AI Search integration for RAG (automatic)
- Citations show source documents
Real talk:
- Azure OpenAI is GPT-3.5/4 with enterprise features
- Pay per token (input + output), no minimum spend
- GPT-4 Turbo: $10/1M input, $30/1M output tokens
- Regional quotas (TPM - tokens per minute)
- Managed Identity better than API keys
# Prompt Flow (visual workflow designer)
# Define in Azure portal, execute via SDK
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
ml_client = MLClient(
DefaultAzureCredential(),
subscription_id="sub-id",
resource_group_name="rg-name",
workspace_name="workspace-name"
)
# Execute prompt flow
result = ml_client.promptflow.invoke(
flow="customer-support-flow", # (1)!
data={"question": "How do I reset my password?"}
)
print(result['answer'])
- Flows created in Azure AI Foundry UI (drag-and-drop)
# Content Safety (moderation)
from azure.ai.contentsafety import ContentSafetyClient
from azure.core.credentials import AzureKeyCredential
safety_client = ContentSafetyClient(
endpoint="https://mysafety.cognitiveservices.azure.com",
credential=AzureKeyCredential("key")
)
from azure.ai.contentsafety.models import AnalyzeTextOptions
result = safety_client.analyze_text(
AnalyzeTextOptions(text="User input text here")
) # (1)!
# Check severity (0-6, higher = more severe)
if result.hate_result.severity > 4:
print("Content blocked: hate speech") # (2)!
- Analyzes text for hate, violence, self-harm, sexual content
- Severity thresholds configurable per category
# Custom model fine-tuning
# 1. Prepare training data (JSONL format)
# 2. Upload to Azure Blob Storage
# 3. Create fine-tuning job
from azure.ai.ml.entities import FineTuningJob
job = FineTuningJob(
model="gpt-3.5-turbo",
training_file="azureml://datastores/workspaceblobstore/paths/train.jsonl",
validation_file="azureml://datastores/workspaceblobstore/paths/val.jsonl",
hyperparameters={
"n_epochs": 3,
"batch_size": 8,
"learning_rate_multiplier": 0.1
}
)
fine_tuned = ml_client.jobs.create_or_update(job) # (1)!
- Fine-tuning creates custom deployment (use like base model)
# Batch processing (cost-effective for bulk)
# Upload JSONL with multiple requests
batch_input = [
{"custom_id": "req-1", "method": "POST", "url": "/chat/completions", "body": {...}},
{"custom_id": "req-2", "method": "POST", "url": "/chat/completions", "body": {...}}
]
# Submit batch job (50% discount vs real-time)
batch_job = client.batches.create(
input_file_id="file-123",
endpoint="/v1/chat/completions",
completion_window="24h" # (1)!
)
# Retrieve results later
results = client.batches.retrieve(batch_job.id)
- Batch jobs complete within 24 hours, much cheaper
// C# SDK (first-class Azure support)
using Azure.AI.OpenAI;
using Azure.Identity;
var client = new OpenAIClient(
new Uri("https://myresource.openai.azure.com"),
new DefaultAzureCredential() // (1)!
);
var chatOptions = new ChatCompletionsOptions
{
DeploymentName = "gpt-4-turbo",
Messages =
{
new ChatRequestSystemMessage("You are helpful"),
new ChatRequestUserMessage("Explain async/await")
},
MaxTokens = 800
};
var response = await client.GetChatCompletionsAsync(chatOptions);
Console.WriteLine(response.Value.Choices[0].Message.Content); // (2)!
- Managed Identity seamless in Azure environment
- C# SDK is excellent (Microsoft's primary language)
Why this works:
- Enterprise features out of box (compliance, security)
- Managed Identity eliminates credential management
- Content Safety prevents brand risk
- On Your Data simplifies RAG (no vector DB setup)
- Prompt Flow enables non-developers to build AI apps
Best Practices
- Managed Identity - Use instead of API keys (better security)
- Content Safety - Always filter user input/output in production
- Private endpoints - Keep traffic within Azure network
- Prompt Flow - Visual workflows easier to maintain than code
- Regional deployment - Deploy near users (latency)
- Quotas - Request TPM increases early (process takes days)
- Monitoring - Use Application Insights for observability
Security
- Private endpoints - Disable public access for production
- RBAC - Use Cognitive Services User role (least privilege)
- Network isolation - VNet integration for sensitive workloads
- Data residency - Choose region for compliance (EU, US)
- Audit logs - Enable diagnostics for compliance
- Content filtering - Required for responsible AI
Performance
- Streaming - Reduces time to first token significantly
- Batch API - 50% discount for non-real-time workloads
- Caching - Cache common responses (Redis, Cosmos DB)
- Regional selection - East US 2 has most capacity
- Model selection - GPT-3.5 Turbo for speed, GPT-4 for quality
Gotchas
- Quotas (TPM) - Per-region, per-model limits (request increase)
- Model names - Deployment names, not OpenAI model names
- API versions - Preview features in specific API versions
- Rate limits - TPM can be hit under load (implement backoff)
- Fine-tuning - Only available for GPT-3.5 Turbo (not GPT-4)
- On Your Data - Requires Azure AI Search (additional cost)
- Content Safety - Can over-filter, test thresholds carefully
Learning Resources¶
Official Docs¶
- Azure AI Foundry Documentation - Complete guide
- Azure OpenAI Service - API reference
- Pricing Calculator - Cost estimation
Key Features¶
- Azure OpenAI Service - GPT-4, GPT-3.5, DALL-E, Whisper
- Prompt Flow - Visual workflow designer for AI apps
- Content Safety - Moderation, PII detection
- On Your Data - RAG with Azure AI Search integration
- Custom models - Fine-tuning GPT-3.5
- Responsible AI - Built-in tools for ethical AI
Model Pricing (2026)¶
| Model | Input $ / 1M tokens | Output $ / 1M tokens | Context Window |
|---|---|---|---|
| GPT-4 Turbo | $10 | $30 | 128k |
| GPT-4 | $30 | $60 | 32k |
| GPT-3.5 Turbo | $0.50 | $1.50 | 16k |
| GPT-4o (optimized) | $5 | $15 | 128k |
| DALL-E 3 | $0.040/image (1024x1024) | - | - |
Last Updated: 2026-02-02 | Vibe Check: Enterprise Standard - Best for Microsoft shops. Managed Identity is great. Content Safety essential. On Your Data simplifies RAG. Quotas can be limiting. Excellent C# support.
Tags: azure-ai-foundry, ai, llm, gpt-4, microsoft