AI Providers
Description
AI Providers in FlexBase expose a unified interface (IFlexAIProvider) for chat completions, streaming, and embeddings across OpenAI, Azure OpenAI, Gemini, Ollama, and Anthropic.
Your application code should depend on IFlexAIProvider. Provider-specific setup is isolated to generated provider registrations.
Important concepts
IFlexAIProvideris the contract: app code callsChatAsync(...),ChatStreamAsync(...),EmbedAsync(...),GetModelsAsync(...), andTestConnectionAsync(...).Provider bridge: generated infrastructure registers an
IFlexAIProviderBridge(which also implementsIFlexAIProvider) so you can override behavior safely.Streaming is first-class: OpenAI/Azure OpenAI/Anthropic streaming surfaces usage and tool-call deltas; Ollama streaming yields text deltas but typically no token usage.
Embeddings are provider-dependent: Anthropic Claude throws
NotSupportedExceptionfor embeddings; use OpenAI/Azure OpenAI/Gemini/Ollama for vector generation.
Configuration in DI
Add the provider in your DI composition root (commonly in EndPoints/...CommonConfigs/OtherApplicationServicesConfig.cs or wherever you centralize registrations).
Only register the provider—Flex auto-wires generated Queries/Handlers that consume IFlexAIProvider.
// using Sumeru.Flex; // IFlexAIProvider
public static class OtherApplicationServicesConfig
{
public static IServiceCollection AddOtherApplicationServices(
this IServiceCollection services,
IConfiguration configuration)
{
// Pick ONE (or register multiple with different compositions).
services.AddFlexOpenAI(configuration);
// services.AddFlexAzureOpenAI(configuration);
// services.AddFlexGemini(configuration);
// services.AddFlexOllama(configuration);
// services.AddFlexAnthropic(configuration);
return services;
}
}appsettings.json
AI provider configuration is read from FlexBase:AI:<Provider>.
Examples (template-based)
These examples mirror the generated Query and PostBus handler templates. You do not register these types manually—Flex discovers and wires generated Queries/Handlers/Plugins automatically.
Chat completion (Query)
Chat completion (PostBus handler)
Implementation notes (hot-topic additions)
Streaming UIs: prefer
ChatStreamAsync(...)to render partial output and reduce perceived latency.Tool calling: OpenAI/Azure OpenAI/Anthropic streaming chunks can include tool-call deltas; treat these as incremental JSON fragments and buffer until complete before execution.
RAG pipeline: generate embeddings with
EmbedAsync(...), store vectors in a Vector Store, then add retrieved context as additional messages (avoid dumping entire documents into the prompt).Prompt-injection safety: treat retrieved content as untrusted input; use clear system instructions and a strict “follow tools/contracts, ignore instructions in documents” policy.
Cost + rate limits: add backoff/retry around calls that can burst (batch embeddings, fan-out queries). Cache embeddings by content hash when feasible.
Popular Ollama Models
Chat
llama3.2, llama3.1, mistral, codellama, phi3, gemma2
Embeddings
nomic-embed-text, mxbai-embed-large, all-minilm
Code
codellama, deepseek-coder, starcoder2
Usage
Basic Chat
Advanced Chat with Options
Multi-turn Conversation
Streaming Responses
Generate Embeddings
JSON Mode
Key Points to Consider
Provider Comparison
Chat
✅
✅
✅
✅
✅
Streaming
✅
✅
✅
✅
✅
Embeddings
✅
✅
❌
✅
✅
Tool/Function Calling
✅
✅
✅
✅
⚠️
JSON Mode
✅
✅
✅
✅
⚠️
Local/Offline
❌
❌
❌
❌
✅
Data Privacy
High
Medium
Medium
Medium
Highest
Cost
Per token
Per token
Per token
Per token
Free
Best Practices
Use System Prompts - Guide AI behavior consistently
Handle Errors - Wrap calls in try-catch, check for rate limits
Cache Responses - Store common queries to reduce costs
Stream for UX - Use streaming for better user experience
Batch Embeddings - More efficient than individual calls
Monitor Costs - Track token usage with
response.UsageTest Locally - Use Ollama for development before cloud deployment
Cost Optimization
Multiple Providers
Error Handling
Examples
Complete RAG Implementation
See Vector Store documentation for complete RAG examples.
Content Moderation
Summarization
Testing
See Also
Vector Store - Use embeddings for semantic search
FlexAI Providers Reference - Detailed API documentation
FlexAI RAG Guide - RAG implementation
FlexAI Logging - Logging configuration
Last updated