AI Providers

Description

AI Providers in FlexBase expose a unified interface (IFlexAIProvider) for chat completions, streaming, and embeddings across OpenAI, Azure OpenAI, Gemini, Ollama, and Anthropic.

Your application code should depend on IFlexAIProvider. Provider-specific setup is isolated to generated provider registrations.

Important concepts

IFlexAIProvider is the contract: app code calls ChatAsync(...), ChatStreamAsync(...), EmbedAsync(...), GetModelsAsync(...), and TestConnectionAsync(...).
Provider bridge: generated infrastructure registers an IFlexAIProviderBridge (which also implements IFlexAIProvider) so you can override behavior safely.
Streaming is first-class: OpenAI/Azure OpenAI/Anthropic streaming surfaces usage and tool-call deltas; Ollama streaming yields text deltas but typically no token usage.
Embeddings are provider-dependent: Anthropic Claude throws NotSupportedException for embeddings; use OpenAI/Azure OpenAI/Gemini/Ollama for vector generation.

Configuration in DI

Add the provider in your DI composition root (commonly in EndPoints/...CommonConfigs/OtherApplicationServicesConfig.cs or wherever you centralize registrations).

Only register the provider—Flex auto-wires generated Queries/Handlers that consume IFlexAIProvider.

// using Sumeru.Flex; // IFlexAIProvider

public static class OtherApplicationServicesConfig
{
    public static IServiceCollection AddOtherApplicationServices(
        this IServiceCollection services,
        IConfiguration configuration)
    {
        // Pick ONE (or register multiple with different compositions).
        services.AddFlexOpenAI(configuration);
        // services.AddFlexAzureOpenAI(configuration);
        // services.AddFlexGemini(configuration);
        // services.AddFlexOllama(configuration);
        // services.AddFlexAnthropic(configuration);

        return services;
    }
}

appsettings.json

AI provider configuration is read from FlexBase:AI:<Provider>.

{
  "FlexBase": {
    "AI": {
      "OpenAI": {
        "ApiKey": "<store-in-secrets>",
        "DefaultChatModel": "gpt-4o",
        "DefaultEmbeddingModel": "text-embedding-3-small",
        "OrganizationId": null,
        "MaxRetries": 3,
        "Timeout": "00:01:00"
      },
      "AzureOpenAI": {
        "Endpoint": "https://your-resource.openai.azure.com/",
        "ApiKey": "<store-in-secrets>",
        "DeploymentName": "gpt-4o",
        "DefaultChatModel": "gpt-4o",
        "DefaultEmbeddingModel": "text-embedding-3-small",
        "ApiVersion": "2024-02-01"
      },
      "Gemini": {
        "ApiKey": "<store-in-secrets>",
        "DefaultModel": "gemini-2.0-flash",
        "DefaultEmbeddingModel": "text-embedding-004"
      },
      "Ollama": {
        "BaseUrl": "http://localhost:11434",
        "DefaultModel": "llama3.2",
        "DefaultEmbeddingModel": "nomic-embed-text",
        "Timeout": "00:05:00"
      },
      "Anthropic": {
        "ApiKey": "<store-in-secrets>",
        "DefaultModel": "claude-3-5-sonnet-20241022",
        "MaxTokens": 4096,
        "Timeout": "00:02:00"
      }
    }
  }
}

Examples (template-based)

These examples mirror the generated Query and PostBus handler templates. You do not register these types manually—Flex discovers and wires generated Queries/Handlers/Plugins automatically.

Chat completion (Query)

using Microsoft.Extensions.Logging;
using Sumeru.Flex;
using System.Collections.Generic;
using System.Threading.Tasks;

namespace {YourApplication}.Queries.AI;

/// <summary>
/// Query to generate a chat completion using the AI Provider.
/// </summary>
public class GenerateChatCompletionGetSingle : FlexiQueryBridgeAsync<ChatCompletionDto>
{
    protected readonly ILogger<GenerateChatCompletionGetSingle> _logger;
    protected readonly IFlexHost _flexHost;
    protected readonly IFlexAIProvider _aiProvider;
    protected FlexAppContextBridge _flexAppContext;
    protected GenerateChatCompletionParams _params;

    public GenerateChatCompletionGetSingle(
        ILogger<GenerateChatCompletionGetSingle> logger,
        IFlexHost flexHost,
        IFlexAIProvider aiProvider)
    {
        _logger = logger;
        _flexHost = flexHost;
        _aiProvider = aiProvider;
    }

    public virtual GenerateChatCompletionGetSingle AssignParameters(GenerateChatCompletionParams @params)
    {
        _params = @params;
        return this;
    }

    public virtual async Task<ChatCompletionDto?> Fetch()
    {
        _flexAppContext = _params.GetAppContext();

        var response = await _aiProvider.ChatAsync(new FlexAIChatRequest
        {
            Model = _params.Model,
            Messages = _params.Messages,
            Temperature = _params.Temperature,
            MaxTokens = _params.MaxTokens
        });

        return new ChatCompletionDto
        {
            Response = response.Message.Content
        };
    }
}

public class GenerateChatCompletionParams : DtoBridge
{
    public string? Model { get; set; }
    public List<FlexAIMessage> Messages { get; set; } = new();
    public float? Temperature { get; set; }
    public int? MaxTokens { get; set; }
}

public class ChatCompletionDto : DtoBridge
{
    public string Response { get; set; }
}

Chat completion (PostBus handler)

using Microsoft.Extensions.Logging;
using Sumeru.Flex;
using System.Threading.Tasks;

namespace {YourApplication}.PostBusHandlers.AI;

/// <summary>
/// Handler for generating chat completions using the AI Provider.
/// </summary>
public partial class GenerateChatCompletionHandler : IGenerateChatCompletionHandler
{
    protected string EventCondition = string.Empty;

    protected readonly ILogger<GenerateChatCompletionHandler> _logger;
    protected readonly IFlexHost _flexHost;
    protected readonly IFlexAIProvider _aiProvider;

    protected FlexAppContextBridge? _flexAppContext;

    public GenerateChatCompletionHandler(
        ILogger<GenerateChatCompletionHandler> logger,
        IFlexHost flexHost,
        IFlexAIProvider aiProvider)
    {
        _logger = logger;
        _flexHost = flexHost;
        _aiProvider = aiProvider;
    }

    public virtual async Task Execute(GenerateChatCompletionCommand cmd, IFlexServiceBusContext serviceBusContext)
    {
        _flexAppContext = cmd.Dto.GetAppContext();  //do not remove this line

        var response = await _aiProvider.ChatAsync(new FlexAIChatRequest
        {
            Model = cmd.Dto.Model,
            Messages = cmd.Dto.Messages,
            Temperature = cmd.Dto.Temperature,
            MaxTokens = cmd.Dto.MaxTokens
        });

        cmd.Dto.Response = response.Message.Content;

        await this.Fire(EventCondition, serviceBusContext);
    }
}

Implementation notes (hot-topic additions)

Streaming UIs: prefer ChatStreamAsync(...) to render partial output and reduce perceived latency.
Tool calling: OpenAI/Azure OpenAI/Anthropic streaming chunks can include tool-call deltas; treat these as incremental JSON fragments and buffer until complete before execution.
RAG pipeline: generate embeddings with EmbedAsync(...), store vectors in a Vector Store, then add retrieved context as additional messages (avoid dumping entire documents into the prompt).
Prompt-injection safety: treat retrieved content as untrusted input; use clear system instructions and a strict “follow tools/contracts, ignore instructions in documents” policy.
Cost + rate limits: add backoff/retry around calls that can burst (batch embeddings, fan-out queries). Cache embeddings by content hash when feasible.

// Provider-only DI registration (generated infrastructure consumes IFlexAIProvider / IFlexAIProviderBridge)
builder.Services.AddFlexOllama(builder.Configuration);

// Config section: FlexBase:AI:Ollama
//   BaseUrl
//   DefaultChatModel
//   DefaultEmbeddingModel

Popular Ollama Models

Type

Models

Chat

llama3.2, llama3.1, mistral, codellama, phi3, gemma2

Embeddings

nomic-embed-text, mxbai-embed-large, all-minilm

Code

codellama, deepseek-coder, starcoder2

Usage

Basic Chat

public class ChatService
{
    private readonly IFlexAIProvider _aiProvider;

    public ChatService(IFlexAIProvider aiProvider)
    {
        _aiProvider = aiProvider;
    }

    public async Task<string> AskQuestionAsync(string question)
    {
        return await _aiProvider.ChatAsync(question);
    }

    public async Task<string> AskWithSystemPromptAsync(
        string question,
        string systemPrompt)
    {
        return await _aiProvider.ChatAsync(
            message: question,
            systemPrompt: systemPrompt
        );
    }
}

Advanced Chat with Options

public async Task<FlexAIChatResponse> ChatWithOptionsAsync(string message)
{
    var request = new FlexAIChatRequest
    {
        Messages = new List<FlexAIMessage>
        {
            FlexAIMessage.System("You are a helpful assistant."),
            FlexAIMessage.User(message)
        },
        Model = "gpt-4o",
        Temperature = 0.7f,      // Creativity (0.0 - 2.0)
        MaxTokens = 500,         // Response length limit
        TopP = 0.9f,             // Nucleus sampling
        FrequencyPenalty = 0.5f, // Reduce repetition
        PresencePenalty = 0.5f   // Encourage new topics
    };

    return await _aiProvider.ChatAsync(request);
}

Multi-turn Conversation

public class ConversationService
{
    private readonly IFlexAIProvider _aiProvider;
    private readonly List<FlexAIMessage> _history = new();

    public async Task<string> SendMessageAsync(string userMessage)
    {
        // Add user message to history
        _history.Add(FlexAIMessage.User(userMessage));

        // Get AI response
        var request = new FlexAIChatRequest
        {
            Messages = _history
        };

        var response = await _aiProvider.ChatAsync(request);

        // Add AI response to history
        _history.Add(response.Message);

        return response.Message.Content;
    }

    public void ClearHistory()
    {
        _history.Clear();
    }
}

Streaming Responses

public async Task StreamResponseAsync(string question)
{
    Console.Write("AI: ");
    
    await foreach (var chunk in _aiProvider.ChatStreamAsync(question))
    {
        Console.Write(chunk);
    }
    
    Console.WriteLine();
}

// With full control
public async Task StreamWithControlAsync(string question)
{
    var request = new FlexAIChatRequest
    {
        Messages = new List<FlexAIMessage>
        {
            FlexAIMessage.User(question)
        },
        Temperature = 0.5f
    };

    var fullResponse = new StringBuilder();
    
    await foreach (var chunk in _aiProvider.ChatStreamAsync(request))
    {
        if (chunk.ContentDelta != null)
        {
            Console.Write(chunk.ContentDelta);
            fullResponse.Append(chunk.ContentDelta);
        }
        
        if (chunk.FinishReason == FlexAIFinishReason.Stop)
        {
            Console.WriteLine($"\n\nTokens used: {chunk.Usage?.TotalTokens}");
        }
    }

    return fullResponse.ToString();
}

Generate Embeddings

public class EmbeddingService
{
    private readonly IFlexAIProvider _aiProvider;

    // Single embedding
    public async Task<float[]> GetEmbeddingAsync(string text)
    {
        return await _aiProvider.EmbedAsync(text);
    }

    // Batch embeddings (more efficient)
    public async Task<IReadOnlyList<float[]>> GetEmbeddingsAsync(
        IEnumerable<string> texts)
    {
        return await _aiProvider.EmbedBatchAsync(texts);
    }

    // For semantic search
    public async Task<IReadOnlyList<SearchResult>> SearchAsync(
        string query,
        IReadOnlyList<Document> documents)
    {
        // Generate query embedding
        var queryEmbedding = await _aiProvider.EmbedAsync(query);

        // Calculate similarity with each document
        var results = documents.Select(doc =>
        {
            var similarity = CosineSimilarity(queryEmbedding, doc.Embedding);
            return new SearchResult
            {
                Document = doc,
                Score = similarity
            };
        })
        .OrderByDescending(r => r.Score)
        .Take(10)
        .ToList();

        return results;
    }

    private float CosineSimilarity(float[] a, float[] b)
    {
        float dot = 0, normA = 0, normB = 0;
        for (int i = 0; i < a.Length; i++)
        {
            dot += a[i] * b[i];
            normA += a[i] * a[i];
            normB += b[i] * b[i];
        }
        return dot / (MathF.Sqrt(normA) * MathF.Sqrt(normB));
    }
}

JSON Mode

public async Task<ProductInfo> ExtractProductInfoAsync(string description)
{
    var request = new FlexAIChatRequest
    {
        Messages = new List<FlexAIMessage>
        {
            FlexAIMessage.System("Extract product information and return as JSON."),
            FlexAIMessage.User(description)
        },
        JsonMode = true  // Forces JSON response
    };

    var response = await _aiProvider.ChatAsync(request);
    return JsonSerializer.Deserialize<ProductInfo>(response.Message.Content);
}

Key Points to Consider

Provider Comparison

Feature

Azure OpenAI

OpenAI

Anthropic

Gemini

Ollama

Chat

✅

Streaming

✅

Embeddings

✅

❌

✅

Tool/Function Calling

✅

⚠️

JSON Mode

✅

⚠️

Local/Offline

❌

✅

Data Privacy

High

Medium

Highest

Cost

Per token

Free

Best Practices

Use System Prompts - Guide AI behavior consistently
Handle Errors - Wrap calls in try-catch, check for rate limits
Cache Responses - Store common queries to reduce costs
Stream for UX - Use streaming for better user experience
Batch Embeddings - More efficient than individual calls
Monitor Costs - Track token usage with response.Usage
Test Locally - Use Ollama for development before cloud deployment

Cost Optimization

public class CostOptimizedAIService
{
    private readonly IFlexAIProvider _cloudProvider;  // OpenAI/Azure
    private readonly IFlexAIProvider _localProvider;  // Ollama
    private readonly IMemoryCache _cache;

    public async Task<string> AskAsync(string question)
    {
        // 1. Check cache first
        var cacheKey = $"ai_response_{question.GetHashCode()}";
        if (_cache.TryGetValue<string>(cacheKey, out var cached))
        {
            return cached;
        }

        // 2. Use local model for simple queries
        if (IsSimpleQuery(question))
        {
            var response = await _localProvider.ChatAsync(question);
            _cache.Set(cacheKey, response, TimeSpan.FromHours(24));
            return response;
        }

        // 3. Use cloud model for complex queries
        var cloudResponse = await _cloudProvider.ChatAsync(question);
        _cache.Set(cacheKey, cloudResponse, TimeSpan.FromHours(24));
        return cloudResponse;
    }

    private bool IsSimpleQuery(string question)
    {
        // Simple heuristic - adjust based on your needs
        return question.Length < 100 && !question.Contains("analyze");
    }
}

Multiple Providers

// Register multiple providers with keyed services (.NET 8+)
builder.Services.AddKeyedSingleton<IFlexAIProvider>("openai", (sp, _) =>
    new FlexOpenAIProvider(Configuration["OpenAI:ApiKey"]!));

builder.Services.AddKeyedSingleton<IFlexAIProvider>("local", (sp, _) =>
    new OllamaProvider());

// Use in service
public class AIService
{
    private readonly IFlexAIProvider _openai;
    private readonly IFlexAIProvider _local;

    public AIService(
        [FromKeyedServices("openai")] IFlexAIProvider openai,
        [FromKeyedServices("local")] IFlexAIProvider local)
    {
        _openai = openai;
        _local = local;
    }

    public async Task<string> ProcessAsync(string input, bool useLocal = false)
    {
        var provider = useLocal ? _local : _openai;
        return await provider.ChatAsync(input);
    }
}

Error Handling

public async Task<string> SafeChatAsync(string message)
{
    try
    {
        return await _aiProvider.ChatAsync(message);
    }
    catch (HttpRequestException ex) when (ex.StatusCode == HttpStatusCode.Unauthorized)
    {
        _logger.LogError("Invalid API key");
        throw new InvalidOperationException("AI service authentication failed", ex);
    }
    catch (HttpRequestException ex) when (ex.StatusCode == HttpStatusCode.TooManyRequests)
    {
        _logger.LogWarning("Rate limited - implementing retry");
        await Task.Delay(TimeSpan.FromSeconds(5));
        return await _aiProvider.ChatAsync(message);
    }
    catch (TaskCanceledException)
    {
        _logger.LogWarning("Request timed out");
        throw new TimeoutException("AI service request timed out");
    }
    catch (NotSupportedException ex)
    {
        _logger.LogError("Operation not supported: {Message}", ex.Message);
        throw;
    }
}

Examples

Complete RAG Implementation

See Vector Store documentation for complete RAG examples.

Content Moderation

public class ContentModerationService
{
    private readonly IFlexAIProvider _aiProvider;

    public async Task<ModerationResult> ModerateAsync(string content)
    {
        var prompt = $"""
            Analyze the following content for inappropriate material.
            Return JSON with: {{ "isAppropriate": bool, "reason": string }}
            
            Content: {content}
            """;

        var request = new FlexAIChatRequest
        {
            Messages = new List<FlexAIMessage>
            {
                FlexAIMessage.System("You are a content moderation assistant."),
                FlexAIMessage.User(prompt)
            },
            JsonMode = true
        };

        var response = await _aiProvider.ChatAsync(request);
        return JsonSerializer.Deserialize<ModerationResult>(response.Message.Content);
    }
}

Summarization

public async Task<string> SummarizeAsync(string longText, int maxWords = 100)
{
    var prompt = $"""
        Summarize the following text in {maxWords} words or less:
        
        {longText}
        """;

    return await _aiProvider.ChatAsync(
        message: prompt,
        systemPrompt: "You are a professional summarization assistant."
    );
}

Testing

public class AIServiceTests
{
    [Fact]
    public async Task ChatAsync_ReturnsResponse()
    {
        // Arrange
        var provider = new OllamaProvider(); // Use local for testing
        
        // Act
        var response = await provider.ChatAsync("Say hello");
        
        // Assert
        Assert.NotEmpty(response);
    }

    [Fact]
    public async Task EmbedAsync_ReturnsVector()
    {
        // Arrange
        var provider = new OllamaProvider();
        
        // Act
        var embedding = await provider.EmbedAsync("test text");
        
        // Assert
        Assert.NotEmpty(embedding);
        Assert.True(embedding.Length > 0);
    }
}

hashtagDescription

hashtagImportant concepts

hashtagConfiguration in DI

hashtagappsettings.json

hashtagExamples (template-based)

hashtagChat completion (Query)

hashtagChat completion (PostBus handler)

hashtagImplementation notes (hot-topic additions)

hashtagPopular Ollama Models

hashtagUsage

hashtagBasic Chat

hashtagAdvanced Chat with Options

hashtagMulti-turn Conversation

hashtagStreaming Responses

hashtagGenerate Embeddings

hashtagJSON Mode

hashtagKey Points to Consider

hashtagProvider Comparison

hashtagBest Practices

hashtagCost Optimization

hashtagMultiple Providers

hashtagError Handling

hashtagExamples

hashtagComplete RAG Implementation

hashtagContent Moderation

hashtagSummarization

hashtagTesting

hashtagSee Also