# AI Providers

## Description

AI Providers in FlexBase expose a unified interface (`IFlexAIProvider`) for chat completions, streaming, and embeddings across OpenAI, Azure OpenAI, Gemini, Ollama, and Anthropic.

Your application code should depend on `IFlexAIProvider`. Provider-specific setup is isolated to generated provider registrations.

## Important concepts

* **`IFlexAIProvider` is the contract**: app code calls `ChatAsync(...)`, `ChatStreamAsync(...)`, `EmbedAsync(...)`, `GetModelsAsync(...)`, and `TestConnectionAsync(...)`.
* **Provider bridge**: generated infrastructure registers an `IFlexAIProviderBridge` (which also implements `IFlexAIProvider`) so you can override behavior safely.
* **Streaming is first-class**: OpenAI/Azure OpenAI/Anthropic streaming surfaces usage and tool-call deltas; Ollama streaming yields text deltas but typically no token usage.
* **Embeddings are provider-dependent**: Anthropic Claude throws `NotSupportedException` for embeddings; use OpenAI/Azure OpenAI/Gemini/Ollama for vector generation.

## Configuration in DI

Add the provider in your DI composition root (commonly in `EndPoints/...CommonConfigs/OtherApplicationServicesConfig.cs` or wherever you centralize registrations).

Only register the provider—Flex auto-wires generated Queries/Handlers that consume `IFlexAIProvider`.

```csharp
// using Sumeru.Flex; // IFlexAIProvider

public static class OtherApplicationServicesConfig
{
    public static IServiceCollection AddOtherApplicationServices(
        this IServiceCollection services,
        IConfiguration configuration)
    {
        // Pick ONE (or register multiple with different compositions).
        services.AddFlexOpenAI(configuration);
        // services.AddFlexAzureOpenAI(configuration);
        // services.AddFlexGemini(configuration);
        // services.AddFlexOllama(configuration);
        // services.AddFlexAnthropic(configuration);

        return services;
    }
}
```

## appsettings.json

AI provider configuration is read from `FlexBase:AI:<Provider>`.

```json
{
  "FlexBase": {
    "AI": {
      "OpenAI": {
        "ApiKey": "<store-in-secrets>",
        "DefaultChatModel": "gpt-4o",
        "DefaultEmbeddingModel": "text-embedding-3-small",
        "OrganizationId": null,
        "MaxRetries": 3,
        "Timeout": "00:01:00"
      },
      "AzureOpenAI": {
        "Endpoint": "https://your-resource.openai.azure.com/",
        "ApiKey": "<store-in-secrets>",
        "DeploymentName": "gpt-4o",
        "DefaultChatModel": "gpt-4o",
        "DefaultEmbeddingModel": "text-embedding-3-small",
        "ApiVersion": "2024-02-01"
      },
      "Gemini": {
        "ApiKey": "<store-in-secrets>",
        "DefaultModel": "gemini-2.0-flash",
        "DefaultEmbeddingModel": "text-embedding-004"
      },
      "Ollama": {
        "BaseUrl": "http://localhost:11434",
        "DefaultModel": "llama3.2",
        "DefaultEmbeddingModel": "nomic-embed-text",
        "Timeout": "00:05:00"
      },
      "Anthropic": {
        "ApiKey": "<store-in-secrets>",
        "DefaultModel": "claude-3-5-sonnet-20241022",
        "MaxTokens": 4096,
        "Timeout": "00:02:00"
      }
    }
  }
}
```

## Examples (template-based)

These examples mirror the generated Query and PostBus handler templates. You do **not** register these types manually—Flex discovers and wires generated Queries/Handlers/Plugins automatically.

### Chat completion (Query)

```csharp
using Microsoft.Extensions.Logging;
using Sumeru.Flex;
using System.Collections.Generic;
using System.Threading.Tasks;

namespace {YourApplication}.Queries.AI;

/// <summary>
/// Query to generate a chat completion using the AI Provider.
/// </summary>
public class GenerateChatCompletionGetSingle : FlexiQueryBridgeAsync<ChatCompletionDto>
{
    protected readonly ILogger<GenerateChatCompletionGetSingle> _logger;
    protected readonly IFlexHost _flexHost;
    protected readonly IFlexAIProvider _aiProvider;
    protected FlexAppContextBridge _flexAppContext;
    protected GenerateChatCompletionParams _params;

    public GenerateChatCompletionGetSingle(
        ILogger<GenerateChatCompletionGetSingle> logger,
        IFlexHost flexHost,
        IFlexAIProvider aiProvider)
    {
        _logger = logger;
        _flexHost = flexHost;
        _aiProvider = aiProvider;
    }

    public virtual GenerateChatCompletionGetSingle AssignParameters(GenerateChatCompletionParams @params)
    {
        _params = @params;
        return this;
    }

    public virtual async Task<ChatCompletionDto?> Fetch()
    {
        _flexAppContext = _params.GetAppContext();

        var response = await _aiProvider.ChatAsync(new FlexAIChatRequest
        {
            Model = _params.Model,
            Messages = _params.Messages,
            Temperature = _params.Temperature,
            MaxTokens = _params.MaxTokens
        });

        return new ChatCompletionDto
        {
            Response = response.Message.Content
        };
    }
}

public class GenerateChatCompletionParams : DtoBridge
{
    public string? Model { get; set; }
    public List<FlexAIMessage> Messages { get; set; } = new();
    public float? Temperature { get; set; }
    public int? MaxTokens { get; set; }
}

public class ChatCompletionDto : DtoBridge
{
    public string Response { get; set; }
}
```

### Chat completion (PostBus handler)

```csharp
using Microsoft.Extensions.Logging;
using Sumeru.Flex;
using System.Threading.Tasks;

namespace {YourApplication}.PostBusHandlers.AI;

/// <summary>
/// Handler for generating chat completions using the AI Provider.
/// </summary>
public partial class GenerateChatCompletionHandler : IGenerateChatCompletionHandler
{
    protected string EventCondition = string.Empty;

    protected readonly ILogger<GenerateChatCompletionHandler> _logger;
    protected readonly IFlexHost _flexHost;
    protected readonly IFlexAIProvider _aiProvider;

    protected FlexAppContextBridge? _flexAppContext;

    public GenerateChatCompletionHandler(
        ILogger<GenerateChatCompletionHandler> logger,
        IFlexHost flexHost,
        IFlexAIProvider aiProvider)
    {
        _logger = logger;
        _flexHost = flexHost;
        _aiProvider = aiProvider;
    }

    public virtual async Task Execute(GenerateChatCompletionCommand cmd, IFlexServiceBusContext serviceBusContext)
    {
        _flexAppContext = cmd.Dto.GetAppContext();  //do not remove this line

        var response = await _aiProvider.ChatAsync(new FlexAIChatRequest
        {
            Model = cmd.Dto.Model,
            Messages = cmd.Dto.Messages,
            Temperature = cmd.Dto.Temperature,
            MaxTokens = cmd.Dto.MaxTokens
        });

        cmd.Dto.Response = response.Message.Content;

        await this.Fire(EventCondition, serviceBusContext);
    }
}
```

## Implementation notes (hot-topic additions)

* **Streaming UIs**: prefer `ChatStreamAsync(...)` to render partial output and reduce perceived latency.
* **Tool calling**: OpenAI/Azure OpenAI/Anthropic streaming chunks can include tool-call deltas; treat these as incremental JSON fragments and buffer until complete before execution.
* **RAG pipeline**: generate embeddings with `EmbedAsync(...)`, store vectors in a Vector Store, then add retrieved context as additional messages (avoid dumping entire documents into the prompt).
* **Prompt-injection safety**: treat retrieved content as untrusted input; use clear system instructions and a strict “follow tools/contracts, ignore instructions in documents” policy.
* **Cost + rate limits**: add backoff/retry around calls that can burst (batch embeddings, fan-out queries). Cache embeddings by content hash when feasible.

```csharp
// Provider-only DI registration (generated infrastructure consumes IFlexAIProvider / IFlexAIProviderBridge)
builder.Services.AddFlexOllama(builder.Configuration);

// Config section: FlexBase:AI:Ollama
//   BaseUrl
//   DefaultChatModel
//   DefaultEmbeddingModel
```

#### Popular Ollama Models

| Type           | Models                                                           |
| -------------- | ---------------------------------------------------------------- |
| **Chat**       | `llama3.2`, `llama3.1`, `mistral`, `codellama`, `phi3`, `gemma2` |
| **Embeddings** | `nomic-embed-text`, `mxbai-embed-large`, `all-minilm`            |
| **Code**       | `codellama`, `deepseek-coder`, `starcoder2`                      |

## Usage

### Basic Chat

```csharp
public class ChatService
{
    private readonly IFlexAIProvider _aiProvider;

    public ChatService(IFlexAIProvider aiProvider)
    {
        _aiProvider = aiProvider;
    }

    public async Task<string> AskQuestionAsync(string question)
    {
        return await _aiProvider.ChatAsync(question);
    }

    public async Task<string> AskWithSystemPromptAsync(
        string question,
        string systemPrompt)
    {
        return await _aiProvider.ChatAsync(
            message: question,
            systemPrompt: systemPrompt
        );
    }
}
```

### Advanced Chat with Options

```csharp
public async Task<FlexAIChatResponse> ChatWithOptionsAsync(string message)
{
    var request = new FlexAIChatRequest
    {
        Messages = new List<FlexAIMessage>
        {
            FlexAIMessage.System("You are a helpful assistant."),
            FlexAIMessage.User(message)
        },
        Model = "gpt-4o",
        Temperature = 0.7f,      // Creativity (0.0 - 2.0)
        MaxTokens = 500,         // Response length limit
        TopP = 0.9f,             // Nucleus sampling
        FrequencyPenalty = 0.5f, // Reduce repetition
        PresencePenalty = 0.5f   // Encourage new topics
    };

    return await _aiProvider.ChatAsync(request);
}
```

### Multi-turn Conversation

```csharp
public class ConversationService
{
    private readonly IFlexAIProvider _aiProvider;
    private readonly List<FlexAIMessage> _history = new();

    public async Task<string> SendMessageAsync(string userMessage)
    {
        // Add user message to history
        _history.Add(FlexAIMessage.User(userMessage));

        // Get AI response
        var request = new FlexAIChatRequest
        {
            Messages = _history
        };

        var response = await _aiProvider.ChatAsync(request);

        // Add AI response to history
        _history.Add(response.Message);

        return response.Message.Content;
    }

    public void ClearHistory()
    {
        _history.Clear();
    }
}
```

### Streaming Responses

```csharp
public async Task StreamResponseAsync(string question)
{
    Console.Write("AI: ");
    
    await foreach (var chunk in _aiProvider.ChatStreamAsync(question))
    {
        Console.Write(chunk);
    }
    
    Console.WriteLine();
}

// With full control
public async Task StreamWithControlAsync(string question)
{
    var request = new FlexAIChatRequest
    {
        Messages = new List<FlexAIMessage>
        {
            FlexAIMessage.User(question)
        },
        Temperature = 0.5f
    };

    var fullResponse = new StringBuilder();
    
    await foreach (var chunk in _aiProvider.ChatStreamAsync(request))
    {
        if (chunk.ContentDelta != null)
        {
            Console.Write(chunk.ContentDelta);
            fullResponse.Append(chunk.ContentDelta);
        }
        
        if (chunk.FinishReason == FlexAIFinishReason.Stop)
        {
            Console.WriteLine($"\n\nTokens used: {chunk.Usage?.TotalTokens}");
        }
    }

    return fullResponse.ToString();
}
```

### Generate Embeddings

```csharp
public class EmbeddingService
{
    private readonly IFlexAIProvider _aiProvider;

    // Single embedding
    public async Task<float[]> GetEmbeddingAsync(string text)
    {
        return await _aiProvider.EmbedAsync(text);
    }

    // Batch embeddings (more efficient)
    public async Task<IReadOnlyList<float[]>> GetEmbeddingsAsync(
        IEnumerable<string> texts)
    {
        return await _aiProvider.EmbedBatchAsync(texts);
    }

    // For semantic search
    public async Task<IReadOnlyList<SearchResult>> SearchAsync(
        string query,
        IReadOnlyList<Document> documents)
    {
        // Generate query embedding
        var queryEmbedding = await _aiProvider.EmbedAsync(query);

        // Calculate similarity with each document
        var results = documents.Select(doc =>
        {
            var similarity = CosineSimilarity(queryEmbedding, doc.Embedding);
            return new SearchResult
            {
                Document = doc,
                Score = similarity
            };
        })
        .OrderByDescending(r => r.Score)
        .Take(10)
        .ToList();

        return results;
    }

    private float CosineSimilarity(float[] a, float[] b)
    {
        float dot = 0, normA = 0, normB = 0;
        for (int i = 0; i < a.Length; i++)
        {
            dot += a[i] * b[i];
            normA += a[i] * a[i];
            normB += b[i] * b[i];
        }
        return dot / (MathF.Sqrt(normA) * MathF.Sqrt(normB));
    }
}
```

### JSON Mode

```csharp
public async Task<ProductInfo> ExtractProductInfoAsync(string description)
{
    var request = new FlexAIChatRequest
    {
        Messages = new List<FlexAIMessage>
        {
            FlexAIMessage.System("Extract product information and return as JSON."),
            FlexAIMessage.User(description)
        },
        JsonMode = true  // Forces JSON response
    };

    var response = await _aiProvider.ChatAsync(request);
    return JsonSerializer.Deserialize<ProductInfo>(response.Message.Content);
}
```

## Key Points to Consider

### Provider Comparison

| Feature                   | Azure OpenAI | OpenAI    | Anthropic | Gemini    | Ollama  |
| ------------------------- | ------------ | --------- | --------- | --------- | ------- |
| **Chat**                  | ✅            | ✅         | ✅         | ✅         | ✅       |
| **Streaming**             | ✅            | ✅         | ✅         | ✅         | ✅       |
| **Embeddings**            | ✅            | ✅         | ❌         | ✅         | ✅       |
| **Tool/Function Calling** | ✅            | ✅         | ✅         | ✅         | ⚠️      |
| **JSON Mode**             | ✅            | ✅         | ✅         | ✅         | ⚠️      |
| **Local/Offline**         | ❌            | ❌         | ❌         | ❌         | ✅       |
| **Data Privacy**          | High         | Medium    | Medium    | Medium    | Highest |
| **Cost**                  | Per token    | Per token | Per token | Per token | Free    |

### Best Practices

1. **Use System Prompts** - Guide AI behavior consistently
2. **Handle Errors** - Wrap calls in try-catch, check for rate limits
3. **Cache Responses** - Store common queries to reduce costs
4. **Stream for UX** - Use streaming for better user experience
5. **Batch Embeddings** - More efficient than individual calls
6. **Monitor Costs** - Track token usage with `response.Usage`
7. **Test Locally** - Use Ollama for development before cloud deployment

### Cost Optimization

```csharp
public class CostOptimizedAIService
{
    private readonly IFlexAIProvider _cloudProvider;  // OpenAI/Azure
    private readonly IFlexAIProvider _localProvider;  // Ollama
    private readonly IMemoryCache _cache;

    public async Task<string> AskAsync(string question)
    {
        // 1. Check cache first
        var cacheKey = $"ai_response_{question.GetHashCode()}";
        if (_cache.TryGetValue<string>(cacheKey, out var cached))
        {
            return cached;
        }

        // 2. Use local model for simple queries
        if (IsSimpleQuery(question))
        {
            var response = await _localProvider.ChatAsync(question);
            _cache.Set(cacheKey, response, TimeSpan.FromHours(24));
            return response;
        }

        // 3. Use cloud model for complex queries
        var cloudResponse = await _cloudProvider.ChatAsync(question);
        _cache.Set(cacheKey, cloudResponse, TimeSpan.FromHours(24));
        return cloudResponse;
    }

    private bool IsSimpleQuery(string question)
    {
        // Simple heuristic - adjust based on your needs
        return question.Length < 100 && !question.Contains("analyze");
    }
}
```

### Multiple Providers

```csharp
// Register multiple providers with keyed services (.NET 8+)
builder.Services.AddKeyedSingleton<IFlexAIProvider>("openai", (sp, _) =>
    new FlexOpenAIProvider(Configuration["OpenAI:ApiKey"]!));

builder.Services.AddKeyedSingleton<IFlexAIProvider>("local", (sp, _) =>
    new OllamaProvider());

// Use in service
public class AIService
{
    private readonly IFlexAIProvider _openai;
    private readonly IFlexAIProvider _local;

    public AIService(
        [FromKeyedServices("openai")] IFlexAIProvider openai,
        [FromKeyedServices("local")] IFlexAIProvider local)
    {
        _openai = openai;
        _local = local;
    }

    public async Task<string> ProcessAsync(string input, bool useLocal = false)
    {
        var provider = useLocal ? _local : _openai;
        return await provider.ChatAsync(input);
    }
}
```

### Error Handling

```csharp
public async Task<string> SafeChatAsync(string message)
{
    try
    {
        return await _aiProvider.ChatAsync(message);
    }
    catch (HttpRequestException ex) when (ex.StatusCode == HttpStatusCode.Unauthorized)
    {
        _logger.LogError("Invalid API key");
        throw new InvalidOperationException("AI service authentication failed", ex);
    }
    catch (HttpRequestException ex) when (ex.StatusCode == HttpStatusCode.TooManyRequests)
    {
        _logger.LogWarning("Rate limited - implementing retry");
        await Task.Delay(TimeSpan.FromSeconds(5));
        return await _aiProvider.ChatAsync(message);
    }
    catch (TaskCanceledException)
    {
        _logger.LogWarning("Request timed out");
        throw new TimeoutException("AI service request timed out");
    }
    catch (NotSupportedException ex)
    {
        _logger.LogError("Operation not supported: {Message}", ex.Message);
        throw;
    }
}
```

## Examples

### Complete RAG Implementation

See [Vector Store documentation](https://docs.flexbase.in/data-stores/vector-store#rag-retrieval-augmented-generation) for complete RAG examples.

### Content Moderation

```csharp
public class ContentModerationService
{
    private readonly IFlexAIProvider _aiProvider;

    public async Task<ModerationResult> ModerateAsync(string content)
    {
        var prompt = $"""
            Analyze the following content for inappropriate material.
            Return JSON with: {{ "isAppropriate": bool, "reason": string }}
            
            Content: {content}
            """;

        var request = new FlexAIChatRequest
        {
            Messages = new List<FlexAIMessage>
            {
                FlexAIMessage.System("You are a content moderation assistant."),
                FlexAIMessage.User(prompt)
            },
            JsonMode = true
        };

        var response = await _aiProvider.ChatAsync(request);
        return JsonSerializer.Deserialize<ModerationResult>(response.Message.Content);
    }
}
```

### Summarization

```csharp
public async Task<string> SummarizeAsync(string longText, int maxWords = 100)
{
    var prompt = $"""
        Summarize the following text in {maxWords} words or less:
        
        {longText}
        """;

    return await _aiProvider.ChatAsync(
        message: prompt,
        systemPrompt: "You are a professional summarization assistant."
    );
}
```

## Testing

```csharp
public class AIServiceTests
{
    [Fact]
    public async Task ChatAsync_ReturnsResponse()
    {
        // Arrange
        var provider = new OllamaProvider(); // Use local for testing
        
        // Act
        var response = await provider.ChatAsync("Say hello");
        
        // Assert
        Assert.NotEmpty(response);
    }

    [Fact]
    public async Task EmbedAsync_ReturnsVector()
    {
        // Arrange
        var provider = new OllamaProvider();
        
        // Act
        var embedding = await provider.EmbedAsync("test text");
        
        // Assert
        Assert.NotEmpty(embedding);
        Assert.True(embedding.Length > 0);
    }
}
```

## See Also

* [Vector Store](https://docs.flexbase.in/data-and-providers/data-stores/vector-store) - Use embeddings for semantic search
* [FlexAI Providers Reference](https://github.com/sumeru-flexbase/flexbase-docs/blob/main/ReferenceDocs/README_FlexAI_Providers.md) - Detailed API documentation
* [FlexAI RAG Guide](https://github.com/sumeru-flexbase/flexbase-docs/blob/main/ReferenceDocs/README_FlexAI_RAG_PrivateData.md) - RAG implementation
* [FlexAI Logging](https://github.com/sumeru-flexbase/flexbase-docs/blob/main/ReferenceDocs/README_FlexAI_Logging.md) - Logging configuration
