# AI Providers

## Description

AI Providers in FlexBase expose a unified interface (`IFlexAIProvider`) for chat completions, streaming, and embeddings across OpenAI, Azure OpenAI, Gemini, Ollama, and Anthropic.

Your application code should depend on `IFlexAIProvider`. Provider-specific setup is isolated to generated provider registrations.

## Important concepts

* **`IFlexAIProvider` is the contract**: app code calls `ChatAsync(...)`, `ChatStreamAsync(...)`, `EmbedAsync(...)`, `GetModelsAsync(...)`, and `TestConnectionAsync(...)`.
* **Provider bridge**: generated infrastructure registers an `IFlexAIProviderBridge` (which also implements `IFlexAIProvider`) so you can override behavior safely.
* **Streaming is first-class**: OpenAI/Azure OpenAI/Anthropic streaming surfaces usage and tool-call deltas; Ollama streaming yields text deltas but typically no token usage.
* **Embeddings are provider-dependent**: Anthropic Claude throws `NotSupportedException` for embeddings; use OpenAI/Azure OpenAI/Gemini/Ollama for vector generation.

## Configuration in DI

Add the provider in your DI composition root (commonly in `EndPoints/...CommonConfigs/OtherApplicationServicesConfig.cs` or wherever you centralize registrations).

Only register the provider—Flex auto-wires generated Queries/Handlers that consume `IFlexAIProvider`.

```csharp
// using Sumeru.Flex; // IFlexAIProvider

public static class OtherApplicationServicesConfig
{
    public static IServiceCollection AddOtherApplicationServices(
        this IServiceCollection services,
        IConfiguration configuration)
    {
        // Pick ONE (or register multiple with different compositions).
        services.AddFlexOpenAI(configuration);
        // services.AddFlexAzureOpenAI(configuration);
        // services.AddFlexGemini(configuration);
        // services.AddFlexOllama(configuration);
        // services.AddFlexAnthropic(configuration);

        return services;
    }
}
```

## appsettings.json

AI provider configuration is read from `FlexBase:AI:<Provider>`.

```json
{
  "FlexBase": {
    "AI": {
      "OpenAI": {
        "ApiKey": "<store-in-secrets>",
        "DefaultChatModel": "gpt-4o",
        "DefaultEmbeddingModel": "text-embedding-3-small",
        "OrganizationId": null,
        "MaxRetries": 3,
        "Timeout": "00:01:00"
      },
      "AzureOpenAI": {
        "Endpoint": "https://your-resource.openai.azure.com/",
        "ApiKey": "<store-in-secrets>",
        "DeploymentName": "gpt-4o",
        "DefaultChatModel": "gpt-4o",
        "DefaultEmbeddingModel": "text-embedding-3-small",
        "ApiVersion": "2024-02-01"
      },
      "Gemini": {
        "ApiKey": "<store-in-secrets>",
        "DefaultModel": "gemini-2.0-flash",
        "DefaultEmbeddingModel": "text-embedding-004"
      },
      "Ollama": {
        "BaseUrl": "http://localhost:11434",
        "DefaultModel": "llama3.2",
        "DefaultEmbeddingModel": "nomic-embed-text",
        "Timeout": "00:05:00"
      },
      "Anthropic": {
        "ApiKey": "<store-in-secrets>",
        "DefaultModel": "claude-3-5-sonnet-20241022",
        "MaxTokens": 4096,
        "Timeout": "00:02:00"
      }
    }
  }
}
```

## Examples (template-based)

These examples mirror the generated Query and PostBus handler templates. You do **not** register these types manually—Flex discovers and wires generated Queries/Handlers/Plugins automatically.

### Chat completion (Query)

```csharp
using Microsoft.Extensions.Logging;
using Sumeru.Flex;
using System.Collections.Generic;
using System.Threading.Tasks;

namespace {YourApplication}.Queries.AI;

/// <summary>
/// Query to generate a chat completion using the AI Provider.
/// </summary>
public class GenerateChatCompletionGetSingle : FlexiQueryBridgeAsync<ChatCompletionDto>
{
    protected readonly ILogger<GenerateChatCompletionGetSingle> _logger;
    protected readonly IFlexHost _flexHost;
    protected readonly IFlexAIProvider _aiProvider;
    protected FlexAppContextBridge _flexAppContext;
    protected GenerateChatCompletionParams _params;

    public GenerateChatCompletionGetSingle(
        ILogger<GenerateChatCompletionGetSingle> logger,
        IFlexHost flexHost,
        IFlexAIProvider aiProvider)
    {
        _logger = logger;
        _flexHost = flexHost;
        _aiProvider = aiProvider;
    }

    public virtual GenerateChatCompletionGetSingle AssignParameters(GenerateChatCompletionParams @params)
    {
        _params = @params;
        return this;
    }

    public virtual async Task<ChatCompletionDto?> Fetch()
    {
        _flexAppContext = _params.GetAppContext();

        var response = await _aiProvider.ChatAsync(new FlexAIChatRequest
        {
            Model = _params.Model,
            Messages = _params.Messages,
            Temperature = _params.Temperature,
            MaxTokens = _params.MaxTokens
        });

        return new ChatCompletionDto
        {
            Response = response.Message.Content
        };
    }
}

public class GenerateChatCompletionParams : DtoBridge
{
    public string? Model { get; set; }
    public List<FlexAIMessage> Messages { get; set; } = new();
    public float? Temperature { get; set; }
    public int? MaxTokens { get; set; }
}

public class ChatCompletionDto : DtoBridge
{
    public string Response { get; set; }
}
```

### Chat completion (PostBus handler)

```csharp
using Microsoft.Extensions.Logging;
using Sumeru.Flex;
using System.Threading.Tasks;

namespace {YourApplication}.PostBusHandlers.AI;

/// <summary>
/// Handler for generating chat completions using the AI Provider.
/// </summary>
public partial class GenerateChatCompletionHandler : IGenerateChatCompletionHandler
{
    protected string EventCondition = string.Empty;

    protected readonly ILogger<GenerateChatCompletionHandler> _logger;
    protected readonly IFlexHost _flexHost;
    protected readonly IFlexAIProvider _aiProvider;

    protected FlexAppContextBridge? _flexAppContext;

    public GenerateChatCompletionHandler(
        ILogger<GenerateChatCompletionHandler> logger,
        IFlexHost flexHost,
        IFlexAIProvider aiProvider)
    {
        _logger = logger;
        _flexHost = flexHost;
        _aiProvider = aiProvider;
    }

    public virtual async Task Execute(GenerateChatCompletionCommand cmd, IFlexServiceBusContext serviceBusContext)
    {
        _flexAppContext = cmd.Dto.GetAppContext();  //do not remove this line

        var response = await _aiProvider.ChatAsync(new FlexAIChatRequest
        {
            Model = cmd.Dto.Model,
            Messages = cmd.Dto.Messages,
            Temperature = cmd.Dto.Temperature,
            MaxTokens = cmd.Dto.MaxTokens
        });

        cmd.Dto.Response = response.Message.Content;

        await this.Fire(EventCondition, serviceBusContext);
    }
}
```

## Implementation notes (hot-topic additions)

* **Streaming UIs**: prefer `ChatStreamAsync(...)` to render partial output and reduce perceived latency.
* **Tool calling**: OpenAI/Azure OpenAI/Anthropic streaming chunks can include tool-call deltas; treat these as incremental JSON fragments and buffer until complete before execution.
* **RAG pipeline**: generate embeddings with `EmbedAsync(...)`, store vectors in a Vector Store, then add retrieved context as additional messages (avoid dumping entire documents into the prompt).
* **Prompt-injection safety**: treat retrieved content as untrusted input; use clear system instructions and a strict “follow tools/contracts, ignore instructions in documents” policy.
* **Cost + rate limits**: add backoff/retry around calls that can burst (batch embeddings, fan-out queries). Cache embeddings by content hash when feasible.

```csharp
// Provider-only DI registration (generated infrastructure consumes IFlexAIProvider / IFlexAIProviderBridge)
builder.Services.AddFlexOllama(builder.Configuration);

// Config section: FlexBase:AI:Ollama
//   BaseUrl
//   DefaultChatModel
//   DefaultEmbeddingModel
```

#### Popular Ollama Models

| Type           | Models                                                           |
| -------------- | ---------------------------------------------------------------- |
| **Chat**       | `llama3.2`, `llama3.1`, `mistral`, `codellama`, `phi3`, `gemma2` |
| **Embeddings** | `nomic-embed-text`, `mxbai-embed-large`, `all-minilm`            |
| **Code**       | `codellama`, `deepseek-coder`, `starcoder2`                      |

## Usage

### Basic Chat

```csharp
public class ChatService
{
    private readonly IFlexAIProvider _aiProvider;

    public ChatService(IFlexAIProvider aiProvider)
    {
        _aiProvider = aiProvider;
    }

    public async Task<string> AskQuestionAsync(string question)
    {
        return await _aiProvider.ChatAsync(question);
    }

    public async Task<string> AskWithSystemPromptAsync(
        string question,
        string systemPrompt)
    {
        return await _aiProvider.ChatAsync(
            message: question,
            systemPrompt: systemPrompt
        );
    }
}
```

### Advanced Chat with Options

```csharp
public async Task<FlexAIChatResponse> ChatWithOptionsAsync(string message)
{
    var request = new FlexAIChatRequest
    {
        Messages = new List<FlexAIMessage>
        {
            FlexAIMessage.System("You are a helpful assistant."),
            FlexAIMessage.User(message)
        },
        Model = "gpt-4o",
        Temperature = 0.7f,      // Creativity (0.0 - 2.0)
        MaxTokens = 500,         // Response length limit
        TopP = 0.9f,             // Nucleus sampling
        FrequencyPenalty = 0.5f, // Reduce repetition
        PresencePenalty = 0.5f   // Encourage new topics
    };

    return await _aiProvider.ChatAsync(request);
}
```

### Multi-turn Conversation

```csharp
public class ConversationService
{
    private readonly IFlexAIProvider _aiProvider;
    private readonly List<FlexAIMessage> _history = new();

    public async Task<string> SendMessageAsync(string userMessage)
    {
        // Add user message to history
        _history.Add(FlexAIMessage.User(userMessage));

        // Get AI response
        var request = new FlexAIChatRequest
        {
            Messages = _history
        };

        var response = await _aiProvider.ChatAsync(request);

        // Add AI response to history
        _history.Add(response.Message);

        return response.Message.Content;
    }

    public void ClearHistory()
    {
        _history.Clear();
    }
}
```

### Streaming Responses

```csharp
public async Task StreamResponseAsync(string question)
{
    Console.Write("AI: ");
    
    await foreach (var chunk in _aiProvider.ChatStreamAsync(question))
    {
        Console.Write(chunk);
    }
    
    Console.WriteLine();
}

// With full control
public async Task StreamWithControlAsync(string question)
{
    var request = new FlexAIChatRequest
    {
        Messages = new List<FlexAIMessage>
        {
            FlexAIMessage.User(question)
        },
        Temperature = 0.5f
    };

    var fullResponse = new StringBuilder();
    
    await foreach (var chunk in _aiProvider.ChatStreamAsync(request))
    {
        if (chunk.ContentDelta != null)
        {
            Console.Write(chunk.ContentDelta);
            fullResponse.Append(chunk.ContentDelta);
        }
        
        if (chunk.FinishReason == FlexAIFinishReason.Stop)
        {
            Console.WriteLine($"\n\nTokens used: {chunk.Usage?.TotalTokens}");
        }
    }

    return fullResponse.ToString();
}
```

### Generate Embeddings

```csharp
public class EmbeddingService
{
    private readonly IFlexAIProvider _aiProvider;

    // Single embedding
    public async Task<float[]> GetEmbeddingAsync(string text)
    {
        return await _aiProvider.EmbedAsync(text);
    }

    // Batch embeddings (more efficient)
    public async Task<IReadOnlyList<float[]>> GetEmbeddingsAsync(
        IEnumerable<string> texts)
    {
        return await _aiProvider.EmbedBatchAsync(texts);
    }

    // For semantic search
    public async Task<IReadOnlyList<SearchResult>> SearchAsync(
        string query,
        IReadOnlyList<Document> documents)
    {
        // Generate query embedding
        var queryEmbedding = await _aiProvider.EmbedAsync(query);

        // Calculate similarity with each document
        var results = documents.Select(doc =>
        {
            var similarity = CosineSimilarity(queryEmbedding, doc.Embedding);
            return new SearchResult
            {
                Document = doc,
                Score = similarity
            };
        })
        .OrderByDescending(r => r.Score)
        .Take(10)
        .ToList();

        return results;
    }

    private float CosineSimilarity(float[] a, float[] b)
    {
        float dot = 0, normA = 0, normB = 0;
        for (int i = 0; i < a.Length; i++)
        {
            dot += a[i] * b[i];
            normA += a[i] * a[i];
            normB += b[i] * b[i];
        }
        return dot / (MathF.Sqrt(normA) * MathF.Sqrt(normB));
    }
}
```

### JSON Mode

```csharp
public async Task<ProductInfo> ExtractProductInfoAsync(string description)
{
    var request = new FlexAIChatRequest
    {
        Messages = new List<FlexAIMessage>
        {
            FlexAIMessage.System("Extract product information and return as JSON."),
            FlexAIMessage.User(description)
        },
        JsonMode = true  // Forces JSON response
    };

    var response = await _aiProvider.ChatAsync(request);
    return JsonSerializer.Deserialize<ProductInfo>(response.Message.Content);
}
```

## Key Points to Consider

### Provider Comparison

| Feature                   | Azure OpenAI | OpenAI    | Anthropic | Gemini    | Ollama  |
| ------------------------- | ------------ | --------- | --------- | --------- | ------- |
| **Chat**                  | ✅            | ✅         | ✅         | ✅         | ✅       |
| **Streaming**             | ✅            | ✅         | ✅         | ✅         | ✅       |
| **Embeddings**            | ✅            | ✅         | ❌         | ✅         | ✅       |
| **Tool/Function Calling** | ✅            | ✅         | ✅         | ✅         | ⚠️      |
| **JSON Mode**             | ✅            | ✅         | ✅         | ✅         | ⚠️      |
| **Local/Offline**         | ❌            | ❌         | ❌         | ❌         | ✅       |
| **Data Privacy**          | High         | Medium    | Medium    | Medium    | Highest |
| **Cost**                  | Per token    | Per token | Per token | Per token | Free    |

### Best Practices

1. **Use System Prompts** - Guide AI behavior consistently
2. **Handle Errors** - Wrap calls in try-catch, check for rate limits
3. **Cache Responses** - Store common queries to reduce costs
4. **Stream for UX** - Use streaming for better user experience
5. **Batch Embeddings** - More efficient than individual calls
6. **Monitor Costs** - Track token usage with `response.Usage`
7. **Test Locally** - Use Ollama for development before cloud deployment

### Cost Optimization

```csharp
public class CostOptimizedAIService
{
    private readonly IFlexAIProvider _cloudProvider;  // OpenAI/Azure
    private readonly IFlexAIProvider _localProvider;  // Ollama
    private readonly IMemoryCache _cache;

    public async Task<string> AskAsync(string question)
    {
        // 1. Check cache first
        var cacheKey = $"ai_response_{question.GetHashCode()}";
        if (_cache.TryGetValue<string>(cacheKey, out var cached))
        {
            return cached;
        }

        // 2. Use local model for simple queries
        if (IsSimpleQuery(question))
        {
            var response = await _localProvider.ChatAsync(question);
            _cache.Set(cacheKey, response, TimeSpan.FromHours(24));
            return response;
        }

        // 3. Use cloud model for complex queries
        var cloudResponse = await _cloudProvider.ChatAsync(question);
        _cache.Set(cacheKey, cloudResponse, TimeSpan.FromHours(24));
        return cloudResponse;
    }

    private bool IsSimpleQuery(string question)
    {
        // Simple heuristic - adjust based on your needs
        return question.Length < 100 && !question.Contains("analyze");
    }
}
```

### Multiple Providers

```csharp
// Register multiple providers with keyed services (.NET 8+)
builder.Services.AddKeyedSingleton<IFlexAIProvider>("openai", (sp, _) =>
    new FlexOpenAIProvider(Configuration["OpenAI:ApiKey"]!));

builder.Services.AddKeyedSingleton<IFlexAIProvider>("local", (sp, _) =>
    new OllamaProvider());

// Use in service
public class AIService
{
    private readonly IFlexAIProvider _openai;
    private readonly IFlexAIProvider _local;

    public AIService(
        [FromKeyedServices("openai")] IFlexAIProvider openai,
        [FromKeyedServices("local")] IFlexAIProvider local)
    {
        _openai = openai;
        _local = local;
    }

    public async Task<string> ProcessAsync(string input, bool useLocal = false)
    {
        var provider = useLocal ? _local : _openai;
        return await provider.ChatAsync(input);
    }
}
```

### Error Handling

```csharp
public async Task<string> SafeChatAsync(string message)
{
    try
    {
        return await _aiProvider.ChatAsync(message);
    }
    catch (HttpRequestException ex) when (ex.StatusCode == HttpStatusCode.Unauthorized)
    {
        _logger.LogError("Invalid API key");
        throw new InvalidOperationException("AI service authentication failed", ex);
    }
    catch (HttpRequestException ex) when (ex.StatusCode == HttpStatusCode.TooManyRequests)
    {
        _logger.LogWarning("Rate limited - implementing retry");
        await Task.Delay(TimeSpan.FromSeconds(5));
        return await _aiProvider.ChatAsync(message);
    }
    catch (TaskCanceledException)
    {
        _logger.LogWarning("Request timed out");
        throw new TimeoutException("AI service request timed out");
    }
    catch (NotSupportedException ex)
    {
        _logger.LogError("Operation not supported: {Message}", ex.Message);
        throw;
    }
}
```

## Examples

### Complete RAG Implementation

See [Vector Store documentation](/data-and-providers/data-stores/vector-store.md#rag-retrieval-augmented-generation) for complete RAG examples.

### Content Moderation

```csharp
public class ContentModerationService
{
    private readonly IFlexAIProvider _aiProvider;

    public async Task<ModerationResult> ModerateAsync(string content)
    {
        var prompt = $"""
            Analyze the following content for inappropriate material.
            Return JSON with: {{ "isAppropriate": bool, "reason": string }}
            
            Content: {content}
            """;

        var request = new FlexAIChatRequest
        {
            Messages = new List<FlexAIMessage>
            {
                FlexAIMessage.System("You are a content moderation assistant."),
                FlexAIMessage.User(prompt)
            },
            JsonMode = true
        };

        var response = await _aiProvider.ChatAsync(request);
        return JsonSerializer.Deserialize<ModerationResult>(response.Message.Content);
    }
}
```

### Summarization

```csharp
public async Task<string> SummarizeAsync(string longText, int maxWords = 100)
{
    var prompt = $"""
        Summarize the following text in {maxWords} words or less:
        
        {longText}
        """;

    return await _aiProvider.ChatAsync(
        message: prompt,
        systemPrompt: "You are a professional summarization assistant."
    );
}
```

## Testing

```csharp
public class AIServiceTests
{
    [Fact]
    public async Task ChatAsync_ReturnsResponse()
    {
        // Arrange
        var provider = new OllamaProvider(); // Use local for testing
        
        // Act
        var response = await provider.ChatAsync("Say hello");
        
        // Assert
        Assert.NotEmpty(response);
    }

    [Fact]
    public async Task EmbedAsync_ReturnsVector()
    {
        // Arrange
        var provider = new OllamaProvider();
        
        // Act
        var embedding = await provider.EmbedAsync("test text");
        
        // Assert
        Assert.NotEmpty(embedding);
        Assert.True(embedding.Length > 0);
    }
}
```

## See Also

* [Vector Store](/data-and-providers/data-stores/vector-store.md) - Use embeddings for semantic search
* [FlexAI Providers Reference](https://github.com/sumeru-flexbase/flexbase-docs/blob/main/ReferenceDocs/README_FlexAI_Providers.md) - Detailed API documentation
* [FlexAI RAG Guide](https://github.com/sumeru-flexbase/flexbase-docs/blob/main/ReferenceDocs/README_FlexAI_RAG_PrivateData.md) - RAG implementation
* [FlexAI Logging](https://github.com/sumeru-flexbase/flexbase-docs/blob/main/ReferenceDocs/README_FlexAI_Logging.md) - Logging configuration


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.flexbase.in/data-and-providers/providers/ai-providers.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
