# Ollama

## Description

The Ollama provider integrates local LLM chat + embeddings behind `IFlexAIProvider`.

Provider capabilities (based on the implementation):

* Chat completions: `ChatAsync(...)`
* Streaming chat: `ChatStreamAsync(...)`
* Embeddings: `EmbedAsync(...)`

## Important concepts

* Application code should depend on `IFlexAIProvider`.
* Flex generates Queries/Handlers that consume `IFlexAIProvider`; you only register the provider.
* Defaults (when not overridden by request/config):
  * Chat model: `llama3.2`
  * Embedding model: `nomic-embed-text`
  * Base URL: `http://localhost:11434`

## Configuration in DI

```csharp
services.AddFlexOllama(configuration);
```

## appsettings.json

Configuration section: `FlexBase:AI:Ollama`

```json
{
  "FlexBase": {
	"AI": {
	  "Ollama": {
		"BaseUrl": "http://localhost:11434",
		"DefaultModel": "llama3.2",
		"DefaultEmbeddingModel": "nomic-embed-text",
		"Timeout": "00:05:00"
	  }
	}
  }
}
```

## Examples (template-based)

These examples mirror the generated Query and PostBus handler templates. You do **not** register these types manually.

### Query: generate a completion

```csharp
using Microsoft.Extensions.Logging;
using Sumeru.Flex;
using System.Collections.Generic;
using System.Threading.Tasks;

namespace {YourApplication}.Queries.AI;

public class GenerateChatCompletionGetSingle : FlexiQueryBridgeAsync<ChatCompletionDto>
{
	protected readonly ILogger<GenerateChatCompletionGetSingle> _logger;
	protected readonly IFlexHost _flexHost;
	protected readonly IFlexAIProvider _aiProvider;
	protected FlexAppContextBridge _flexAppContext;
	protected GenerateChatCompletionParams _params;

	public GenerateChatCompletionGetSingle(
		ILogger<GenerateChatCompletionGetSingle> logger,
		IFlexHost flexHost,
		IFlexAIProvider aiProvider)
	{
		_logger = logger;
		_flexHost = flexHost;
		_aiProvider = aiProvider;
	}

	public virtual GenerateChatCompletionGetSingle AssignParameters(GenerateChatCompletionParams @params)
	{
		_params = @params;
		return this;
	}

	public virtual async Task<ChatCompletionDto?> Fetch()
	{
		_flexAppContext = _params.GetAppContext();

		var response = await _aiProvider.ChatAsync(new FlexAIChatRequest
		{
			Model = _params.Model,
			Messages = _params.Messages,
			Temperature = _params.Temperature,
			MaxTokens = _params.MaxTokens
		});

		return new ChatCompletionDto { Response = response.Message.Content };
	}
}

public class GenerateChatCompletionParams : DtoBridge
{
	public string? Model { get; set; }
	public List<FlexAIMessage> Messages { get; set; } = new();
	public float? Temperature { get; set; }
	public int? MaxTokens { get; set; }
}

public class ChatCompletionDto : DtoBridge
{
	public string Response { get; set; }
}
```

### PostBus handler: generate a completion

```csharp
using Microsoft.Extensions.Logging;
using Sumeru.Flex;
using System.Threading.Tasks;

namespace {YourApplication}.PostBusHandlers.AI;

public partial class GenerateChatCompletionHandler : IGenerateChatCompletionHandler
{
	protected string EventCondition = string.Empty;
	protected readonly ILogger<GenerateChatCompletionHandler> _logger;
	protected readonly IFlexHost _flexHost;
	protected readonly IFlexAIProvider _aiProvider;

	protected FlexAppContextBridge? _flexAppContext;

	public GenerateChatCompletionHandler(
		ILogger<GenerateChatCompletionHandler> logger,
		IFlexHost flexHost,
		IFlexAIProvider aiProvider)
	{
		_logger = logger;
		_flexHost = flexHost;
		_aiProvider = aiProvider;
	}

	public virtual async Task Execute(GenerateChatCompletionCommand cmd, IFlexServiceBusContext serviceBusContext)
	{
		_flexAppContext = cmd.Dto.GetAppContext();  //do not remove this line

		var response = await _aiProvider.ChatAsync(new FlexAIChatRequest
		{
			Model = cmd.Dto.Model,
			Messages = cmd.Dto.Messages,
			Temperature = cmd.Dto.Temperature,
			MaxTokens = cmd.Dto.MaxTokens
		});

		cmd.Dto.Response = response.Message.Content;
		await this.Fire(EventCondition, serviceBusContext);
	}
}
```

## Provider considerations

* **Local runtime**: make sure Ollama is running and the model is pulled (`ollama pull llama3.2`, `ollama pull nomic-embed-text`).
* **Token usage**: usage/tokens are often unavailable from local providers; treat them as “best effort”.
* **Capacity**: CPU/GPU saturation is the common bottleneck; keep concurrency conservative for stable latency.
