Ollama

Description

The Ollama provider integrates local LLM chat + embeddings behind IFlexAIProvider.

Provider capabilities (based on the implementation):

  • Chat completions: ChatAsync(...)

  • Streaming chat: ChatStreamAsync(...)

  • Embeddings: EmbedAsync(...)

Important concepts

  • Application code should depend on IFlexAIProvider.

  • Flex generates Queries/Handlers that consume IFlexAIProvider; you only register the provider.

  • Defaults (when not overridden by request/config):

    • Chat model: llama3.2

    • Embedding model: nomic-embed-text

    • Base URL: http://localhost:11434

Configuration in DI

appsettings.json

Configuration section: FlexBase:AI:Ollama

Examples (template-based)

These examples mirror the generated Query and PostBus handler templates. You do not register these types manually.

Query: generate a completion

PostBus handler: generate a completion

Provider considerations

  • Local runtime: make sure Ollama is running and the model is pulled (ollama pull llama3.2, ollama pull nomic-embed-text).

  • Token usage: usage/tokens are often unavailable from local providers; treat them as “best effort”.

  • Capacity: CPU/GPU saturation is the common bottleneck; keep concurrency conservative for stable latency.

Last updated