Ollama
Description
The Ollama provider integrates local LLM chat + embeddings behind IFlexAIProvider.
Provider capabilities (based on the implementation):
Chat completions:
ChatAsync(...)Streaming chat:
ChatStreamAsync(...)Embeddings:
EmbedAsync(...)
Important concepts
Application code should depend on
IFlexAIProvider.Flex generates Queries/Handlers that consume
IFlexAIProvider; you only register the provider.Defaults (when not overridden by request/config):
Chat model:
llama3.2Embedding model:
nomic-embed-textBase URL:
http://localhost:11434
Configuration in DI
appsettings.json
Configuration section: FlexBase:AI:Ollama
Examples (template-based)
These examples mirror the generated Query and PostBus handler templates. You do not register these types manually.
Query: generate a completion
PostBus handler: generate a completion
Provider considerations
Local runtime: make sure Ollama is running and the model is pulled (
ollama pull llama3.2,ollama pull nomic-embed-text).Token usage: usage/tokens are often unavailable from local providers; treat them as “best effort”.
Capacity: CPU/GPU saturation is the common bottleneck; keep concurrency conservative for stable latency.
Last updated