Sections
Backend
Generation
Terminal typing animation
Prompts
Persistent memory
Retrieval (RAG)
- Source syntax
- Advanced retrieval tuning
A note on defaults
See Also

chat_llm.ini

ctrl/chat_llm.ini configures the LLM-backed chat engine — which model to talk to, how it generates replies, what it remembers, and what it retrieves for grounding. It is read by both the chat engine and the RAG index builder.

Sections

The file is organised into a [default] section plus optional per-persona sections. A persona section is named for the persona's code — the value passed to the IRC adapter, or the persona a Guru's chat module runs under:

[default]
endpoint = http://localhost:11434/v1/chat/completions
model    = qwen2.5:14b
 
[guru]
; overrides for the "guru" persona only
temperature = 0.2

Any key absent from a persona section falls back to [default]. This lets one host run several personas (a terminal Guru, an IRC bot, a multinode helper) off one file with shared backend settings and per-persona prompts.

Backend

Key	Default	Purpose
`provider`	`openai`	API dialect. `openai` covers any OpenAI-compatible endpoint (Ollama, Groq, OpenRouter, llama.cpp, LM Studio, OpenAI itself). `gemini` (Google native) is reserved, not yet implemented.
`endpoint`	(required)	Full URL of the chat-completions endpoint, e.g. `http://localhost:11434/v1/chat/completions`.
`model`	(required)	Model identifier sent to the endpoint, e.g. `qwen2.5:14b`.
`api_key`	`placeholder`	Bearer token. Ollama ignores it but it must be non-empty; real providers (Groq, OpenRouter, OpenAI) require their actual key.
`keep_alive`	(empty)	Optional per-request model-retention hint. Ollama's OpenAI-compatible endpoint ignores this — for durable retention with Ollama set the `OLLAMA_KEEP_ALIVE` environment variable on the Ollama server instead. Some other backends honor values like `5m`, `-1` (forever), `0`.

Generation

Key	Default	Purpose
`max_tokens`	`500`	Maximum reply length in tokens.
`temperature`	`0.7`	Sampling temperature. Lower (e.g. `0.3`) is more conservative and less likely to fabricate; higher is more creative.
`timeout`	`60`	Request timeout in seconds. Set generously — a cold model load on the first call can take 15–30s for a 7B and longer for bigger models.

Terminal typing animation

These only affect the terminal (private) path, where the reply is “typed out” character by character.

Key	Default	Purpose
`typing_speed_factor`	`2.0`	Multiplier on the per-character delay. `0` disables the animation (instant output); `1.0` matches the legacy Guru's 25–150 ms/char; higher is snappier.
`simulate_typos`	`true`	Whether the animation includes fat-finger and transposition typos (with corrections). At very high speeds the corrections flash by oddly; some sysops prefer `false`.

Prompts

Long prompts live in plain-text files under ctrl/ (editable in any editor, no line-length or escaping pain). Each has an inline alternative for short prompts; the file form wins if both are set.

Key	Bundled file	Purpose
`system_prompt_file` / `system_prompt`	`chat_llm_persona.utf8`	The persona/system prompt for normal turns.
`opening_system_prompt_file` / `opening_system_prompt`	`chat_llm_opening_persona.utf8`	System prompt for the opening turn. Falls back to the normal system prompt if unset.
`opening_prompt_file` / `opening_prompt`	`chat_llm_greeting.utf8`	The line that elicits a greeting at session start. Leave blank to disable the opening greeting (the bot waits silently for the caller to speak).

The .utf8 extension is a reminder to save these files as UTF-8 (what the API expects). Prompts may contain @macro@ tokens, substituted per turn:

Macro	Expands to
`@bot_name@`	Persona display name
`@system_name@`	BBS name
`@system_op@`	Sysop name
`@sync_version@`	Synchronet version string
`@alias@`	Caller's alias
`@real_name@`	Caller's real name (if known)
`@level@`	Caller's security level
`@location@`	Caller's location
`@lang@` / `@lang_name@`	Caller's language code / English name
`@language_directive@`	Assembled “Respond in X.” sentence (English-only fallback)
`@memory_summary@`	Rolling long-term memory; empty when there's no prior context
`@retrieved_context@`	Top RAG hits for this turn; empty when nothing clears the relevance gate

Unknown tokens expand to empty strings.

Persistent memory

The engine keeps a per-(caller, persona) memory file under data/chat/. See chat_llm → Persistent memory for behavior.

Key	Default	Purpose
`memory_persist`	`true`	Master switch. Set `false` for public contexts (e.g. an IRC bot in open channels) where you don't want to retain anything about strangers.
`history_window`	`8`	Number of recent verbatim turns sent to the model each request. Larger = better recall but more tokens and slower prompt processing. Very short inputs automatically use a smaller window.
`summarize_threshold`	`30`	When the stored transcript exceeds this many turns, the oldest turns are compressed into the long-term summary.
`summarize_batch`	`10`	How many of the oldest turns are folded into the summary each time the threshold is crossed.
`memory_summary_in_prompt`	`true`	Whether the rolling summary is injected into the system prompt. Set `false` if your model echoes the summary back into replies.
`memory_max_age_days`	`365`	Memory files untouched for longer than this are dropped on next load. `0` disables age-based pruning.

Retrieval (RAG)

The index builder (jsexec llm_index.js <persona>) builds a BM25 index from configured content sources; the engine queries it each turn and injects the top hits via @retrieved_context@.

Key	Default	Purpose
`index_sources`	`msgbase`	Semicolon-separated list of source crawlers (commas are reserved for a source's own argument list). See below.
`index_output`	`chat/<persona>.idx`	Index file path, relative to the data directory. `<persona>` is replaced with the section name.
`index_top_k`	`5`	How many top-scoring chunks are injected per turn. Lower = less noise; higher = better recall but more risk of drowning the model.
`index_min_score_per_token`	`3.5`	Relevance gate. If the top hit's score per query token (after stopword removal) is below this, nothing is injected and the prompt's “I haven't seen anything about that” rule takes over. Prevents fabricated BBS-local detail.
`index_max_chunks`	`5000`	Cap on total chunks ingested by the builder; oldest are trimmed if exceeded.

Source syntax

Each source names a crawler module under exec/llm_index/. A :arg suffix passes an argument whose meaning is source-specific. The bundled crawlers (see llm_index):

msgbase — all message-base groups. msgbase:Local,DOVE-Net,FsxNet,FidoNet restricts to named groups (recommended — indexing every group lets large networks drown out community content).
filebase — local file-base descriptions.
dokuwiki:<path-to-data/pages> — a local DokuWiki page tree (point at the data/pages subdirectory).

A typical community-grounded configuration:

index_sources = msgbase:Local,DOVE-Net,FsxNet,FidoNet

Advanced retrieval tuning

These are optional and not present in the bundled file:

index_source_weights — per-source score multipliers to favor curated sources, e.g. dokuwiki=2.0,gitlab=1.5. (Documentation-style questions automatically boost the wiki source for that turn.)
index_recency_halflife — recency-decay half-life in days for scoring; 0 (default) disables recency bias.
group_aliases — semicolon-separated alias map so message-base group names in a question match your actual group codes, e.g. fidonet=fidonet; dove,dovenet=dove-net;.

A note on defaults

The bundled ctrl/chat_llm.ini ships with several values tuned more conservatively than the built-in fallbacks above, to reduce fabrication and allow for slow cold model loads. Where your file and this page disagree, your file wins. The notable pre-set overrides are: temperature = 0.3 (vs 0.7), timeout = 180 (vs 60), max_tokens = 300 (vs 500), index_top_k = 2 (vs 5), index_min_score_per_token = 3.0 (vs 3.5), and typing_speed_factor = 1.5 (vs 2.0).

Table of Contents