ctrl/chat_llm.ini configures the LLM-backed chat
engine — which model to talk to, how it generates replies, what it
remembers, and what it retrieves for grounding. It is read by both the chat
engine and the RAG index builder.
The file is organised into a [default] section plus optional per-persona
sections. A persona section is named for the persona's code — the value
passed to the IRC adapter, or the persona a Guru's
chat module runs under:
[default] endpoint = http://localhost:11434/v1/chat/completions model = qwen2.5:14b [guru] ; overrides for the "guru" persona only temperature = 0.2
Any key absent from a persona section falls back to [default]. This lets
one host run several personas (a terminal Guru, an IRC bot, a multinode
helper) off one file with shared backend settings and per-persona prompts.
| Key | Default | Purpose |
|---|---|---|
provider | openai | API dialect. openai covers any OpenAI-compatible endpoint (Ollama, Groq, OpenRouter, llama.cpp, LM Studio, OpenAI itself). gemini (Google native) is reserved, not yet implemented. |
endpoint | (required) | Full URL of the chat-completions endpoint, e.g. http://localhost:11434/v1/chat/completions. |
model | (required) | Model identifier sent to the endpoint, e.g. qwen2.5:14b. |
api_key | placeholder | Bearer token. Ollama ignores it but it must be non-empty; real providers (Groq, OpenRouter, OpenAI) require their actual key. |
keep_alive | (empty) | Optional per-request model-retention hint. Ollama's OpenAI-compatible endpoint ignores this — for durable retention with Ollama set the OLLAMA_KEEP_ALIVE environment variable on the Ollama server instead. Some other backends honor values like 5m, -1 (forever), 0. |
| Key | Default | Purpose |
|---|---|---|
max_tokens | 500 | Maximum reply length in tokens. |
temperature | 0.7 | Sampling temperature. Lower (e.g. 0.3) is more conservative and less likely to fabricate; higher is more creative. |
timeout | 60 | Request timeout in seconds. Set generously — a cold model load on the first call can take 15–30s for a 7B and longer for bigger models. |
These only affect the terminal (private) path, where the reply is “typed out” character by character.
| Key | Default | Purpose |
|---|---|---|
typing_speed_factor | 2.0 | Multiplier on the per-character delay. 0 disables the animation (instant output); 1.0 matches the legacy Guru's 25–150 ms/char; higher is snappier. |
simulate_typos | true | Whether the animation includes fat-finger and transposition typos (with corrections). At very high speeds the corrections flash by oddly; some sysops prefer false. |
Long prompts live in plain-text files under ctrl/ (editable in any editor,
no line-length or escaping pain). Each has an inline alternative for short
prompts; the file form wins if both are set.
| Key | Bundled file | Purpose |
|---|---|---|
system_prompt_file / system_prompt | chat_llm_persona.utf8 | The persona/system prompt for normal turns. |
opening_system_prompt_file / opening_system_prompt | chat_llm_opening_persona.utf8 | System prompt for the opening turn. Falls back to the normal system prompt if unset. |
opening_prompt_file / opening_prompt | chat_llm_greeting.utf8 | The line that elicits a greeting at session start. Leave blank to disable the opening greeting (the bot waits silently for the caller to speak). |
The .utf8 extension is a reminder to save these files as UTF-8 (what the
API expects). Prompts may contain @macro@ tokens, substituted per turn:
| Macro | Expands to |
|---|---|
@bot_name@ | Persona display name |
@system_name@ | BBS name |
@system_op@ | Sysop name |
@sync_version@ | Synchronet version string |
@alias@ | Caller's alias |
@real_name@ | Caller's real name (if known) |
@level@ | Caller's security level |
@location@ | Caller's location |
@lang@ / @lang_name@ | Caller's language code / English name |
@language_directive@ | Assembled “Respond in X.” sentence (English-only fallback) |
@memory_summary@ | Rolling long-term memory; empty when there's no prior context |
@retrieved_context@ | Top RAG hits for this turn; empty when nothing clears the relevance gate |
Unknown tokens expand to empty strings.
The engine keeps a per-(caller, persona) memory file under data/chat/.
See chat_llm → Persistent memory for
behavior.
| Key | Default | Purpose |
|---|---|---|
memory_persist | true | Master switch. Set false for public contexts (e.g. an IRC bot in open channels) where you don't want to retain anything about strangers. |
history_window | 8 | Number of recent verbatim turns sent to the model each request. Larger = better recall but more tokens and slower prompt processing. Very short inputs automatically use a smaller window. |
summarize_threshold | 30 | When the stored transcript exceeds this many turns, the oldest turns are compressed into the long-term summary. |
summarize_batch | 10 | How many of the oldest turns are folded into the summary each time the threshold is crossed. |
memory_summary_in_prompt | true | Whether the rolling summary is injected into the system prompt. Set false if your model echoes the summary back into replies. |
memory_max_age_days | 365 | Memory files untouched for longer than this are dropped on next load. 0 disables age-based pruning. |
The index builder (jsexec llm_index.js <persona>)
builds a BM25 index from configured content sources; the engine queries it
each turn and injects the top hits via @retrieved_context@.
| Key | Default | Purpose |
|---|---|---|
index_sources | msgbase | Semicolon-separated list of source crawlers (commas are reserved for a source's own argument list). See below. |
index_output | chat/<persona>.idx | Index file path, relative to the data directory. <persona> is replaced with the section name. |
index_top_k | 5 | How many top-scoring chunks are injected per turn. Lower = less noise; higher = better recall but more risk of drowning the model. |
index_min_score_per_token | 3.5 | Relevance gate. If the top hit's score per query token (after stopword removal) is below this, nothing is injected and the prompt's “I haven't seen anything about that” rule takes over. Prevents fabricated BBS-local detail. |
index_max_chunks | 5000 | Cap on total chunks ingested by the builder; oldest are trimmed if exceeded. |
Each source names a crawler module under exec/llm_index/. A :arg
suffix passes an argument whose meaning is source-specific. The bundled
crawlers (see llm_index):
msgbase — all message-base groups. msgbase:Local,DOVE-Net,FsxNet,FidoNet restricts to named groups (recommended — indexing every group lets large networks drown out community content).filebase — local file-base descriptions.dokuwiki:<path-to-data/pages> — a local DokuWiki page tree (point at the data/pages subdirectory).A typical community-grounded configuration:
index_sources = msgbase:Local,DOVE-Net,FsxNet,FidoNet
These are optional and not present in the bundled file:
index_source_weights — per-source score multipliers to favor curated sources, e.g. dokuwiki=2.0,gitlab=1.5. (Documentation-style questions automatically boost the wiki source for that turn.)index_recency_halflife — recency-decay half-life in days for scoring; 0 (default) disables recency bias.group_aliases — semicolon-separated alias map so message-base group names in a question match your actual group codes, e.g. fidonet=fidonet; dove,dovenet=dove-net;.
The bundled ctrl/chat_llm.ini ships with several values tuned more
conservatively than the built-in fallbacks above, to reduce fabrication and
allow for slow cold model loads. Where your file and this page disagree, your
file wins. The notable pre-set overrides are: temperature = 0.3 (vs
0.7), timeout = 180 (vs 60), max_tokens = 300 (vs 500),
index_top_k = 2 (vs 5), index_min_score_per_token = 3.0 (vs
3.5), and typing_speed_factor = 1.5 (vs 2.0).