Table of Contents

chat_llm.ini

ctrl/chat_llm.ini configures the LLM-backed chat engine — which model to talk to, how it generates replies, what it remembers, and what it retrieves for grounding. It is read by both the chat engine and the RAG index builder.

Sections

The file is organised into a [default] section plus optional per-persona sections. A persona section is named for the persona's code — the value passed to the IRC adapter, or the persona a Guru's chat module runs under:

[default]
endpoint = http://localhost:11434/v1/chat/completions
model    = qwen2.5:14b
 
[guru]
; overrides for the "guru" persona only
temperature = 0.2

Any key absent from a persona section falls back to [default]. This lets one host run several personas (a terminal Guru, an IRC bot, a multinode helper) off one file with shared backend settings and per-persona prompts.

Backend

Key Default Purpose
provider openai API dialect. openai covers any OpenAI-compatible endpoint (Ollama, Groq, OpenRouter, llama.cpp, LM Studio, OpenAI itself). gemini (Google native) is reserved, not yet implemented.
endpoint (required) Full URL of the chat-completions endpoint, e.g. http://localhost:11434/v1/chat/completions.
model (required) Model identifier sent to the endpoint, e.g. qwen2.5:14b.
api_key placeholder Bearer token. Ollama ignores it but it must be non-empty; real providers (Groq, OpenRouter, OpenAI) require their actual key.
keep_alive (empty) Optional per-request model-retention hint. Ollama's OpenAI-compatible endpoint ignores this — for durable retention with Ollama set the OLLAMA_KEEP_ALIVE environment variable on the Ollama server instead. Some other backends honor values like 5m, -1 (forever), 0.

Generation

Key Default Purpose
max_tokens 500 Maximum reply length in tokens.
temperature 0.7 Sampling temperature. Lower (e.g. 0.3) is more conservative and less likely to fabricate; higher is more creative.
timeout 60 Request timeout in seconds. Set generously — a cold model load on the first call can take 15–30s for a 7B and longer for bigger models.

Terminal typing animation

These only affect the terminal (private) path, where the reply is “typed out” character by character.

Key Default Purpose
typing_speed_factor 2.0 Multiplier on the per-character delay. 0 disables the animation (instant output); 1.0 matches the legacy Guru's 25–150 ms/char; higher is snappier.
simulate_typos true Whether the animation includes fat-finger and transposition typos (with corrections). At very high speeds the corrections flash by oddly; some sysops prefer false.

Prompts

Long prompts live in plain-text files under ctrl/ (editable in any editor, no line-length or escaping pain). Each has an inline alternative for short prompts; the file form wins if both are set.

Key Bundled file Purpose
system_prompt_file / system_prompt chat_llm_persona.utf8 The persona/system prompt for normal turns.
opening_system_prompt_file / opening_system_prompt chat_llm_opening_persona.utf8 System prompt for the opening turn. Falls back to the normal system prompt if unset.
opening_prompt_file / opening_prompt chat_llm_greeting.utf8 The line that elicits a greeting at session start. Leave blank to disable the opening greeting (the bot waits silently for the caller to speak).

The .utf8 extension is a reminder to save these files as UTF-8 (what the API expects). Prompts may contain @macro@ tokens, substituted per turn:

Macro Expands to
@bot_name@ Persona display name
@system_name@ BBS name
@system_op@ Sysop name
@sync_version@ Synchronet version string
@alias@ Caller's alias
@real_name@ Caller's real name (if known)
@level@ Caller's security level
@location@ Caller's location
@lang@ / @lang_name@ Caller's language code / English name
@language_directive@ Assembled “Respond in X.” sentence (English-only fallback)
@memory_summary@ Rolling long-term memory; empty when there's no prior context
@retrieved_context@ Top RAG hits for this turn; empty when nothing clears the relevance gate

Unknown tokens expand to empty strings.

Persistent memory

The engine keeps a per-(caller, persona) memory file under data/chat/. See chat_llm → Persistent memory for behavior.

Key Default Purpose
memory_persist true Master switch. Set false for public contexts (e.g. an IRC bot in open channels) where you don't want to retain anything about strangers.
history_window 8 Number of recent verbatim turns sent to the model each request. Larger = better recall but more tokens and slower prompt processing. Very short inputs automatically use a smaller window.
summarize_threshold 30 When the stored transcript exceeds this many turns, the oldest turns are compressed into the long-term summary.
summarize_batch 10 How many of the oldest turns are folded into the summary each time the threshold is crossed.
memory_summary_in_prompt true Whether the rolling summary is injected into the system prompt. Set false if your model echoes the summary back into replies.
memory_max_age_days 365 Memory files untouched for longer than this are dropped on next load. 0 disables age-based pruning.

Retrieval (RAG)

The index builder (jsexec llm_index.js <persona>) builds a BM25 index from configured content sources; the engine queries it each turn and injects the top hits via @retrieved_context@.

Key Default Purpose
index_sources msgbase Semicolon-separated list of source crawlers (commas are reserved for a source's own argument list). See below.
index_output chat/<persona>.idx Index file path, relative to the data directory. <persona> is replaced with the section name.
index_top_k 5 How many top-scoring chunks are injected per turn. Lower = less noise; higher = better recall but more risk of drowning the model.
index_min_score_per_token 3.5 Relevance gate. If the top hit's score per query token (after stopword removal) is below this, nothing is injected and the prompt's “I haven't seen anything about that” rule takes over. Prevents fabricated BBS-local detail.
index_max_chunks 5000 Cap on total chunks ingested by the builder; oldest are trimmed if exceeded.

Source syntax

Each source names a crawler module under exec/llm_index/. A :arg suffix passes an argument whose meaning is source-specific. The bundled crawlers (see llm_index):

A typical community-grounded configuration:

index_sources = msgbase:Local,DOVE-Net,FsxNet,FidoNet

Advanced retrieval tuning

These are optional and not present in the bundled file:

A note on defaults

The bundled ctrl/chat_llm.ini ships with several values tuned more conservatively than the built-in fallbacks above, to reduce fabrication and allow for slow cold model loads. Where your file and this page disagree, your file wins. The notable pre-set overrides are: temperature = 0.3 (vs 0.7), timeout = 180 (vs 60), max_tokens = 300 (vs 500), index_top_k = 2 (vs 5), index_min_score_per_token = 3.0 (vs 3.5), and typing_speed_factor = 1.5 (vs 2.0).

See Also