====== chat_llm.ini ======

''ctrl/chat_llm.ini'' configures the [[module:chat_llm|LLM-backed chat
engine]] — which model to talk to, how it generates replies, what it
remembers, and what it retrieves for grounding.  It is read by both the chat
engine and the [[module:llm_index|RAG index builder]].

===== Sections =====

The file is organised into a ''[default]'' section plus optional per-persona
sections.  A persona section is named for the persona's code — the value
passed to the [[module:chat_llm_irc|IRC adapter]], or the persona a Guru's
chat module runs under:

<code ini>
[default]
endpoint = http://localhost:11434/v1/chat/completions
model    = qwen2.5:14b

[guru]
; overrides for the "guru" persona only
temperature = 0.2
</code>

Any key absent from a persona section falls back to ''[default]''.  This lets
one host run several personas (a terminal Guru, an IRC bot, a multinode
helper) off one file with shared backend settings and per-persona prompts.

===== Backend =====

^ Key ^ Default ^ Purpose ^
| ''provider'' | ''openai'' | API dialect.  ''openai'' covers any OpenAI-compatible endpoint (Ollama, Groq, OpenRouter, llama.cpp, LM Studio, OpenAI itself).  ''gemini'' (Google native) is reserved, not yet implemented. |
| ''endpoint'' | //(required)// | Full URL of the chat-completions endpoint, e.g. ''%%http://localhost:11434/v1/chat/completions%%''. |
| ''model'' | //(required)// | Model identifier sent to the endpoint, e.g. ''qwen2.5:14b''. |
| ''api_key'' | ''placeholder'' | Bearer token.  Ollama ignores it but it must be non-empty; real providers (Groq, OpenRouter, OpenAI) require their actual key. |
| ''keep_alive'' | //(empty)// | Optional per-request model-retention hint.  **Ollama's OpenAI-compatible endpoint ignores this** — for durable retention with Ollama set the ''OLLAMA_KEEP_ALIVE'' environment variable on the Ollama server instead.  Some other backends honor values like ''5m'', ''-1'' (forever), ''0''. |

===== Generation =====

^ Key ^ Default ^ Purpose ^
| ''max_tokens'' | ''500'' | Maximum reply length in tokens. |
| ''temperature'' | ''0.7'' | Sampling temperature.  Lower (e.g. ''0.3'') is more conservative and less likely to fabricate; higher is more creative. |
| ''timeout'' | ''60'' | Request timeout in seconds.  Set generously — a cold model load on the first call can take 15–30s for a 7B and longer for bigger models. |

===== Terminal typing animation =====

These only affect the terminal (private) path, where the reply is "typed out"
character by character.

^ Key ^ Default ^ Purpose ^
| ''typing_speed_factor'' | ''2.0'' | Multiplier on the per-character delay.  ''0'' disables the animation (instant output); ''1.0'' matches the legacy Guru's 25–150 ms/char; higher is snappier. |
| ''simulate_typos'' | ''true'' | Whether the animation includes fat-finger and transposition typos (with corrections).  At very high speeds the corrections flash by oddly; some sysops prefer ''false''. |

===== Prompts =====

Long prompts live in plain-text files under ''ctrl/'' (editable in any editor,
no line-length or escaping pain).  Each has an inline alternative for short
prompts; **the file form wins if both are set.**

^ Key ^ Bundled file ^ Purpose ^
| ''system_prompt_file'' / ''system_prompt'' | ''chat_llm_persona.utf8'' | The persona/system prompt for normal turns. |
| ''opening_system_prompt_file'' / ''opening_system_prompt'' | ''chat_llm_opening_persona.utf8'' | System prompt for the **opening** turn.  Falls back to the normal system prompt if unset. |
| ''opening_prompt_file'' / ''opening_prompt'' | ''chat_llm_greeting.utf8'' | The line that elicits a greeting at session start.  Leave blank to disable the opening greeting (the bot waits silently for the caller to speak). |

The ''.utf8'' extension is a reminder to save these files as UTF-8 (what the
API expects).  Prompts may contain ''@macro@'' tokens, substituted per turn:

^ Macro ^ Expands to ^
| ''@bot_name@'' | Persona display name |
| ''@system_name@'' | BBS name |
| ''@system_op@'' | Sysop name |
| ''@sync_version@'' | Synchronet version string |
| ''@alias@'' | Caller's alias |
| ''@real_name@'' | Caller's real name (if known) |
| ''@level@'' | Caller's security level |
| ''@location@'' | Caller's location |
| ''@lang@'' / ''@lang_name@'' | Caller's language code / English name |
| ''@language_directive@'' | Assembled "Respond in X." sentence (English-only fallback) |
| ''@memory_summary@'' | Rolling long-term memory; empty when there's no prior context |
| ''@retrieved_context@'' | Top RAG hits for this turn; empty when nothing clears the relevance gate |

Unknown tokens expand to empty strings.

===== Persistent memory =====

The engine keeps a per-(caller, persona) memory file under ''data/chat/''.
See [[module:chat_llm#persistent_memory|chat_llm → Persistent memory]] for
behavior.

^ Key ^ Default ^ Purpose ^
| ''memory_persist'' | ''true'' | Master switch.  Set ''false'' for public contexts (e.g. an IRC bot in open channels) where you don't want to retain anything about strangers. |
| ''history_window'' | ''8'' | Number of recent verbatim turns sent to the model each request.  Larger = better recall but more tokens and slower prompt processing.  Very short inputs automatically use a smaller window. |
| ''summarize_threshold'' | ''30'' | When the stored transcript exceeds this many turns, the oldest turns are compressed into the long-term summary. |
| ''summarize_batch'' | ''10'' | How many of the oldest turns are folded into the summary each time the threshold is crossed. |
| ''memory_summary_in_prompt'' | ''true'' | Whether the rolling summary is injected into the system prompt.  Set ''false'' if your model echoes the summary back into replies. |
| ''memory_max_age_days'' | ''365'' | Memory files untouched for longer than this are dropped on next load.  ''0'' disables age-based pruning. |

===== Retrieval (RAG) =====

The [[module:llm_index|index builder]] (''jsexec llm_index.js <persona>'')
builds a BM25 index from configured content sources; the engine queries it
each turn and injects the top hits via ''@retrieved_context@''.

^ Key ^ Default ^ Purpose ^
| ''index_sources'' | ''msgbase'' | **Semicolon**-separated list of source crawlers (commas are reserved for a source's own argument list).  See below. |
| ''index_output'' | ''chat/<persona>.idx'' | Index file path, relative to the data directory.  ''<persona>'' is replaced with the section name. |
| ''index_top_k'' | ''5'' | How many top-scoring chunks are injected per turn.  Lower = less noise; higher = better recall but more risk of drowning the model. |
| ''index_min_score_per_token'' | ''3.5'' | Relevance gate.  If the top hit's score per query token (after stopword removal) is below this, nothing is injected and the prompt's "I haven't seen anything about that" rule takes over.  Prevents fabricated BBS-local detail. |
| ''index_max_chunks'' | ''5000'' | Cap on total chunks ingested by the builder; oldest are trimmed if exceeded. |

==== Source syntax ====

Each source names a crawler module under ''exec/llm_index/''.  A ''%%:arg%%''
suffix passes an argument whose meaning is source-specific.  The bundled
crawlers (see [[module:llm_index]]):

  * ''msgbase'' — all message-base groups.  ''%%msgbase:Local,DOVE-Net,FsxNet,FidoNet%%'' restricts to named groups (recommended — indexing every group lets large networks drown out community content).
  * ''filebase'' — local file-base descriptions.
  * ''%%dokuwiki:<path-to-data/pages>%%'' — a local DokuWiki page tree (point at the ''data/pages'' subdirectory).

A typical community-grounded configuration:

<code ini>
index_sources = msgbase:Local,DOVE-Net,FsxNet,FidoNet
</code>

==== Advanced retrieval tuning ====

These are optional and not present in the bundled file:

  * ''index_source_weights'' — per-source score multipliers to favor curated sources, e.g. ''%%dokuwiki=2.0,gitlab=1.5%%''.  (Documentation-style questions automatically boost the wiki source for that turn.)
  * ''index_recency_halflife'' — recency-decay half-life in days for scoring; ''0'' (default) disables recency bias.
  * ''group_aliases'' — semicolon-separated alias map so message-base group names in a question match your actual group codes, e.g. ''%%fidonet=fidonet; dove,dovenet=dove-net;%%''.

===== A note on defaults =====

The bundled ''ctrl/chat_llm.ini'' ships with several values **tuned more
conservatively than the built-in fallbacks** above, to reduce fabrication and
allow for slow cold model loads.  Where your file and this page disagree, your
file wins.  The notable pre-set overrides are: ''temperature = 0.3'' (vs
''0.7''), ''timeout = 180'' (vs ''60''), ''max_tokens = 300'' (vs ''500''),
''index_top_k = 2'' (vs ''5''), ''index_min_score_per_token = 3.0'' (vs
''3.5''), and ''typing_speed_factor = 1.5'' (vs ''2.0'').

===== See Also =====

  * [[module:chat_llm]] — the chat engine
  * [[module:llm_index]] — building the RAG index
  * [[module:llm_tools]] — the tool registry
  * [[howto:llm-guru]] — start-to-finish setup

{{tag>chat guru llm chat_llm config ini ai rag}}