Chat¶
The Chat API runs multi-turn conversations with optional system
prompt and tool calling. It wraps the C-side Conversation API,
which applies the model's chat template (e.g. Gemma's
<|turn>user … <turn|>) on every turn.
chat, err := client.NewChat(ctx,
litertlm.WithSystemPrompt("You are a friendly assistant."),
)
defer chat.Close()
reply, err := chat.Send(ctx, "Hi, what is your name?")
fmt.Println(reply.Text())
reply, err = chat.Send(ctx, "Tell me a fun fact.")
fmt.Println(reply.Text())
Chat keeps dialogue history internally — successive Send calls
have access to prior turns.
Creating NewChat(ctx, opts...)¶
Open a Chat from a Client with Client.NewChat(ctx, opts...).
Call Close() on the returned *Chat when done.
Chat configuration:
| Option | Effect |
|---|---|
WithSystemPrompt(s) |
The system message. Pass just the content — the C side wraps it in a {role,content} envelope itself. |
WithTool(defs ...) |
One or more ToolDefinitions the model may call. Mix RawTool (hand-built) and ManagedTool (typed handler) freely. |
WithInitialMessages(msgs) |
Seed history with prior turns. Each Message{Role, Parts} carries a []Part body — text-only history uses []Part{Text("...")}, multimodal history may include Image / Audio parts. |
WithConstrainedDecoding(on) |
Toggle the engine's constrained-decoding mode (boolean only — schema delivery is upstream-pending). |
WithExtraContext(json) |
JSON string used as the conversation preface's extra context. |
WithFilterChannelContentFromKVCache(on) |
Exclude the model's reasoning-channel tokens from the KV cache (won't persist across turns). |
Send(ctx, message) and Reply¶
Send issues a synchronous user-role message and returns a *Reply.
type Reply struct{ /* unexported */ }
func (r *Reply) Text() string // concatenated text content parts
func (r *Reply) ToolCalls() []ToolCall // structured function-call requests
func (r *Reply) HasToolCalls() bool
func (r *Reply) Raw() string // original C-side JSON, for debugging
SendStream(ctx, message)¶
SendStream is the streaming variant of Send. It returns an
iter.Seq2[Chunk, error] over the model's response chunks as they
arrive.
for chunk, err := range chat.SendStream(ctx, message) {
if err != nil { break }
fmt.Print(chunk.Text)
}
Multimodal turns: SendMulti and SendMultiStream¶
Send image and audio inputs through the same Chat handle as text
turns. SendMulti accepts []Part instead of string; the
underlying Conversation accumulates KV state across turns
regardless of modality, so follow-up text turns can reference
earlier multimodal content.
img, err := litertlm.ImageFromFile("photo.jpg")
// ...
reply, err := chat.SendMulti(ctx, []litertlm.Part{
img,
litertlm.Text("Describe this image in one sentence."),
})
// Follow-up text turn — same Chat, image embeddings still in KV cache:
reply, err = chat.Send(ctx, "What's the dominant color?")
SendMultiStream is the streaming sibling:
for chunk, err := range chat.SendMultiStream(ctx, parts) {
if err != nil { break }
fmt.Print(chunk.Text)
}
Requirements:
- Image Parts require
WithVisionBackendon the Client atNewtime. Audio Parts requireWithAudioBackend. CallingSendMultiwith the wrong backend (or no backend) surfaces a C-sideconversation_createfailure. - An empty
[]Partis rejected up front withlitertlm: SendMulti: empty parts. Pass at least one Part. - Tool dispatch behaves identically to
Send— the model can emittool_callcontent in response to multimodal input, andSendMultiruns the auto-dispatch loop the same way. - A
[]Partcontaining only text Parts is equivalent toSendwith the text concatenated; the multimodal path is only needed when at least one image / audio Part is present.
Contrast with Client.GenerateMulti / GenerateMultiStream /
GenerateMultiResponse: those are one-shot calls that open a
fresh Conversation, run one inference, and discard it. KV state
does not persist between calls. Use Chat.SendMulti* when you
want successive turns to share the same conversation state.
Per-call options¶
All five Chat.Send* methods accept variadic RuntimeOption values
applied to a single turn (and to every dispatch hop the turn
triggers).
WithVisualTokenBudget(n) caps the vision tokens consumed by a
SendMulti / SendMultiStream turn. Text-only turns ignore it.
Effective on Gemma 4 vision-enabled models.
reply, err := chat.SendMulti(ctx, []litertlm.Part{img, litertlm.Text("...")},
litertlm.WithVisualTokenBudget(512),
)
WithReturnToolRequests(true) bypasses the auto-dispatch loop on
Send, SendMulti, and SendToolResult. The first reply
containing tool calls is returned directly via Reply.ToolCalls()
even when every call maps to a registered ManagedTool. Pair with
Chat.SendToolResult to feed the result back. Streaming methods
ignore this flag — SendStream / SendMultiStream always run the
dispatch loop.
reply, err := chat.Send(ctx, "What's the weather in Paris?",
litertlm.WithReturnToolRequests(true),
)
for _, call := range reply.ToolCalls() {
// inspect call.Function.Name / Arguments before dispatching manually
}
WithMaxConcurrentTools(n) runs tool handlers in parallel within a
single dispatch hop. n <= 1 keeps the default sequential dispatch;
n > 1 caps in-flight handlers at n. Result ordering in the
follow-up tool-role message matches the model's original call order
regardless of completion order. The first handler to fail (in real
time) terminates the batch; sibling handlers see ctx cancellation
and may bail.
reply, err := chat.Send(ctx, "Compare weather in Paris and Tokyo",
litertlm.WithMaxConcurrentTools(4),
)
Tool handlers must be safe to invoke from multiple goroutines when
n > 1. Shared mutable state in a closure-captured ManagedTool
handler needs its own synchronization.
WithSampler and WithMaxOutputTokens are not per-call on Chat —
they apply at NewChat time on the Client's session config.
Tool calling¶
Two flavors of tool attach to a Chat via WithTool:
| Flavor | Constructor | Dispatch |
|---|---|---|
RawTool |
NewRawTool |
Manual — Reply.ToolCalls() + Chat.SendToolResult |
ManagedTool |
RegisterTool |
Framework dispatches the typed handler |
Both satisfy ToolDefinition and may be mixed in the same call:
chat, _ := client.NewChat(ctx,
litertlm.WithSystemPrompt("You are a calculator. Always call the tool."),
litertlm.WithTool(calcAdd),
)
When the chat has at least one ManagedTool registered, Chat.Send
runs the dispatch loop and returns the post-tool natural-language
reply directly. Replies whose tool calls aren't all dispatchable
(unknown name or RawTool) come back for manual handling.
Cap the loop with WithMaxToolHops(n) (default 5). Override the
per-tool error-propagation policy with WithToolPolicy(p) at
RegisterTool time.
See the Tools guide for the full reference: schema
reflection rules, dispatch semantics, mixed-registration behavior,
ErrToolHopsExceeded / ToolHopsError, and ToolPolicy modes.
ToolDefinition and ToolCall types¶
type ToolDefinition interface {
Name() string
Description() string
Parameters() map[string]any // JSON-Schema-shaped
}
type ToolCall struct {
Type string
Function ToolCallFunction
}
type ToolCallFunction struct {
Name string
Arguments map[string]any // numbers come as float64 (encoding/json default)
}
Multi-turn¶
A single Chat instance preserves history across calls. Successive
Send calls let the model see prior turns:
chat.Send(ctx, "I'm planning a trip to Tokyo.")
chat.Send(ctx, "Suggest a 3-day itinerary.") // model knows "Tokyo"
chat.Send(ctx, "What about the third day?") // model knows the prior itinerary
When you need a fresh context, open a new Chat.
Seeding history with WithInitialMessages accepts both text-only and
multimodal turns:
img, _ := litertlm.ImageFromFile("photo.jpg")
chat, _ := client.NewChat(ctx,
litertlm.WithInitialMessages([]litertlm.Message{
{Role: "user", Parts: []litertlm.Part{img, litertlm.Text("What is this?")}},
{Role: "assistant", Parts: []litertlm.Part{litertlm.Text("A wooden table with a lamp.")}},
}),
)
chat.Send(ctx, "What's the dominant color?") // resumes with image in KV
Introspection¶
Chat.TokenCount() returns the cumulative tokens the underlying
Conversation has processed across all turns. The result decomposes
into prompt (prefill) and completion (decode) totals.
Token counts are collected only when the Client was created with
WithBenchmarkEnabled. Without it, the C side reports zero turns
and TokenCount returns a zero TokenUsage.
Tool-dispatch hops within a single Send count as additional
prefill and decode turns and are included in the cumulative totals.
See also¶
- Tools guide — full reference for
RawTool,ManagedTool,RegisterTool, schema reflection, the dispatch loop,WithMaxToolHops, andToolPolicy. examples/chat/— minimal multi-turn demo.examples/autotool/— typed tool registration with auto-dispatch.examples/conversation/— manual dispatch withNewRawTool+SendToolResult.- Structured output — when you want type-safe JSON instead of free-form text.
- Low-level API —
Conversation,ConversationConfig.