Skip to content

Chat

The Chat API runs multi-turn conversations with optional system prompt and tool calling. It wraps the C-side Conversation API, which applies the model's chat template (e.g. Gemma's <|turn>user … <turn|>) on every turn.

chat, err := client.NewChat(ctx,
    litertlm.WithSystemPrompt("You are a friendly assistant."),
)
defer chat.Close()

reply, err := chat.Send(ctx, "Hi, what is your name?")
fmt.Println(reply.Text())

reply, err = chat.Send(ctx, "Tell me a fun fact.")
fmt.Println(reply.Text())

Chat keeps dialogue history internally — successive Send calls have access to prior turns.

Creating NewChat(ctx, opts...)

Open a Chat from a Client with Client.NewChat(ctx, opts...). Call Close() on the returned *Chat when done.

Chat configuration:

Option Effect
WithSystemPrompt(s) The system message. Pass just the content — the C side wraps it in a {role,content} envelope itself.
WithTool(defs ...) One or more ToolDefinitions the model may call. Mix RawTool (hand-built) and ManagedTool (typed handler) freely.
WithInitialMessages(msgs) Seed history with prior turns. Each Message{Role, Parts} carries a []Part body — text-only history uses []Part{Text("...")}, multimodal history may include Image / Audio parts.
WithConstrainedDecoding(on) Toggle the engine's constrained-decoding mode (boolean only — schema delivery is upstream-pending).
WithExtraContext(json) JSON string used as the conversation preface's extra context.
WithFilterChannelContentFromKVCache(on) Exclude the model's reasoning-channel tokens from the KV cache (won't persist across turns).

Send(ctx, message) and Reply

Send issues a synchronous user-role message and returns a *Reply.

type Reply struct{ /* unexported */ }
func (r *Reply) Text() string         // concatenated text content parts
func (r *Reply) ToolCalls() []ToolCall // structured function-call requests
func (r *Reply) HasToolCalls() bool
func (r *Reply) Raw() string          // original C-side JSON, for debugging

SendStream(ctx, message)

SendStream is the streaming variant of Send. It returns an iter.Seq2[Chunk, error] over the model's response chunks as they arrive.

for chunk, err := range chat.SendStream(ctx, message) {
    if err != nil { break }
    fmt.Print(chunk.Text)
}

Multimodal turns: SendMulti and SendMultiStream

Send image and audio inputs through the same Chat handle as text turns. SendMulti accepts []Part instead of string; the underlying Conversation accumulates KV state across turns regardless of modality, so follow-up text turns can reference earlier multimodal content.

img, err := litertlm.ImageFromFile("photo.jpg")
// ...
reply, err := chat.SendMulti(ctx, []litertlm.Part{
    img,
    litertlm.Text("Describe this image in one sentence."),
})

// Follow-up text turn — same Chat, image embeddings still in KV cache:
reply, err = chat.Send(ctx, "What's the dominant color?")

SendMultiStream is the streaming sibling:

for chunk, err := range chat.SendMultiStream(ctx, parts) {
    if err != nil { break }
    fmt.Print(chunk.Text)
}

Requirements:

  • Image Parts require WithVisionBackend on the Client at New time. Audio Parts require WithAudioBackend. Calling SendMulti with the wrong backend (or no backend) surfaces a C-side conversation_create failure.
  • An empty []Part is rejected up front with litertlm: SendMulti: empty parts. Pass at least one Part.
  • Tool dispatch behaves identically to Send — the model can emit tool_call content in response to multimodal input, and SendMulti runs the auto-dispatch loop the same way.
  • A []Part containing only text Parts is equivalent to Send with the text concatenated; the multimodal path is only needed when at least one image / audio Part is present.

Contrast with Client.GenerateMulti / GenerateMultiStream / GenerateMultiResponse: those are one-shot calls that open a fresh Conversation, run one inference, and discard it. KV state does not persist between calls. Use Chat.SendMulti* when you want successive turns to share the same conversation state.

Per-call options

All five Chat.Send* methods accept variadic RuntimeOption values applied to a single turn (and to every dispatch hop the turn triggers).

WithVisualTokenBudget(n) caps the vision tokens consumed by a SendMulti / SendMultiStream turn. Text-only turns ignore it. Effective on Gemma 4 vision-enabled models.

reply, err := chat.SendMulti(ctx, []litertlm.Part{img, litertlm.Text("...")},
    litertlm.WithVisualTokenBudget(512),
)

WithReturnToolRequests(true) bypasses the auto-dispatch loop on Send, SendMulti, and SendToolResult. The first reply containing tool calls is returned directly via Reply.ToolCalls() even when every call maps to a registered ManagedTool. Pair with Chat.SendToolResult to feed the result back. Streaming methods ignore this flag — SendStream / SendMultiStream always run the dispatch loop.

reply, err := chat.Send(ctx, "What's the weather in Paris?",
    litertlm.WithReturnToolRequests(true),
)
for _, call := range reply.ToolCalls() {
    // inspect call.Function.Name / Arguments before dispatching manually
}

WithMaxConcurrentTools(n) runs tool handlers in parallel within a single dispatch hop. n <= 1 keeps the default sequential dispatch; n > 1 caps in-flight handlers at n. Result ordering in the follow-up tool-role message matches the model's original call order regardless of completion order. The first handler to fail (in real time) terminates the batch; sibling handlers see ctx cancellation and may bail.

reply, err := chat.Send(ctx, "Compare weather in Paris and Tokyo",
    litertlm.WithMaxConcurrentTools(4),
)

Tool handlers must be safe to invoke from multiple goroutines when n > 1. Shared mutable state in a closure-captured ManagedTool handler needs its own synchronization.

WithSampler and WithMaxOutputTokens are not per-call on Chat — they apply at NewChat time on the Client's session config.

Tool calling

Two flavors of tool attach to a Chat via WithTool:

Flavor Constructor Dispatch
RawTool NewRawTool Manual — Reply.ToolCalls() + Chat.SendToolResult
ManagedTool RegisterTool Framework dispatches the typed handler

Both satisfy ToolDefinition and may be mixed in the same call:

chat, _ := client.NewChat(ctx,
    litertlm.WithSystemPrompt("You are a calculator. Always call the tool."),
    litertlm.WithTool(calcAdd),
)

When the chat has at least one ManagedTool registered, Chat.Send runs the dispatch loop and returns the post-tool natural-language reply directly. Replies whose tool calls aren't all dispatchable (unknown name or RawTool) come back for manual handling.

Cap the loop with WithMaxToolHops(n) (default 5). Override the per-tool error-propagation policy with WithToolPolicy(p) at RegisterTool time.

See the Tools guide for the full reference: schema reflection rules, dispatch semantics, mixed-registration behavior, ErrToolHopsExceeded / ToolHopsError, and ToolPolicy modes.

ToolDefinition and ToolCall types

type ToolDefinition interface {
    Name() string
    Description() string
    Parameters() map[string]any  // JSON-Schema-shaped
}

type ToolCall struct {
    Type     string
    Function ToolCallFunction
}

type ToolCallFunction struct {
    Name      string
    Arguments map[string]any  // numbers come as float64 (encoding/json default)
}

Multi-turn

A single Chat instance preserves history across calls. Successive Send calls let the model see prior turns:

chat.Send(ctx, "I'm planning a trip to Tokyo.")
chat.Send(ctx, "Suggest a 3-day itinerary.")  // model knows "Tokyo"
chat.Send(ctx, "What about the third day?")   // model knows the prior itinerary

When you need a fresh context, open a new Chat.

Seeding history with WithInitialMessages accepts both text-only and multimodal turns:

img, _ := litertlm.ImageFromFile("photo.jpg")
chat, _ := client.NewChat(ctx,
    litertlm.WithInitialMessages([]litertlm.Message{
        {Role: "user", Parts: []litertlm.Part{img, litertlm.Text("What is this?")}},
        {Role: "assistant", Parts: []litertlm.Part{litertlm.Text("A wooden table with a lamp.")}},
    }),
)
chat.Send(ctx, "What's the dominant color?")  // resumes with image in KV

Introspection

Chat.TokenCount() returns the cumulative tokens the underlying Conversation has processed across all turns. The result decomposes into prompt (prefill) and completion (decode) totals.

usage, err := chat.TokenCount()

Token counts are collected only when the Client was created with WithBenchmarkEnabled. Without it, the C side reports zero turns and TokenCount returns a zero TokenUsage.

client, _ := litertlm.New(ctx,
    litertlm.WithModel(modelPath),
    litertlm.WithBenchmarkEnabled(),
)

Tool-dispatch hops within a single Send count as additional prefill and decode turns and are included in the cumulative totals.

See also