Voxta docs

LLM parameters

Temperature, top-p, repetition penalty, and the rest — what each sampling parameter actually does to your AI character.

LLM sampling parameters are the knobs that control how your AI character picks each word. They're the difference between a character that sounds bone-dry and predictable, and one that sounds creative but unhinged. This page documents what each parameter does with examples.

Different LLM services expose different subsets of these parameters. Examples here use illustrative values — the same setting on different models can produce different effects.

Temperature

Range: 0.002.00. Most common setting: 0.7.

The headline parameter. Controls how "freewheeling" or "predictable" the model is when picking the next word.

  • Low (near 0) — deterministic. Always picks the most likely word. Predictable, sometimes dull.
  • Moderate (0.7) — balanced. The default for most setups.
  • High (1.5+) — wild. Surprising, creative, sometimes nonsensical.
Temp ≈ 0.0  →  "The cat sat on the mat."
Temp = 0.7  →  "The cat sat on the couch."
Temp = 1.5  →  "The cat sat on the clouds of imagination."

Top-p (nucleus sampling)

Range: 0.001.00. Common: 0.90.95.

Restricts the model's next-word choices to a probability-mass slice. With top_p = 0.9, the model only considers the most-likely words whose combined probability is 0.9 or less.

  • Lower — focused, more predictable.
  • Higher — broader vocabulary, more variety.

Temperature and top-p both control variety — most setups use one or the other (not both aggressively).

Top-k

Range: 0200. Common: 40.

Restricts the model to the top k most likely next words.

  • Lower top-k — predictable, narrow choices.
  • Higher top-k — more variety.

Compared to top-p: top-k cuts off by count, top-p cuts off by probability. Top-p is generally preferred today, but some samplers combine both.

Typical-p

Range: 0.001.00. Default 1.0 (disabled).

Typical sampling keeps tokens that are roughly as likely as the "typical" word at this point in the text — neither too obvious nor too surprising.

  • High (1.0) — disabled, normal sampling.
  • Lower (0.5) — more creative / less predictable.

Repetition penalty

Range: 1.001.50. Default 1.0 (no penalty). Common: 1.11.2.

Discourages the model from repeating the same words or phrases. 1.0 = no penalty. Higher values = more variety, at the cost of sometimes avoiding natural repetition.

Pen = 1.0  →  "I love eating ice cream. Ice cream is my favorite. Ice cream is so good."
Pen = 1.2  →  "I love eating ice cream because it's creamy, sweet, and offers many flavors."

Repetition penalty range

Range: 04096 tokens. 0 means "all prior tokens."

How far back the repetition penalty looks. Smaller range = penalty only avoids very recent repeats; larger range = avoids repetition across the whole reply.

Min length

Range: 12000 tokens.

Minimum length of generated output. Prevents the model from cutting replies short. Set higher than 1 if you want substantive responses, not "Yes."

Max new tokens

Range: 14096 tokens. Common: 100400 for chat, higher for storytelling.

Hard cap on response length. Useful for keeping replies snappy, controlling cost (cloud LLMs bill per token), or preventing runaway generation.

Top-a (RWKV-specific)

Range: 0.01.0. 0 disables.

Top-a sampling is specific to RWKV models. Adapts the filter based on the highest-probability token in the current context.

TFS (Tail Free Sampling)

Range: 0.01.0. 1.0 disables.

Tail Free Sampling cuts off the "tail" of unlikely words by analyzing the shape of the probability distribution. Less aggressive than top-p in most cases.

Encoder repetition penalty

Range: 0.81.5. Also called the "Hallucinations filter."

Penalizes tokens that aren't in the input prompt. Higher values keep the AI more grounded in the prompt context; lower values let it freelance.

No repeat n-gram size

Range: 020. 0 disables.

Hard-blocks n-grams of this size from repeating. Setting to 3 means no 3-token sequence can repeat. Use sparingly — small values produce unnatural output; only 0 or large values are usually a good idea.

Length penalty

Range: -5.0+5.0.

Adjusts the model's preference for longer or shorter responses.

  • Negative — favors shorter sequences.
  • Positive — favors longer sequences.
  • 1.0 (default) — no adjustment.

Mirostat

Mirostat is a different family of sampler that targets a consistent perplexity (a measure of how "surprising" the output is). When enabled, it overrides top-k, top-p, typical-p, and TFS.

Mirostat mode

  • 0 — disabled.
  • 1 — llama.cpp mode.
  • 2 — Hugging Face mode (ExLlama, llamacpp_HF, etc.).

Mirostat tau

Range: 010. Default 5. Target entropy. Lower = more coherent / focused. Higher = more diverse.

Mirostat eta

Range: 01. Default 0.1. Learning rate. Higher = more responsive to feedback. Lower = more stable.

Range: 0.005.00. 0 disables.

Used with contrastive search: pair with a low top-k (e.g. 4) and do_sample = false. Balances the model's confidence against a penalty for matching previous context, producing diverse-yet-coherent output.

Do sample

  • On — stochastic sampling. Random picks from the probability distribution. Variable output.
  • Off — deterministic (greedy decoding). Same input → same output. Used for contrastive search and debugging.

Seed

-1 = random per generation. Any other value = deterministic — the same prompt with the same seed produces the same output. Useful for debugging.

What's next

On this page