Lesson 3: How to Tune Model Settings for Better Output
Why This Lesson Matters
Even the best prompt won’t get you what you want if your model settings are off. These settings — like temperature, top-p, and output length — act as dials that let you control how creative, verbose, or structured your output is.
Whether you're writing marketing copy, analyzing data, or building a chatbot, knowing how to tune these levers gives you more control, less frustration, and better results.
Key Concepts & Definitions
- Temperature: How random or deterministic the output is
- Top-k: How many of the top token options the model considers
- Top-p: Limits token selection based on cumulative probability
- Max Tkens: Caps the output length, controlling verbosity or truncation
Temperature
What it is:
Controls randomness. Low temperature = focused and repeatable. High temperature = creative and diverse.
Use when:
- 0.0–0.3 for factual, consistent responses (e.g., summaries, data tasks)
- 0.7–1.0 for creative writing, brainstorming, ideation
Example:
Prompt: “Write a story about a robot discovering music.”
- At temp 0.0: Straightforward, possibly bland
- At temp 1.0: Wild, imaginative, unexpected turns
Try This:
“Name a startup that makes smart socks.”
Run it at temp 0.2 and 0.9. Compare results.
Top-k
What it is:
Instead of picking from all possible words, top-k limits the pool to the k most likely options. Higher k = more creativity.
Use when:
- You want more or less randomness
- You're fine-tuning creative outputs or limiting overly safe answers
Top-p (Nucleus Sampling)
What it is:
Instead of picking the top k words, top-p picks the smallest set of tokens whose combined probability mass exceeds a threshold (e.g., 0.9).
Use when:
- You want natural variation with fewer surprises
- You're targeting “creative but coherent” outputs
Typical ranges:
- 0.9–0.95 = balanced
- 0.5 = more restricted, factual tone
Max Tokens
What it is:
The upper limit on how long the model’s response can be.
Why it matters:
- Too short? Output gets cut off.
- Too long? Output might ramble or increase cost.
Use When / Avoid When Summary

Recap
You now understand how temperature, top-k, top-p, and max tokens control the “feel” of your output — whether that’s short and safe or long and creative.
These tuning skills will come in handy throughout the rest of this course, especially as we build prompt templates and debug unexpected model behavior.