How to Tune AI Model Settings: Temperature, Top-p, Top-k, Tokens

Lesson 3: How to Tune Model Settings for Better Output

Skills

Generative AI

Prompt Engineering

Why This Lesson Matters

Even the best prompt won’t get you what you want if your model settings are off. These settings — like temperature, top-p, and output length — act as dials that let you control how creative, verbose, or structured your output is.

Whether you're writing marketing copy, analyzing data, or building a chatbot, knowing how to tune these levers gives you more control, less frustration, and better results.

‍

Key Concepts & Definitions

‍Temperature: How random or deterministic the output is‍
Top-k: How many of the top token options the model considers‍
Top-p: Limits token selection based on cumulative probability‍
Max Tkens: Caps the output length, controlling verbosity or truncation

‍

Temperature

What it is:
Controls randomness. Low temperature = focused and repeatable. High temperature = creative and diverse.

Use when:

0.0–0.3 for factual, consistent responses (e.g., summaries, data tasks)
0.7–1.0 for creative writing, brainstorming, ideation

Example:

Prompt: “Write a story about a robot discovering music.”

At temp 0.0: Straightforward, possibly bland
At temp 1.0: Wild, imaginative, unexpected turns

Try This:

“Name a startup that makes smart socks.”

Run it at temp 0.2 and 0.9. Compare results.

‍

Top-k

What it is:
Instead of picking from all possible words, top-k limits the pool to the k most likely options. Higher k = more creativity.

Use when:

You want more or less randomness
You're fine-tuning creative outputs or limiting overly safe answers

Pro Tip

Top-k and temperature often interact. Adjust both gradually to see impact.

‍

Top-p (Nucleus Sampling)

What it is:
Instead of picking the top k words, top-p picks the smallest set of tokens whose combined probability mass exceeds a threshold (e.g., 0.9).

Use when:

You want natural variation with fewer surprises
You're targeting “creative but coherent” outputs

Typical ranges:

0.9–0.95 = balanced
0.5 = more restricted, factual tone

‍

Max Tokens

What it is:
The upper limit on how long the model’s response can be.

Why it matters:

Too short? Output gets cut off.
Too long? Output might ramble or increase cost.

Pro Tip

Lowering max tokens doesn’t make outputs more concise — it just stops them earlier.

‍

Use When / Avoid When Summary

‍

Try This Prompt Challenge

👉 Experiment with temperature and top-p to shape output behavior:

Task: You’re generating startup ideas for a founder.
Challenge: Write a prompt, then adjust the temperature and top-p settings. Record and compare how each version changes the tone and creativity.

‍

Recap

You now understand how temperature, top-k, top-p, and max tokens control the “feel” of your output — whether that’s short and safe or long and creative.

These tuning skills will come in handy throughout the rest of this course, especially as we build prompt templates and debug unexpected model behavior.

‍

Next Lesson Next Lesson →

Table of contents

Teacher

Matthew Berman

All

Matthew Berman

lessons