Reasoning modelsBeta
OpenAI o1 series models are new large language models trained with reinforcement learning to perform complex reasoning. o1 models think before they answer, and can produce a long internal chain of thought before responding to the user. o1 models excel in scientific reasoning, ranking in the 89th percentile on competitive programming questions (Codeforces), placing among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeding human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA).
There are two reasoning models available in the API:
o1-preview
: an early preview of our o1 model, designed to reason about hard problems using broad general knowledge about the world.o1-mini
: a faster and cheaper version of o1, particularly adept at coding, math, and science tasks where extensive general knowledge isn't required.
o1 models offer significant advancements in reasoning, but they are not intended to replace GPT-4o in all use-cases.
For applications that need image inputs, function calling, or consistently fast response times, the GPT-4o and GPT-4o mini models will continue to be the right choice. However, if you're aiming to develop applications that demand deep reasoning and can accommodate longer response times, the o1 models could be an excellent choice. We're excited to see what you'll create with them!
Quickstart
Both o1-preview
and o1-mini
are available through the chat completions endpoint.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="o1-preview",
messages=[
{
"role": "user",
"content": "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format."
}
]
)
print(response.choices[0].message.content)
Depending on the amount of reasoning required by the model to solve the problem, these requests can take anywhere from a few seconds to several minutes.
Beta Limitations
During the beta phase, many chat completion API parameters are not yet available. Most notably:
- Modalities: text only, images are not supported.
- Message types: user and assistant messages only, system messages are not supported.
- Tools: tools, function calling, and response format parameters are not supported.
- Logprobs: not supported.
- Other:
temperature
andtop_p
are fixed at1
, whilepresence_penalty
andfrequency_penalty
are fixed at0
. - Assistants and Batch: these models are not supported in the Assistants API or Batch API.
We will be adding support for some of these parameters in the coming weeks as we move out of beta. Features like multimodality and tool usage will be included in future models of the o1 series.
How reasoning works
The o1 models introduce reasoning tokens. The models use these reasoning tokens to "think", breaking down their understanding of the prompt and considering multiple approaches to generating a response. After generating reasoning tokens, the model produces an answer as visible completion tokens, and discards the reasoning tokens from its context.
Here is an example of a multi-step conversation between a user and an assistant. Input and output tokens from each step are carried over, while reasoning tokens are discarded.
Managing the context window
The o1-preview and o1-mini models offer a context window of 128,000 tokens. Each completion has an upper limit on the maximum number of output tokens—this includes both the invisible reasoning tokens and the visible completion tokens. The maximum output token limits are:
- o1-preview: Up to 32,768 tokens
- o1-mini: Up to 65,536 tokens
It's important to ensure there's enough space in the context window for reasoning tokens when creating completions. Depending on the problem's complexity, the models may generate anywhere from a few hundred to tens of thousands of reasoning tokens. The exact number of reasoning tokens used is visible in the usage object of the chat completion response object, under completion_tokens_details
:
1
2
3
4
5
6
7
8
usage: {
total_tokens: 1000,
prompt_tokens: 400,
completion_tokens: 600,
completion_tokens_details: {
reasoning_tokens: 500
}
}
Controlling costs
To manage costs with the o1 series models, you can limit the total number of tokens the model generates (including both reasoning and completion tokens) by using the max_completion_tokens
parameter.
In previous models, the max_tokens
parameter controlled both the number of tokens generated and the number of tokens visible to the user, which were always equal. However, with the o1 series, the total tokens generated can exceed the number of visible tokens due to the internal reasoning tokens.
Because some applications might rely on max_tokens
matching the number of tokens received from the API, the o1 series introduces max_completion_tokens
to explicitly control the total number of tokens generated by the model, including both reasoning and visible completion tokens. This explicit opt-in ensures no existing applications break when using the new models. The max_tokens
parameter continues to function as before for all previous models.
Allocating space for reasoning
If the generated tokens reach the context window limit or the max_completion_tokens
value you've set, you'll receive a chat completion response with the finish_reason
set to length
. This might occur before any visible completion tokens are produced, meaning you could incur costs for input and reasoning tokens without receiving a visible response.
To prevent this, ensure there's sufficient space in the context window or adjust the max_completion_tokens
value to a higher number. OpenAI recommends reserving at least 25,000 tokens for reasoning and outputs when you start experimenting with these models. As you become familiar with the number of reasoning tokens your prompts require, you can adjust this buffer accordingly.
Advice on prompting
These models perform best with straightforward prompts. Some prompt engineering techniques, like few-shot prompting or instructing the model to "think step by step," may not enhance performance and can sometimes hinder it. Here are some best practices:
- Keep prompts simple and direct: The models excel at understanding and responding to brief, clear instructions without the need for extensive guidance.
- Avoid chain-of-thought prompts: Since these models perform reasoning internally, prompting them to "think step by step" or "explain your reasoning" is unnecessary.
- Use delimiters for clarity: Use delimiters like triple quotation marks, XML tags, or section titles to clearly indicate distinct parts of the input, helping the model interpret different sections appropriately.
- Limit additional context in retrieval-augmented generation (RAG): When providing additional context or documents, include only the most relevant information to prevent the model from overcomplicating its response.
Prompt examples
OpenAI o1 series models are able to implement complex algorithms and produce code. This prompt asks o1 to refactor a React component based on some specific criteria.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
from openai import OpenAI
client = OpenAI()
prompt = """
Instructions:
- Given the React component below, change it so that nonfiction books have red
text.
- Return only the code in your reply
- Do not include any additional formatting, such as markdown code blocks
- For formatting, use four space tabs, and do not allow any lines of code to
exceed 80 columns
const books = [
{ title: 'Dune', category: 'fiction', id: 1 },
{ title: 'Frankenstein', category: 'fiction', id: 2 },
{ title: 'Moneyball', category: 'nonfiction', id: 3 },
];
export default function BookList() {
const listItems = books.map(book =>
<li>
{book.title}
</li>
);
return (
<ul>{listItems}</ul>
);
}
"""
response = client.chat.completions.create(
model="o1-mini",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": prompt
},
],
}
]
)
print(response.choices[0].message.content)
Use case examples
Some examples of using o1 for real-world use cases can be found in the cookbook.