Reasoning models
Beta

Explore advanced reasoning and problem-solving models.

OpenAI o1 series models are new large language models trained with reinforcement learning to perform complex reasoning. o1 models think before they answer, and can produce a long internal chain of thought before responding to the user. o1 models excel in scientific reasoning, ranking in the 89th percentile on competitive programming questions (Codeforces), placing among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeding human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA).

There are two reasoning models available in the API:

  1. o1-preview: an early preview of our o1 model, designed to reason about hard problems using broad general knowledge about the world.
  2. o1-mini: a faster and cheaper version of o1, particularly adept at coding, math, and science tasks where extensive general knowledge isn't required.

o1 models offer significant advancements in reasoning, but they are not intended to replace GPT-4o in all use-cases.

For applications that need image inputs, function calling, or consistently fast response times, the GPT-4o and GPT-4o mini models will continue to be the right choice. However, if you're aiming to develop applications that demand deep reasoning and can accommodate longer response times, the o1 models could be an excellent choice. We're excited to see what you'll create with them!

Quickstart

Both o1-preview and o1-mini are available through the chat completions endpoint.

Using the o1-preview model
1
2
3
4
5
6
7
8
9
10
11
12
13
14
from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
    model="o1-preview",
    messages=[
        {
            "role": "user", 
            "content": "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format."
        }
    ]
)

print(response.choices[0].message.content)

Depending on the amount of reasoning required by the model to solve the problem, these requests can take anywhere from a few seconds to several minutes.

Beta Limitations

During the beta phase, many chat completion API parameters are not yet available. Most notably:

  • Modalities: text only, images are not supported.
  • Message types: user and assistant messages only, system messages are not supported.
  • Tools: tools, function calling, and response format parameters are not supported.
  • Logprobs: not supported.
  • Other: temperature and top_p are fixed at 1, while presence_penalty and frequency_penalty are fixed at 0.
  • Assistants and Batch: these models are not supported in the Assistants API or Batch API.

We will be adding support for some of these parameters in the coming weeks as we move out of beta. Features like multimodality and tool usage will be included in future models of the o1 series.

How reasoning works

The o1 models introduce reasoning tokens. The models use these reasoning tokens to "think", breaking down their understanding of the prompt and considering multiple approaches to generating a response. After generating reasoning tokens, the model produces an answer as visible completion tokens, and discards the reasoning tokens from its context.

Here is an example of a multi-step conversation between a user and an assistant. Input and output tokens from each step are carried over, while reasoning tokens are discarded.

Reasoning tokens aren't retained in context

While reasoning tokens are not visible via the API, they still occupy space in the model's context window and are billed as output tokens.

Managing the context window

The o1-preview and o1-mini models offer a context window of 128,000 tokens. Each completion has an upper limit on the maximum number of output tokens—this includes both the invisible reasoning tokens and the visible completion tokens. The maximum output token limits are:

  • o1-preview: Up to 32,768 tokens
  • o1-mini: Up to 65,536 tokens

It's important to ensure there's enough space in the context window for reasoning tokens when creating completions. Depending on the problem's complexity, the models may generate anywhere from a few hundred to tens of thousands of reasoning tokens. The exact number of reasoning tokens used is visible in the usage object of the chat completion response object, under completion_tokens_details:

Chat completions usage
1
2
3
4
5
6
7
8
usage: {
  total_tokens: 1000,
  prompt_tokens: 400,
  completion_tokens: 600,
  completion_tokens_details: {
    reasoning_tokens: 500
  }
}

Controlling costs

To manage costs with the o1 series models, you can limit the total number of tokens the model generates (including both reasoning and completion tokens) by using the max_completion_tokens parameter.

In previous models, the max_tokens parameter controlled both the number of tokens generated and the number of tokens visible to the user, which were always equal. However, with the o1 series, the total tokens generated can exceed the number of visible tokens due to the internal reasoning tokens.

Because some applications might rely on max_tokens matching the number of tokens received from the API, the o1 series introduces max_completion_tokens to explicitly control the total number of tokens generated by the model, including both reasoning and visible completion tokens. This explicit opt-in ensures no existing applications break when using the new models. The max_tokens parameter continues to function as before for all previous models.

Allocating space for reasoning

If the generated tokens reach the context window limit or the max_completion_tokens value you've set, you'll receive a chat completion response with the finish_reason set to length. This might occur before any visible completion tokens are produced, meaning you could incur costs for input and reasoning tokens without receiving a visible response.

To prevent this, ensure there's sufficient space in the context window or adjust the max_completion_tokens value to a higher number. OpenAI recommends reserving at least 25,000 tokens for reasoning and outputs when you start experimenting with these models. As you become familiar with the number of reasoning tokens your prompts require, you can adjust this buffer accordingly.

Advice on prompting

These models perform best with straightforward prompts. Some prompt engineering techniques, like few-shot prompting or instructing the model to "think step by step," may not enhance performance and can sometimes hinder it. Here are some best practices:

  • Keep prompts simple and direct: The models excel at understanding and responding to brief, clear instructions without the need for extensive guidance.
  • Avoid chain-of-thought prompts: Since these models perform reasoning internally, prompting them to "think step by step" or "explain your reasoning" is unnecessary.
  • Use delimiters for clarity: Use delimiters like triple quotation marks, XML tags, or section titles to clearly indicate distinct parts of the input, helping the model interpret different sections appropriately.
  • Limit additional context in retrieval-augmented generation (RAG): When providing additional context or documents, include only the most relevant information to prevent the model from overcomplicating its response.

Prompt examples

OpenAI o1 series models are able to implement complex algorithms and produce code. This prompt asks o1 to refactor a React component based on some specific criteria.

Ask o1 models to refactor code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
from openai import OpenAI

client = OpenAI()

prompt = """
Instructions:
- Given the React component below, change it so that nonfiction books have red
  text. 
- Return only the code in your reply
- Do not include any additional formatting, such as markdown code blocks
- For formatting, use four space tabs, and do not allow any lines of code to 
  exceed 80 columns

const books = [
  { title: 'Dune', category: 'fiction', id: 1 },
  { title: 'Frankenstein', category: 'fiction', id: 2 },
  { title: 'Moneyball', category: 'nonfiction', id: 3 },
];

export default function BookList() {
  const listItems = books.map(book =>
    <li>
      {book.title}
    </li>
  );

  return (
    <ul>{listItems}</ul>
  );
}
"""

response = client.chat.completions.create(
    model="o1-mini",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": prompt
                },
            ],
        }
    ]
)

print(response.choices[0].message.content)

Use case examples

Some examples of using o1 for real-world use cases can be found in the cookbook.