批处理 API

使用 Batch API 异步处理作业。

了解如何使用 OpenAI 的 Batch API 发送异步请求组，成本降低 50%，单独的速率限制池明显更高，并且有明确的 24 小时周转时间。该服务非常适合处理不需要立即响应的作业。您还可以直接在此处浏览 API 参考。source

概述

虽然 OpenAI Platform 的某些用途要求您发送同步请求，但在许多情况下，请求不需要立即响应或速率限制会阻止您快速执行大量查询。批处理作业在以下使用案例中通常很有帮助：source

运行评估
对大型数据集进行分类
嵌入内容存储库

Batch API 提供了一组简单的终端节点，允许您将一组请求收集到一个文件中，启动批处理作业以执行这些请求，在执行底层请求时查询该批次的状态，并最终在批次完成时检索收集的结果。source

与直接使用标准终端节点相比，Batch API 具有：source

更高的成本效益：与同步 API 相比，成本降低 50%
更高的速率限制：与同步 API 相比，净空要大得多
快速完成时间：每个批次在 24 小时内完成（通常更快）

开始

1. 准备批处理文件

批处理以.jsonl文件，其中每行都包含对 API 的单个请求的详细信息。目前，可用的终端节点包括/v1/chat/completions (聊天完成 API）和/v1/embeddings (Embeddings API 的 API 中）。对于给定的输入文件，每行的body字段与底层终端节点的参数相同。每个请求必须包含一个唯一的custom_id值，可用于在完成后引用结果。下面是一个包含 2 个请求的输入文件示例。请注意，每个 Importing 文件只能包含对单个模型的请求。source

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}

2. 上传 Batch 输入文件

与我们的 Fine-tuning API 类似，您必须先上传输入文件，以便在启动批处理时可以正确引用它。上传您的.jsonl文件。source

为 Batch API 上传文件

1
2
3
4
5
6
7
from openai import OpenAI
client = OpenAI()

batch_input_file = client.files.create(
  file=open("batchinput.jsonl", "rb"),
  purpose="batch"
)

3. 创建 Batch

成功上传输入文件后，您可以使用输入 File 对象的 ID 创建批处理。在本例中，我们假设文件 ID 为file-abc123.目前，完成窗口只能设置为24h.您还可以通过可选的metadata参数。source

创建 Batch

1
2
3
4
5
6
7
8
9
10
batch_input_file_id = batch_input_file.id

client.batches.create(
    input_file_id=batch_input_file_id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={
      "description": "nightly eval job"
    }
)

此请求将返回一个 Batch 对象，其中包含有关您的批处理的元数据：source

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{
  "id": "batch_abc123",
  "object": "batch",
  "endpoint": "/v1/chat/completions",
  "errors": null,
  "input_file_id": "file-abc123",
  "completion_window": "24h",
  "status": "validating",
  "output_file_id": null,
  "error_file_id": null,
  "created_at": 1714508499,
  "in_progress_at": null,
  "expires_at": 1714536634,
  "completed_at": null,
  "failed_at": null,
  "expired_at": null,
  "request_counts": {
    "total": 0,
    "completed": 0,
    "failed": 0
  },
  "metadata": null
}

4. 检查批次的状态

您可以随时检查批处理的状态，这也会返回一个 Batch 对象。source

检查批处理的状态

1
2
3
4
from openai import OpenAI
client = OpenAI()

client.batches.retrieve("batch_abc123")

给定 Batch 对象的状态可以是以下任何一种：source

地位	描述
`validating`	在批处理开始之前，正在验证输入文件
`failed`	输入文件未通过验证过程
`in_progress`	输入文件已成功验证，并且批处理当前正在运行
`finalizing`	批次已完成，正在准备结果
`completed`	批处理已完成，结果已准备就绪
`expired`	批处理无法在 24 小时的时间范围内完成
`cancelling`	批次正在被取消（最多可能需要 10 分钟）
`cancelled`	批次已取消

5. 检索结果

批处理完成后，您可以通过output_file_id字段，并将其写入计算机上的文件（在本例中为batch_output.jsonlsource

检索批处理结果

1
2
3
4
5
from openai import OpenAI
client = OpenAI()

file_response = client.files.content("file-xyz123")
print(file_response.text)

输出.jsonlfile 将对输入文件中的每个成功请求行都有一个响应行。批处理中任何失败的请求都会将其错误信息写入错误文件，该文件可以通过批处理的error_file_id.source

请注意，输出线序可能与输入线序不匹配。而不是依靠 order 来处理您的结果，请使用 custom_id 字段，该字段将是存在于输出文件的每一行中，并允许您在输入中映射请求的结果。source

{"id": "batch_req_123", "custom_id": "request-2", "response": {"status_code": 200, "request_id": "req_123", "body": {"id": "chatcmpl-123", "object": "chat.completion", "created": 1711652795, "model": "gpt-3.5-turbo-0125", "choices": [{"index": 0, "message": {"role": "assistant", "content": "Hello."}, "logprobs": null, "finish_reason": "stop"}], "usage": {"prompt_tokens": 22, "completion_tokens": 2, "total_tokens": 24}, "system_fingerprint": "fp_123"}}, "error": null}
{"id": "batch_req_456", "custom_id": "request-1", "response": {"status_code": 200, "request_id": "req_789", "body": {"id": "chatcmpl-abc", "object": "chat.completion", "created": 1711652789, "model": "gpt-3.5-turbo-0125", "choices": [{"index": 0, "message": {"role": "assistant", "content": "Hello! How can I assist you today?"}, "logprobs": null, "finish_reason": "stop"}], "usage": {"prompt_tokens": 20, "completion_tokens": 9, "total_tokens": 29}, "system_fingerprint": "fp_3ba"}}, "error": null}

6. 取消批次

如有必要，您可以取消正在进行的批处理。批处理的状态将更改为cancelling在正在进行的请求完成之前（最多 10 分钟），之后状态将更改为cancelled.source

取消批处理

1
2
3
4
from openai import OpenAI
client = OpenAI()

client.batches.cancel("batch_abc123")

7. 获取所有批次的列表

您可以随时查看所有批次。对于具有多个批次的用户，您可以使用limit和after参数对结果进行分页。source

获取所有批次的列表

1
2
3
4
from openai import OpenAI
client = OpenAI()

client.batches.list(limit=10)

模型可用性

Batch API 当前可用于对以下模型执行查询。Batch API 支持与这些模型的终端节点格式相同的文本和视觉输入：source

gpt-4o
gpt-4o-2024-08-06
gpt-4o-mini
gpt-4-turbo
gpt-4
gpt-4-32k
gpt-3.5-turbo
gpt-3.5-turbo-16k
gpt-4-turbo-preview
gpt-4-vision-preview
gpt-4-turbo-2024-04-09
gpt-4-0314
gpt-4-32k-0314
gpt-4-32k-0613
gpt-3.5-turbo-0301
gpt-3.5-turbo-16k-0613
gpt-3.5-turbo-1106
gpt-3.5-turbo-0613
text-embedding-3-large
text-embedding-3-small
text-embedding-ada-002

Batch API 还支持微调模型。source

速率限制

批量 API 速率限制与现有的每个模型的速率限制是分开的。Batch API 有两种新的速率限制类型：source

每批限制：单个批处理最多可包含 50,000 个请求，批处理输入文件的大小最大可达 200 MB。请注意，/v1/embeddings批处理还限制为批处理中所有请求的最多 50,000 个嵌入输入。
每个模型的 Enqueued prompt tokens：每个模型都有允许用于批处理的最大排队提示令牌数。您可以在 Platform Settings （平台设置）页面上找到这些限制。

目前，Batch API 的输出令牌或提交的请求数没有限制。由于 Batch API 速率限制是一个新的单独池，因此使用 Batch API 不会消耗标准每个模型速率限制中的令牌，从而为您提供了一种便捷的方式来增加查询 API 时可以使用的请求和已处理令牌的数量。source

批处理过期

未及时完成的批处理最终会移至expired州;该批处理中未完成的请求将被取消，并且对已完成请求的任何响应都将通过批处理的输出文件提供。您需要为任何已完成请求消耗的令牌付费。source

过期的请求将写入您的错误文件，并显示如下消息。您可以使用custom_id检索过期请求的请求数据。source

{"id": "batch_req_123", "custom_id": "request-3", "response": null, "error": {"code": "batch_expired", "message": "This request could not be executed before the completion window expired."}}
{"id": "batch_req_123", "custom_id": "request-7", "response": null, "error": {"code": "batch_expired", "message": "This request could not be executed before the completion window expired."}}

其他资源

有关更具体的示例，请访问 OpenAI Cookbook，其中包含分类、情绪分析和摘要生成等使用案例的示例代码。source