看完这篇，你也能做 AI 搜索：论「结构化输出」

赛博禅心 2024-08-19 11:12:18

先说结论：结构化输出，是绝大多数 AI 产品和 Agent 的核心

无论是 AI 搜索、有记忆的 bot，还是各类 agent，都是基于结构化输出搭建的。

在这篇文章里，我讲从多角度讲一下结构化输出的用途、起源和示例。

同时，文章会附带一些代码片段，方便更直观的表现（当然，完全不看关系也没问题）

一、什么是「格式化输出」

如果我问 GPT：介绍一下乔布斯

通常会收到这样的回复：

史蒂夫·乔布斯，1955年2月24日出生，2011年10月5日去世，美国人。他活跃于科技、创新、企业管理和动画领域。乔布斯创立了Apple、NeXT和Pixar公司，推出了Mac、iPod、iPhone等具有划时代意义的产品，重塑了个人电脑、音乐和手机行业，奠定了苹果在全球科技领域的领军地位。作为Pixar的创办人之一，他也在动画领域留下了深远的影响。乔布斯是20世纪末至21世纪初最具影响力的企业家和创新者之一。

而在很多 AI 产品中，我们并非看到一串文字，而是这样的脑图：

其背后，就是结构化输出 – 让 AI 输出 json，而非文本，比如：

{
“name”: “史蒂夫·乔布斯”,
“birth_date”: “1955-02-24”,
“death_date”: “2011-10-05”,
“nationality”: “美国”,
“fields”: [“科技”, “创新”, “企业管理”, “动画”],
“companies_founded”: [“Apple”, “NeXT”, “Pixar”],
“achievements”: [
“创立苹果公司”,
“推出Mac、iPod、iPhone等产品”,
“重塑个人电脑、音乐、手机行业”,
“奠定苹果全球科技领军地位”,
“Pixar创办人之一”
],
“influence”: “20世纪末至21世纪初最具影响力的企业家和创新者之一”
}

二、产品的背后，都是结构化输出

依然拿「介绍一下乔布斯」这个问题举例，在不同 AI 产品中，这个问题的内部输出是不同的。

如果是搜索，它的内部输出可能是这样：

{
“query”: “乔布斯”,
“search_by”: “Google”
}

获得这个结果后，再用谷歌搜索「乔布斯」，并将结果通过 AI 总结，返回给用户。

对于 Rag 工具，其数据库为《硅谷县志》，它的内部输出可能是这样

{
“rag1”: “乔布斯的家庭”,
“rag2”: “乔布斯的成长”,
“rag3”: “乔布斯的产品”,
“rag4”: “乔布斯的成就”,
}

分别对这几个信息进行 rag 后，在把结果汇总，通过 AI 总结，返回给用户。

对于四格漫画，其内部输出可能是这样的：

{
“stories”: [
{
“story”: “乔布斯的家庭”,
“prompt”: “20世纪70年代复古风格，温暖的色调，柔和的线条。在美国加州的一间温馨家庭住宅，窗外阳光明媚，庭院中充满绿植和鲜花。年轻的乔布斯与他的养父母在客厅里，其母亲在织毛衣，父亲在看报纸，乔布斯坐在地上玩着一台老式计算机。画面呈现出和谐温馨的家庭场景，浓厚的亲情氛围中，乔布斯的眼中充满了好奇与探索。”,
“caption”: “家庭的力量塑造了伟大的梦想”},
{
“story”: “乔布斯的成长”,
“prompt”: “1970年代末期的黑白摄影风格，带有强烈的对比效果。在旧金山一所简朴的高中教室，光线从窗外斜射进来，课桌上摆满了书本和笔记。年轻的乔布斯坐在教室后排，注视着老师手中的物理实验，身边的同学们都在认真听课。画面体现了乔布斯对知识的渴望，眼神专注，透出不凡的好奇心和思考的深度。”,
“caption”: “追求知识与个人成长”},
{
“story”: “乔布斯的产品”,
“prompt”: “极简主义风格，采用现代化的色彩搭配，注重设计感。在苹果公司现代化的办公室内，简洁的玻璃桌面上摆放着第一代Macintosh，背景是白色的墙壁和大型苹果标志。乔布斯站在桌前，手指轻触Macintosh，身后几位工程师在讨论。画面重点突出乔布斯与他的产品，展示出科技与设计的完美结合，乔布斯的神态自信且充满远见。”,
“caption”: “通过产品改变世界”},
{
“story”: “乔布斯的成就”,
“prompt”: “超现实主义风格，带有未来感，色彩鲜明且具有冲击力。在庞大的苹果公司总部前，未来风格的天空中悬浮着乔布斯的头像，周围环绕着iPhone、iPad、Mac等产品。乔布斯的巨大肖像与天空中的科技产品融为一体，象征着他对现代科技的深远影响。画面展现了一幅震撼的图景，乔布斯的形象如同神话般屹立在现代科技的顶峰。”,
“caption”: “达到科技的巅峰”
}
]}

然后分别对这几个信息，进行画图，在展示给用户。

三、以「AI 天气预报」为例

现在换个例子：我有一个天气预报 AI，如果用户问到了天气，则进行告知。

实际上，这个 AI 并不是真的用 AI 去实时预测，而是问题，转化成一个请求，去查询「天气预报数据库」，再返回给用户（当然，如果用户的问题和天气无关，则正常回答）。

具体的流程如下：

从程序的角度，执行以下：

1）通过结构化输出，判断该问题和天气有关，并拆解出 location 和 date 两个字段分别是北京和明天

2）向接口请求信息北京和明天（严谨来说是，Beijing 和 2024-08-16）

3）AI 进行结构化返回：

{
“date”: “2024-08-16”,
“location”: “北京”,
“temperature”:
{
“high”: “32°C”,
“low”: “24°C”
},
“weather”: “雷阵雨，云层厚”,
“humidity”: “77%”,
“UV_index”: “高”,
“advice”: “携带雨具，注意防晒”
}

4）向接口发送类似下面的信息

client.chat.completions.create(
model=”gpt-4o”,
messages=[
{“role”: “system”, “content”: “你是天气预报机器人，今天是 2024-08-16”},
{“role”: “user”, “content”: “””
{“date”: “2024-08-16″,”location”: “北京”,”temperature”: {“high”: “32°C”,”low”: “24°C”},”weather”: “雷阵雨，云层厚”,”humidity”: “77%”,”UV_index”: “高”,”advice”: “携带雨具，注意防晒”}
北京明天的天气怎么样？
“””}
]
)

5）接口返回的结果，会类似：

明天北京的天气预报显示白天的气温将达到最高约32°C，夜间最低温度约为24°C。预计会有雷阵雨，全天云层较厚，湿度较高，达到77%左右。雷阵雨可能会在上午和下午出现，因此出行时建议携带雨具，并注意防晒，因为紫外线指数较高。总的来说，天气闷热，空气湿润，体感温度可能会比实际温度更高

也可以利用「结构化输出」，对设备进行 IoT 适配。

比如，我学过电工（EE），就可以让 Coze 变成一个家庭中控，如：

家里的数据：

封装成 Coze Bot：

四、产业演化史

在 AI 领域，我们通常认为，结构化输出的第一次大规模使用，是源自去年 5 月 OpenAI 的 Plugin 正式上线：AI 可以通过结构化输出，来调用外部工具。

并且，截止到当前，OpenAI 在结构化输出这块，供进行了 4 次迭代，包括 Plugin 方法，Function Calling，Json Mode 和前两天新出的 Structured Outputs。

当然了，你也可以用 markdown 等 prompt 方法来模拟结构化输出，但不在本次的讨论范围。

Plugin 方法

在 2023 年 3 月，当时参与到 plugin 内测的朋友，会看到一份如何让 ChatGPT 调用外部工具的文档，也是结构化输出的雏形。

流程就和上文一样，ChatGPT 在获知用户的请求后，通过结构化输出的方式，生成包括插件选择在内的一个 json，插件在接受到这些参数后开始处理，并给到一个回调。之后这套东西，变成了 GPTs 的 Action。

注意：这套方法并未通过接口的方式发布

Function Calling

在 2023 年 6 月，OpenAI 带来了 0613 年中更新，并发布了 Function Calling，也是现在看来最广泛使用的调用方法，国内模型普遍支持。

下面，我们以一个更直观的例子，来看看 Function Calling 的使用过程。以用户查询包裹为例，这个 bot 处理任务的过程中，总计分 2 步：

1）用户向 AI 询问【我的包裹，编号12345，寄了吗？】的时候，其请求额外带上字段 tools，在其中定义要获取的信息 order_id

2）假设获取到的信息是 order_12345 ，通过查询数据库，获得包裹信息 2024-08-01

3）将这个信息，和历史提问合并，再交给大模型，获得最终输出包裹在 2024-08-01 的时候已经寄出去了

如果用代码的方式，就是：

tools = [
{
“type”: “function”,
“function”: {
“name”: “get_delivery_date”,
“description”: “Get the delivery date for a customer’s order. Call this whenever you need to know the delivery date, for example when a customer asks ‘Where is my package'”,
“parameters”: {
“type”: “object”,
“properties”: {
“order_id”: {
“type”: “string”,
“description”: “The customer’s order ID.”
}
},
“required”: [“order_id”],
“additionalProperties”: False
}
}
}
]

messages = []
messages.append({“role”: “system”, “content”: “You are a helpful customer support assistant. Use the supplied tools to assist the user.”})
messages.append({“role”: “user”, “content”: “Hi, can you tell me the delivery date for my order?”})
messages.append({“role”: “assistant”, “content”: “Hi there! I can help with that. Can you please provide your order ID?”})
messages.append({“role”: “user”, “content”: “i think it is order_12345”})

rsp = client.chat.completions.create(
model=’gpt-4o’,
messages=messages,
tools=tools
)

之后，AI 会返回类似：

ChatCompletion(id=’chatcmpl-9wY3ulTLZswqZLF58L0LQ0sM1EAsG’, choices=[Choice(finish_reason=’tool_calls’, index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role=’assistant’, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id=’call_W1KzfgxvkoxjCAGT3Td9oVPk’, function=Function(arguments='{“order_id”:”order_12345″}’, name=’get_delivery_date’), type=’function’)]))], created=1723740986, model=’gpt-4o-2024-05-13′, object=’chat.completion’, service_tier=None, system_fingerprint=’fp_3aa7262c27′, usage=CompletionUsage(completion_tokens=19, prompt_tokens=140, total_tokens=159))

其中 response.choices[0].message.tool_calls[0].function.arguments 的值，就是 {“order_id”:”order_12345″}

假定查询到的结果是 2024-08-01

# Prepare the chat completion call payload
completion_payload = {
“model”: “gpt-4o”,
“messages”: [
{“role”: “system”, “content”: “You are a helpful customer support assistant. Use the supplied tools to assist the user.”},
{“role”: “user”, “content”: “Hi, can you tell me the delivery date for my order?”},
{“role”: “assistant”, “content”: “Hi there! I can help with that. Can you please provide your order ID?”},
{“role”: “user”, “content”: “i think it is order_12345”},
rsp.choices[0].message,
{“role”: “tool”, “content”: “delivery_date：2024-08-01”, “tool_call_id”: rsp.choices[0].message.tool_calls[0].id},
]
}

# Call the OpenAI API’s chat completions endpoint to send the tool call result back to the model
response = client.chat.completions.create(
model=completion_payload[“model”],
messages=completion_payload[“messages”],
)

# Print the response from the API. In this case it will typically contain a message such as “The delivery date for your order #12345 is xyz. Is there anything else I can help you with?”
print(response)

最终，你会得到

ChatCompletion(id=’chatcmpl-9wYV7Yhkimzlpg3ejNkjRjI0GKqyw’, choices=[Choice(finish_reason=’stop’, index=0, logprobs=None, message=ChatCompletionMessage(content=’Your order with ID “order_12345” is scheduled to be delivered on August 1, 2024. If you have any other questions or need further assistance, feel free to ask!’, refusal=None, role=’assistant’, function_call=None, tool_calls=None))], created=1723742673, model=’gpt-4o-2024-05-13′, object=’chat.completion’, service_tier=None, system_fingerprint=’fp_3aa7262c27′, usage=CompletionUsage(completion_tokens=40, prompt_tokens=111, total_tokens=151))

也就是包裹在 2024-08-01 的时候已经寄出去了

回顾一下

上面完成这个对话的时候，用户给出了一次 prompt: i think it is order_12345，但 AI 实际上是跑了 2 次：

第一次是获取 order id

第二次才是真正是生成内容包裹在 2024-08-01 的时候已经寄出去了

同时，在第二次的对话中，结尾挂着第一次的 response 和数据库查找结果。

在数据库的查询结果中，role 为 tool

还需注意

如果你在某些代码中，看到 Function Calling 的查询信息，不是用 tool，而是用 function，这也没错。

因为 OpenAI 曾经改过 Function Calling 的接口实现：最开始是 function 结构，后面改成了 tool 结构。对于 tool 和 function 这两种写法，目前都行，但后续 OpenAI 将只支持 tool 结构

吐槽：我个人更喜欢 function 结构，更优雅

使用 tool 结构”messages”:

“messages”: [
{“role”: “system”, “content”: “You are a helpful customer support assistant. Use the supplied tools to assist the user.”},
{“role”: “user”, “content”: “Hi, can you tell me the delivery date for my order?”},
{“role”: “assistant”, “content”: “Hi there! I can help with that. Can you please provide your order ID?”},
{“role”: “user”, “content”: “i think it is order_12345”},
rsp.choices[0].message,
{“role”: “tool”, “content”: “delivery_date：2024-08-01”, “tool_call_id”: rsp.choices[0].message.tool_calls[0].id}]

使用 function 结构

“messages”: [
{“role”: “system”, “content”: “You are a helpful customer support assistant. Use the supplied tools to assist the user.”},

{“role”: “user”, “content”: “Hi, can you tell me the delivery date for my order?”},
{“role”: “assistant”, “content”: “Hi there! I can help with that. Can you please provide your order ID?”},
{“role”: “user”, “content”: “i think it is order_12345”},
{“role”: “function”, “content”: “delivery_date：2024-08-01”, “name”: “delevery_record”}]

使用 function 结构

“messages”:[
{“role”: “system”, “content”: “You are a helpful customer support assistant. Use the supplied tools to assist the user.”},
{“role”: “user”, “content”: “Hi, can you tell me the delivery date for my order?”},
{“role”: “assistant”, “content”: “Hi there! I can help with that. Can you please provide your order ID?”},
{“role”: “user”, “content”: “i think it is order_12345”},
{“role”: “function”, “content”: “delivery_date：2024-08-01”, “name”: “delevery_record”}
]

使用 function 结构

“messages”: [
{“role”: “system”, “content”: “You are a helpful customer support assistant. Use the supplied tools to assist the user.”},
{“role”: “user”, “content”: “Hi, can you tell me the delivery date for my order?”},
{“role”: “assistant”, “content”: “Hi there! I can help with that. Can you please provide your order ID?”},
{“role”: “user”, “content”: “i think it is order_12345”},
{“role”: “function”, “content”: “delivery_date：2024-08-01”, “name”: “delevery_record”}]

另外：也可以两种结构都不用

“messages”: [
{“role”: “system”, “content”: “You are a helpful customer support assistant. Use the supplied tools to assist the user.”},
{“role”: “user”, “content”: “Hi, can you tell me the delivery date for my order?”},
{“role”: “assistant”, “content”: “Hi there! I can help with that. Can you please provide your order ID?”},
{“role”: “user”, “content”: “i think it is order_12345. Related record is: delivery_date：2024-08-01”}]

Json Mode

在 2023 年 11 月，OpenAI 在开发者大会上，带来了 Json Mode 更新。

仔细看上面的 Function Calling，其参数是通过 string 给到的，不够稳定。Json Mode 便是为了解决这一问题：直接输出 Json。

注意：这种方法仍然不够稳定，并已被 Structured Outputs 取代

调用的时候，要求：

prompt 里出现 json 这个单词
response_format 设置为 “type”: “json_object”

比如

completion_payload = {
‘model’: ‘gpt-3.5-turbo’,
‘messages’: [{‘role’: ‘user’, ‘content’: ‘告诉我四大名著分别是什么，以及他们的作者是谁，按这个 json 格式: {{‘书名’:’xxx’，’作者’:’xxx’}…}’}],
‘response_format’: {‘type’: ‘json_object’}
}

# Call the OpenAI API’s chat completions endpoint to send the tool call result back to the model
response = client.chat.completions.create(
model=completion_payload[“model”],
messages=completion_payload[“messages”],
)

得到 resoponse 为Chat

Completion(id=’chatcmpl-9wZ5DHWicaarxccmTBGi8MfJsa6AQ’, choices=[Choice(finish_reason=’stop’, index=0, logprobs=None, message=ChatCompletionMessage(content=”{n {‘书名’: ‘西游记’, ‘作者’: ‘吴承恩’},n {‘书名’: ‘红楼梦’, ‘作者’: ‘曹雪芹’},n {‘书名’: ‘水浒传’, ‘作者’: ‘施耐庵’},n {‘书名’: ‘三国演义’, ‘作者’: ‘罗贯中’}n}”, refusal=None, role=’assistant’, function_call=None, tool_calls=None))], created=1723744911, model=’gpt-3.5-turbo-0125′, object=’chat.completion’, service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=92, prompt_tokens=57, total_tokens=149))

其中，通过 response.choices[0].message.content 可去到 json 信息，如需进行后续处理，依然沿用 function calling 中的方法

Structured Outputs

较之 Function Calling 和 Json Mode，Structured OutPuts 明显好用了很多，当前支持以下模型：gpt-4o-mini, gpt-4o-2024-08-06，当然，也包括之后的模型。

简单调试测试一下

刚才的四大名著的例子，代码这么写

from pydantic import BaseModel

class theBook(BaseModel):
name: str
writer: str

class theFour(BaseModel):
steps: list[theBook]

completion = client.beta.chat.completions.parse(
model=”gpt-4o-2024-08-06″,
messages=[
{“role”: “system”, “content”: “Extract the event information.”},
{“role”: “user”, “content”: “告诉我四大名著分别是什么，以及他们的作者是谁”},
],
response_format = theFour,
)

response = completion.choices[0].message.parsed

得到的结果是

theFour
(
steps=[
theBook(name=’《红楼梦》’, writer=’曹雪芹’),
theBook(name=’《西游记》’, writer=’吴承恩’),
theBook(name=’《三国演义》’, writer=’罗贯中’),
theBook(name=’《水浒传》’, writer=’施耐庵’)])

非常好用！

通过这种方法，还可以完成单次对话的 CoT，比如：

from pydantic import BaseModel

class Step(BaseModel):
explanation: str
output: str

class MathReasoning(BaseModel):
steps: list[Step]
final_answer: str

completion = client.beta.chat.completions.parse(
model=”gpt-4o-2024-08-06″,
messages=[
{“role”: “system”, “content”: “You are a helpful math tutor. Guide the user through the solution step by step.”},
{“role”: “user”, “content”: “how can I solve 8x + 7 = -23”} ], response_format=MathReasoning,)math_reasoning = completion.choices[0].message.parsed

得到结果

{
“steps”: [
{
“explanation”: “Start with the equation 8x + 7 = -23.”, “output”: “8x + 7 = -23”
},
{
“explanation”: “Subtract 7 from both sides to isolate the term with the variable.”, “output”: “8x = -23 – 7”
},
{
“explanation”: “Simplify the right side of the equation.”, “output”: “8x = -30”

},

{

“explanation”: “Divide both sides by 8 to solve for x.”, “output”: “x = -30 / 8”

},

{

“explanation”: “Simplify the fraction.”, “output”: “x = -15 / 4”

} ],

“final_answer”: “x = -15 / 4”

}