计算机的核心从 CPU 转向 GPU,上个时代依靠程序员写代码指挥 CPU 执行指令解决问题,构成了现在庞大的 IT 产业,程序员是中心。现在的时代逐渐转变,GPU 生产的 token 逐渐能解决越来越多的问题,能思考,能生成代码指挥 CPU 去执行解决问题,计算的核心一定会转向 GPU,世界对 GPU 的需求只会越来越高。
给 AI 分了四个阶段,Perception AI → Generative AI → Agentic AI → Physical AI,不是很认同,Agentic 和 Physical 都是 Generative AI 的延续,不过无所谓,可以看到 Agentic 这个概念实在是火爆。
Scaling Law 没有停止,Agentic AI 需要深度思考,深度思考有新的 Test-time Scaling Law,越多的 token 输出效果越好需,要多轮理解和工具调用对 token 的消耗更是指数级上涨。
Physical AI 要更好地理解现实世界,声音/视觉/触感,都会比纯文本思考对 token 消耗的诉求更高,像 2G 时代看文字新闻,3G 4G 图片,5G 视频一样。
多个 session 都在推广 NVidia 的 Video Search and Summarization Agent,串联从视频的获取→分割→VLM识别、CV物体识别和跟踪→数据处理存储和RAG召回→用户对话 整个流程,做到可以对视频提供实时分析和报警,也可以自然语言交互查询视频内容,边缘部署,适合用于监控,算是用 NVidia 技术栈做 AI 应用的一个标杆范例。
AIGC
关注了下视频 AIGC 相关的几个 Session
在好莱坞干了几十年的视觉效果的 Ed Ulbrich 开了个公司 Metaphysic,以前的电影特效制作成本巨大,对人的处理还很难跨过恐怖谷,而基于 AI 技术做特效,用完全不同的技术栈,效果好成本低,是一种颠覆。metaphysic 给娱乐行业提供人脸替换、数字人的服务,看起来是用的 GAN,在人物换脸技术上,GAN 还是更能做到稳定和实时,特别是实时这个点,基于 diffusion 很难做到。基于市场需求,利用已有的不同技术(甚至是上一代技术)深入解决问题,是有空间的。
但模型本身目前解决不了所有问题,还需要工程上的一些策略和串联做优化。例如 Tree of Thought 让任务不是以线性一步步执行的形式,而是生成解决问题的多个节点,多角度思考问题,形成树结构的任务,评估节点的价值,在里面寻找最优解。 Reflexion 会有 Evaluator 对各种反馈(工具调用结果/模型输出/用户指令)进行反思,梳理改进方向,也会把反思结果作为知识库经验,指导后续的任务。
Claude 3.7 Sonnet 在万众期待中推出了,为什么期待,因为从 Claude 3.5 Sonnet 发布后,一直是AI Coding Agent 领域最好的模型,综合效果没有对手,后面陆续推出的 o1/o3/DeepSeek 都没能撼动,更让人期待 Claude 3.7 Sonnet 在 AI Coding 领域能不能有进一步提升。
Claude 3.7 放出来的 Benchmark 里,有两个是跟 AI Coding Agent 表现强相关的:
SWE-bench Verified: Claude 3.7 Sonnet scores 62.3% out of 500 problems using pass@1 with bash/editor tools plus a "thinking tool" for single-attempt patches — no additional test-time compute used. 70.3% score uses internal scoring and custom scaffold on a reduced subset of problems. Scaffold Deepseek R1 results use the "Agentless" framework. OpenAI results from o3-mini system card cover a different subset of problems with a custom compute.
Claude 没有具体说明怎么做检索代码的,官方blog附录里提到在运行环境上是用了极简的方案,只提供了命令和编辑文件的能力,看起来是只把代码仓库目录扔给 LLM,让它自行去做文件搜索。另外有提到 Deepseek 的跑分是基于 Agentless 框架跑的,Agentless 专门介绍说明了如何跑 SWE-Bench,以及具体是怎么做代码检索的。见 Agentless/README_swebench.md。
{
"annotator": 0,
"user_id": "omar_rossi_1241",
"instruction": "Your user id is omar_rossi_1241. For your upcoming trip from New York to Chicago, you want to change the passenger to yourself, upgrade it to economy class, and have 3 checked bags. You prefer gift card payment. Your birthday is in your user profile so you do not prefer to provide it. You are reactive to the agent and will not say anything that is not asked.",
"actions": [
{
"name": "update_reservation_flights",
"arguments": {
"reservation_id": "FQ8APE",
"cabin": "economy",
"flights": [
{
"flight_number": "HAT056",
"date": "2024-05-25",
},
{
"flight_number": "HAT138",
"date": "2024-05-25",
},
],
"payment_id": "gift_card_8190333",
},
},
{
"name": "update_reservation_passengers",
"arguments": {
"reservation_id": "FQ8APE",
"passengers": [
{
"first_name": "Omar",
"last_name": "Rossi",
"dob": "1970-06-06",
}
],
},
},
{
"name": "update_reservation_baggages",
"arguments": {
"reservation_id": "FQ8APE",
"total_baggages": 3,
"nonfree_baggages": 0,
"payment_id": "gift_card_8190333",
},
},
],
},
转化后这个case实际跟大模型交互的过程:
{"traj": [
{
"role": "system",
"content": "# Airline Agent Policy\n\nThe current time is 2024-05-15 15:00:00 EST.\n\nAs an airline agent, you can help users book, modify, or cancel flight reservations.\n\n- Before taking any actions that update the booking database (booking, modifying flights, editing baggage, upgrading cabin class, or updating passenger information), you must list the action details and obtain explicit user confirmation (yes) to proceed.\n\n- You should not provide any information, knowledge, or procedures not provided by the user or available tools, or give subjective recommendations or comments.\n\n- You should only make one tool call at a time, and if you make a tool call, you should not respond to the user simultaneously. If you respond to the user, you should not make a tool call at the same time.\n\n- You should deny user requests that are against this policy.\n\n- You should transfer the user to a human agent if and only if the request cannot be handled within the scope of your actions.\n\n## Domain Basic\n\n- Each user has a profile containing user id, email, addresses, date of birth, payment methods, reservation numbers, and membership tier.\n\n- Each reservation has an reservation id, user id, trip type (one way, round trip), flights, passengers, payment methods, created time, baggages, and travel insurance information.\n\n- Each flight has a flight number, an origin, destination, scheduled departure and arrival time (local time), and for each date:\n - If the status is "available", the flight has not taken off, available seats and prices are listed.\n - If the status is "delayed" or "on time", the flight has not taken off, cannot be booked.\n - If the status is "flying", the flight has taken off but not landed, cannot be booked.\n\n## Book flight\n\n- The agent must first obtain the user id, then ask for the trip type, origin, destination.\n\n- Passengers: Each reservation can have at most five passengers. The agent needs to collect the first name, last name, and date of birth for each passenger. All passengers must fly the same flights in the same cabin.\n\n- Payment: each reservation can use at most one travel certificate, at most one credit card, and at most three gift cards. The remaining amount of a travel certificate is not refundable. All payment methods must already be in user profile for safety reasons.\n\n- Checked bag allowance: If the booking user is a regular member, 0 free checked bag for each basic economy passenger, 1 free checked bag for each economy passenger, and 2 free checked bags for each business passenger. If the booking user is a silver member, 1 free checked bag for each basic economy passenger, 2 free checked bag for each economy passenger, and 3 free checked bags for each business passenger. If the booking user is a gold member, 2 free checked bag for each basic economy passenger, 3 free checked bag for each economy passenger, and 3 free checked bags for each business passenger. Each extra baggage is 50 dollars.\n\n- Travel insurance: the agent should ask if the user wants to buy the travel insurance, which is 30 dollars per passenger and enables full refund if the user needs to cancel the flight given health or weather reasons.\n\n## Modify flight\n\n- The agent must first obtain the user id and the reservation id.\n\n- Change flights: Basic economy flights cannot be modified. Other reservations can be modified without changing the origin, destination, and trip type. Some flight segments can be kept, but their prices will not be updated based on the current price. The API does not check these for the agent, so the agent must make sure the rules apply before calling the API!\n\n- Change cabin: all reservations, including basic economy, can change cabin without changing the flights. Cabin changes require the user to pay for the difference between their current cabin and the new cabin class. Cabin class must be the same across all the flights in the same reservation; changing cabin for just one flight segment is not possible.\n\n- Change baggage and insurance: The user can add but not remove checked bags. The user cannot add insurance after initial booking.\n\n- Change passengers: The user can modify passengers but cannot modify the number of passengers. This is something that even a human agent cannot assist with.\n\n- Payment: If the flights are changed, the user needs to provide one gift card or credit card for payment or refund method. The agent should ask for the payment or refund method instead.\n\n## Cancel flight\n\n- The agent must first obtain the user id, the reservation id, and the reason for cancellation (change of plan, airline cancelled flight, or other reasons)\n\n- All reservations can be cancelled within 24 hours of booking, or if the airline cancelled the flight. Otherwise, basic economy or economy flights can be cancelled only if travel insurance is bought and the condition is met, and business flights can always be cancelled. The rules are strict regardless of the membership status. The API does not check these for the agent, so the agent must make sure the rules apply before calling the API!\n\n- The agent can only cancel the whole trip that is not flown. If any of the segments are already used, the agent cannot help and transfer is needed.\n\n- The refund will go to original payment methods in 5 to 7 business days.\n\n## Refund\n\n- If the user is silver/gold member or has travel insurance or flies business, and complains about cancelled flights in a reservation, the agent can offer a certificate as a gesture after confirming the facts, with the amount being $100 times the number of passengers.\n\n- If the user is silver/gold member or has travel insurance or flies business, and complains about delayed flights in a reservation and wants to change or cancel the reservation, the agent can offer a certificate as a gesture after confirming the facts and changing or cancelling the reservation, with the amount being $50 times the number of passengers.\n\n- Do not proactively offer these unless the user complains about the situation and explicitly asks for some compensation. Do not compensate if the user is regular member and has no travel insurance and flies (basic) economy.\n"
},
{
"role": "user",
"content": "Hi! I'd like to make a few changes to my trip from New York to Chicago. Can you help me with that?"
},
{
"role": "assistant",
"content": "Of course! I can help you modify your trip. Could you please provide your user ID and the reservation ID for the trip you want to change?",
},
{
"role": "user",
"content": "Sure! My user ID is omar_rossi_1241. But I'm not sure about the reservation ID at the moment."
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"function": {
"arguments": "{"user_id":"omar_rossi_1241"}",
"name": "get_user_details"
},
"id": "call_FXi5dyufwOlkHksVgNwVhhVB",
"type": "function"
}
]
},
{
"role": "assistant",
"content": "I found your reservations. Could you please confirm which reservation ID you'd like to modify for your trip from New York to Chicago? Here are your reservation IDs:\n\n- UM3OG5\n- 5RJ7UH\n- FQ8APE\n- QKRY03\n\nLet me know which one you'd like to modify!",
},
{
"role": "user",
"content": "I'm not entirely certain, but I believe it might be the first one, UM3OG5."
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"function": {
"arguments": "{"reservation_id":"UM3OG5"}",
"name": "get_reservation_details"
},
"id": "call_FApEDaUHdL2hx8FNbu5UCMb8",
"type": "function"
}
]
},
{
"role": "assistant",
"content": "The reservation ID "UM3OG5" is for a round trip from Seattle (SEA) to Dallas/Fort Worth (DFW), not from New York to Chicago. \n\nCould you please check if any of the other reservation IDs match your trip from New York to Chicago? Here are the remaining reservation IDs:\n\n- 5RJ7UH\n- FQ8APE\n- QKRY03\n\nLet me know which one you'd like to modify!",
},
{
"role": "user",
"content": "In that case, I think it might be FQ8APE. Can you check that for me?"
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"function": {
"arguments": "{"reservation_id":"FQ8APE"}",
"name": "get_reservation_details"
},
"id": "call_W507pQxieFVdAxvqONuYWvN2",
"type": "function"
}
]
},
{
"content": "The reservation ID "FQ8APE" is for a one-way trip from Newark (EWR) to Chicago (ORD) with a stop in Houston (IAH). Since this reservation is in basic economy, please note that the flights cannot be modified. However, I can assist with other changes like cabin upgrades or baggage additions.\n\nPlease let me know what specific changes you would like to make to this reservation!",
"role": "assistant"
},
{
"role": "user",
"content": "Great! I'd like to add three checked bags, change the passenger to myself, and upgrade to economy class, please."
},
{
"role": "assistant",
"content": "Here are the changes you want to make to your reservation:\n\n1. **Add three checked bags**: \n - Since you are a gold member and upgrading to economy class, you will have 3 free checked bags, so no additional charges for the bags.\n\n2. **Change the passenger to yourself**:\n - Current passenger: Ivan Garcia\n - New passenger: Omar Rossi (DOB: 1970-06-06)\n\n3. **Upgrade to economy class**:\n - Upgrade from basic economy to economy class.\n\nPlease confirm if you would like to proceed with these changes.",
},
{
"role": "user",
"content": "Yes, I'd like to proceed with those changes. And please use a gift card for any payment needed."
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"function": {
"arguments": "{"reservation_id": "FQ8APE", "passengers": [{"first_name": "Omar", "last_name": "Rossi", "dob": "1970-06-06"}]}",
"name": "update_reservation_passengers"
},
"id": "call_jK7xz4ERk3csc9jBfroPobGs",
"type": "function"
}
]
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"function": {
"arguments": "{"reservation_id":"FQ8APE","cabin":"economy","flights":[{"origin":"EWR","destination":"IAH","flight_number":"HAT056","date":"2024-05-25"},{"origin":"IAH","destination":"ORD","flight_number":"HAT138","date":"2024-05-25"}],"payment_id":"gift_card_8190333"}",
"name": "update_reservation_flights"
},
"id": "call_zeyT5c2EYzRvfY42X7YOKOng",
"type": "function"
}
]
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"function": {
"arguments": "{"reservation_id":"FQ8APE","total_baggages":3,"nonfree_baggages":0,"payment_id":"gift_card_8190333"}",
"name": "update_reservation_baggages"
},
"id": "call_PA1XaKLPX8egjewaxIArCkRc",
"type": "function"
}
]
},
{
"role": "assistant",
"content": "Your reservation has been successfully updated with the following changes:\n\n1. **Passenger Change**: The passenger is now Omar Rossi.\n2. **Cabin Upgrade**: Upgraded to economy class.\n3. **Checked Baggage**: Added three checked bags, all free due to your gold membership.\n\nThe payment for the cabin upgrade was processed using your gift card. If you have any more questions or need further assistance, feel free to ask!",
},
{
"role": "user",
"content": "Thank you for helping with the changes! That's all for now. ###STOP###"
}
],
}
PPO 用 Reward Model 计算大模型完整输出的奖励分数(r),过程中会跟原始模型的输出做对比(KL),避免偏离太远。大模型每步 token 输出,都会经过 Value Model 判定输出对后续奖励的期望值(v),经过 GAE 函数计算每步动作的好坏程度,更新我们在训练的大模型 Policy Model 参数。
GRPO 去掉了 Value Model,对每个输入(q)输出多个结果o1 o2 o3 …,奖励模型判断这些结果的好坏,从中选出最好的,更新模型参数。对算法和公式更详细的介绍,推荐这两个讲解:(1),(2)
2. Aha moment:训练过程中模型学会了停下来重新评估前面的思考,而且使用拟人的口气进行了反思。强化学习过程中人类没有输入任何类似的反思引导,模型能自主进行这种反思,十分有趣。我理解为,在预训练的模型里,模型已经有这样的反思意识潜力,在训练的某次输出过程中偶然出现,而出现后的结果对复杂推理有帮助,强化学习的机制会持续鼓励这样的输出。
训练的结果,在推理相关的 benchmark上,基本达到 o1 的水平:
R1-Zero 有非常有趣的发现,即不做监督学习,仅对预训练模型进行强化学习这条路是 OK 的,最终的模型有媲美o1的推理能力,但当前这种方式训练出的 R1-Zero 实际对外使用会有些问题,主要是对人不友好,比如输出的内容中英文混杂,猜测是奖励模型并没有做这种人类阅读友好的奖励判定,模型没理由能做好这块。
R1
为了做出一个推理能力强,输出对人类友好,综合能力 OK 的模型,DeepSeek 另外训练了R1模型。