2025-01-10 00:16:51
Finally got my copy! “AI Engineering” is officially out 🙏 🎉
2024-12-13 11:44:56
During the process of writing AI Engineering, I went through so many papers, case studies, blog posts, repos, tools, etc. This repo contains ~100 resources that really helped me understand various aspects of building with foundation models.
https://github.com/chiphuyen/aie-book/blob/main/resources.md
Here are the highlights:
1. Anthropic’s Prompt Engineering Interactive Tutorial
The Google Sheets-based interactive exercises make it easy to experiment with different prompts and see immediately what works and what doesn’t. I’m surprised other model providers don’t have similar interactive guides: https://docs.google.com/spreadsheets/d/19jzLgRruG9kjUQNKtCg1ZjdD6l6weA6qRXG5zLIAhC8/edit
2. OpenAI’s best practices for finetuning
While this guide focuses on GPT-3, many techniques are applicable to full finetuning in general. It explains how finetuning works, how to prepare training data, how to pick training hyperparameters, and common finetuning mistakes: https://docs.google.com/document/d/1rqj7dkuvl7Byd5KQPUJRxc19BJt8wo0yHNwK84KfU3Q/edit
3. Llama 3 paper
The section on post-training data is a gold mine as it details different techniques they used to generate 2.7 million examples for supervised finetuning. It also covers a crucial but less talked about topic: data verification, how to evaluate the quality of synthetic data: https://arxiv.org/abs/2407.21783
4. Efficiently Scaling Transformer Inference (Pope et al., 2022)
An amazing paper co-authored by Jeff Dean about inference optimization for transformers models. It covers not only different optimization techniques and their tradeoffs, but also provides a guideline for what to do if you want to optimize for different aspects, e.g. lowest possible latency, highest possible throughput, or longest context length: https://arxiv.org/abs/2211.05102
5. Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models (Lu et al., 2023)
My favorite study on LLM planners, how they use tools, and their failure modes. An interesting finding is that different LLMs have different tool preferences: https://arxiv.org/abs/2304.09842
6. AI Incident Database
For those interested in seeing how AI can go wrong, this contains over 3000 reports of AI harms: https://incidentdatabase.ai/
7. I find case studies from teams that have successfully deployed AI applications extremely educational. Here are some of my favorite enterprise case studies. I'll add more case studies soon!
- LinkedIn: https://www.linkedin.com/blog/engineering/generative-ai/musings-on-building-a-generative-ai-product
- Pinterest's Text-to-SQL:
https://medium.com/pinterest-engineering/how-we-built-text-to-sql-at-pinterest-30bad30dabff
- Gmail’s Smart Compose (2019): https://arxiv.org/abs/1906.00080
- Grab: https://engineering.grab.com/llm-powered-data-classification
2024-12-05 03:02:17
It’s done! 150,000 words, 200+ illustrations, 250 footnotes, and over 1200 reference links.
My editor just told me the manuscript has been sent to the printers.
- The ebook will be coming out later this week.
- Paperback copies should be available in a few weeks (hopefully before the end of the year). Preorder: https://amzn.to/49j1cGS
- The full manuscript is also accessible on O'Reilly platform: https://oreillymedia.pxf.io/c/5719111/2146021/15173
This wouldn’t have been possible without the help of so many people who reviewed the early drafts, answered my thousands of questions, introduced me to fascinating use cases, or helped me see the beauty of overlooked techniques.
Thank you everyone for making this happen!
2024-10-10 08:06:02
Re @Luke_Metz @barret_zoph @LiamFedus @johnschulman2 What a ride! Can't wait to see what you'll build next
2024-07-26 01:10:47
Building a platform for generative AI applications
https://huyenchip.com/2024/07/25/genai-platform.html
After studying how companies deploy generative AI applications, I noticed many similarities in their platforms. This post outlines these common components, what they do, and implementation considerations.
This post starts from the simplest architecture and progressively adds more components.
1. Enhance context input into a model by giving the model access to external data sources and tools for information gathering.
2. Put in guardrails to protect your system and your users.
3. Add model router and gateway to support complex pipelines and add more security.
4. Optimize for latency and costs with cache.
5. Add complex logic and write actions to maximize your system’s capabilities.
I try my best to keep the architecture general, but certain applications might deviate. As always, feedback is appreciated!
2024-06-27 01:57:05
Re 3. Hallucinations make GenAI applications unusable
Models hallucinate because they are probabilistic. However, a model is much more likely to hallucinate when it doesn’t have access to the right information.
Multiple studies have shown that hallucinations can be significantly reduced by giving the model the right context via retrieval or tools that the model can use to gather context (e.g., web search).
While hallucinations are dealbreakers for many applications, they can be sufficiently curtailed to make GenAI usable for many more.