MoreRSS

site iconTim KelloggModify

AI architect, software engineer, and tech enthusiast.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Tim Kellogg

Powershell Users Like To Vomit

2025-01-05 08:00:00

In a stunning new study, PowerShell users insist that they like to vomit. How can this be? James Neno says, “PowerShell syntax is too wordy”, a quip about how PowerShell cmdlets can sometimes feel verbose. Similarly, Novel Logan said, “I have a lot of questions for those that chose vomiting lol”, indicating an apparently common sentiment about PowerShell users. Another respondant, going by the name “Fred” said, “[bash] makes me feel powerful and runs on all platforms […]”, a somewhat gross allusion to Bash user’s preference in contrast to PowerShell.

We caught up with Tim Kellogg, CEO of dentropy, he said:

This is certainly a surprising result. I wouldn’t have predicted this. In fact, I’m not sure what I would have predicted at all. But it’s data, we can see the data and it’s crystal clear. You can’t question the data.

You can’t question the data, quite true Tim, quite true. The Data Gods have spoken, and their wisdom sends a clear message: PowerShell users prefer vomitting, Bash users perfer diarrhea.

You can’t question the data.

A screenshot of a bar chart made in Excel. Bash+Diarrhea=86%, Bash+Vomitting=14%, PowerShell+Diarrhea=33%, PowerShell+Vomitting=66%


Cut! Cut! Cut!

Yes, you absolutely can question the data, please question the data.

The Poll

I manufactured the data. Sort of. Given a large enough yet also small enough sample size, a bias will form in the data. It’s not guaranteed, but close enough.

I sent a poll out on 🦋 Bluesky asking people if they preferred Bash or PowerShell, and another question about if they preferred diarrhea or vomiting. I cut it off at 10 respondants, the perfect number to show a clear trend, but not long enough for the Law of Large Numbers to take over and indicate a ridiculous result.

(btw be suspicious of social media polls, they’re not a random sample, they’re a poll of that person’s followers)

The Title

Technically there was a much stronger correlation with bash users, but you gotta admit that people who prefer vomitting are sus. Plus, people who prefer PowerShell are in the minority and there’s this weird tension such that focusing on PowerShell is guaranteed to bring heat to the conversation. The clicks don’t bait themselves.

The Quotes

For good measure, I took some of the quotes very out of context; I made one say the opposite of what it actually said by selectively quoting. Plus I quoted myself, lol.

The Data Gods Demand Obedience

Is data better than pure vibes?

I honestly don’t know. In a lot of ways, data is worse than vibes because it triggers a sort of virtue cycle that makes people stop questioning. Shuts their brains off.

But that’s not a problem with having data, it’s a problem with not questioning.

Carl Bergstrom & Jevin West wrote a book, “Calling Bullshit”, where they discuss several different categories of “bullshit”, or data that appears to be worth paying attention to but is actually lying. They also teach a class by the same title at University of Washington.

Data is good — for starting a conversation. Most mistakes with data are made in how it’s interpreted. Interpreting data is challenging, but like all challenges, it gets easier with practice. You can practice by talking about it with other people.

What should you talk about?

Doubt everything. Doubt yourself and your own biases, doubt other people’s biases, doubt the author’s interpretation of data. Doubt everything.

When questioning other’s interpretations, ask yourself if there’s another plausible interpretation. List them all. Practice makes perfect.

The AI Generation

Here’s the other thing: We can’t trust anything anyway. Get used to it.

Google places AI generated answers (that regularly hallucinate) before actual results. AI generated images and video now effortlessly impersonate elected officials. Politicians… (heh, you know).

Skepticism isn’t just your friend, it’s your only hope.

One day, you’ll be sitting in your living room and your daughter comes over after buying her first home at age 45, as generation alpha does. Talking on the phone she says, “he’s ranting again about AI hallucinations, I’m worried about him, it’s like he never learned to be skeptical.” Because kids learn stuff.

It might seem like a sudden change, but it’s really not. It’s been developing for 20 years. It’s never too late to learn.

Be skeptical. Talk about the data.

Normware: The Decline of Software Engineering

2025-01-02 08:00:00

There’s a common refrain that “AI won’t replace software engineers”, often with the resolution, “you’ll be replaced by software engineers that use AI”, implying that AI will be used to make software engineers more productive and efficient. I argue that software engineers will be replaced by normies, non-engineers (accountants, lawyers, etc.) that are empowered to create their own software to solve their own problems.

I don’t think that we’re close to automating software engineering. But what if we simply didn’t need software engineers?

I envision a world where AI tools & no code tools enable normies to create, normware, software created by anyone to solve their own problems. To some extent, it’s been happening for decades. But now it’s possible to easily make good normware.

First, let’s start at the beginning.

Are You Valuable If You Don’t Write Code?

There are a lot of job disciplines that write code but aren’t software engineers. The first non-engineer coder I worked with was a computational linguist. He was a linguist, definitely not a software engineer, but he did write a lot of Python. I also worked with a game artist, who was clearly an artist, but whose medium was code (and visual design). I’ve since worked with dozens of professions that are comfortable writing code but aren’t software engineers.

At Amazon, I was introduced to the idea that principal software engineers don’t write any code. Principal engineers would publically lament that they write so little code that they likely aren’t even capable of writing production code.

That made me think a lot. I had always defined myself by my ability to write code. But I was a senior software engineer, and it appeared that all paths upward involved not writing code. It shook me.

Today, I’m seeing a similar phenomenon with AI coding. Software engineers of all levels are wrestling with what their value is. If it’s not writing code, what do we do?

Why Hire A Software Engineer?

Without talking directly about all the things a software engineer does, why are we hired in the first place?

  1. Big Projects — I’ve seen business folk prototype a solution, but eventually get to the point where further development is too difficult to do themselves. Sofware engineers bring in things like design patterns and unit tests that allow projects to grow even larger.
  2. Distribution — Similarly, after prototyping a solution, how can people use it? e.g. web app, Excel spreadsheet on a SharePoint server, mobile app.
  3. Scale — How many users? Any top level internet site, like Google or Facebook, has gotten huge and it’s very complicated to operate at this scale. They hire legions of software engineers and publically declare that their success hinges on the quality of engineers they hire.
  4. Maintenance — Software engineers have lore about how software is alive. An untouched application will eventually inexplicably break. In fact, some engineers will brag that a piece of software they wrote has run untouched in production for 20 years. That brag only carries weight because it’s unusual, most software inexplicably breaks without maintenance.
  5. Safety — In some cases, often real-time embedded devices, human health & safety is at risk and software engineers are hired to take responsibility for the quality and integrity.

None of those things are inherently about writing code. So why do I think software engineering is disappearing?

An Experiment: Storymode

Over the holidays I hacked together a web app that I dubbed storymode. I wanted Claude to write stories & convert them into audiobooks to occupy my kids for multiple 10 hour road trips.

The catch: I didn’t write it. I made a rule for myself, I used Cursor’s Composer with the new Agent Mode. Not only did I not write code, I didn’t even read it. I blindly accepted every change without review.

And it worked! With only a couple exceptions, I was able to write an entire web app that solved my problem, and did so soley in English prose. A couple of times I ran into doom loops and had to manually intervene. But that’s happening less and less as models & dev tools get smarter.

Normware: Solve Your Own Problem

We’re clearly at a juncture with software and software engineering. The debate is where we’re going. It’s hard to imagine that software will remain unchanged in five years. I think the production of software will move toward it’s users.

Why? Simple economics.

I call this normware, the software that normies write to solve their own problems. Think about how good dev tools are in relation to, say, the software used submit expense reports. Dev tools are good because they’re made by the same people who experience the problem. On the other extreme, in enterprise software the buyers & users are different people; it’s notorious for being difficult to use.

Minimize the distance between developer & user. The bigger the distance, the harder it will be to get right.

Normware is that, but taken to the limit. It’s the minimum possible distance between developer and user. For everything.

Hybrid Roles Reduce Communication Overhead

If you’ve watched software engineering develop over the decades, you’ll notice that this pattern has already been playing out on repeat. Group A and group b have diverging skill sets and a lot of time & energy is wasted by having them miscommunicate, so instead we’ll create group AB that has both skill sets.

Reducing distance increases effectiveness:

  • Dev Ops = development + operations
  • Full stack = frontend + backend dev
  • ML engineers = data scientist + engineer
  • DataOps = data engineer + operations
  • DevSecOps = development + security + operations
  • Tech evangelist = developer + marketer
  • Legal technologist = lawyer + software developer
  • Game artist = game developer + artist
  • Computational linguist = linguist + developer
  • Bioinformatics engineer = biologist + data scientist

The list gets longer as the years go by, because it is desireable to reduce communication overhead. Yes, communication is good, but even better is not needing to communicate. And someone who can think clearly in two domains is infinitely more useful than two people that can’t communicate.

The obvious downside is that these hybrid roles tend to be not quite as proficient at either of the pre-combined roles, but the business prefers them anyway because they deliver the right thing at a much higher rate.

The Curse of Scale (The Problem Solved By Normware)

My hottest take of all is that product managers are neither group A nor group B. Maybe they started as a software engineer, but when they become a PM they’re disallowed from writing software. Or maybe they were a subject matter expert, but when they become a PM, they’re disallowed from continuing to practice their expertise, so skill atrophies.

Worst of all, product managers are pushed by upper management to make big ideas that push the product into the market leader position. But in pursuit of big ideas, they lose touch with the original mission.

This is an artifact of scale.

When software becomes a product, it needs users. Prior to being a product, it simply had users. As it continues to live as a product, it needs growth in users. To achieve growth, it needs to cater to new groups of users, and in the process it caters fewer of the original users, because it’s difficult to serve a diverse user base.

Cory Doctorow’s enshittification is a special case of this effect that addresses what happens to a two-sided marketplace as it scales.

Scale always creates problems. In distributed systems, cellular architecture was created to artificially reduce the scale of traffic. Engineers at AWS created the architecture after realizing that new problems always continue to emerge as scale increases.

Similar to cellular architecture, normware constrains scale, which makes it fundamentally simpler for new solutions to emerge and thrive.

Normware Doesn’t Need Software Engineers

Normware is inherently a lot smaller and simpler, and thus can largely avoid the overhead caused by scale.

  1. Small Projects — They solve only one problem at a time, and rewrite when it gets complex.
  2. Distribution — Little to no distribution infrastructure, e.g. maybe they run a webapp on their laptop or use a no-code platform.
  3. Small scale — They make it for their friends or immediate team.
  4. Maintenance — Still needs to be maintained, but it’s small so AI tooling is a viable option.

Normware doesn’t need software engineers. AI dev tools may be good enough for normies to solve their own problems and maintain their own solutions. After storymode, I’m convinced of this.

Normware Dev Tooling

What kinds of software do normies use to create normware?

  • MS Excel — The classic normware. Accountants and business people have created spreadsheets to solve their own problems for years. The UI isn’t great, so I think this will steadily be replaced by other options
  • Cursor & Windsurf — Code editors with heavy AI enablement. It’s more formidable to get started, but there’s virtually no upper bound to what you can make.
  • UIPath & RPA software — These tools let you directly automate a mouse-driven point-and-click workflow on your computer. From what I hear, UIPath is investing heavily in AI & computer vision. Claude’s computer use tools will give UIPath stiff competition, and lots of other options will soon emerge.
  • Custom GPTs & MS Copilot — A great way to integrate data sources into your workflow. These are essential no-code AI tools for making your data very useful to others.

It’s not necessarily about replicating what software engineers do, it’s just about solving your own problems.

Normware Will Become Dominant

My hot take here is NOT that businesses want their reserachers, accountants, lawyers, etc. solving their own problems. Businesses always have wanted that. I can name more than 5 teams I’ve worked on that were started by a prototype made by a normie & a spreadsheet. Thus far normware only rationalized the need for increased software engineering investment.

The hot take is that normware can be developed & maintained by normies.

While working on storymode it struck me that anyone can do this. The main reason why they don’t, is because they don’t know they can. It’s just an education problem.

Software Engineers Will Still Have Jobs

I do think software engineers will be able to weather this fine.

  1. Societal change is slow, you have years to redefine yourself
  2. Normware isn’t always appropriate, e.g. when health & safety is at risk
  3. Existing software will always need to be maintained (e.g. there’s still 800B lines of COBOL in use today).
  4. Normware depends on non-normware software platforms like Cursor or MS Excel

But regardless, you need to expand yourself.

Traditionally, software engineers formed a tight knit pod and isolated from the rest of the business. We had enough jargon and inside jokes that we could sustain our own parallel culture. That’s unlikely to continue.

Make friends with the sales guys and generally branch out.

Explainer: Latent Space Experts

2024-12-24 08:00:00

A new paper just dropped from Google DeepMind, Deliberation in Latent Space via Differentiable Cache Augmentation. I don’t think this paper is very readable, but it also seems quite important so I wanted to take a moment to break it down, as I understand it.

In this paper, they take a normal, frozen LLM that acts as a generalist. Then they attach a coprocessor LLM that acts as an “expert” in a specific domain. The coprocessor expert talks to the generalist LLM by adding extra embeddings.

You could take a reasoning model (like o3) that’s just good at making logical deductions and combine it with a coprocessor model that’s an expert in biomed. Together, they’d become a team of a PhD-level generalist reasoner and a PhD-level biomed expert that could pair up and tackle tough challenges, like designing a new drug. The expert hasn’t been trained to do o1/o3 style reasoning, but they have a tremendous bank of knowledge of not just facts but also procedural knowledge (“how” to do something).

Wait, Isn’t This Just RAG?

This does have a lot of overlap with RAG.

In RAG, you use an embedding model, which is also an LLM that supplies embeddings rather than mapping it to a token, same as this coprocessor model. In fact, they often recommend using domain-specific embedding models for RAG.

The main difference is that RAG integrates in input text, whereas the knowledge supplied by the coprocessor is trained into the model. So a coprocessor is a lot more expensive to create & manage, but it provides much higher quality input than RAG does.

Latent Space vs Text

The hot topic of the month, as far as I can tell, is latent space vs text in LLMs. The discussion is all about using the LLM’s internal representation (embeddings or hidden layer outputs) vs converting that back into text.

I have a loose understanding that latent space is a lot more information dense than text. When I think about that, I see that English really sucks at communicating clearly. So many unfortunate ambiguities. So in that sense, anything else seems better. But when I think about how latent space would be better, I have little to no comprehension of what latent space really is, what it’s communicating, or what the downsides are.

The pursuit of latent space feels a lot like magical thinking. It may very well be that it’s 100% as good as the claims. It just doesn’t sit well with me that I don’t understand why latent space is good, I only understand why text is bad.

Fundamentally, the advantage is that the symbiosis betweent the coprocessor & generalist LLMs is that they’re optimized together using machine learning. By using thousands of examples, they’re able to optimize the information transfer between the two models. Whereas, embedding models are optimized completely independently, and for far more rudimentary tasks (like similarity, clustering, etc.)

How Will This be Used?

If this approach takes off, I think it’ll be used in conjunction with RAG.

LLMs will become smaller and always trained to do o1-style reasoning. Expert coprocessors will be trained for every domain (e.g. biomed, material science, astronomy, poetry, etc.) and attached at runtime. At first, you’ll manually select which expert is needed, but over time that will be automatically selected as well.

There might even become a marketplace for coprocessor experts. This could really take off if the act of adapting a coprocessor to a generalist LLM was as simple as training a LoRA.

Also RAG is not dead. RAG will never die, because RAG is just a database and you simply can’t provide real-time fresh data cheaper and more effectively than a database. But these latent space experts will help cover over a lot of the problems with RAG. This seems like it could be a net good thing.

AI Engineering Primer

2024-12-19 08:00:00

How do you get up to speed with AI engineering? Unfortunately, I don’t know of any good consolidated resources, so I’m going to attempt to make one here. My first attempt at this focused more on what an AI engineer is and made only a feeble attempt at providing resources to get started. Let’s go!

The reason it’s difficult is that AI Engineering is so new, it’s bleeding edge. People still scoff at the idea that it’s even a title that someone can hold. It’s moving so fast that 3 months is roughly equivalent to a decade, so any resources that might exist become obsolete within a few months.

Things to Avoid

Avoid: LangChain

LangChain is used pervasively in tutorials. They usually are one of the first to implement a new prompting technique right after the paper comes out. However, nobody I know uses it in production. Many attempt to, but then replace it with either a langchain competitor or a write their own code.

Instead:

  • Hand-roll (has it’s own problems, but sometimes it’s easier than getting burnt repeatedly by solutions that almost work)
  • LlamaIndex — direct langchain competitor
  • griptape — direct langchain competitor, focused on DAG workflows & tools
  • Haystack — oriented toward search, it’s more than a bare vector store
  • DSPy — focused on automatic prompt optimization
  • gradio — prototype apps quickly
  • Vendor SDKs from Cohere, OpenAI and Anthropic are sometimes quite powerful.

There’s a very long list of other good options, both open source & proprietary. The reason LangChain doesn’t work is that the code isn’t structured well. It works seamlessly until you run into a case that they didn’t explicitly plan for. Experienced software engineers would say that LangChain doesn’t “compose well”.

Avoid: Prompt Influencers

There’s no shortage of people on LinkedIn or X that are hawking “one weird trick”, the magic prompt, or in one way or another trying to convince you that there are special words or phrases that magically make an LLM do your bidding. If it sounds like a salesman trying to sell you something, it’s definitely a salesman trying to sell you something. In fact, they’re almost always the sales type, and very rarely have any sort of engineering experience. Avoid.

Avoid: Traditional ML People

This is a contentious topic, I’ve writen about it. They can be an asset, but beware of blindly taking advice from people who have been deep into traditional pre-LLM machine learning.

Boring Advice

Advice: Use LLMs A Lot

They’re both amazingly intelligent and unexpectedly dumb. The only real way to know what you’re dealing with is to use them a lot, for everything. Yes, you do need to get burnt. Just do it in a way that doesn’t matter too much. The goal here is to develop an instinct. You should be able to tell yourself, “if I do X it’ll probably go poorly, but if I rephrase it as Y then I can be confident in what it says”.

Advice: Basic Design Patterns

You should know RAG inside & out. Chain of Thought (CoT), and the ReAct pattern. Skim the rest of this post for more leads.

Advice: Buy Apple Silicon

Better yet, get a gaming laptop with an NVIDIA graphics card and Linux. But if not, get a Macbook M1, M2, M3, etc. series. The main memory & GPU memory is all the same, shared, so you can rock some surprisingly big models, all local.

I’m a big advocate of local LLMs, especially for AI engineers. They’re worse than the big SOTA models, which means you learn the sharp edges faster; learn to properly distrust an LLM. Plus, you can send logs with passwords to a local model, but it’s highly unwise to send passwords to OpenAI, Anthropic, or any computer that isn’t your own.

Topics

Here are several large areas to learn about. Not all of them will be important to you.

Topic: New Models

As new models are released, their capabilities increase. As an AI engineer, it’s crucial you stay on top of this. You should know about the pre-training scaling laws that have brought LLMs into the public’s eye.

Ways that models improve:

  • Benchmarks — MMLU, GSM8, HellaSwag, HumanEval, etc. There’s tons of these and they’re always improving and you also shouldn’t trust them. They’re easily gamed. Yet you also have to pay attention and know what they mean. The open LLM leaderboard has a lot of good info.
  • Context width — The size of the input. As this improves, RAG becomes easier. But LLMs also get worse at recall with bigger context, so it’s not a slam dunk.
  • Reasoning — Models like o1 do CoT natively without prompting to achieve better reasoning scores.
  • Model size — measured in number of parameters. 13B = 13 billion parameters. Bigger models are generally more capable, but smaller models are faster. When you consider TTC, smaller is smarter.
  • Modalities — Beyond text, being able to take or emit other modalities like image, video, audio, etc. can be a game changer. As of today, Google seems to be leading with Gemini 2.0
  • APIs — Occasionally new APIs & features enable wildly new things. e.g. Anthropic’s prompt caching enabled the Contextual Retrieval pattern for embeddings.

Most of this shows up in blog announcements from the AI labs and announced on X.

Topic: New Patterns

AI Engineering is still being figured out. If you go back far enough in programming history, languages didn’t even have control structures like if/then or for loops. It took time to figure that stuff out. We’re in a similar spot with AI engineering, where the patterns are still emerging.

Check out Prompting Guide for a comprehensive list of current patterns. Also subscribe to Latent Space and read Simon Willison to keep up to date.

Topic: Infrastructure

Outside of the AI labs, you may want to watch some providers:

Additionally, pay attention to vector stores:

  • Pinecone
  • Qdrant
  • pgvector — Postgres extension to treat it as just another SQL index on any table rather than a standalone database. This is a winning strategy, your SQL DB probably already has something like this. Use it.
  • Redis — Classic NoSQL database. Watch this, though, because it’s creator, antirez has been talking about some wildly different ideas where the index is more of a plain data structure. This might be the key to enabling a lot more patterns, like clustering. Watch antirez’ work for updates.

Also, look into edge compute. Ollama for personal computers, vLLM for Linux servers, but also pay attention to work being done to run LLMs on IoT devices and phones.

Topic: Model Development & Optimization

Generally, do not do this unless you know you need to. It’s often tempting to try to fine tune, but it’s usually a red herring.

Topics:

Topic: Evaluation & Testing

This is quickly evolving and there’s unfortunately not much here.

Topics

Topic: Test Time Compute (TTC)

As I’m writing, this is a hot topic. The train time scaling laws seem to be fading and the new promising area is having models “think” longer during inference (see o1). This also seems to be a significant key to agents.

Generally follow any of the sources below. The information is spread out.

Topic: Agents

There’s two kinds of perspectives here:

  1. “Agent” is anything that uses tools
  2. “Agent” is autonomous and interacts with the world

The former isn’t very interesting, it’s just the ReAct pattern. The latter is an area of active research. Within agents you have topics like:

In my experience, present agents are like riding a unicycle. It’s possible to make them work, but it takes a lot of experience to not fall off. The main blocker to having them rolled out more broadly is reasoning & planning. I think Test Time Compute (TTC) might be part of the puzzle, others are betting on world models. In reality, it’s going to be a bit of everything; the whole field needs to evolve.

Sources

Primers

  • Prompting Guide — Exhaustive coverage of individual topics. All prompting. Very useful for any AI engineer.
  • Hugging Face docs — More oriented toward training new models

The AI Labs’s documentation often also has good primers:

Courses

AI Labs

People to Watch

News Venues & Newsletters

Github

This is a new one for me, but some highly recommend following people on Github first and then maybe follow individual repos. It’s far better to follow people, because then you learn about new repos. Whereas following repos gets noisy very fast, so only do that when you want to keep close tabs. Look for new repos, new ideas, and new trends.

See People to Watch for Github links.

HuggingFace

[HuggingFace][(https://huggingface.co/) is like “Github for AI/ML models”. Typically, the code for the model is kept in Github and the model artifacts are hosted in HuggingFace. The transformers library makes it very easy to download models off HuggingFace and run them, or fine-tune, or disassemble and use just the tokenizer, or steal the attention layers from an LLM to fine-tune an embedding model, etc.

Also, HuggingFace offers inference. So you can host model inference there. For example, the Open LLM Leaderboard is hosted there, so it’s also not limited to just model inference.

Additionally, a lot of papers are posted to HuggingFace (sometimes instead of arXiv). There seems to be a social networking aspect to it, where you can comment on papers, follow authors, etc. It’s safe to say that HuggingFace is a core part of the AI ecosystem. While it’s not an AI lab in the traditional sense, it’s in many ways just as critical to AI development, maybe more so.

Discussion

If I forgot something contact me, or else use the Github repo for this blog to create an issue or PR. Or add to one of the discussion links.

Is ML Experience A Liability For AI Engineering?

2024-12-10 08:00:00

Yesterday I posted here about becoming an AI Engineer and made a statement that prior ML experience is often a liability for transitioning into AI engineering. That turned out to be quite the hot take! In this post I’ll incorporate feedback and try to expand that into a more balanced take. I’ll expand on the perspective of it being an asset, as well as where it’s a liability.

First of all, the responses were very polarized, either enthusiastic agreement or disagreement (no in between, so you know it was extra spicy). That seems like a strong indicator that it’s a difference between archetypes.

My hunch is that those who disagreed with my statement tend to align with the “researcher” archetype.

  • Data pipeline archetype — “the data is most important, improve the data/indexes/storage/etc.”
  • UX archetype — “the user & AI collaborate to create intelligence, improve the fluency that they can collaborate”
  • Researcher archetype — “the model/algorithms are most important, improve the model”

The researcher arechetype is probably poorly named, although I’m not sure what a better name is. They’re model-centric.

Why it’s a liability

I originally formed that opinion back in 2022 about a week or two after trying to build on top of LLMs for the first time. I was talking to a data scientist (who I’m close with both personally and professionally) about how to incorporate LLMs. I recall there being a ton of friction in those initial conversations, which led me to state something overly dramatic like, “I think data science is going to be dead post-LLM”.

Since then, I’ve had a lot of people independently validate that opinion. One take I’ve heard went something like this:

ML people think their job is to produce a model, whereas (pure) engineering folk do not, which leads engineers to view fine-tuning as an optimization that’s often premature.

I’ve also used the argument that ML folk view Occam’s Razor to mean that they should produce the simplest (smallest) possible model first and increase the model complexity as needed, whereas engineers tend think Occam’s Razor means they should start with the approach that’s most likely to work easily (the biggest, baddest LLM available) and work downward toward more efficient models to optimize costs.

I’ve talked to hiring managers who explicitly seek “Please No ML Experience”. In their words, they’ve seen ML people push their org into spending tens or hundreds of thousands of dollars fine tuning models. Those projects fail at an unfortunately high rate and deliver slowly. Whereas simply prompting better will often get you close enough to launch (and therefore mitigate project risk).

Why it’s an asset

Rahul Dave posted on Bluesky that it’s sometimes difficult to know when you need to fine tune, and he found that his prior ML experience was critical in identifying that situation.

That’s a very good point. Organizationally, the act of identifying that a threshold has been crossed is very difficult. Historically in my engineering experience it’ll show up as

We built component X to solve problem Y. But 3 months ago problem Y disappeared due to a change in management/customers/business and now component X only causes people friction. We’re stuck with it forever because nobody realized that the original problem it solved is now gone.

One of the big ways a staff+ engineer contributes is to identify and explain change. With LLM apps, it often takes ML intuition to be able to correctly identify the situation where performance isn’t good enough (and therefore a huge architectural change is needed).

Vicki Boykis took another tack, arguing that the non-determinism of LLMs is unfamiliar to software engineers:

I think software engineering in general favors things like: unit tests where you get the same input and same output, a for loop n times will only loop through n times, type checking (in most languages 😅) confer correctness etc. LLMs are none of that, and lossy compression to boot.

Her experience is that, for this reason, ML people have an easier time transitioning into AI engineering. I personally think some engineers, e.g. distributed systems background, are already adept at dealing with non-determinism, so this isn’t much of a hurdle for them. But she’s correct, this is a huge hurdle for a lot of engineers. If you’re a hiring manager, you should probably address non-determinism in the interview.

Conclusion

If you have too much ML experience, your organization will definitely fine tune models and it will cost a lot of money. If you have too little, you won’t fine tune any models and you’ll be leaving performance on the table.

Fine tuning historically has a much riskier track record, which leads a lot of people to recommend against fine tuning. However, it might be wise to include a staff+ engineer with ML experience on your team so they can identify when your team needs to transition into the researcher archetype.

How Can I Be An AI Engineer?

2024-12-09 08:00:00

You want to be an AI Engineer? Do you even have the right skills? What do they do? All great questions. I’ve had this same conversation several times, so I figured it would be best to write it down. Here I answer all those, and break down the job into archetypes that should help you understand how you’ll contribute.

What is it?

An AI engineer is a specialized software engineer that integrates GenAI models into applications. It can involve training or fine-tuning LLMs, but it often does not. It can involve working on low-level harnesses, like llama.cpp or vLLM, but it often does not.

More often AI engineering involves building UIs, APIs, and data pipelines. It can look wildly different from job to job. The common thread is that you send prompts to an LLM or image model, e.g. via OpenAI’s API, and use the result in an application somehow.

Am I a good fit?

You’ll be a great AI engineer if:

  1. You’re a software engineer
  2. You have breadth (broad knowledge of a lot of domains)

Seriously, you don’t typically need to have AI experience, it’s a new field so not many people actually have prior experience. It’s tempting to think machine learning (ML) expierience is helpful, but it’s actually often more of a liability[1] to approach problems like a data scientist does.

Here are a few archetypes of AI engineers distinguished by how they look at problems. You’ll likely know which archetype you are based on what you already do.

The Data Pipeline Archetype

An extension of a data engineer, this archetype is most likely to use RAG architecture to build AI applications using company databases or knowledge banks. When asked, “how can I make this better?”, your answer is to improve the quality of the data, or how it’s indexed, or the model used to index it, etc. All problems center around the data.

This archetype should have a thorough understanding of RAG architecture and embeddings, holds strong opinions about vector databases vs just using a vector index, and maybe can diagram out how the HNSW algorithm works on the back of a bar napkin.

The UX Archetype

This arechetype of AI engineer views “intelligence” as an inseperable collaboration between human & AI. They aren’t necessarily a UX designer or frontend engineer, but you typically can’t live as this archetype without slinging a fair bit of React code.

If you’re living this archetype, you might work with the Data Pipeline Archetype, or even also be one. But when it comes to, “how can I make this app better”, your answer is typically “tighter collaboration with the user”. You work to improve the quality of information you glean from the user, or use AI to improve the user’s experience with the app or the value they get out of it.

You might be a UX Archetype if you admire ChatGPT, Cursor, or NotebookLM for how they helped us reimagine how we can use LLMs. You probably get excited about new LLMs that are faster or lower latency, multimodal, or new modalities.

The Researcher Archetype

The Researcher Archetype isn’t necessarily a researcher, but they’re focused on the models and algorithms. When asked, “how can I make this app better”, their answer is about algorithms, new models, more compute, etc.

The Researcher Archetype is most likely to fine-tune their own model. To be successful as this archetype, you need to spend a lot of time keeping track of AI news on X/Bluesky/Reddit. The AI space moves fast, but as this archetype especially, you ride the bleeding edge, so it takes extra effort to keep pace. Make time to read 1-5 papers per week, and become adept at using NotebookLM.

Also, hack a lot in your spare time. You should definitely be running models locally (e.g. via Ollama). You should be comfortable running pytorch models via the Transformers library in a Jupyter notebook. You’re eyes probably light up every time SmolLM is in the news. And you may have a desktop with a RTX 3060 (and not for gaming).

Other Archetypes

There’s probably several others. For example, I have a poorly-understood concept of an “artist” archetype, that uses AI to create something beautiful. Maybe more for safety, philosophy, and others. The ones outlined above are what you’re most likely to be hired for.

How is AI Engineering different from Software Engineering?

For the most part, AI & Software engineering are the same. The main difference is how fast the AI field moves. Because of this, you have to be extra okay with throwing out all your work from time to time. For example, if a new framework comes out and you rewrite everything in DSPy.

(By the way, you should really checkout DSPy 🔥)

Another thing is management. I keep thinking about how using AI as a tool in your work feels a lot like management, or at least being your own tech lead. I’m not sure we’ve properly equipped most engineers with the right skills, but if you thrive in the next few years, you’ll be well set up to go into management, if that’s your thing.

How do I get started?

You’re already a solid engineer, so you’re most of the way there already. The other part is getting your continuing education setup.

I personally am not a fan of courses. There’s an absolute ton of them out there, but I believe that the mere fact that a course has to be prepared in advance and delivered many times in order to make money, that kinda implies the material is going to be a bit stale since AI moves so fast.

My recommendations:

  1. Subscribe to The Rundown — it’s mostly business & product releases, table stakes imo.
  2. Read everything Simon Wilison writes. He’s basically the godfather of AI Engineering, and everything he writes is intensely practical.

Data archetypes should check out episode S2E16 from the How AI Is Built podcast. It goes into detail on trategies for improving the quality of the source data.

All archetypes should probably have a solid social media source. I think 🦋 Bluesky is the best, it has starter packs to get you zeroed into the right group very quickly. I know X has a lot of great chatter, but it’s extremely noisy, so it’s hard to recommend. Feel free to scrape my account for followers.

That’s it! I hope that helps.

Footnotes

  • [1] “prior ML experience is a liability” turned out to be quite a controversial statement. I’ve followed it up with a new post expanding on the pros and cons of prior ML experience.

Discussion