2025-04-19 08:00:00
I’ve been thinking recently about what sets apart my coworkers who’ve done the best work.
You might think that the main thing that makes people really effective at research or engineering is technical ability, and among the general population that’s true. Among my Anthropic coworkers, though, we’ve restricted the range by screening for extremely high-percentile technical ability, so the remaining differences, while they still matter, aren’t quite as critical. Instead, people’s biggest bottleneck eventually becomes their ability to get leverage—i.e., to find and execute work that has a big impact-per-hour multiplier.
For example, here are some types of work at Anthropic that tend to have high impact-per-hour, or a high impact-per-hour ceiling when done well (of course this list is extremely non-exhaustive!):
I think of finding high-leverage work as having two interrelated components:
Without taste, you’re likely to work toward the wrong thing. Without agency, even if you work toward the right thing, you’re likely to get nowhere.
How can someone improve at these? Mostly, I think, by practicing: there’s no substitute for good feedback loops with reality. So maybe the most important takeaway here is that you have permission to try to exercise good taste and high agency! But below are some more specific pieces of advice that have helped me improve on these dimensions.
One of the easiest ways to get more leverage is to take a goal you’re already trying to accomplish, and figure out a better way to accomplish the same thing. (For example: changing your experiment design to be more directly relevant to the high-level question; finding an 80/20 way of building a tool; just deciding not to do something entirely; etc.) In order to do this, you need to keep careful track of the high-level context and priorities that your specific project is aimed at, so that you can understand and judge the trade-offs.
Obviously, whoever’s supervising your project will make an attempt to point you at the most effective way of achieving the goal that they can think of. But finding the most effective way of doing something takes a lot of time and attention, they probably have a lot of other things to do, and they also have much less detailed context than you, so they’re liable to miss things!
For example, when I did a work trial at Anthropic in late 2022, I was assigned the task of building a custom scheduler for our compute cluster to fix some unfortunate behaviors in the default scheduler that were making researchers’ lives hard. But after seeing the team work for a week and thinking about the system design, I started to worry that this project would end up imposing a high maintenance burden on the team that owned the cluster, which was already overloaded, and I ended up suggesting that we postpone it. It turned out that second-order effect would have made this project net-negative for the root goal of “keep the clusters working well!”
A counterintuitive fact about the highest-leverage projects is that they’re often not obviously high-leverage to most people in advance (because if they were, they would already have been done a long time ago). That means people are often skeptical of the value of pursuing them.
That skepticism can be for a couple reasons (in the real world, most projects have some mix of both going on):
A personal example: shortly after I joined Anthropic’s inference team, a couple folks on the team proposed completely rewriting our inference service codebase, because they felt the current version was overcomplicated. I was somewhat skeptical of this, based on the standard software engineering heuristic of “most rewrites fail and most people are over-optimistic about how long they will take,” although I didn’t push back very hard because I knew I didn’t have much context. It turned out that the rewrite took about a week, was an immediate integer-multiple efficiency improvement, and made it much easier to make further efficiency improvements. I’m glad that people didn’t listen to my skepticism!
One tactic for addressing this dynamic is that instead of asking people for approval to go do something, you can just tell them what you intend to do (implicitly giving them space to object or course-correct if they feel strongly).
Of course, the same “high leverage projects are non-obvious” phenomenon also means that some of your self-driven high-impact project attempts will probably fail, because you misjudged their impact or difficulty. Because of this, you should think of them as bets that might or might not pay off, and—at least until you’ve proven your ability to make good bets over fairly long time horizons—it’s best to keep these as part of a portfolio with low-risk projects as well, to avoid the situation where you bite off a single high-risk project, it doesn’t pan out, and people lose trust in your ability to pick your own projects.
A common trait of high-agency people is that they take accountability for achieving a goal, not just doing some work.
There’s a huge difference between the following two operating modes:
Mode 1 makes you a leaky abstraction—if your project is critical, someone else needs to be constantly monitoring it and figuring out how to resolve blockers. My concept handle for being in mode 2 is “making success inevitable,” because “inevitable” is the bar where other people can stop spending substantial fractions of mental energy on worrying about the project.
People who can be trusted to make something inevitable are really rare, and are typically the bottleneck for how many different things a team or company can do at once. So if someone else is responsible for making your project inevitable, you’re consuming some of that scarce resource; if you’re the one making your own projects inevitable, you’re a producer of that resource, and you’re helping unblock a key constraint for your team.
(Of course in practice you will never achieve full inevitability, but getting closer to it still makes a huge difference!)
I wrote above that you should expect high-impact projects to be non-obvious because if they were obvious, they’d have been done already. This points to another interesting dynamic, which is that it’s quite rare for different people’s “zone of best taste” to overlap very much. Instead, the quality of most people’s taste is highly idiosyncratic and area-specific, where areas can be as localized as e.g. “language model personality design” or “what blog post titles will get upvoted on Hacker News” (one of my own less-useful areas of good taste…)
For this reason, an important signal to keep track of is: where is your taste the best?
I’ve noticed a lot of people underestimate their own taste, because they expect having good taste to feel like being very smart or competent or good at things. Unfortunately, I am here to tell you that, at least if you are similar to me, you will never feel smart, competent, or good at things; instead, you will just start feeling more and more like everyone else mysteriously sucks at them.
For this reason, the prompt I suggest here is: what does it seem like everyone else is mysteriously bad at? That’s probably a sign that you have good taste there.
It’s okay if this prompt doesn’t immediately yield anything; you might just be on a team with a lot of really good people, where you don’t have a really unique angle on anything yet. Even so, a weakened version of this question—where does it seem like the most people are being the least competent?—is a useful gradient signal!
One way of thinking about taste is that it’s about the quality of your predictive models and search heuristics. If I design the experiment this way, what will I find? If I design the tool this way, how easy will it be to use? If I write the doc this way, how much will it resonate with people?
Doing enough search and prediction to come up with great ideas takes time. The first domain that I got some degree of taste in was software design, and I remember a pretty clear phase transition where I gained the ability to improve my designs by thinking harder about them. After that point, I spent a lot of time iterating on many different design improvements—most of which I never implemented because I couldn’t come up with something I was happy enough with, but a few of which turned into major wins.
The easiest way to improve at this is just to try it! Whenever you’re debating what to do, explicitly ask yourself “what do I predict will happen if I choose option A?” and try to unroll the trajectory. Even if you think you’re already intuitively predicting the results of your choices, I’ve found it helps surprisingly much to be explicit—one of my manager role models asks me this (“what do you think will happen?”) every time I ask him for advice and it’s kind of silly how often it helps me realize something new. (For bonus points, revisit your predictions afterwards.)
Things that I’ve found benefit from a lot of thinking time:
(See also: Think real hard. Although if you’re someone whose natural failure mode is to overthink things, it’s possible you should reverse this advice.)
If taste is about the quality of your predictive models and search heuristics, it’s important to wring out every possible update to these from the data that you get.
For that reason, many of the most effective people I’ve worked with also do the most metacognition, i.e., reflecting on their own (and their team’s) work and thought processes, and figuring out how to improve them. They’re often the people who are most likely to identify improvements to our processes or mental models—things like:
Often, they don’t just do their own metacognition but also help drive “group metacognition” by sharing these reflections with the team, scheduling retrospectives, etc. Even if each of the lessons here is small individually, they compound over time to help people and teams become much more effective.
My best habit for encouraging myself to metacogitate more is a weekly review. The format and typical outcomes of my reviews have evolved a lot since I wrote that, but I still do them and find them extremely valuable for helping me improve at whatever I’m currently focused on improving at!
(See also / further reading: Chris Olah’s Research Taste Exercises)
2025-04-01 08:00:00
This post was adapted from an internal doc I wrote at Wave.
Welcome to being a manager! Your time-management problem just got a lot harder.
As an IC, you can often get away with a very simple time-management strategy:
As a team lead, this isn’t going to work, because much more of your work is interrupt-driven or gets blocked for long periods of time. One-on-ones! Code reviews! Design reviews! Prioritization meetings! Project check-ins! All of these are subject to an external schedule rather than being the type of thing that you can push on in a single focused block until it’s done.
Being a team lead means three big changes for your time management:
You no longer have a single most important thing. You’ll have to learn how to juggle competing priorities.
Your most important responsibility is for your team’s output, not your personal output. That means that individual engineering work goes last in your list of potential most important things.
You’ll need to start spending some of your time on a manager’s schedule:
There are two types of schedule, which I’ll call the manager’s schedule and the maker’s schedule. The manager’s schedule is for bosses. It’s embodied in the traditional appointment book, with each day cut into one hour intervals. You can block off several hours for a single task if you need to, but by default you change what you’re doing every hour.
Most powerful people are on the manager’s schedule. It’s the schedule of command. But there’s another way of using time that’s common among people who make things, like programmers and writers. They generally prefer to use time in units of half a day at least. You can’t write or program well in units of an hour. That’s barely enough time to get started.
Here’s some advice on how to cope with those changes.
Your responsibilities to your team will take time, and even more importantly, attention. That means you’ll be a lot less productive on IC work than you have been in the past—especially at first while you’re finding your legs. Additionally, your time will be less predictable week-to-week as you might have to spend an unknown amount of time responding to “inbound” work.
For the first few months, you should treat any individual engineering work that you get done as a bonus. Even after that, you should expect to have something like 10-20% less individual output per engineer you manage, depending on how experienced you are, they are, etc., and with substantial week-to-week variance.
To mitigate this, my rule for myself has been to make sure that my IC work is important but not urgent—i.e. that nobody will be sad and no plans will be derailed if I end up having to spend the next week firefighting instead of pushing it forward.
For honing my intuitions about how much I can actually expect to accomplish, I’ve found time tracking very useful (see How time tracking helped me be a better manager and I apparently got 50% better at my job last month).
A corollary of the above is that it becomes very important for you to prioritize what to work on, both on an hour-to-hour cadence and on a larger timescale.
It’s not possible to write down a full algorithm for prioritizing in a blog post—that’s why they pay us the big bucks—but here are some heuristics for which things are most worth prioritizing:
One of the most important types of “work that increases your or your team’s future bandwidth” is delegating things. This is something entire books have been written about, but here’s how to avoid a few common delegation pitfalls for new team leads:
Negotiate how hands-on to be. Effective delegation means finding the right balance between micromanaging, and throwing your report to the wolves. The stereotype is that new managers often micromanage, but at Wave, I’ve noticed that new managers often err on the side of undermanaging, or being too hands-off, perhaps out of a desire to signal that they trust their reports.
If you’re unsure, it’s good to have an explicit conversation with the person you’re delegating to about how much support they want. E.g. “do you want to do the design for this feature yourself, have me do the high-level and fill in the details, or have me write the entire design doc?”
Calibrate to your team members’ task-relevant maturity, or how capable they are of independently doing a particular type of task. A senior engineer should be able to design most features independently (with review), but give the same design task to a junior engineer and they’ll probably flail around and make no progress. You should be maintaining a mental map of each of your team members’ strengths and weaknesses—and updating it over time as you help them improve.
Delegate ahead of future growth. Your team’s workload is going to increase over time, so even if you don’t feel like you have too much work to do right now, you probably will in the future, unless you currently feel underutilized. You should aim for a workload where in the steady state, you feel like you have some slack capacity.
Delegate “stretch projects” to help your team level up. Getting enough slack might require you to delegate work that no one else on your team can currently do. Take your mental strengths-and-weaknesses map, ask yourself what the most important growth directions for each of those team members are, and figure out how to give them work that stretches them in that direction. Note that these delegations will probably require more frequent monitoring, since your reports will have less task-relevant maturity on their stretch projects!
Despite your best efforts to follow the above advice, there will probably come a time when you feel very stressed about the amount of work on your plate. When that time comes, here’s what to do:
If you’re not careful, it’s easy to fill your entire calendar with meetings, Slack, etc. and have no time for deep work. With careful planning, you can avoid this by “batching” all your distractions to particular times of day.
There are lots of tactical tips for doing this; I catalogued some that work for me in Tools for keeping focused.
One tech-lead-specific one that I’ll add is batching meetings: I schedule all my meetings back-to-back on Tuesdays and Thursdays to leave the rest of the week as free as possible for deep work.
2025-03-16 08:00:00
My few most productive individual weeks at Anthropic have all been “crisis project management:” coordinating major, time-sensitive implementation or debugging efforts.
In a company like Anthropic, excellent project management is an extremely high-leverage skill, and not just during crises: our work has tons of moving parts with complex, non-obvious interdependencies and hard schedule constraints, which means organizing them is a huge job, and can save weeks of delays if done right. Although a lot of the examples here come from crisis projects, most of the principles here are also the way I try to run any project, just more-so.
I think excellent project management is also rarer than it needs to be. During the crisis projects I didn’t feel like I was doing anything particularly impressive; mostly it felt like I was putting in a lot of work but doing things that felt relatively straightforward. On the other hand, I often see other people miss chances to do those things, maybe for lack of having seen a good playbook.
So here’s an attempt to describe my playbook for when I’m being intense about project management.
(I’ve described what I did as “coordinating” above, but that’s underselling it a bit; it mattered a lot for this playbook that I had enough technical context, and organizational trust, to autonomously make most prioritization decisions about the project. Sometimes we instead try to have the trusted decisionmakers not be highly involved in managing execution, and instead farm that out to a lower-context or less-trusted project manager to save the trusted decisionmaker time, but IMO this is usually a false economy for projects where it’s critical that they be executed well.)
For each of the crisis management projects I completely cleared my schedule to focus on them, and ended up spending 6+ hours a day organizing them.
This is a bit unintuitive because I’m used to thinking of information processing as basically a free action. After all, you’re “just” moving info from place to place, not doing real work like coding, right? But if you add it all up—running meetings, pinging for updates, digesting Slack threads, pinging for updates again, thinking about what’s next, pinging for updates a third time, etc.—it’s surprisingly time-intensive.
Even more importantly than freeing up time, clearing my schedule made sure the project was the top idea in my mind. If I don’t do that, it’s easy for me to let projects “go on autopilot,” where I keep them running but don’t proactively make time to think through things like whether we should change goals, add or drop priorities, or do other “non-obvious” things.
For non-crisis projects, it’s often not tenable (or the right prioritization) to spend 6+ hours a day project-managing; but it’s still the case that you can improve execution a lot if you focus and make them a top priority, e.g. by carving out dedicated time every day to check statuses, contemplate priorities, broadcast updates, and so on.
A specific tool that I’ve found critical for staying oriented and updating quickly is a detailed plan for victory, i.e., a list of steps, as concrete as possible, that end with the goal being achieved.
The plan is important because whether or not we’re achieving the plan is the best way to figure out how well or badly things are going. Knowing how well or badly things are going is important because it tells me when to start asking for more support, cutting scope, escalating problems, and otherwise sounding more alarms. One of the most common megaproject failure modes is to not freak out soon enough, and having a concrete plan is the best antidote.
As a both positive and negative example of this, during a recent sprint to release a new implementation of a model, we took a detailed accounting of all the work we thought we had to do to launch.
As the above example shows, having a plan can’t completely save you if you underestimate how long all the steps in the plan will take. But it certainly helps! My sense is that the main things that would have helped even more in the above case were:
OODA stands for “observe, orient, decide, act”—in other words, the process by which you update your plans and behavior based on new information.
Most of the large projects I’ve worked on have been characterized by incomplete information:
In fact, I’d make a stronger claim: usually getting complete information was the hard part of the project, and took up a substantial fraction of the overall critical-path timeline.
For example, let’s take a recent project to kick off a training run. The critical path probably looked something like:
Practically all of these steps are about information-processing, not writing code! Even the step where the compute partner debugged the problems on their side was itself constrained by information processing speed, since there were tens of people working on the debugging effort and coordinating / sharing info between them was difficult. Overall, the project timeline was strongly constrained by how quickly information could round-trip from our compute partner’s large-scale debugging effort, through their tech lead, me, and Anthropic’s large-scale debugging effort.
This pattern generalizes to most projects I’ve been a part of, and as a result, one of my most productive project management habits is to try to run the fastest OODA loop that I can.
A few specific things that I’ve found help:
It’s not just enough for me personally to be running a fast OODA loop—in a large group, everyone needs to be autonomously making frequent, high-quality, local prioritization decisions, without needing a round-trip through me. To get there, they need to be ambiently aware of:
I’ve usually found that to create the right level of ambient awareness, I have to repeat the same things way more often than I intuitively expect. This is roughly the same “communicate uncomfortably much” principle above, but applied to broadcasts and not just 1:1 conversations with people.
For example, although the first team I managed at Anthropic started with a daily asynchronous standup, we found that synchronous meetings were much more effective for creating common knowledge and reorienting, so we moved to a twice-weekly synchronous standup, which probably qualified as “uncomfortably much” synchronous communication for Anthropic at the time.
Once a project gets over maybe 10 people, I can’t track everything myself in enough detail to project-manage the entire thing myself. At this point, it becomes critical to delegate.
Here I mean delegating the project management, not just the execution (that’s what I’d be delegating to the first 10 people). This is the point where I need other people to help split up the work, monitor and communicate progress, escalate blockers, etc.
A few things I try to keep in mind when delegating project management:
One of my favorite things to make delegation easier is to keep goals simple—if they can fit in a Slack message while still crisply describing a path to the desired end state, then the people working on the goal will be much more able to prioritize autonomously, and point their work at the real end goal rather than doing something that turns out to be useless for some reason they didn’t think about.
“Keep goals simple” doesn’t have to mean “do less”—the best way to keep goals simple is to find the latent structure that enables a clean recursive decomposition into subgoals. This often requires a deceptive amount of work—both cognitive and hands-on-keyboard—to identify the right intermediate goals, but I’ve found that it pays off immensely by clarifying what’s important to work on.
Some of my favorite memories of Anthropic are of helping out with these big projects. While they can be intense, it’s also really inspiring to see how our team comes together, and the feeling of being part of a big team of truly excellent people cooking something ambitious together can be really magical! So I try to enjoy the chaos :)
Here’s the internal doc I share with folks on my team who are getting into being responsible for large projects.
So you’re the DRI of a project (or part of one). Concretely, what do you do to “be DRI”?
This doc is my suggested “starter kit” answer to that question. The habits and rituals described here aren’t perfect for every situation, but they’re lightweight and broadly helpful. I suggest you use them as a starting point for iteration: try them out, then adjust as necessary. This is an SL init; the RL is your job :)
The goal is to help you do your job as DRI—
—without adding too much overhead:
(Note: being DRI will still unavoidably add some overhead—e.g. you’ll have to track what other people are doing, delegate work, unblock people, set and communicate goals, etc. The goal is specifically for the process/paperwork to be minimal.)
You should schedule at least one 30-minute weekly meeting with everyone working on the project.
The goal of this meeting is to (1) be a backstop for any coordination that needs to happen and didn’t happen asynchronously; (2) be an efficient way to create common knowledge of goals, updates, etc.; (3) help you track whether things are going well.
It’s really helpful for discoverability and wayfinding to have a single “master doc” with all the most important info about a project. As you loop more people in, they can read the doc to get up to speed. And anyone who thinks “I wonder how X is going” can stop by there to find out.
Create a doc for your workstream with:
If it’s part of a larger project, your doc should be nested within the larger project’s working doc.
If this ends up being too much for one doc, you can fork these out into sub-docs (esp. running notes and updates).
Thanks to Kelley Rivoire for many thoughtful comments on a draft!
2024-07-21 08:00:00
This is an adaptation of an internal doc I wrote for Anthropic.
Recently I’ve been having a lot of conversations about how to structure and staff teams. One framework I’ve referenced repeatedly is to break down team leadership into a few different categories of responsibility.
This is useful for a couple reasons. One is that it helps you get more concrete about what leading a team involves; for new managers, having an exhaustive list of job responsibilities is helpful to make sure you’re tracking all of them.
More importantly, though, we often want to somehow split these responsibilities between people. Team leadership covers a huge array of things—as you can see from how long this post is—and trying to find someone who can be great at all of them is often a unicorn hunt. Even if you do find someone good-enough at all of them, they usually spike in 1-2 areas, and it might be higher-leverage for them to fully focus on those.
Here’s a breakdown I use a lot:1
The most important responsibility a team’s leadership is to ensure that the team is headed in the right direction—that is, are they working towards the right high level goal and do they have an achievable plan to get there? Overall direction tends to get input from many people inside and outside a team, but who is most accountable for it can vary; see Example divisions of responsibility below.
Overall direction involves working on things like:
The most important skill for getting this right is having good predictive models (of both the team’s domain and the organization)—since prioritization is ultimately a question about “what will be the impact if we pursue this project.” Being great at communicating those predictive models, and the team’s priorities and goals, to other stakeholders is also important.
Good team direction mostly looks like the team producing a steady stream of big wins. Poor direction most commonly manifests as getting caught by surprise or falling behind—that is, mispredicting what work will be most important and doing too little of it, for example by starting too late, under-hiring, or not growing people into the right skillset or role. Other signs of poor direction include team members not understanding why they’re working on something; the team working on projects that deliver little value; friction with peer teams or arguments about scope; or important projects falling through the cracks between teams.
People management means being responsible for the success of the people on the team, most commonly including things like:
Day to day, the most important responsibility here is recurring 1:1s (the coaching kind, not the status update kind). Others include writing job descriptions, setting up interview loops, sourcing candidates, gathering feedback, writing performance reviews, helping people navigate org policies, giving career coaching, etc.
The most important skill for people management is understanding people—both in the traditional “high EQ” sense of being empathetic and good at seeing others’ perspectives, but also in the sense of knowing what contributes to high performance in a domain (e.g. what makes someone a great engineer or researcher). It’s also important to be good at having tricky conversations in a compassionate but firm way.
The main outcome of people management is whether people on the team are high-performing and happy. Teams with the best people management hire great people, give them fast feedback on anything that’s not working, course-correct them quickly, help them grow their impact over time, and generally help them have a great time at work. Bad people management looks like people chronically underperforming or having low morale.
A common question here is how technical a people manager needs to be. Opinions vary widely. The bar I typically suggest is that the people manager doesn’t need to have the most technical depth on the team, but they need enough depth that they can follow most discussions without slowing them down, understand who’s correct in most debates without needing to rely on trust, and generally stay oriented easily.
The people manager is responsible for making sure their reports get mentorship and feedback if needed, but they don’t need to be the primary person doing the mentorship or feedback themselves. Often, domain-specific mentorship comes from whoever is responsible for technical direction, but it can also come from anyone else senior on the team, or less commonly, somewhere else in the org.
Project management means making sure the team executes well: i.e., that everyone works efficiently towards the team’s top priorities while staying unblocked and situationally aware of what else is going on. In the short run, it’s the key determinant of a team’s productivity.
Day to day, project management looks like:
Project management isn’t just administrative; doing it well requires a significant amount of domain expertise (to follow project discussions, understand status updates, track dependencies, etc.). Beyond that, it’s helpful to be organized and detail-oriented, and to have good mental models of people (who will be good at what types of work? What kinds of coordination rituals are helpful for this team?).
Good project management is barely visible—it just feels like “things humming along.” It’s more visible when it’s going badly, which mostly manifests as inefficient work: people being blocked, context-switching frequently due to priority thrash, flailing around because they’re working on a project that’s a bad fit, doing their work wrong because they don’t understand the root goal, missing out on important information that was in the wrong Slack channel, and so on.
When teams get big, project management is one of the areas that’s easiest to delegate and split up. For example, when Anthropic’s inference team got up to 10+ people, we split it up into multiple “pods” focused on different areas, where each pod had a “pod lead” that was responsible for that pod’s project management.
Technical leadership means being responsible for the quality of a team’s technical work. In complex orgs integrating multiple technical skillsets, you can think of teams as often needing some amount of tech leadership in each one—for example, research teams at Anthropic need both research and engineering leadership, although the exact balance varies by team.
Specific work includes:
Because technical leadership benefits a lot from the detailed context and feedback loops of working on execution yourself, it’s fairly common for tech leads to be individual contributors.2 In practice, many teams have a wide enough surface area that they end up with multiple technical leads in different domains—split either “vertically” by project, “horizontally” by skillset, or some combination of the two.
Perhaps obviously, the most important skill for a tech lead is domain expertise. Technical communication is probably next most important, and what separates this archetype of senior IC from others.
When technical leadership isn’t going well, it most often manifests as accumulating debt or other friction that slows down execution: bogus research results, uninformative experiments, creaky systems, frequent outages, etc.
Here are a few different real-world examples of how these responsibilities can be divided up.3
When a new company introduces their first technical managers, they often do it by moving their strongest technical person (or people) into a management role and expecting them to fulfill all four responsibilities. Some people do just fine in such roles, but more commonly, the new manager isn’t great at one or more of the responsibilities—most often people management—and struggles to improve due to the number of other things they’re responsible for. (Further reading: Tech Lead Management roles are a trap)
Although TLM roles have some pitfalls, they’re not impossible. Here are a few protective factors that make them more likely to succeed:
This type of split is common in larger tech companies, with the EM responsible for overall direction, people and project management, and the TL responsible for technical leadership (and potentially also contributing to overall direction). “Tech lead” doesn’t have to be a formal title here, and sometimes a team will have multiple tech leads in different areas.
At Anthropic, a good example of this is our inference team, where the managers don’t set much technical direction themselves, and instead are focused on hiring, organizing, coaching, establishing priorities, and being glue with the team’s many many client teams. Since the domain is highly complex and the team is senior-heavy, tech leadership is provided by multiple different ICs for different parts of the service (model implementation, server architecture, request scheduling, capacity management, etc.).
This is an example of a less-common split. At Wave, we used a division similar to the EM/TL split described above, but the team managers (which we called Product Managers, although it was a somewhat atypical shape for a PM role) often came from non-technical backgrounds.
PMs were expected to act as the “mini CEO” of a product area (e.g. our bill payment product, our agent network, etc.) with fairly broad autonomy to work within that area. Because the “mini CEO” role involved a bunch of other competencies, we decided they didn’t also need to be as technical as a normal engineering manager might.
Although unusual, this worked well for a couple main reasons:
Notably, this broke the suggestion I mentioned above that people managers should be reasonably technical. This worked mostly because we were able to lean heavily on tech leads for the parts of people management that required technical context. Tech lead was a formal role, with secondary reporting into an engineering manager-of-managers; and while PMs were ultimately responsible for people management, the TL played a major role as well. Both of them would have 1:1s with each team member, and performance reviews would be co-written between the PM and the TL.
Anthropic has a few examples of splitting people management from research leadership; the longest-running one is on our Interpretability team, where Chris Olah owned overall direction and technical leadership, and Shan Carter owned people and project management. (This has changed a bit now that Interpretability has multiple sub-teams.)
In this split, unlike an EM/TL split on an engineering team, it made more sense for the research lead to be accountable for overall direction because it depended very heavily on high-context intuitive judgment calls about which research direction to pursue (e.g. betting heavily on the superposition hypothesis, which led to several major results). Many (though not all!) engineering teams’ prioritization depends less on this kind of highly technical judgment call.
This is interesting as an example of a setup where the people manager wasn’t (primarily) responsible for overall direction. It’s somewhat analogous to the CTO / VP Engineering split in some tech companies, where the CTO is responsible for overall direction but most people-leadership responsibility lies with the VPE who reports to them.
Thanks to Milan Cvitkovic and many Anthropic coworkers for reading a draft of this post.
2024-07-13 08:00:00
This is an adaptation of an internal doc I wrote for Anthropic.
I’ve been noticing recently that often, a big blocker to teams staying effective as they grow is trust.
“Alice doesn’t trust Bob” makes Alice sound like the bad guy, but it’s often completely appropriate for people not to trust each other in some areas:
One might have an active reason to expect someone to be bad at something. For example, recently I didn’t fully trust two of my managers to set their teams’ roadmaps… because they’d joined about a week ago and had barely gotten their laptops working. (Two months later, they’re doing great!)
One might just not have data. For example, I haven’t seen most of my direct reports deal with an underperforming team member yet, and this is a common blind spot for many managers, so I shouldn’t assume that they will reliably be effective at this without support.
In general, if Alice is Bob’s manager and is an authority on, say, prioritizing research directions, Bob is probably actively trying to build a good mental “Alice simulator” so that he can prioritize autonomously without checking in all the time. But his simulator might not be good yet, or Alice might not have verified that it’s good enough. Trust comes from common knowledge of shared mental models, and that takes investment from both sides to build.
If low trust is sometimes appropriate, what’s the problem? It’s that trust is what lets collaboration scale. If I have a colleague I don’t trust to (say) make good software design decisions, I’ll have to review their designs much more carefully and ask them to make more thorough plans in advance. If I have a report that I don’t fully trust to handle underperforming team members, I’ll have to manage them more granularly, digging into the details to understand what’s going on and forming my own views about what should happen, and checking on the situation repeatedly to make sure it’s heading in the right direction. That’s a lot more work both for me, but also for my teammates who have to spend a bunch more time making their work “inspectable” in this way.
The benefits here are most obvious when work gets intense. For example, Anthropic had a recent crunch time during which one of our teams was under intense pressure to quickly debug a very tricky issue. We were able to work on this dramatically more efficiently because the team (including most of the folks who joined the debugging effort from elsewhere) had high trust in each other’s competence; at peak we had probably ~25 people working on related tasks, but we were mostly able to split them into independent workstreams where people just trusted the other stuff would get done. In similar situations with a lower-mutual-trust team, I’ve seen things collapse into endless FUD and arguments about technical direction, leading to much slower forward progress.
Trust also becomes more important as the number of stakeholders increases. It’s totally manageable for me to closely supervise a report dealing with an underperformer; it’s a lot more costly and high-friction if, say, 5 senior managers need to do deep dives on a product decision. In an extreme case, I once saw an engineering team with a tight deadline choose to build something they thought was unnecessary, because getting the sign-off to cut scope would have taken longer than doing the work. From the perspective of the organization as an information-processing entity, given the people and relationships that existed at the time, that might well have been the right call; but it does suggest that if they worked to build enough trust to make that kind of decision efficient enough to be worth it, they’d probably move much faster overall.
As you work with people for longer you’ll naturally have more experience with each other and build more trust. So on most teams, these kinds of things work themselves out over time. But if you’re going through hypergrowth, then unless you’re very proactive about this, any given time most of your colleagues will have some sort of trust deficit.
Symptoms I sometimes notice that can indicate a buildup of trust deficits:
It’s easy to notice these and think that the solution is for people to “just trust each other more.” There are some situations and personalities where that’s the right advice. But often it’s reasonable not to trust someone yet! In that case, a better tactic is to be more proactive about building trust. In a large, fast-growing company you’ll probably never get to the utopian ideal of full pairwise trust between everyone—it takes too long to build. But on the margin, more effort still helps a lot.
Some ways to invest more effort in trusting others that I’ve seen work well:
Share your most important mental models broadly. At Anthropic, Dario gives biweekly-ish “informal vision updates” (hour-long talks on important updates to parts of company strategy) that I think of as the canonical example of this. Just about everyone at Anthropic is trying to build an internal “Dario simulator” who they can consult when the real one is too busy (i.e. ~always). For high level strategy, these updates do an amazing job of that.
Put in time. In addition to one-way broadcasts, trust-building benefits a lot from one-on-one bidirectional communication so that you can get feedback on how well the other person is building the right models. This is one of the reasons I schedule lots of recurring 1:1s with peers in addition to my team. Offsites are also very helpful here.
Try people out. If you’re unsure whether someone on your team will be great at something, try giving them a trial task and monitoring how it’s going more closely than you would by default, to catch issues early. This is a great way to invest in your long-term ability to delegate things.
Give feedback. It’s easy to feel like something is “too minor” to give feedback on and let it slide, especially when there’s always too much to do. But I’ve never regretted erring on the side of giving feedback, and often regretted deciding to “deal with it” or keep quiet. One pro-tip here: if you feel anxious about giving someone negative feedback, consider whether you’ve given them enough positive feedback—which is a helpful buffer against people interpreting negative feedback as “you’re not doing well overall.”
Inspection forums, i.e., recurring meetings where leadership monitors the status of many projects by setting goals and tracking progress against them. The above tactics are mostly 1:1 or one-to-all, but sometimes you want to work with a small group and this is an efficient way of doing that.
To help other people trust you:
Accept that you start out with incomplete trust. When someone, say, tries to monitor my work more closely than I think is warranted, my initial reaction is to be defensive and ask them to trust me more. It takes effort to put myself into their shoes and remind myself that they probably don’t have a good enough model of me to trust me yet.
Overcommunicate status. This helps in two ways: first, it gives stakeholders more confidence that if something goes off the rails they’ll know quickly. And second, it gives them more data and helps them build a higher-fidelity model of how you operate.
Proactively own up when something isn’t going well. Arguably a special case of overcommunicating, but one that’s especially important to get right: if you can be relied on to ask for help when you need it, it’s a lot less risky for people to “try you out” on stuff at the edge of what they trust you on.
Related reading: Inspection and the limits of trust
2024-02-25 08:00:00
This is an adaptation of an internal doc I wrote for Wave.
I used to think that behavioral interviews were basically useless, because it was too easy for candidates to bullshit them and too hard for me to tell what was a good answer. I’d end up grading every candidate as an “okay, I guess” because I was never sure what bar I should hold them to.
I still think most behavioral interviews are like that, but after grinding out way too many of them, I now think it’s possible to escape that trap. Here are my tips and tricks for doing so!
Confidence level: doing this stuff worked better than not doing it, but I still feel like I could be a lot better at behavioral interviews, so please suggest improvements and/or do your own thing :)
That’s how long I usually take to design and prepare a new type of interview. If I spend a couple hours thinking about what questions and follow-ups to ask, I’m much more likely to get a strong signal about which candidates performed well.
It might sounds ridiculous to spend 2 hours building a 1-hour interview that you’ll only give 4 times. But it’s worth it! Your most limited resource is time with candidates, so if you can spend more of your own time to use candidates’ time better, that’s worth it.
I spend most of those 2 hours trying to answer the following question: “what answers to these questions would distinguish a great candidate from a mediocre one, and how can I dig for that?” I find that if I wait until after the interview to evaluate candidates, I rarely have conviction about them, and fall back to grading them a “weak hire” or “weak no-hire.”
To avoid this, write yourself a rubric of all the things you care about assessing, and what follow-up questions you’ll ask to assess those things. This will help you deliver the interview consistently, but most importantly, you’ll ask much better follow-up questions if you’ve thought about them beforehand. See the appendix for an example rubric.
I usually focus on 1-3 related skills or traits.
To get a strong signal from a behavioral interview question I usually need around 15 minutes, which only leaves time to discuss a small number of scenarios. For example, for a head of technical recruiting, I decided to focus my interview on the cluster of related traits of being great at communication, representing our culture to candidates, and holding a high bar for job candidate experience.
You should coordinate with the rest of the folks on your interview loop to make sure that, collectively, you cover all the most important traits for the role.
My formula for kicking off a behavioral question is “Tell me about a recent time when [X situation happened]. Just give me some brief high-level context on the situation, what the problem was,1 and how you addressed it. You can keep it high-level and I’ll ask follow-up questions afterward.”
I usually ask for a recent time to avoid having them pick the one time that paints them in the best possible light.
The second sentence (context/problem/solution) is important for helping the candidate keep their initial answer focused—otherwise, they are more likely to ramble for a long time and leave less time for you to…
Almost everyone will answer the initial behavioral interview prompt with something that sounds vaguely like it makes sense, even if they don’t actually usually behave in the ways you’re looking for. To figure out whether they’re real or BSing you, the best way is to get them to tell you a lot of details about the situation—the more you get them to tell you, the harder it will be to BS all the details.
General follow-ups you can use to get more detail:
Ask for a timeline—how quickly people operate can be very informative. (Example: I asked someone how they dealt with an underperforming direct report and they gave a compelling story, but when I asked for the timeline, it seemed that weeks had elapsed between noticing the problem and doing anything about it.)
“And then what happened?” / “What was the outcome?” (Example: I asked this to a tech recruiter for the “underperforming report” question and they admitted they had to fire the person, which they hadn’t previously mentioned—that’s a yellow flag on honesty.)
Ask how big of an effect something had and how they know. (Example: I had a head of technical recruiting tell me “I did X and our outbound response rate improved;” when I asked how much, he said from 11% to 15%, but the sample size was small enough that that could have been random chance!)
“Is there anything you wish you’d done differently?” (Sometimes people respond to this with non-actionable takeaways like “I wish I’d thought of that idea earlier” but having no plan or mechanism that could possibly cause them to think about the idea earlier the next time.)
One of the worst mistakes you can make in a behavioral interview is to wing it: to ask whatever follow-up questions pop into your head, and then at the end try to answer the question, “did I like this person?” If you do that, you’re much more likely to be a “weak yes” or “weak no” on every candidate, and to miss asking the follow-up questions that could have given you stronger signal.
Instead, you should know what you’re looking for, and what directions to probe in, before you start the interview. The best way to do this is to build a scoring rubric, where you decide what you’re going to look for and what a good vs. bad answer looks like. See the appendix for an example.
Of course, most of your rubric should be based on the details of what traits you’re trying to evaluate! But here are some failure modes that are common to most behavioral interviews:
Vague platitudes: some people have a tendency to fall back on vague generalities in behavioral interviews. “In recruiting, it’s all about communication!” “No org structure is perfect!” If they don’t follow this up with a more specific, precise or nuanced claim, they may not be a strong first-principles thinker.
Communication bandwidth: if you find that you’re struggling to understand what the person is saying or get on the same page as them, this is a bad sign about your ability to discuss nuanced topics in the future if you work together.
Self-improvement mindset: if the person responds to “what would you do differently” with “nothing,” or with non-actionable vague platitudes, it’s a sign they may not be great at figuring out how to get better at things over time.
Being embarrassingly honest: if probing for more details causes you to learn that the thing went less well than the original impression you got, the candidate probably is trying to “spin” this at least a little bit.
High standards: if they say there’s nothing they wish they’d done differently, this may also be lack of embarrassing honesty, or not holding themselves to a high standard. (Personally, even for any project that went exceptionally well I can think of lots of individual things I could have done better!)
Scapegoating: if you ask about solving a problem, do they take responsibility for contributing to the problem? it’s common for people to imply/say that problems were all caused by other people and solved by them (eg “this hiring manager wanted to do it their way, but I knew they were wrong, but couldn’t convince them…”). Sometimes this is true, but usually problems aren’t a single person’s fault!
Here’s an example rubric and set of follow-up questions for a Head of Technical Recruiting.
Question: “tell me about a time when your report wasn’t doing a good job.”