2025-03-16 08:00:00
My few most productive individual weeks at Anthropic have all been “crisis project management:” coordinating major, time-sensitive implementation or debugging efforts.
In a company like Anthropic, excellent project management is an extremely high-leverage skill, and not just during crises: our work has tons of moving parts with complex, non-obvious interdependencies and hard schedule constraints, which means organizing them is a huge job, and can save weeks of delays if done right. Although a lot of the examples here come from crisis projects, most of the principles here are also the way I try to run any project, just more-so.
I think excellent project management is also rarer than it needs to be. During the crisis projects I didn’t feel like I was doing anything particularly impressive; mostly it felt like I was putting in a lot of work but doing things that felt relatively straightforward. On the other hand, I often see other people miss chances to do those things, maybe for lack of having seen a good playbook.
So here’s an attempt to describe my playbook for when I’m being intense about project management.
(I’ve described what I did as “coordinating” above, but that’s underselling it a bit; it mattered a lot for this playbook that I had enough technical context, and organizational trust, to autonomously make most prioritization decisions about the project. Sometimes we instead try to have the trusted decisionmakers not be highly involved in managing execution, and instead farm that out to a lower-context or less-trusted project manager to save the trusted decisionmaker time, but IMO this is usually a false economy for projects where it’s critical that they be executed well.)
For each of the crisis management projects I completely cleared my schedule to focus on them, and ended up spending 6+ hours a day organizing them.
This is a bit unintuitive because I’m used to thinking of information processing as basically a free action. After all, you’re “just” moving info from place to place, not doing real work like coding, right? But if you add it all up—running meetings, pinging for updates, digesting Slack threads, pinging for updates again, thinking about what’s next, pinging for updates a third time, etc.—it’s surprisingly time-intensive.
Even more importantly than freeing up time, clearing my schedule made sure the project was the top idea in my mind. If I don’t do that, it’s easy for me to let projects “go on autopilot,” where I keep them running but don’t proactively make time to think through things like whether we should change goals, add or drop priorities, or do other “non-obvious” things.
For non-crisis projects, it’s often not tenable (or the right prioritization) to spend 6+ hours a day project-managing; but it’s still the case that you can improve execution a lot if you focus and make them a top priority, e.g. by carving out dedicated time every day to check statuses, contemplate priorities, broadcast updates, and so on.
A specific tool that I’ve found critical for staying oriented and updating quickly is a detailed plan for victory, i.e., a list of steps, as concrete as possible, that end with the goal being achieved.
The plan is important because whether or not we’re achieving the plan is the best way to figure out how well or badly things are going. Knowing how well or badly things are going is important because it tells me when to start asking for more support, cutting scope, escalating problems, and otherwise sounding more alarms. One of the most common megaproject failure modes is to not freak out soon enough, and having a concrete plan is the best antidote.
As a both positive and negative example of this, during a recent sprint to release a new implementation of a model, we took a detailed accounting of all the work we thought we had to do to launch.
As the above example shows, having a plan can’t completely save you if you underestimate how long all the steps in the plan will take. But it certainly helps! My sense is that the main things that would have helped even more in the above case were:
OODA stands for “observe, orient, decide, act”—in other words, the process by which you update your plans and behavior based on new information.
Most of the large projects I’ve worked on have been characterized by incomplete information:
In fact, I’d make a stronger claim: usually getting complete information was the hard part of the project, and took up a substantial fraction of the overall critical-path timeline.
For example, let’s take a recent project to kick off a training run. The critical path probably looked something like:
Practically all of these steps are about information-processing, not writing code! Even the step where the compute partner debugged the problems on their side was itself constrained by information processing speed, since there were tens of people working on the debugging effort and coordinating / sharing info between them was difficult. Overall, the project timeline was strongly constrained by how quickly information could round-trip from our compute partner’s large-scale debugging effort, through their tech lead, me, and Anthropic’s large-scale debugging effort.
This pattern generalizes to most projects I’ve been a part of, and as a result, one of my most productive project management habits is to try to run the fastest OODA loop that I can.
A few specific things that I’ve found help:
It’s not just enough for me personally to be running a fast OODA loop—in a large group, everyone needs to be autonomously making frequent, high-quality, local prioritization decisions, without needing a round-trip through me. To get there, they need to be ambiently aware of:
I’ve usually found that to create the right level of ambient awareness, I have to repeat the same things way more often than I intuitively expect. This is roughly the same “communicate uncomfortably much” principle above, but applied to broadcasts and not just 1:1 conversations with people.
For example, although the first team I managed at Anthropic started with a daily asynchronous standup, we found that synchronous meetings were much more effective for creating common knowledge and reorienting, so we moved to a twice-weekly synchronous standup, which probably qualified as “uncomfortably much” synchronous communication for Anthropic at the time.
Once a project gets over maybe 10 people, I can’t track everything myself in enough detail to project-manage the entire thing myself. At this point, it becomes critical to delegate.
Here I mean delegating the project management, not just the execution (that’s what I’d be delegating to the first 10 people). This is the point where I need other people to help split up the work, monitor and communicate progress, escalate blockers, etc.
A few things I try to keep in mind when delegating project management:
One of my favorite things to make delegation easier is to keep goals simple—if they can fit in a Slack message while still crisply describing a path to the desired end state, then the people working on the goal will be much more able to prioritize autonomously, and point their work at the real end goal rather than doing something that turns out to be useless for some reason they didn’t think about.
“Keep goals simple” doesn’t have to mean “do less”—the best way to keep goals simple is to find the latent structure that enables a clean recursive decomposition into subgoals. This often requires a deceptive amount of work—both cognitive and hands-on-keyboard—to identify the right intermediate goals, but I’ve found that it pays off immensely by clarifying what’s important to work on.
Some of my favorite memories of Anthropic are of helping out with these big projects. While they can be intense, it’s also really inspiring to see how our team comes together, and the feeling of being part of a big team of truly excellent people cooking something ambitious together can be really magical! So I try to enjoy the chaos :)
Here’s the internal doc I share with folks on my team who are getting into being responsible for large projects.
So you’re the DRI of a project (or part of one). Concretely, what do you do to “be DRI”?
This doc is my suggested “starter kit” answer to that question. The habits and rituals described here aren’t perfect for every situation, but they’re lightweight and broadly helpful. I suggest you use them as a starting point for iteration: try them out, then adjust as necessary. This is an SL init; the RL is your job :)
The goal is to help you do your job as DRI—
—without adding too much overhead:
(Note: being DRI will still unavoidably add some overhead—e.g. you’ll have to track what other people are doing, delegate work, unblock people, set and communicate goals, etc. The goal is specifically for the process/paperwork to be minimal.)
You should schedule at least one 30-minute weekly meeting with everyone working on the project.
The goal of this meeting is to (1) be a backstop for any coordination that needs to happen and didn’t happen asynchronously; (2) be an efficient way to create common knowledge of goals, updates, etc.; (3) help you track whether things are going well.
It’s really helpful for discoverability and wayfinding to have a single “master doc” with all the most important info about a project. As you loop more people in, they can read the doc to get up to speed. And anyone who thinks “I wonder how X is going” can stop by there to find out.
Create a doc for your workstream with:
If it’s part of a larger project, your doc should be nested within the larger project’s working doc.
If this ends up being too much for one doc, you can fork these out into sub-docs (esp. running notes and updates).
Thanks to Kelley Rivoire for many thoughtful comments on a draft!
2024-07-21 08:00:00
This is an adaptation of an internal doc I wrote for Anthropic.
Recently I’ve been having a lot of conversations about how to structure and staff teams. One framework I’ve referenced repeatedly is to break down team leadership into a few different categories of responsibility.
This is useful for a couple reasons. One is that it helps you get more concrete about what leading a team involves; for new managers, having an exhaustive list of job responsibilities is helpful to make sure you’re tracking all of them.
More importantly, though, we often want to somehow split these responsibilities between people. Team leadership covers a huge array of things—as you can see from how long this post is—and trying to find someone who can be great at all of them is often a unicorn hunt. Even if you do find someone good-enough at all of them, they usually spike in 1-2 areas, and it might be higher-leverage for them to fully focus on those.
Here’s a breakdown I use a lot:1
The most important responsibility a team’s leadership is to ensure that the team is headed in the right direction—that is, are they working towards the right high level goal and do they have an achievable plan to get there? Overall direction tends to get input from many people inside and outside a team, but who is most accountable for it can vary; see Example divisions of responsibility below.
Overall direction involves working on things like:
The most important skill for getting this right is having good predictive models (of both the team’s domain and the organization)—since prioritization is ultimately a question about “what will be the impact if we pursue this project.” Being great at communicating those predictive models, and the team’s priorities and goals, to other stakeholders is also important.
Good team direction mostly looks like the team producing a steady stream of big wins. Poor direction most commonly manifests as getting caught by surprise or falling behind—that is, mispredicting what work will be most important and doing too little of it, for example by starting too late, under-hiring, or not growing people into the right skillset or role. Other signs of poor direction include team members not understanding why they’re working on something; the team working on projects that deliver little value; friction with peer teams or arguments about scope; or important projects falling through the cracks between teams.
People management means being responsible for the success of the people on the team, most commonly including things like:
Day to day, the most important responsibility here is recurring 1:1s (the coaching kind, not the status update kind). Others include writing job descriptions, setting up interview loops, sourcing candidates, gathering feedback, writing performance reviews, helping people navigate org policies, giving career coaching, etc.
The most important skill for people management is understanding people—both in the traditional “high EQ” sense of being empathetic and good at seeing others’ perspectives, but also in the sense of knowing what contributes to high performance in a domain (e.g. what makes someone a great engineer or researcher). It’s also important to be good at having tricky conversations in a compassionate but firm way.
The main outcome of people management is whether people on the team are high-performing and happy. Teams with the best people management hire great people, give them fast feedback on anything that’s not working, course-correct them quickly, help them grow their impact over time, and generally help them have a great time at work. Bad people management looks like people chronically underperforming or having low morale.
A common question here is how technical a people manager needs to be. Opinions vary widely. The bar I typically suggest is that the people manager doesn’t need to have the most technical depth on the team, but they need enough depth that they can follow most discussions without slowing them down, understand who’s correct in most debates without needing to rely on trust, and generally stay oriented easily.
The people manager is responsible for making sure their reports get mentorship and feedback if needed, but they don’t need to be the primary person doing the mentorship or feedback themselves. Often, domain-specific mentorship comes from whoever is responsible for technical direction, but it can also come from anyone else senior on the team, or less commonly, somewhere else in the org.
Project management means making sure the team executes well: i.e., that everyone works efficiently towards the team’s top priorities while staying unblocked and situationally aware of what else is going on. In the short run, it’s the key determinant of a team’s productivity.
Day to day, project management looks like:
Project management isn’t just administrative; doing it well requires a significant amount of domain expertise (to follow project discussions, understand status updates, track dependencies, etc.). Beyond that, it’s helpful to be organized and detail-oriented, and to have good mental models of people (who will be good at what types of work? What kinds of coordination rituals are helpful for this team?).
Good project management is barely visible—it just feels like “things humming along.” It’s more visible when it’s going badly, which mostly manifests as inefficient work: people being blocked, context-switching frequently due to priority thrash, flailing around because they’re working on a project that’s a bad fit, doing their work wrong because they don’t understand the root goal, missing out on important information that was in the wrong Slack channel, and so on.
When teams get big, project management is one of the areas that’s easiest to delegate and split up. For example, when Anthropic’s inference team got up to 10+ people, we split it up into multiple “pods” focused on different areas, where each pod had a “pod lead” that was responsible for that pod’s project management.
Technical leadership means being responsible for the quality of a team’s technical work. In complex orgs integrating multiple technical skillsets, you can think of teams as often needing some amount of tech leadership in each one—for example, research teams at Anthropic need both research and engineering leadership, although the exact balance varies by team.
Specific work includes:
Because technical leadership benefits a lot from the detailed context and feedback loops of working on execution yourself, it’s fairly common for tech leads to be individual contributors.2 In practice, many teams have a wide enough surface area that they end up with multiple technical leads in different domains—split either “vertically” by project, “horizontally” by skillset, or some combination of the two.
Perhaps obviously, the most important skill for a tech lead is domain expertise. Technical communication is probably next most important, and what separates this archetype of senior IC from others.
When technical leadership isn’t going well, it most often manifests as accumulating debt or other friction that slows down execution: bogus research results, uninformative experiments, creaky systems, frequent outages, etc.
Here are a few different real-world examples of how these responsibilities can be divided up.3
When a new company introduces their first technical managers, they often do it by moving their strongest technical person (or people) into a management role and expecting them to fulfill all four responsibilities. Some people do just fine in such roles, but more commonly, the new manager isn’t great at one or more of the responsibilities—most often people management—and struggles to improve due to the number of other things they’re responsible for. (Further reading: Tech Lead Management roles are a trap)
Although TLM roles have some pitfalls, they’re not impossible. Here are a few protective factors that make them more likely to succeed:
This type of split is common in larger tech companies, with the EM responsible for overall direction, people and project management, and the TL responsible for technical leadership (and potentially also contributing to overall direction). “Tech lead” doesn’t have to be a formal title here, and sometimes a team will have multiple tech leads in different areas.
At Anthropic, a good example of this is our inference team, where the managers don’t set much technical direction themselves, and instead are focused on hiring, organizing, coaching, establishing priorities, and being glue with the team’s many many client teams. Since the domain is highly complex and the team is senior-heavy, tech leadership is provided by multiple different ICs for different parts of the service (model implementation, server architecture, request scheduling, capacity management, etc.).
This is an example of a less-common split. At Wave, we used a division similar to the EM/TL split described above, but the team managers (which we called Product Managers, although it was a somewhat atypical shape for a PM role) often came from non-technical backgrounds.
PMs were expected to act as the “mini CEO” of a product area (e.g. our bill payment product, our agent network, etc.) with fairly broad autonomy to work within that area. Because the “mini CEO” role involved a bunch of other competencies, we decided they didn’t also need to be as technical as a normal engineering manager might.
Although unusual, this worked well for a couple main reasons:
Notably, this broke the suggestion I mentioned above that people managers should be reasonably technical. This worked mostly because we were able to lean heavily on tech leads for the parts of people management that required technical context. Tech lead was a formal role, with secondary reporting into an engineering manager-of-managers; and while PMs were ultimately responsible for people management, the TL played a major role as well. Both of them would have 1:1s with each team member, and performance reviews would be co-written between the PM and the TL.
Anthropic has a few examples of splitting people management from research leadership; the longest-running one is on our Interpretability team, where Chris Olah owned overall direction and technical leadership, and Shan Carter owned people and project management. (This has changed a bit now that Interpretability has multiple sub-teams.)
In this split, unlike an EM/TL split on an engineering team, it made more sense for the research lead to be accountable for overall direction because it depended very heavily on high-context intuitive judgment calls about which research direction to pursue (e.g. betting heavily on the superposition hypothesis, which led to several major results). Many (though not all!) engineering teams’ prioritization depends less on this kind of highly technical judgment call.
This is interesting as an example of a setup where the people manager wasn’t (primarily) responsible for overall direction. It’s somewhat analogous to the CTO / VP Engineering split in some tech companies, where the CTO is responsible for overall direction but most people-leadership responsibility lies with the VPE who reports to them.
Thanks to Milan Cvitkovic and many Anthropic coworkers for reading a draft of this post.
2024-07-13 08:00:00
This is an adaptation of an internal doc I wrote for Anthropic.
I’ve been noticing recently that often, a big blocker to teams staying effective as they grow is trust.
“Alice doesn’t trust Bob” makes Alice sound like the bad guy, but it’s often completely appropriate for people not to trust each other in some areas:
One might have an active reason to expect someone to be bad at something. For example, recently I didn’t fully trust two of my managers to set their teams’ roadmaps… because they’d joined about a week ago and had barely gotten their laptops working. (Two months later, they’re doing great!)
One might just not have data. For example, I haven’t seen most of my direct reports deal with an underperforming team member yet, and this is a common blind spot for many managers, so I shouldn’t assume that they will reliably be effective at this without support.
In general, if Alice is Bob’s manager and is an authority on, say, prioritizing research directions, Bob is probably actively trying to build a good mental “Alice simulator” so that he can prioritize autonomously without checking in all the time. But his simulator might not be good yet, or Alice might not have verified that it’s good enough. Trust comes from common knowledge of shared mental models, and that takes investment from both sides to build.
If low trust is sometimes appropriate, what’s the problem? It’s that trust is what lets collaboration scale. If I have a colleague I don’t trust to (say) make good software design decisions, I’ll have to review their designs much more carefully and ask them to make more thorough plans in advance. If I have a report that I don’t fully trust to handle underperforming team members, I’ll have to manage them more granularly, digging into the details to understand what’s going on and forming my own views about what should happen, and checking on the situation repeatedly to make sure it’s heading in the right direction. That’s a lot more work both for me, but also for my teammates who have to spend a bunch more time making their work “inspectable” in this way.
The benefits here are most obvious when work gets intense. For example, Anthropic had a recent crunch time during which one of our teams was under intense pressure to quickly debug a very tricky issue. We were able to work on this dramatically more efficiently because the team (including most of the folks who joined the debugging effort from elsewhere) had high trust in each other’s competence; at peak we had probably ~25 people working on related tasks, but we were mostly able to split them into independent workstreams where people just trusted the other stuff would get done. In similar situations with a lower-mutual-trust team, I’ve seen things collapse into endless FUD and arguments about technical direction, leading to much slower forward progress.
Trust also becomes more important as the number of stakeholders increases. It’s totally manageable for me to closely supervise a report dealing with an underperformer; it’s a lot more costly and high-friction if, say, 5 senior managers need to do deep dives on a product decision. In an extreme case, I once saw an engineering team with a tight deadline choose to build something they thought was unnecessary, because getting the sign-off to cut scope would have taken longer than doing the work. From the perspective of the organization as an information-processing entity, given the people and relationships that existed at the time, that might well have been the right call; but it does suggest that if they worked to build enough trust to make that kind of decision efficient enough to be worth it, they’d probably move much faster overall.
As you work with people for longer you’ll naturally have more experience with each other and build more trust. So on most teams, these kinds of things work themselves out over time. But if you’re going through hypergrowth, then unless you’re very proactive about this, any given time most of your colleagues will have some sort of trust deficit.
Symptoms I sometimes notice that can indicate a buildup of trust deficits:
It’s easy to notice these and think that the solution is for people to “just trust each other more.” There are some situations and personalities where that’s the right advice. But often it’s reasonable not to trust someone yet! In that case, a better tactic is to be more proactive about building trust. In a large, fast-growing company you’ll probably never get to the utopian ideal of full pairwise trust between everyone—it takes too long to build. But on the margin, more effort still helps a lot.
Some ways to invest more effort in trusting others that I’ve seen work well:
Share your most important mental models broadly. At Anthropic, Dario gives biweekly-ish “informal vision updates” (hour-long talks on important updates to parts of company strategy) that I think of as the canonical example of this. Just about everyone at Anthropic is trying to build an internal “Dario simulator” who they can consult when the real one is too busy (i.e. ~always). For high level strategy, these updates do an amazing job of that.
Put in time. In addition to one-way broadcasts, trust-building benefits a lot from one-on-one bidirectional communication so that you can get feedback on how well the other person is building the right models. This is one of the reasons I schedule lots of recurring 1:1s with peers in addition to my team. Offsites are also very helpful here.
Try people out. If you’re unsure whether someone on your team will be great at something, try giving them a trial task and monitoring how it’s going more closely than you would by default, to catch issues early. This is a great way to invest in your long-term ability to delegate things.
Give feedback. It’s easy to feel like something is “too minor” to give feedback on and let it slide, especially when there’s always too much to do. But I’ve never regretted erring on the side of giving feedback, and often regretted deciding to “deal with it” or keep quiet. One pro-tip here: if you feel anxious about giving someone negative feedback, consider whether you’ve given them enough positive feedback—which is a helpful buffer against people interpreting negative feedback as “you’re not doing well overall.”
Inspection forums, i.e., recurring meetings where leadership monitors the status of many projects by setting goals and tracking progress against them. The above tactics are mostly 1:1 or one-to-all, but sometimes you want to work with a small group and this is an efficient way of doing that.
To help other people trust you:
Accept that you start out with incomplete trust. When someone, say, tries to monitor my work more closely than I think is warranted, my initial reaction is to be defensive and ask them to trust me more. It takes effort to put myself into their shoes and remind myself that they probably don’t have a good enough model of me to trust me yet.
Overcommunicate status. This helps in two ways: first, it gives stakeholders more confidence that if something goes off the rails they’ll know quickly. And second, it gives them more data and helps them build a higher-fidelity model of how you operate.
Proactively own up when something isn’t going well. Arguably a special case of overcommunicating, but one that’s especially important to get right: if you can be relied on to ask for help when you need it, it’s a lot less risky for people to “try you out” on stuff at the edge of what they trust you on.
Related reading: Inspection and the limits of trust
2024-02-25 08:00:00
This is an adaptation of an internal doc I wrote for Wave.
I used to think that behavioral interviews were basically useless, because it was too easy for candidates to bullshit them and too hard for me to tell what was a good answer. I’d end up grading every candidate as an “okay, I guess” because I was never sure what bar I should hold them to.
I still think most behavioral interviews are like that, but after grinding out way too many of them, I now think it’s possible to escape that trap. Here are my tips and tricks for doing so!
Confidence level: doing this stuff worked better than not doing it, but I still feel like I could be a lot better at behavioral interviews, so please suggest improvements and/or do your own thing :)
That’s how long I usually take to design and prepare a new type of interview. If I spend a couple hours thinking about what questions and follow-ups to ask, I’m much more likely to get a strong signal about which candidates performed well.
It might sounds ridiculous to spend 2 hours building a 1-hour interview that you’ll only give 4 times. But it’s worth it! Your most limited resource is time with candidates, so if you can spend more of your own time to use candidates’ time better, that’s worth it.
I spend most of those 2 hours trying to answer the following question: “what answers to these questions would distinguish a great candidate from a mediocre one, and how can I dig for that?” I find that if I wait until after the interview to evaluate candidates, I rarely have conviction about them, and fall back to grading them a “weak hire” or “weak no-hire.”
To avoid this, write yourself a rubric of all the things you care about assessing, and what follow-up questions you’ll ask to assess those things. This will help you deliver the interview consistently, but most importantly, you’ll ask much better follow-up questions if you’ve thought about them beforehand. See the appendix for an example rubric.
I usually focus on 1-3 related skills or traits.
To get a strong signal from a behavioral interview question I usually need around 15 minutes, which only leaves time to discuss a small number of scenarios. For example, for a head of technical recruiting, I decided to focus my interview on the cluster of related traits of being great at communication, representing our culture to candidates, and holding a high bar for job candidate experience.
You should coordinate with the rest of the folks on your interview loop to make sure that, collectively, you cover all the most important traits for the role.
My formula for kicking off a behavioral question is “Tell me about a recent time when [X situation happened]. Just give me some brief high-level context on the situation, what the problem was,1 and how you addressed it. You can keep it high-level and I’ll ask follow-up questions afterward.”
I usually ask for a recent time to avoid having them pick the one time that paints them in the best possible light.
The second sentence (context/problem/solution) is important for helping the candidate keep their initial answer focused—otherwise, they are more likely to ramble for a long time and leave less time for you to…
Almost everyone will answer the initial behavioral interview prompt with something that sounds vaguely like it makes sense, even if they don’t actually usually behave in the ways you’re looking for. To figure out whether they’re real or BSing you, the best way is to get them to tell you a lot of details about the situation—the more you get them to tell you, the harder it will be to BS all the details.
General follow-ups you can use to get more detail:
Ask for a timeline—how quickly people operate can be very informative. (Example: I asked someone how they dealt with an underperforming direct report and they gave a compelling story, but when I asked for the timeline, it seemed that weeks had elapsed between noticing the problem and doing anything about it.)
“And then what happened?” / “What was the outcome?” (Example: I asked this to a tech recruiter for the “underperforming report” question and they admitted they had to fire the person, which they hadn’t previously mentioned—that’s a yellow flag on honesty.)
Ask how big of an effect something had and how they know. (Example: I had a head of technical recruiting tell me “I did X and our outbound response rate improved;” when I asked how much, he said from 11% to 15%, but the sample size was small enough that that could have been random chance!)
“Is there anything you wish you’d done differently?” (Sometimes people respond to this with non-actionable takeaways like “I wish I’d thought of that idea earlier” but having no plan or mechanism that could possibly cause them to think about the idea earlier the next time.)
One of the worst mistakes you can make in a behavioral interview is to wing it: to ask whatever follow-up questions pop into your head, and then at the end try to answer the question, “did I like this person?” If you do that, you’re much more likely to be a “weak yes” or “weak no” on every candidate, and to miss asking the follow-up questions that could have given you stronger signal.
Instead, you should know what you’re looking for, and what directions to probe in, before you start the interview. The best way to do this is to build a scoring rubric, where you decide what you’re going to look for and what a good vs. bad answer looks like. See the appendix for an example.
Of course, most of your rubric should be based on the details of what traits you’re trying to evaluate! But here are some failure modes that are common to most behavioral interviews:
Vague platitudes: some people have a tendency to fall back on vague generalities in behavioral interviews. “In recruiting, it’s all about communication!” “No org structure is perfect!” If they don’t follow this up with a more specific, precise or nuanced claim, they may not be a strong first-principles thinker.
Communication bandwidth: if you find that you’re struggling to understand what the person is saying or get on the same page as them, this is a bad sign about your ability to discuss nuanced topics in the future if you work together.
Self-improvement mindset: if the person responds to “what would you do differently” with “nothing,” or with non-actionable vague platitudes, it’s a sign they may not be great at figuring out how to get better at things over time.
Being embarrassingly honest: if probing for more details causes you to learn that the thing went less well than the original impression you got, the candidate probably is trying to “spin” this at least a little bit.
High standards: if they say there’s nothing they wish they’d done differently, this may also be lack of embarrassing honesty, or not holding themselves to a high standard. (Personally, even for any project that went exceptionally well I can think of lots of individual things I could have done better!)
Scapegoating: if you ask about solving a problem, do they take responsibility for contributing to the problem? it’s common for people to imply/say that problems were all caused by other people and solved by them (eg “this hiring manager wanted to do it their way, but I knew they were wrong, but couldn’t convince them…”). Sometimes this is true, but usually problems aren’t a single person’s fault!
Here’s an example rubric and set of follow-up questions for a Head of Technical Recruiting.
Question: “tell me about a time when your report wasn’t doing a good job.”
2023-04-23 08:00:00
This post was adapted from a “management roundtable” I gave at Anthropic.
I had an unusually hard time becoming a manager: I went back and forth three times before it stuck, mostly because I made lots of mistakes each time. Since then, as I had to grow my team and grow other folks into managing part of it, I’ve seen a lot of other people have varying degrees of a rough time as well—often in similar ways.
Here’s a small, lovingly hand-curated selection of my prodigious oeuvre of mistakes, and strategies that helped me mitigate them.
The first thing I noticed about being a manager was that I wasn’t sure whether anything I was doing was useful.
As an engineer, I had a fast feedback loop—I could design something, code it, test it, show it to coworkers, ship it and see users happily using it all within a day or two.
Managing doesn’t have that kind of feedback. If I gave someone helpful advice in a one-on-one, at best they might mention it offhandedly three weeks later; more typically, they might forget to, and I’d never know. Without being able to tell whether I was doing anything useful, it was hard for me to stay motivated.
Gradually, over my first year, I built up better self-evaluation instincts. Today, if I give someone advice, I can usually guess right away whether it’s useful—not perfectly, of course, but well enough that I can feel good about my day-to-day output.
But those self-evaluation instincts took time to develop. In the mean time, I went through a demotivated slump, and I’ve seen lots of other new managers go through it too.
Three strategies helped me through it:
I was open with my manager when I was feeling down—sometimes I’d even explicitly ask him for a pep talk. Because he had a higher-level, longer-term perspective and had been a manager for longer, he was often able to point out ways I was having a big impact without noticing.
I asked people for feedback. I found that if I just asked “do you have any feedback for me?” people often wouldn’t, but if I asked more granular questions—“was that meeting useful?”—I would usually learn a lot from it. (See also § angsting.)
I built up other sources of fun and validation. For a long time, my work was the primary thing that helped me feel good about myself. Diversifying that to include more of friends, relationships, hobbies, Twitter likes, etc. smoothed out the ups and downs.
I started managing with only a few reports, so it was easy for me to tell myself that I still had time to code. In principle that was true. What I didn’t have was enough attention to split between two things:
Like many people, I have most of my best ideas in the shower…. The time when it was most constraining was the first time I became a manager. I only had a few reports, so managing them wasn’t a full-time job. But I was very bad at it, and so it should have been what I spent all my shower insights on.
Unfortunately, I was spending my non-management time on programming. And even if I tried to use my showers to think about my thorny and awkward people issues, my mind somehow always wandered off to tackle those nice, juicy software design problems instead.
This was extra-bad when the programming was urgent: I’d end up caught between, say, disappointing our operations team by not shipping a critical tooling improvement, or letting down my own team by half-assing planning and letting them work on unimportant things. I found these periods really stressful.
Eventually, I decided that I’d only allow myself to work on programming projects if nobody else cared when they shipped—say, cleaning up some non-blocking tech debt, or doing small bits of UI polish. If I had spare time after getting through my more important management work, I could pick up one of those projects, but if I had a busy week and had to put it on hold, nothing bad would happen.
(See also: Attention is your scarcest resource, Tech Lead Management roles are a trap.)
I read a bunch of management books that warned me against micromanaging my reports, so I resolved not to do that. I would give my team full autonomy, and participate in their work only by “editing” or helping them reach a higher quality bar. “These people are smart,” I thought. “They’ll figure it out, or if they get stuck they’ll ask me for help.”
That plan fell apart almost immediately, when I asked a junior engineer to write a design doc for a new feature. He did his best, but when he came back a few days later, it was clear he was flailing—he didn’t understand what level of abstraction to write at, had a hard time imagining the future the pros and cons of various decisions, and didn’t know how much weight to put on the ones he did identify.
Eventually we decided that I would write the design and he would implement it. After that, the project went much better.
In hindsight, it was silly of me to assume he’d ask me for enough help. He may not have realized that what he was experiencing was the feeling of being out of his depth—and even if he had, he might (reasonably!) have been reluctant to ask for more support from me, if he thought I’d expected him not to need it.
Instead of “don’t micromanage,” the advice I wish I’d gotten is:
Manage projects according to the owner’s level of task-relevant maturity.1
People with low task-relevant maturity appreciate some amount of micromanagement (if they’re self-aware and you’re nice about it).
One thing that really helped me calibrate on this was talking about it explicitly. When delegating a task: “Do you feel like you know how to do this?” “What kind of support would you like?” In one-on-ones: “How did the hand-off for that work go?” “Is there any extra support that would be useful here?”
(See also: Situational Leadership theory.)
Being a manager put me in the line of fire for a lot of emotionally draining situations—most often, for example, needing to give people tough feedback or let them go. At the beginning, I just tried to avoid thinking about these: if someone wasn’t performing well, I’d ignore it or convince myself they were doing a good enough job.
Fortunately, my manager was exceptional at “staring into the abyss” and convincing other people to do the same. He coached me through my first couple tough situations, and each time I realized afterwards that I felt relieved of a huge burden, and having the “abyss” resolved made me way happier. After I internalized that, I was much happier to spend time thinking about things that made me uncomfortable.
Staring into the abyss as a core life skill suggests some strategies for getting better at this:
Another abyss-staring strategy I’ve found useful is to talk to someone else. One reason that I sometimes procrastinate on staring into the abyss is that, when I try to think about the uncomfortable topic, I don’t do it in a productive way: instead, I’ll ruminate or think myself in circles. If I’m talking to someone else, they can help me break out of those patterns and make progress. They can also be an accountability buddy for actually spending time thinking about the thing.
… One solution to the timing problem is to check in about your abyss-staring on a schedule. For example, if you think it might be time for you to change jobs, rather than idly ruminating about it for weeks, block out a day or two to really seriously weigh the pros and cons and get advice, with the goal at the end of deciding either to leave, or to stay and stop thinking about quitting until you’ve gotten a bunch of new information.
“Deferred maintenance” means postponing repairs or upkeep for physical assets like buildings, equipment, infrastructure, etc. It’s often done by, e.g., underfunded transit agencies to make up for going over budget in other areas. But maintenance is needed for a reason—unmaintained infrastructure degrades more quickly, and is more expensive to fix in the long run.
As a new manager in a quickly growing team, I always felt like I was “over budget.” One-on-ones! Hiring! Onboarding! Code reviews! Design reviews! Incident response! Postmortems! There was always enough time-sensitive work for three of me. That meant that I’d “postpone” the managerial equivalent of maintenance over and over:
Eventually I realized that I needed to have slack by default. It’s okay if I sometimes defer maintenance during much-busier-than-usual periods, but only if I’m honest with myself about what “much busier than usual” actually means. If it’s not one of my 4-8 worst weeks of the year, I should be spending some time on long-term investments.
Of course, this requires me to manage my workload well enough that it’s default below my capacity. I could still improve at this, but I’ve found a trigger-action-plan for when I feel overwhelmed that usually does the job:
It was really helpful for me to realize that it was okay for me to change or discard priorities if I did it right—people are usually quite sympathetic as long as I warn them in advance (e.g. “sorry, I have to slip this deadline / give up on this due to [whatever more important thing]”), so that it doesn’t come as a surprise and they can change their plans or push back.
(See also: Slack.)
I care a lot about my coworkers’ opinions of me. About 95% of the time this is a force for good: it makes me less likely to do low-integrity things, go on power trips, etc. The other 5% is when I, e.g., say something to Dave the product manager that comes out wrong and spend the next six weeks stressed about whether Dave is secretly steaming at me.
I had a very illuminating conversation about this with Drew at one point:
Ben. I’m worried I pissed off Dave the product manager by saying something that came out wrong.
Drew. Have you asked him whether you pissed him off?
Ben. facepalming I should have known you were going to say that.
(Since then, I’ve been on the other side of this exact conversation with most new managers I’ve supported! So if you feel silly for not asking them yet, you’re in excellent company.)
If you’re worried that you made someone upset and you ask them about it, one of three things can happen:
You didn’t upset them, they tell you about it, and you can stop stressing.
You did upset them, but they’re understanding about it, and glad that you opened up a conversation. You can apologize and figure out how to do better next time, and they’re happy that the situation seems likely to improve.
You upset them so deeply that they respond by unleashing the incredibly vicious-yet-perceptive tirade that they’ve been stewing on since the incident, reducing you to tears. Congratulations on hiring someone in the bottom ~2% of professionalism? At least your conscience can be clean at this point I guess.
This also applies to most other things you might be worried about. Is my team’s strategy good? Does this recurring meeting add value? Is this new hire spinning up fast enough? Just ask people!
If you’re worried that they won’t be honest if you ask them directly—maybe because you barely know each other or there’s a large power imbalance—you can ask for a backchannel from your manager or theirs. Similarly, having your own manager do skip-level 1:1s with your reports can give you more perspective and confidence that your team is happy with you.
There are a few core reasons that being a new manager is hard:
It requires an almost completely different set of skills than the ones you’ve been building so far.
The scope of what you’re responsible for (the health of an entire team) is much broader. You can’t just focus on, say, writing good code—you need to worry about prioritization and planning and hiring and coaching and running meetings and…
Similarly, the set of actions you can take is much broader, so it’s harder to figure out what to focus on.
You’re less likely to get great support and mentorship—most companies are much better at supporting new ICs than new managers.
Because of that, you should expect to make a bunch of mistakes while you’re starting out. But it’s still useful to know a basic set of pitfalls to avoid, so that you can spend your quota on new, exciting types of mistakes instead :)
2023-02-01 08:00:00
Last Friday was my last day as CTO at Wave, capping an incredible ~8 years filled with more professional and personal growth, joy, and meaning than I could have hoped for.
This was the hardest decision I’ve made. I’m still just as excited about Wave’s mission and trajectory. And most of the important work is still ahead of us: as of yet there’s only one of our potential markets where we move over 70% of GDP.1 Most wrenchingly, I’ll be saying goodbye to a huge number of dear friends, inspiring colleagues, and incredible mentors.
It’s been the opportunity of a lifetime—helping grow Wave’s engineering team has taught me a tremendous amount, and by helping build something that’s important to millions of underserved people, I’ve been lucky to have more positive impact than I ever expected to be able to. It’s hard to describe how much Wave has meant to me, though I hope to capture more in a future post.
But over the last few months, I’ve had two key shifts that made me think about working on something else.
First, it finally feels like Wave is at a point where I’m not critical for engineering work to go smoothly. We’ve hired and grown lots of other amazing people who can take over every part of my job—building product, making it scale, hiring, and building teams and organizations—without needing much input.
Second, recent progress in AI has made me pretty worried about whether that transition is on track to go well for society. For example, large language models are becoming human-level at new tasks faster than anyone expected, but we don’t know how to provide almost any guarantees about their behavior; our best known safety measures, like those on ChatGPT, are easy to bypass. For a model with a ChatGPT-ish level of capability, this is concerning but not catastrophic. But if you extrapolate the scaling laws out a surprisingly short time, things start to look scary.2
Because of this, while there’s never a good time to change direction, now felt like the least-worst time to take a leap.
When I looked at what organizations were working on AI alignment, and filtered down to the ones where it looked like the engineering leadership skills I’ve built at Wave could be most useful, the clear standout ended up being Anthropic. Anthropic builds large language models, similar to the one behind ChatGPT,3 and does safety research on them—for example, training them to be more helpful, less harmful, and more honest.
Sidebar to avoid contributing to any potential information cascade: I want to emphasize that this doesn’t mean I personally strongly endorse everything Anthropic does, and that much of my excitement about Anthropic at this point comes from the vetting of other people I trust, not from my own first-principles reasoning.
In particular, for example, Anthropic probably has at least some effect of accelerating the “race through the minefield” to develop transformative AI. They try to be thoughtful about how to balance this trade-off (e.g. holding back the release of their chat model), and most of the people I got advice from thought their existence was worth the trade, but some were quite uncertain.
I intend to get to the point where I can have my own first-principles takes on what I do/don’t endorse, but there are a lot of moving parts here and I’m not very familiar with the space, so I expect it to take a while for me to flesh out my understanding.
Aside from their mission, the thing that makes me most excited about Anthropic is the quality of the team: I originally heard of them because they employed more people I thought were super cool than any other company.4 That high bar has let them do an impressive amount for their size in a short time (~2 years)!
While Anthropic’s team is relatively small right now, it’s growing quickly, and I’m excited to try to help navigate that rapid growth using what I’ve learned at Wave. I don’t know exactly what I’ll end up doing, but my plan is to show up, start working on whatever seems most useful, and iterate from there, the same way I did last time.
I’ll also be moving to the Bay Area (likely Berkeley) later this year to work onsite, so will be in the market for new friends—say hi if you’re in the area and interested in meeting up!