MoreRSS

site iconStay SaaSyModify

We started this blog to share what we’ve learned on how to scale product and engineering through all stages of startup hypergrowth.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Stay SaaSy

Leading From The Front

2025-04-20 14:59:22

Leading from the front means leading where the action is happening.

Here we’ll talk about three critical aspects of leading from the front:

  • Why you should do it
  • Finding the front
  • Leading when you’re there

Why you should do it

Leading from the front is important for two main reasons:

  • It makes you a more effective leader
  • It motivates your team

Learning from the front

First, leading from the front makes you a more effective leader because you learn from the front.

Being on the front lines - in the incidents, in the customer calls, in the meltdowns - gets you critical information with first-hand knowledge. There’s a major difference between “an engineer made a Jira ticket that I saw that said this was an important incident followup” and “I saw this happen and if we don’t do this followup we deserve to fail. It’s not a question - we’re doing it.”

But it not only gives you the information, it also gives you the motivation. Being woken up at 3am, or having a $1M customer scream at you, catalyzes action like few other things can. Leading from the front gives you the literal incentive and neurological shock to overcome the forces of inertia, politeness, and busy-ness that normally hinder action.

Motivating from the front

Second, effort is contagious.

Most people will try really hard as long as they know there’s at least one person who will storm the hill, who will go into oncoming fire and spit bullets back at the enemy.

The problem with large organizations is that those people often get promoted and then stuck in meetings all day. They’re no longer on the frontlines. They might swoop in to help, but that’s not the same as being in the foxhole.

This is why leading from the front is so important. You literally just need one single leader with general stripes who is in the foxhole with the team saying I ain’t going nowhere. I have nothing more important than fighting with you.

That’s the difference between a motivated org and a disengaged org. Somebody who has authority and power is in the stuff with you.

You might say - but everyone is incentivized to try hard, why do we need this? Incentives are necessary but only go so far. In fact, they don’t cover two critical cases.

The first case is collective action. I might be incentivized to try hard, but if my whole team is phoning it in, I don’t believe my efforts are worth it. But if a leader comes in and gets these bozos to rise to the occasion, I’m ready to charge the hill. Nobody wants to be the hardest worker on the last place team.

The second is that you can only unlock the higher order gears like professional pride, duty, and honor with the moral authority of putting yourself on the line. Yeah, I’ll try hard enough if I’m incentivized. But if there’s a leader out here who looks just as intense at 5 hours into this incident as they were 5 minutes in, with no inkling of quit in them, if there’s someone who is just hardwired to never give up, I can sprint.

This is why coaches matter.

Finding The Front

To be that contagion, to lead from the front, the most important thing is that your leadership is predictable and consistent, and that you know where the front is.

Most leaders think that the most important thing is that they are reachable if things get bad enough. While this is a worthy goal, it regularly and quickly falls apart.

A leader says “if things get bad enough, call me, page me, I’ll come running”. And when the leader says that, they mean it. But then that leader:

  • Goes to a dinner with new investors that absolutely must go well
  • Has an all week onsite with a partner company where they absolutely must finish the integration while they’re in San Jose
  • Goes to a conference to speak and will be out of pocket for at least 8 hours

Quickly you find that even though that leader genuinely wants to help when things get bad, they’re simply too busy.

The right away to lead from the front is to be predictable and consistent. There’s a lot of ways to do this, but this can look like:

  • I have an operational standup every other day. I never miss it twice in a row. Ever.
  • I will not do big blocking events more than one or two weeks per quarter.

This is what committing to leading from the front looks like. It means a commitment to being present.

Note that not every role can be on the front with these time requirements. That’s ok. It’s important to be clear if you can actually commit to leading from the front. It doesn’t make you less good at your job to admit this.

Some leaders might also say - hey traveling and being out of pocket is the front lines, isn’t it? Well yes, it is, but it’s often just a different front line than the one your team is on. Some generals need to go to Washington to secure funding. They’re just as important - if not more - than the frontline generals. The problem arises if the general in Washington believes they’re both (and doesn’t find someone to man the front).

Beyond just being available, you must also know where the front is. In some leadership positions, without intentionality you can be boxed out of any of the real action (which happens in teams you’re not part of). To solve for this, you must ensure you have checkpoints to coordinate with your teams’ goings-on.

Leading At the Front

Ok, you’ve decided to lead from the front, you’ve actually set yourself up to be able to do it and be there. Here’s how to lead when you’re there.

First, you have to know what you’re doing. Most leaders avoid leading from the front by never investing enough to actually know what to do when they’re there. To lead from the front you must be ready when the battle starts with skills that matter. These include:

  • Coordinating resources to drive outcomes. Most proverbial battles involve multiple-stakeholders and a specific outcome - a sale, an incident, a decision. Organizing stakeholders to decide and deliver outcomes is the core of almost every activity on the front.
  • Basic debugging skills. Whether it’s technical debugging, or product behavior, or the intricacies of pricing and packaging - knowing enough to drive an outcome is essential.

This is why it’s so important for engineering managers to go on-call - to learn the basics of how to debug in a system. It’s why it’s important for, for example, design leaders to use and test the products under their remit.

To learn to coordinate resources you must simply do it. Shirking away from hard product decisions, or arbitrating disagreements, or late night incidents will only entrench your lack of ability. Confidence in how to wrangle people, how to organize thought and action, and how to drive next steps is only learned from experience.

Second, your demeanor must be poised and determined and urgent. If you enter an incident hysterical or say “how the hell did this happen”, you are a distraction. Go fly a kite. Instead, your presence must imbue the team with a sense of “it does not matter how we got here. We will fix this and I will be here until we do.”

You must also, critically, have the energy. One of the most common things to fail at in frontline leadership is giving up too early. We’ve already been at this for 8 hours, do we really need to dot this i and cross this t. In incidents small and large, if you leave an ember burning you will be woken up by another fire. As a leader, you likely won’t be in the deep deep weeds. Your job is to make sure that people don’t give up before the fire is out. Your job is to be measured and calm, while being urgent and intense. You will not quit and you will not yell. Inhabiting this duality is one of the most powerful skills of leadership.

Finally, you must follow through. You must make sure the things that caused this fire drill will never happen again. Lots of people can help out when things fail, but to have the grit to follow up on the root causes and drive the change to prevent recurrence in a super-power.

If you can put all of these together, you will be a great leader.

Summary

Most people will not lead from the front. They won’t learn what’s needed to be useful when they get there. They won’t block the time to be able to show up. Or when they get there they’ll be a distraction. Or after being woken up twice at 3am in a month they’ll say “this really just isn’t for me.”

But those that do will earn the respect of their team. They’ll motivate their team to be better than the sum of their parts. And they’ll deliver outcomes that are outsized to their resourcing.

In summary:

  • Leading from the front is important because it gives you the information and motivation to impact change. It also is incredibly motivating for your team
  • To be on the front you must be available. Blocking time to deal with operational issues and being available for issues is key.
  • To lead once you’re on the front lines, you must be capable, poised, and able to manage follow-ups with intense focus.

The Anti-patterns

For those who learn better from anti-patterns, reminder that great ways to not lead from the front are:

  • Being on the road and unavailable all the time
  • Being hysterical and blameful
  • Being forgetful and unable to track follow-ups
  • Being afraid to work hard off hours

The Precise Language Of Good Management

2025-04-06 14:59:22

As a manager, your words are your bond

In the squishy realm of managing humans, the specific things you say have specific outcomes.

Unfortunately, most managers are very bad at speaking precisely. Speaking precisely, especially about long-term, uncertain things, is not something many people do by nature.

To compound issues, there are a variety of factors that drive managers to speak non-specifically and say the wrong thing. These include things like:

  • Nervousness or laziness
  • Lack of experience
  • Not wanting to disappoint people
  • Not wanting to seem like you don’t have the answer

Let’s explore some common examples of imprecise language and how to fix them.


Common Examples

How Am I Doing?

The most common example of imprecise language is when someone asks you in a 1:1 “how am I doing?” Very few managers are ready to answer this question well on the spot. But managers answer the question anyway and often say things like:

“Oh you’re doing well, communication could improve a bit but overall you’re doing well.”

Well, what often happens is that the performance round happens and the person gets below expectations.

“I thought I was doing well?”

Yes, you did say that. You didn’t say that the communication piece is actually a major factor in their performance that is causing regular friction in the team.

The right answer in this situation is almost always to say that you need some time to gather your thoughts and will come to the next 1:1 with a more considered answer. By default this should be your answer any time anyone asks you how they’re doing, because unless you hold a ton of information in your head at all times and have regularly reviewed it for an accurate synthesis of their performance, there’s no way you can answer that question specifically, or well.


Performance Assessments

I mentioned this topic in our article on writing performance reviews. It’s extremely common for managers to use vague and overly flowery language in performance reviews.

Imagine you’re a junior engineer and you get a review that says:

“Angela is a great engineer and always up for a challenge.”

Problems:

  • If Angela is a junior engineer and is already “great”, what does growth look like? Have they already peaked?
  • Feedback is to either tell someone what to change or what to keep doing. “Great engineer” does none of that.
  • People change. You’re describing them as having invariant properties vs. describing things they’ve done. This makes it harder for people to process feedback that’s counter to these claims in the future.

So, be very specific in what you’re calling out, and give feedback around actions, not attributes. Good examples:

  • “Angela showed impressive persistence in delivering her feature through several requirements changes”
  • “Angela is a detail-oriented code reviewer, much to the benefit of our team”

Can You Do This?

Another common example of unspecific language happens when your boss asks you if your team can do something. There’s two common, bad answers that people habitually use:

  • Yes, we can (The refrain of yes men everywhere)
  • No we can’t, we have too much to do, we’d have to make tradeoffs that would collapse the company (The refrain of people who are afraid of their team)

The problem is that neither of these answers are specific or true. The right answer here is to gather more information and circle back. But if there are meaningful tradeoffs, the right answer is always:

“We probably can but let me draw out the implications of taking on this work so we can align on priorities.”


Summary

The point here is that specific language really matters. “Good” is not the same as “persistent”. “Soon” is not the same thing as “6 months.”

One of the reasons why writing things down all the time is so useful is that it forces people into more measured, specific language. If you’re a manager, using measured and specific language needs to be a skill you learn ASAP.


Appendix: More Examples

Here’s a bunch of examples where people often use vague language when they shouldn’t:

Promotions

  • Don’t: “I think a promotion soon is looking good.”
  • Do: “We have two promotion cycles in the next 12 months. I think the one in 3 months is less likely and the one in 9 months is about 80% there. Let’s discuss why I’m thinking about those probabilities and we can make sure you feel that’s fair.”

PIPs

  • Don’t: “Hey these last two weeks have been great.”
  • Do: “These last two weeks have been at the level of output we determined was necessary for you to succeed in this PIP. You’ll have to keep this up for the remainder to pass.”

Hiring

  • Don’t: “Yeah you can basically choose your team once you’re in.”
  • Do: “We have three teams that have open roles right now. One of them you aren’t a fit for based on your background, so realistically that leaves two and of those two the X team seems like a better fit, so if you wouldn’t want to join that team, let’s talk”

Goals

  • Don’t: “That’ll get done next quarter.”
  • Do: “Our target date for that is March 1st.”

Performance assessments

  • Don’t: “always”, “great”, “unbelievable”
  • Do: “consistently”, “outstanding”, “leader amongst their cohort.”

Growth paths

  • Don’t: “You should focus on more leadership opportunities this half.”
  • Do: “Your goal this half should be to lead project Y. Starting tomorrow I will name you the project lead and we’ll have checkins monthly.”

Upward feedback

  • Don’t: “This new bonus structure is stupid and punitive. Everyone hates it.”
  • Do: “4/9 of my team have expressed severe concerns about this new bonus structure and I personally agree. I believe the schedule of money is non-standard and creates a feeling that our comp is always in flux.”

First Principles Problems, Secondhand Solutions

2025-04-02 14:59:22

If you spend a lot of time in tech, you’ll inevitably hear people extolling the virtues of being a First Principles Thinker – that is, someone who analyzes situations in terms of foundational axioms and then uses their impeccable reasoning to determine a bold and original course of action.

But if you’ve spent significant time operating a business, it’s obvious that the solutions to your problems are rarely unique. In business the optimal move is often just to reason by analogy quickly and decisively.

So I’m going to propose a somewhat different way to break down the debate on whether reasoning from first principles or reasoning by analogy is better. In most situations that I’ve seen, you’ll get better results if you:

  • Reason from first principles to establish what your problems actually are.
  • Reason from analogy to figure out what solutions you should actually deploy.

A First Principles Example: Retention

Let’s take a conceptually simple problem – why is retention low? This seemingly straightforward problem could have many possible root causes:

  • Is the product weak? This could be due to poor product management, design, or engineering. Weakness in any of the three, or a combination, could lead to similarly bad metrics.
  • Is something about our support process failing? Is the support team’s leadership bad? (A weak product and a weak support team can manifest in similar symptoms, such as poor response times and broken integrations)
  • Is the customer success team failing? Are they under-resourced, incorrectly incentivized, or just not operating well?
  • Have we oversold customers because we were forced to stretch out of our Ideal Customer Profile (ICP)? Did sales oversell, or sell to bad-fit customers?
  • If sales oversold some customers, was it an execution issue on their side, or were they effectively forced to oversell because they didn’t have enough pipeline? (And keep in mind that a marketing problem might go all the way back to the top – it’s often a product issue!)

With this many possibilities, you can’t just draw from your past experiences to identify the core problem. Unfortunately, it’s all too common for experienced leaders to jump to the conclusion that whatever was most broken at their last team or company is broken again. Poor retention? Seen it before, fire the Head of Support. Seen it before, the product is broken – fire the Head of Product. Seen it before, you need to replatform to remove tech debt. Seen it before, you need to hire more customer success.

This incurious approach consistently leads to poor outcomes. In about 75% of cases that I’ve seen, identifying root cause problems by analogy ends with blaming the nearest leader who’s a weak communicator, somewhat abrasive, overly passive, or all 3. This bystander is usually a problem, but he’s often not the problem.

Using first principles to identify problems is like solving a math proof: Start from your ground truths, use them to generate hypotheses, and check whether those hypotheses would logically lead to the visible symptoms. A good litmus test is whether you can articulate your root problem using the word “therefore.” For example, retention is low because:

  • We can see that adoption of our features is low, especially in customers who churn. We also know that our engineering team has high velocity and support response times / CSAT have been improving – they’re likely not the problem. Therefore, we’re probably building the wrong things which is leading to failing product-market fit.
  • We can see that our Security team buyers are churning at disproportionately high rates, but our Engineering buyers seem pretty happy. Our growth rate is only barely acceptable as-is. Therefore, we’re likely overselling a Security offering that is not actually competitive in the market, causing customers to churn.

Closely observing the problem also helps. In the retention example, if you actually talk to 5 customers who are churning your learnings should align with your first principles analysis.

There Are Many Solutions, But Not Infinite Solutions

But once you correctly identify your problems, there’s a relatively short list of common solution archetypes in startups and business writ large. Your problem might be unique but the optimal solution is probably not.

As a result, you should pull solutions from a library of similar scenarios wherever possible – this not only improves the odds that you’ll get to a good outcome, but also critically lets you move much faster. The faster you implement fixes the less accurate you need to be, because you can take more shots on goal.

A list of some of the most common solutions to have in your repertoire:

  • You have a bad leader at some level of the organization – change them out.
  • You are not following best practices in an important function and must re-establish high-quality execution, typically via a time-consuming but conceptually simple back-to-basics approach. This is most common on teams like sales, engineering, and support that have a large number of people doing the same fundamental job.
  • You are in the wrong market, and should be shifting the markets where you focus in a dramatic way, e.g. by going up-market or refocusing your geographic footprint.
  • You are misallocating financial resources and need to rebaseline. For example, marketing is underfunded vs. sales. Engineering is underfunded vs. customer support.
  • You are doing too much and must re-establish focus by cutting projects.
  • Your pricing and packaging are unworkable in some regard and need to be changed.
  • You must fix broken incentives – for example, compensating sales based upon their deals renewing, not just deals closed.
  • (In startups) You do not have product-market fit and need to reset the company.

Using first principles to determine solutions can have equally bad outcomes.

The first pattern is a tendency to over-complicate simple situations. A classic example is a startup that crafts an entirely unique pricing structure, because they have some logical first principles argument that their custom pricing scheme makes more sense than all known industry standards. Enterprise buyers show up, get confused, and run in horror until the startup picks a standard, boring pricing scheme.

Another common failure mode is moving too slowly because teams don’t realize that they’re in a known scenario. For example, maybe our startup is losing competitiveness because we’re building too slowly. The answer is not to get advisors or run a huge analysis on your unique velocity challenges; the answer is almost certainly that you need more senior engineers, better PMs and designers, a culture that pushes everyone to ship with urgency, and basic processes to improve quality of products produced.

You Need Both

It’s worth mentioning that in many circles first principles thinking is viewed as some sort of inherently superior activity – first principles thinking is treated like some kind of lightsaber that only the most worthy can wield to invent new secrets of the universe.

Maybe that’s true in theoretical physics, but in 99.5% of situations it’s just a tool. Few B2B SaaS startups that I’ve seen are busy discovering new laws of the universe. To be effective you need to have the skillset and willingness to both think from first principles and reason by analogy as the situation demands.

This is a really powerful combination, and some of the very strongest leaders that I’ve worked with are older, more experienced people who have a strong ability to reason from first principles about their problems and then draw from a huge array of potential solutions. Because really, the key is to have a bit of humility: To admit that maybe your experience doesn’t mean that you know best, or that maybe the sheer force of your intellect isn’t the only solution.

Managing People You Can’t Fire

2025-03-26 14:59:22

One of the worst situations in management is needing to fire someone and getting blocked. This happens somewhat regularly and is one of the most trust-breaking experiences between a manager and their boss. Let’s talk about why it happens and how to right-size the situation

The Ingredients

Managers end up needing to fire someone for many reasons, the most common being:

  • Behavioral issues
  • Significant underperformance

Almost all managers loathe the idea of firing someone, so when they advocate for it, it’s almost certainly needed. Think about that again - the point is important. Much, much more often managers are late to fire someone. Because firing is such a traumatic experience on both sides, managers do not come to the conclusion lightly.

So if you’re a manager’s manager and they come to you asking for a termination, the chances that it’s the right thing to do are very high.

Despite this reality, many managers of managers often block the termination, or say not now. The reasons they do this vary, but include:

  • They have a personal relationship with the person being fired and intervene.
  • They think the manager is being too harsh. Note - this happens, but again, it’s really not common.
  • They don’t trust their hire to get it done without that person.
  • They’re being risk averse.

From the perspective of the manager in this situation, this was the formative moment where your boss could show that they trust you. You regularly take on work for them, make them look good, accept their critical feedback. All of that is worth it for their respect and trust (and the money). And at this moment, the time where they could really prove they trust you - they do the opposite. They do something that shows that you are in fact just on training wheels.

Then it’s not like HR can write down “manager was probably right, overridden by their boss.” So in cases of behavioral issues, including with the manager, there is often some documented feedback that the manager was wrong, and themselves needs to improve.

Finally, the manager then has to keep on living with that underperformer and dealing with all of the chaos they cause.

The result of this triple whammy is often a deep-seated frustration on the part of the manager and often leads to them quitting. And it’s an issue that doesn’t go away because if that person needs to be fired in the future, the manager’s boss is now invested in that not happening to not be proven wrong.

A Better Way

There should only two versions of firing decisions.

The first is learning mode. The boss is clear to their new manager that for the first 6 months they’ll be the final decision on any performance or firing decisions, because the new manager needs to understand the culture and calibrate to organizational standards. Set the expectation clearly, and it’s fine.

The only other mode is trust. Again, extremely few managers are not going to offer up a termination proposal without cause. If you’re sure it’s a mistake - intervene. If you’re unsure - the tie goes to the manager. They’re living the reality, they deserve that trust.

Summary

In summary:

  • If a manager is proposing a termination, it’s almost always needed
  • That moment in time is an opportunity for their boss to show if they trust them or not
  • So, they deserve to be trusted

Note: there are grizzled VP/C-level psychopaths who fire people like eating Tic Tacs, but this post is not about them. The managers in question are managing ICs, probably have been managing for under 7 years…etc.

Tips For Better Interactions

2025-03-17 14:59:22

The following are an assortment of tips for having better interactions and better meetings.

Don’t Be Frustrated

Don’t ever agree to the premise that you’re “frustrated” or “upset” at work.

People will often consciously or subconsciously say things like “I know you’re frustrated, but…”

If you’re in a meeting and this happens, cut them off. Say “to be clear I’m not frustrated I’m…”

  • Just escalating an issue
  • Just getting more information,
  • Just making sure that X is accounted for

… or whatever

Once you allow the premise that you’re frustrated, everything that happens after is someone dealing with a frustrated person, which is never to your benefit.

Another benefit of this technique is that if you are frustrated, it acts as a mantra to remind yourself to stop acting that way ASAP.

Taking Notes

One of the simplest ways to be a better leader is to be the note taker in meetings.

Taking notes:

  • Is important. You codify the discussion.
  • Is humble. It shows everyone an act of service.
  • Helps you shape the conversation. If you can’t write the logic, the logic isn’t happening.
  • Gives people something to look at (present the notes).
  • Gives you a chance to slow the meeting if things get emotional, and lets you create natural pauses to get things back on track.
  • Shows people that their opinion is both heard but also is being recorded, by way of visibly creating a record. This acknowledges contributions and dissuades people from saying wildly stupid stuff (e.g. not wanting to be codified as the person that won’t let the new company-wide process happen because their it’ll harsh their vibe when they work from their Tahoe house).

Of all the ways I’ve tried to get people to run meetings better, having them take notes while facilitating is the only one that works reliably and well. I’ve seen people go from rambling rabbit hole-ing messes to focused meeting facilitators with just this one change.

Avoid Boundary Objections

So much of bad meeting feedback is “if you take this idea too far we’ll have problems.” This is one of the most useless pieces of feedback.

If someone says “we should let people choose lunch once a week,” and you respond with “if we let people choose everything around here the inmates will run the asylum!” - you’re being actively counter-productive.

Boundary objections are almost never realistic and are often disrespectful - they imply that the presenter is dumb, foolish, or would act with low integrity. No Alan, I’m not proposing we let employees choose executive comp just because I said once a week they should be able to choose pizza or burgers.

Boundary thinking has a place, namely in thinking about the behavior of engineered systems. Beyond that, it’s almost always a negative distraction.

Let People Be Wrong

Another way people often waste time in meetings is chasing the accuracy of details that don’t matter. The meeting is about whether we should code freeze on the 10th or the 11th, and your department head is debating a detail on slide 5 about how many customers we have in Italy. It doesn’t matter.

People do this all the time, because:

  • Some people see it as their job to correct every mistake they see. ATTN: it’s not.
  • Some people get itchy if they think something is wrong. Instead of living with the tension of an unimportant thing they don’t agree with, they need to resolve it.
  • Some people are well intentioned but don’t realize they’re wasting time.
  • Some people believe that not calling out something they think is false is implicit agreement with it.

Let unimportant things go even if you think they might be wrong. If you really need to do something about them, you can:

  • Say “I don’t agree with all the details here, but it doesn’t matter, I think __
  • Send a follow up to a meeting or discussion with a note on the thing you think is false

Time Your Interactions

Important meetings shouldn’t be on Monday or Friday, and should never be the last meeting of the day, and shouldn’t be during lunch hour, and shouldn’t be changed by calendaring software, and nobody should be off camera (if remote).

Narrowly, avoid all of these in particular for 1:1s. Dynamic calendaring software that often moves 1:1s around to “optimize” your calendar does more harm than good. There’s value in having meetings at the same time every week. It allows for a rhythm of preparation and a consistency in environment (e.g. morning vs afternoon).

Prioritize Your Meeting Agenda and Don’t Force Yourself to Get To Everything

“We have to move on to get to the rest of the agenda” is a sign of a bad meeting.

Your meeting should have an agenda with topics in priority order. You move on from things when you’re done with them.

Bad meetings focus on moving on to the next item even when the current item is unresolved.

This is dumb.

Stop having meetings with 5 topics that get cut off and produce no value. Discuss 2 points well and find time for the rest or punt them.

Companies that can’t run good meetings almost always can’t prioritize well. An overbooked, under-prioritized meeting is a symptom of an overbooked and under-prioritized company.

Relatedly, don’t try to rush the end of a meeting to finish something. It’s always better to find a separate time. One of the most common unforced errors is rushing things. A great way to be wrong is to be right in a rush.

Blameful Post-Mortems

2025-03-12 14:59:22

Over the last decade, the concept of a “blameless” post-mortem has become a software industry standard. According to ChatGPT, blameless was introduced to the software world by a 2012 blog post:

Having a “blameless” Post-Mortem process means that engineers whose actions have contributed to an accident can give a detailed account of [what happened[...and that they can give this detailed account without fear of punishment or retribution.

Why shouldn’t they be punished or reprimanded? Because an engineer who thinks they’re going to be reprimanded are disincentivized to give the details necessary to get an understanding of the mechanism, pathology, and operation of the failure. This lack of understanding of how the accident occurred all but guarantees that it will repeat. If not with the original engineer, another one in the future.

The intent of this philosophy is good - let’s avoid fear of unfair punishment so we can learn from incidents.

The problem is that there is a metric ton of nuance in these statements. Instead of finding a middle ground of accountability and empathy, many companies ran with these principles into a no-man’s land of non-accountability.

Let’s talk about why. But first, some background.

The History Of Blameless

The idea of blameless post-mortems came from aviation and healthcare. I’m not an expert, but I think that the situations that prompted these industries to conduct blameless post-mortems had the following qualities:

  • Severity: Something really really bad happened. A plane crashed. A person died in the operating room.
  • Frequency: Incidents are generally infrequent. Most people are rarely involved in a post-mortem.
  • Punishment: The punishment for negligence can be extraordinarily high, e.g. murder charges. You need to assuage people that they aren’t at risk of criminal punishment for well-intentioned mistakes.
  • Recurrence: Given the severity of these incidents, you need to be absolutely sure you are doing every possible thing to prevent recurrence.
  • Follow-through: Following an incident, official governing bodies change official rules or laws.

This all makes sense. A plane crashes once in the Toledo airport in 50 years, we gotta make sure the person who screwed in the lug nuts isn’t afraid to tell us that their wrench did seem a little loose that day.

Blameless in Software

Post-mortems in many software companies happen regularly. Many companies do weekly incident reviews. The properties of software post-mortems look like:

  • Severity: Comparatively, most software incidents aren’t that bad. Your average incident might be something like an API going down for 5 minutes. You gotta fix it, but they’re not hauling bodies out of the bay.
  • Frequency: Frequent. Often weekly.
  • Punishment: Listen, if you cause a 5 minute API outage and we’re holding you accountable, you’re probably not even getting rated Below Expectations at most companies. Do it twice and you probably are. Do it three times and you have to go find another job that might pay you 10% more. You’re not going to jail.
  • Recurrence: We do want to avoid recurrences of software incidents. It’s very important.
  • Follow-through: Follow-ups are typically owned by teams, often with medium-level priority, sometimes forgotten.

Software vs Aviation/Health

In summary, software post-mortems are much lower severity and much more frequent than aviation or healthcare post-mortems. In fact, they’re so common that they’re a critical part of regular accountability and learning in software organizations.

As a result, software culture becoming too blameless is just as bad as being too blameful:

  • Individuals and teams miss the opportunity to learn. Without actually saying whose fault something is, people can end up living in a world where they never hear the thing that they need to hear most - this one was on you and you must change what you’re doing.
  • Because follow-ups are not as fanatically and centrally implemented, the key driver of quality is accountability, not new rules. When you obscure accountability in software incidents, you create sustained mediocrity.

Said another way, the severity of an average software incident is not so bad that it’s worth the non-accountability of a blameless approach. As a result, it’s critical that software post-mortems tweak these practices to more effectively serve their needs.

Blameful Postmortems - A Goldilocks Solution

There must be a middle ground where we can achieve all of our goals. Blameful Postmortems should have the following properties:

It’s Somebody’s Fault

An absolutely critical part of every post-mortem is that it’s somebody’s fault. Every issue is either:

  • Someone’s job and thus their fault when it fails
  • Nobody’s job and the fault of the leader who has an ambiguous organizational design

Note: it must be primarily one person or one team’s fault. If it’s multiple teams that are allegedly at fault, it’s the same as no teams at fault. Driving to the core of who should have prevented an incident is often one of the most fruitful exercises in refining and clarifying responsibilities.

There are no exceptions. Common examples are:

  • It’s a third party provider, how can that be my fault? You picked the third party provider, you need to own their outcomes.
  • I inherited this thing and never worked on it, how can that be my fault? You might not be in a ton of trouble for this failure, but it’s still your responsibility.

Note: the word fault here is knowingly a bit harsh sounding, but it’s used intentionally. Other words start you on the slippery slope to blameless avoidance. Every incident should have had someone whose job it was to own the prevention or risk mitigation.

Accountability Is Fair

A key failure with most blameless cultures is that people believe it means you don’t have consequences when things break. That’s a non-accountable culture. That’s nonsense. A good post-mortem and engineering culture promises that people will be held accountable in a fair and balanced way.

For example, if there was really no way that you could have been expected to prevent an incident, it might still be your fault, but you might have 0 repercussions. We might walk away agreeing that next time is the one you’re expected to prevent.

On the flip side, if you really messed up, you might get fired. If we said we’re in a code freeze and you YOLOed a release to try to push out a project to game the performance assessment round and you took out prod for 2 days, you will be blamed and you will be fired.

Most repercussions are a middle ground. Good culture doesn’t mean people face all-or-nothing repercussions - it means they face the appropriate accountability.

The Right Incentives

As a quick aside - I absolutely hate how most people treat incentives. So many leaders act like incentives are the only thing you can expect people to follow.

Hey, my top sales guy sold a $1M deal but did it at -90% margin, but there was no rule against it, so what do you expect?

Hey, you created this process of always holding people accountable to incidents, don’t be surprised when people hide stuff, right?

Horse apples. Nonsense.

There is one main incentive that all employees have - act with high integrity or get fired. Stop excusing people for bad behavior because your little point systems don’t cover every case of common sense.

So, as it applies to post-mortems - be clear with your team:

  • Everything is someone’s fault
  • Something being your fault doesn’t mean it’s a huge deal. Most things are not.
  • If you game the system, hide information, or otherwise prioritize your own rewards over the health of our company, you will be subject to disciplinary action up to and including termination

Blameful Postmortems - Final Thoughts

Software isn’t aviation or healthcare, so let’s stop acting like it. Post-mortems are good. Focus on non-recurrence of incidents is good. Let’s keep doing those.

But not having accountability for failures is bad. Let’s stop doing that.

Finally, leaders make or break processes like this. The worst leaders are overly blameful and punish people unfairly, often this is to cover their own tracks. As we make it back to a more healthy culture of appropriate blame, let’s make sure that leaders are held accountable as well.