MoreRSS

site iconDan Luu

A blog about programming and the programming industry. Vancouver, BC
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Dan Luu

Steve Ballmer was an underrated CEO

2024-10-28 08:00:00

There's a common narrative that Microsoft was moribund under Steve Ballmer and then later saved by the miraculous leadership of Satya Nadella. This is the dominant narrative in every online discussion about the topic I've seen and it's a commonly expressed belief "in real life" as well. While I don't have anything negative to say about Nadella's leadership in this post, this narrative underrates Ballmer's role in Microsoft's success. Not only did Microsoft's financials, revenue and profit, look great under Ballmer, Microsoft under Ballmer made deep, long-term bets that set up Microsoft for success in the decades after his reign. At the time, the bets were widely panned, indicating that they weren't necessarily obvious, but we can see in retrospect that the company made very strong bets despite the criticism at the time.

In addition to overseeing deep investments in areas that people would later credit Nadella for, Ballmer set Nadella up for success by clearing out political barriers for any successor. Much like Gary Bernhardt's talk, which was panned because he made the problem statement and solution so obvious that people didn't realize they'd learned something non-trivial, Ballmer set up Microsoft for future success so effectively that it's easy to criticize him for being a bum because his successor is so successful.

Criticisms of Ballmer

For people who weren't around before the turn of the century, in the 90s, Microsoft used to be considered the biggest, baddest, company in town. But it wasn't long before people's opinions on Microsoft changed — by 2007, many people thought of Microsoft as the next IBM and Paul Graham wrote Microsoft is Dead, in which he noted that Microsoft being considered effective was ancient history:

A few days ago I suddenly realized Microsoft was dead. I was talking to a young startup founder about how Google was different from Yahoo. I said that Yahoo had been warped from the start by their fear of Microsoft. That was why they'd positioned themselves as a "media company" instead of a technology company. Then I looked at his face and realized he didn't understand. It was as if I'd told him how much girls liked Barry Manilow in the mid 80s. Barry who?

Microsoft? He didn't say anything, but I could tell he didn't quite believe anyone would be frightened of them.

These kinds of comments often came with comments that Microsoft's revenue was destined to fall, such as these comments by Graham:

Actors and musicians occasionally make comebacks, but technology companies almost never do. Technology companies are projectiles. And because of that you can call them dead long before any problems show up on the balance sheet. Relevance may lead revenues by five or even ten years.

Graham names Google and the web as primary causes of Microsoft's death, which we'll discuss later. Although Graham doesn't name Ballmer or note his influence in Microsoft is Dead, Ballmer has been a favorite punching bag of techies for decades. Ballmer came up on the business side of things and later became EVP of Sales and Support; techies love belittling non-technical folks in tech1. A common criticism, then and now, is that Ballmer didn't understand tech and was a poor leader because all he knew was sales and the bottom line and all he can do is copy what other people have done. Just for example, if you look at online comments on tech forums (minimsft, HN, slashdot, etc.) when Ballmer pushed Sinofsky out in 2012, Ballmer's leadership is nearly universally panned2. Here's a fairly typical comment from someone claiming to be an anonymous Microsoft insider:

Dump Ballmer. Fire 40% of the workforce starting with the loser online services (they are never going to get any better). Reinvest the billions in start-up opportunities within the puget sound that can be accretive to MSFT and acquisition targets ... Reset Windows - Desktop and Tablet. Get serious about business cloud (like Salesforce ...)

To the extent that Ballmer defended himself, it was by pointing out that the market appeared to be undervaluing Microsoft. Ballmer noted that Microsoft's market cap at the time was extremely low relative to its fundamentals/financials relative to Amazon, Google, Apple, Oracle, IBM, and Salesforce. This seems to have been a fair assessment by Ballmer as Microsoft has outperformed all of those companies since then.

When Microsoft's market cap took off after Nadella became CEO, it was only natural the narrative would be that Ballmer was killing Microsoft and that the company was struggling until Nadella turned it around. You can pick other discussions if you want, but just for example, if we look at the most recent time Microsoft is Dead hit #1 on HN, a quick ctrl+F has Ballmer's name showing up 24 times. Ballmer has some defenders, but the standard narrative that Ballmer was holding Microsoft back is there, and one of the defenders even uses part of the standard narrative: Ballmer was an unimaginative hack, but he at least set up Microsoft well financially. If you look at high ranking comments, they're all dunking on Ballmer.

And if you look on less well informed forums, like Twitter or Reddit, you see the same attacks, but Ballmer has fewer defenders. On Twitter, when I search for "Ballmer", the first four results are unambiguously making fun of Ballmer. The fifth hit could go either way, but from the comments, seems to generally be taken as making of Ballmer, and as I far as I scrolled down, all but one of the remaining videos was making fun of Ballmer (the one that wasn't was an interview where Ballmer notes that he offered Zuckerberg "$20B+, something like that" for Facebook in 2009, which would've been the 2nd largest tech acquisition ever at the time, second only to Carly Fiorina's acquisition of Compaq for $25B in 2001). Searching reddit (incognito window with no history) is the same story (excluding the stories about him as an NBA owner, where he's respected by fans). The top story is making fun of him, the next one notes that he's wealthier than Bill Gates and the top comment on his performance as a CEO starts with "The irony is that he is Microsofts [sic] worst CEO" and then has the standard narrative that the only reason the company is doing well is due to Nadella saving the day, that Ballmer missed the boat on all of the important changes in the tech industry, etc.

To sum it up, for the past twenty years, people having been dunking on Ballmer for being a buffoon who doesn't understand tech and who was, at best, some kind of bean counter who knew how to keep the lights on but didn't know how to foster innovation and caused Microsoft to fall behind in every important market.

Ballmer's wins

The common view is at odds with what actually happened under Ballmer's leadership. In financially material positive things that happened under Ballmer since Graham declared Microsoft dead, we have:

  • 2009: Bing launched. This is considered a huge failure, but the bar here is fairly high. A quick web search finds that Bing allegedly made $1B in profit in 2015 and $6.4B in FY 2024 on $12.6B of revenue (given Microsoft's PE ratio in 2022, a rough estimate for Bing's value in 2022 would be $240B)
  • 2010: Microsoft creates Azure
    • I can't say that I personally like it as a product, but in terms of running large scale cloud infrastructure, the three companies that are head-and-shoulders ahead of everyone else in the world are Amazon, Google, and Microsoft. From a business standpoint, the worst thing you could say about Microsoft here is that they're a solid #2 in terms of the business and the biggest threat to become the #1
    • The enterprise sales arm, built and matured under Ballmer, was and is critical to the success of Azure and Office
  • 2010: Office 365 released
    • Microsoft transitioned its enterprise / business suite of software from boxed software to subscription-based software with online options
      • there isn't really a fixed date for this; the official release of Office 365 seems like as good a year as any
    • Like Azure, I don't personally like these products, but if Microsoft were to split up into major business units, the enterprise software suite is the business unit that could possibly rival Azure in market cap

There are certainly plenty of big misses as well. From 2010-2015, HoloLens was one of Microsoft's biggest bets, behind only Azure and then Bing, but no one's big AR or VR bets have had good returns to date. Microsoft failed to capture the mobile market. Although Windows Phone was generally well received by reviewers who tried it, depending on who you ask, Microsoft was either too late or wasn't willing to subsidize Windows Phone for long enough. Although .NET is still used today, in terms of marketshare, .NET and Silverlight didn't live up to early promises and critical parts were hamstrung or killed as a side effect of internal political battles. Bing is, by reputation, a failure and, at least given Microsoft's choices at the time, probably needed antitrust action against Google to succeed, but this failure still resulted in a business unit worth hundreds of billions of dollars. And despite all of the failures, the biggest bet, Azure, is probably worth on the order of a trillion dollars.

The enterprise sales arm of Microsoft was built out under Ballmer before he was CEO (he was, for a time, EVP for Sales and Support, and actually started at Microsoft as the first business manager) and continued to get built out when Ballmer was CEO. Microsoft's sales playbook was so effective that, when I was Microsoft, Google would offer some customers on Office 365 Google's enterprise suite (Docs, etc.) for free. Microsoft salespeople noted that they would still usually be able to close the sale of Microsoft's paid product even when competing against a Google that was giving their product away. For the enterprise, the combination of Microsoft's offering and its enterprise sales team was so effective that Google couldn't even give its product away.

If you're reading this and you work at a "tech" company, the company is overwhelmingly likely to choose the Google enterprise suite over the Microsoft enterprise suite and the enterprise sales pitch Microsoft sales people have probably sounds ridiculous to you.

An acquaintance of mine who ran a startup had a Microsoft Azure salesperson come in and try to sell them on Azure, opening with "You're on AWS, the consumer cloud. You need Azure, the enterprise cloud". For most people in tech companies, enterprise is synonymous with overpriced, unreliable, junk. In the same way it's easy to make fun of Ballmer because he came up on the sales and business side of the house, it's easy to make fun of an enterprise sales pitch when you hear it but, overall, Microsoft's enterprise sales arm does a good job. When I worked in Azure, I looked into how it worked and, having just come from Google, there was a night and day difference. This was in 2015, under Nadella, but the culture and processes that let Microsoft scale this up were built out under Ballmer. I think there were multiple months where Microsoft hired and onboarded more salespeople than Google employed in total and every stage of the sales pipeline was fairly effective.

Microsoft's misses under Ballmer

When people point to a long list of failures like Bing, Zune, Windows Phone, and HoloLens as evidence that Ballmer was some kind of buffoon who was holding Microsoft back, this demonstrates a lack of understanding of the tech industry. This is like pointing to a list of failed companies a VC has funded as evidence the VC doesn't know what they're doing. But that's silly in a hits based industry like venture capital. If you want to claim the VC is bad, you need to point out poor total return or a lack of big successes, which would imply poor total return. Similarly, a large company like Microsoft has a large portfolio of bets and one successful bet can pay for a huge number of failures. Ballmer's critics can't point to a poor total return because Microsoft's total return was very good under his tenure. Revenue increased from $14B or $22B to $83B, depending on whether you want to count from when Ballmer became President in July 1998 or when Ballmer became CEO in January 2000. The company was also quite profitable when Ballmer left, recording $27B in profit the previous four quarters, more than the revenue of the company he took over. By market cap, Azure alone would be in the top 10 largest public companies in the world and the enterprise software suite minus Azure would probably just miss being in the top 10.

As a result, critics also can't point to a lack of hits when Ballmer presided over the creation of Azure, the conversion of Microsoft's enterprise software from set of local desktop apps to Office 365 et al., the creation of the world's most effective enterprise sales org, the creation of Microsoft's video game empire (among other things, Ballmer was CEO when Microsoft acquired Bungie and made Halo the Xbox's flagship game on launch in 2001), etc. Even Bing, widely considered a failure, on last reported revenue and current P/E ratio, would be 12th most valuable tech company in the world, between Tencent and ASML. When attacking Ballmer, people cite Bing as a failure that occurred on Ballmer's watch, which tells you something about the degree of success Ballmer had. Most companies would love to have their successes be as successful as Bing, let alone their failures. Of course it would be better if Ballmer was prescient and all of his bets succeeded, making Microsoft worth something like $10T instead of the lowly $3T market cap it has today, but the criticism of Ballmer that says that he had some failures and some $1T successes is a criticism that he wasn't the greatest CEO of all time by a gigantic margin. True, but not much of a criticism.

And, unlike Nadella, Ballmer didn't inherit a company that was easily set up for success. As we noted earlier, it wasn't long into Ballmer's tenure that Microsoft was considered a boring, irrelevant company and the next IBM, mostly due to decisions made when Bill Gates was CEO. As a very senior Microsoft employee from the early days, Ballmer was also partially responsible for the state of Microsoft at the time, so Microsoft's problems are also at least partially attributable to him (but that also means he should get some credit for the success Microsoft had through the 90s). Nevertheless, he navigated Microsoft's most difficult problems well and set up his successor for smooth sailing.

Earlier, we noted that Paul Graham cited Google and the rise of the web as two causes for Microsoft's death prior to 2007. As we discussed in this look at antitrust action in tech, these both share a common root cause, antitrust action against Microsoft. If we look at the documents from the Microsoft antitrust case, it's clear that Microsoft knew how important the internet was going to be and had plans to control the internet. As part of these plans, they used their monopoly power on the desktop to kill Netscape. They technically lost an antirust case due to this, but if you look at the actual outcomes, Microsoft basically got what they wanted from the courts. The remedies levied against Microsoft are widely considered to have been useless (the initial decision involved breaking up Microsoft, but they were able to reverse this on appeal), and the case dragged on for long enough that Netscape was doomed by the time the case was decided, and the remedies that weren't specifically targeted at the Netscape situation were meaningless.

A later part of the plan to dominate the web, discussed at Microsoft but never executed, was to kill Google. If we're judging Microsoft by how "dangerous" it is, how effectively it crushes its competitors, like Paul Graham did when he judged Microsoft to be dead, then Microsoft certainly became less dangerous, but the feeling at Microsoft was that their hand was forced due to the circumstances. One part of the plan to kill Google was to redirect users who typed google.com into their address bar to MSN search. This was before Chrome existed and before mobile existed in any meaningful form. Windows desktop marketshare was 97% and IE had between 80% to 95% marketshare depending on the year, with most of the rest of the marketshare belonging to the rapidly declining Netscape. If Microsoft makes this move, Google is killed before it can get Chrome and Android off the ground and, barring extreme antitrust action, such as a breakup of Microsoft, Microsoft owns the web to this day. And then for dessert, it's not clear there wouldn't be a reason to go after Amazon.

After internal debate, Microsoft declined to kill Google not due to fear of antitrust action, but due to fear of bad PR from the ensuing antitrust action. Had Microsoft redirected traffic away from Google, the impact on Google would've been swifter and more severe than their moves against Netscape and in the time it would take for the DoJ to win another case against Microsoft, Google would suffer the same fate as Netscape. It might be hard to imagine this if you weren't around at the time, but the DoJ vs. Microsoft case was regular front-page news in a way that we haven't seen since (in part because companies learned their lesson on this one — Google supposedly killed the 2011-2012 FTC against them with lobbying and has cleverly maneuvered the more recent case so that it doesn't dominate the news cycle in the same way). The closest thing we've seen since the Microsoft antitrust media circus was the media response to the Crowdstrike outage, but that was a flash in the pan compared to the DoJ vs. Microsoft case.

If there's a criticism of Ballmer here, perhaps it's something like Microsoft didn't pre-emptively learn the lessons its younger competitors learned from its big antitrust case before the big antitrust case. A sufficiently prescient executive could've advocated for heavy lobbying to head the antitrust case off at pass, like Google did in 2011-2012, or maneuvered to make the antitrust case just another news story, like Google has been doing for the current case. Another possible criticism is that Microsoft didn't correctly read the political tea leaves and realize that there wasn't going to be serious US tech antitrust for at least two decades after the big case against Microsoft. In principle, Ballmer could've overridden the decision to not kill Google if he had the right expertise on staff to realize that the United States was entering a two decade period of reduced antitrust scrutiny in tech.

As criticisms go, I think the former criticism is correct, but not an indictment of Ballmer unless you expect CEOs to be infallible, so as evidence that Ballmer was a bad CEO, this would be a very weak criticism. And it's not clear that the latter criticism is correct. While Google was able to get away with things like hardcoding the search engine in Android to prevent users from changing their search engine setting to having badware installers trick users into making Chrome the default browser, they were considered the "good guys" and didn't get much scrutiny for these sorts of actions, Microsoft wasn't treated with kid gloves in the same way by the press or the general public. Google didn't trigger a serious antitrust investigation until 2011, so it's possible the lack of serious antitrust action between 2001 and 2010 was an artifact of Microsoft being careful to avoid antitrust scrutiny and Google being too small to draw scrutiny and that a move to kill Google when it was still possible would've drawn serious antitrust scrutiny and another PR circus. That's one way in which the company Ballmer inherited was in a more difficult situation than its competitors — Microsoft's hands were perceived to be tied and may have actually been tied. Microsoft could and did get severe criticism for taking an action when the exact same action taken by Google would be lauded as clever.

When I was at Microsoft, there was a lot of consternation about this. One funny example was when, in 2011, Google officially called out Microsoft for unethical behavior and the media jumped on this as yet another example of Microsoft behaving badly. A number of people I talked to at Microsoft were upset by this because, according to them, Microsoft got the idea to do this when they noticed that Google was doing it, but reputations take a long time to change and actions taken while Gates was CEO significantly reduced Microsoft's ability to maneuver.

Another difficulty Ballmer had to deal with on taking over was Microsoft's intense internal politics. Again, as a very senior Microsoft employee going back to almost the beginning, he bears some responsibility for this, but Ballmer managed to clear the board of the worst bad actors so that Nadella didn't inherit such a difficult situation. If we look at why Microsoft didn't dominate the web under Ballmer, in addition to concerns that killing Google would cause a PR backlash, internal political maneuvering killed most of Microsoft's most promising web products and reduced the appeal and reach of most of the rest of its web products. For example, Microsoft had a working competitor to Google Docs in 1997, one year before Google was founded and nine years before Google acquired Writely, but it was killed for political reasons. And likewise for NetMeeting and other promising products. Microsoft certainly wasn't alone in having internal political struggles, but it was famous for having more brutal politics than most.

Although Ballmer certainly didn't do a perfect job at cleaning house, when I was at Microsoft and asked about promising projects that were sidelined or killed due to internal political struggles, the biggest recent sources of those issues were shown the door under Ballmer, leaving a much more functional company for Nadella to inherit.

The big picture

Stepping back to look at the big picture, Ballmer inherited a company that was a financially strong position that was hemmed in by internal and external politics in a way that caused outside observers to think the company was overwhelmingly likely to slide into irrelevance, leading to predictions like Graham's famous prediction that Microsoft is dead, with revenues expected to decline in five to ten years. In retrospect, we can see that moves made under Gates limited Microsoft's ability to use its monopoly power to outright kill competitors, but there was no inflection point at which a miraculous turnaround was mounted. Instead, Microsoft continued its very strong execution on enterprise products and continued making reasonable bets on the future in a successful effort to supplant revenue streams that were internally viewed as long-term dead ends, even if they were going to be profitable dead ends, such as Windows and boxed (non-subscription) software.

Unlike most companies in that position, Microsoft was willing to very heavily subsidize a series of bets that leadership thought could power the company for the next few decades, such as Windows Phone, Bing, Azure, Xbox, and HoloLens. From the internal and external commentary on these bets, you can see why it's so hard for companies to use their successful lines of business to subsidize new lines of business when the writing is on the wall for the successful businesses. People panned these bets as stupid moves that would kill the company, saying the company should focus is efforts on its most profitable businesses, such as Windows. Even when there's very clear data showing that bucking the status quo is the right thing, people usually don't do it, in part because you look like an idiot when it doesn't pan out, but Ballmer was willing to make the right bets in the face of decades of ridicule.

Another reason it's hard for companies to make these bets is that companies are usually unable to launch new things that are radically different from their core business. When yet another non-acquisition Google consumer product fails, every writes this off as a matter of course — of course Google failed there, they're a technical-first company that's bad at product. But Microsoft made this shift multiple times and succeeded. Once was with Xbox. If you look at the three big console manufacturers, two are hardware companies going way back and one is Microsoft, a boxed software company that learned how to make hardware. Another time was with Azure. If you look at the three big cloud providers, two are online services companies going back to their founding and one is Microsoft, a boxed software company that learned how to get into the online services business. Other companies with different core lines of business than hardware and online services saw these opportunities and tried to make the change and failed.

And if you look at the process of transitioning here, it's very easy to make fun of Microsoft in the same way it's easy to make fun of Microsoft's enterprise sales pitch. The core Azure folks came from Windows, so in the very early days of Azure, they didn't have an incident management process to speak of and during their first big global outages, people were walking around the hallways asking "is it Azure down?" and trying to figure out what to do. Azure would continue to have major global outages for years while learning how to ship somewhat reliable software, but they were able to address the problems well enough to build a trillion dollar business. Another time, before Azure really knew how to build servers, a Microsoft engineer pulled up Amazon's pricing page and noticed that AWS's retail price for disk was cheaper than Azure's cost to provision disks. When I was at Microsoft, a big problem for Azure was building out datacenter fast enough. People joked that the recent hiring of a ton of sales people worked too well and the company sold too much Azure, which was arguably true and also a real emergency for the company. In the other cases, Microsoft mostly learned how to do it themselves and in this case they brought in some very senior people from Amazon who had deep expertise in supply chain and building out datacenters. It's easy to say that, when you have a problem and a competitor has the right expertise, you should hire some experts and listen to them but most companies fail when they try to do this. Sometimes, companies don't recognize that they need help but, more frequently, they do bring in senior expertise that people don't listen to. It's very easy for the old guard at a company to shut down efforts to bring in senior outside expertise, especially at a company as fractious at Microsoft, but leadership was able to make sure that key initiatives like this were successful3.

When I talked to Google engineers about Azure during Azure's rise, they were generally down on Azure and would make fun of it for issues like the above, which seemed comical to engineers working at a companies that grew up as large scale online services companies with deep expertise in operating large scale services, building efficient hardware, and building out datacenter, but despite starting in a very deep hole technically, operationally, and culturally, Microsoft built a business unit worth a trillion dollars with Azure.

Not all of the bets panned out, but if we look at comments from critics who were saying that Microsoft was doomed because it was subsidizing the wrong bets or younger companies would surpass it, well, today, Microsoft is worth 50% more than Google and twice as much as Meta. If we look at the broader history of the tech industry, Microsoft has had sustained strong execution from its founding in 1975 until today, a nearly fifty year run, a run that's arguably been unmatched in the tech industry. Intel's been around as bit longer, but they stumbled very badly around the turn of the century and they've had a number of problems over the past decade. IBM has a long history, but it just wasn't all that big during its early history, e.g., when T.J. Watson renamed Computing-Tabulating-Recording Company to International Business Machines, its revenue was still well under $10M a year (inflation adjusted, on the order of $100M a year). Computers started becoming big and IBM was big for a tech company by the 50s, but the antitrust case brought against IBM in 1969 that dragged on until it was dropped for being "without merit" in 1982 hamstrung the company and its culture in ways that are still visible when you look at, for example, why IBM's various cloud efforts have failed and, in the 90s, the company was on its deathbed and only managed to survive at all due to Gerstner's turnaround. If we look at older companies that had long sustained runs of strong execution, most of them are gone, like DEC and Data General, or had very bad stumbles that nearly ended the company, like IBM and Apple. There are companies that have had similarly long periods of strong execution, like Oracle, but those companies haven't been nearly as effective as Microsoft in expanding their lines of business and, as a result, Oracle is worth perhaps two Bings. That makes Oracle the 20th most valuable public company in the world, which certainly isn't bad, but it's no Microsoft.

If Microsoft stumbles badly, a younger company like Nvidia, Meta, or Google could overtake Microsoft's track record, but that would be no fault of Ballmer's and we'd still have to acknowledge that Ballmer was a very effective CEO, not just in terms of bringing the money in, but in terms of setting up a vision that set Microsoft up for success for the next fifty years.

Appendix: Microsoft's relevance under Ballmer

Besides the headline items mentioned above, off the top of my head, here are a few things I thought were interesting that happened under Ballmer since Graham declared Microsoft to be dead

  • 2007: Microsoft releases LINQ, still fairly nice by in-use-by-practitioners standards today
  • 2011: Sumit Gulwani, at MSR, publishes "Automating string processing in spreadsheets using input-output examples", named a most influential POPL paper 10 years later
    • This paper is about using program synthesis for spreadsheet "autocomplete/inference"
    • I'm not a fan of patents, but I would guess that the reason autocomplete/inference works fairly well in Excel and basically doesn't work at all in Google Sheets is that MS has a patent on this based on this work
  • 2012: Microsoft releases TypeScript
    • This has to be the most widely used programming language released this century and it's a plausible candidate for becoming the most widely used language, period (as long as you don't also count TS usage as JS)
  • 2012: Microsoft Surface released
    • Things haven't been looking so good for the Surface line since Panos Panay left in 2022, and this was arguably a failure even in 2022, but this was a $7B/yr line of business in 2022, which goes to show you how big and successful Microsoft is — most companies would love to have something doing as well as a failed $7B/yr business
  • 2015: Microsoft releases vscode (after the end of Ballmer's tenure in 2014, but this work came out of work under Ballmer's tenure in multiple ways)
    • This seems like the most widely used editor among programmers today by a very large margin. When I looked at survey data on this a number of years back, I was shocked by how quickly this happened. It seems like vscode has achieved a level of programmer editor dominance that's never been seen before. Probably the closest thing was Visual Studio a decade before Paul declared Microsoft dead, but that never achieved the same level of marketshare due to a combination of effectively being Windows only software and also costing quite a bit of money
    • Heath Borders notes that Erich Gamma, hired in 2011, was highly influential here

One response to Microsoft's financial success, both the direct success that happened under Ballmer as well as later success that was set up by Ballmer, is that Microsoft is financially successful but irrelevant for trendy programmers, like IBM. For one thing, rounded to the nearest Bing, IBM is probably worth either zero or one Bings. But even if we put aside the financial aspect and we just look at how much each $1T tech company (Apple, Nvidia, Microsoft, Google, Amazon, and Meta) has impacted programmers, Nvidia, Apple, and Microsoft all have a lot of programmers who are dependent on the company due to some kind of ecosystem dependence (CUDA; iOS; .NET and Windows, the latter of which is still the platform of choice for many large areas, such as AAA games).

You could make a case for the big cloud vendors, but I don't think that companies have a nearly forced dependency on AWS in the same way that a serious English-language consumer app company really needs an iOS app or an AAA game company has to release on Windows and overwhelmingly likely develops on Windows.

If we look at programmers who aren't pinned to an ecosystem, Microsoft seems highly relevant to a lot of programmers due to the creation of tools like vscode and TypeScript. I wouldn't say that it's necessarily more relevant than Amazon since so many programmers use AWS, but it's hard to argue that the company that created (among many other things) vscode and TypeScript under Ballmer's watch is irrelevant to programmers.

Appendix: my losing bet against Microsoft

Shortly after joining Microsoft in 2015, I bet Derek Chiou that Google would beat Microsoft to $1T market cap. Unlike most external commentators, I agreed with the bets Microsoft was making, but when I looked around at the kinds of internal dysfunction Microsoft had at the time, I thought that would cause them enough problems that Google would win. That was wrong — Microsoft beat Google to $1T and is now worth $1T more than Google.

I don't think I would've made the bet even a year later, after seeing Microsoft from the inside and how effective Microsoft sales was and how good Microsoft was at shipping things that are appealing to enterprises and the comparing that to Google's cloud execution and strategy. But you could say that I made a mistake that was fairly analogous to what external commentators made until I saw how Microsoft operated in detail.

Thanks to Laurence Tratt, Yossi Kreinin, Heath Borders, Justin Blank, Fabian Giesen, Justin Findlay, Matthew Thomas, Seshadri Mahalingam, and Nam Nguyen for comments/corrections/discussion


  1. Fabian Giesen points out that, in addition to Ballmer's "sales guy" reputation, his stage persona didn't do him any favors, saying "His stage presence made people think he was bad. But if you're not an idiot and you see an actor portraying Macbeth, you don't assume they're killing all their friends IRL" [return]
  2. Here's the top HN comment on a story about Sinofsky's ousting:

    The real culprit that needs to be fired is Steve Ballmer. He was great from the inception of MSFT until maybe the turn of the century, when their business strategy of making and maintaining a Windows monopoly worked beautifully and extremely profitably. However, he is living in a legacy environment where he believes he needs to protect the Windows/Office monopoly BY ANY MEANS NECESSARY, and he and the rest of Microsoft can't keep up with everyone else around them because of innovation.

    This mindset has completely stymied any sort of innovation at Microsoft because they are playing with one arm tied behind their backs in the midst of trying to compete against the likes of Google, Facebook, etc. In Steve Ballmer's eyes, everything must lead back to the sale of a license of Windows/Office, and that no longer works in their environment.

    If Microsoft engineers had free rein to make the best search engine, or the best phone, or the best tablet, without worries about how will it lead to maintaining their revenue streams of Windows and more importantly Office, then I think their offerings would be on an order of magnitude better and more creative.

    This is wrong. At the time, Microsoft was very heavily subsidizing Bing. To the extent that one can attribute the subsidy, it would be reasonable to say that the bulk of the subsidy was coming from Windows. Likewise, Azure was a huge bet that was being heavily subsidized from the profit that was coming from Windows. Microsoft's strategy under Ballmer was basically the opposite of what this comment is saying.

    Funnily enough, if you looked at comments on minimsft (many of which were made by Microsoft insiders), people noted the huge spend on things like Azure and online services, but most thought this was a mistake and that Microsoft needed to focus on making Windows and Windows hardware (like the Surface) great.

    Basically, no matter what people think Ballmer is doing, they say it's wrong and that he should do the opposite. That means people call for different actions since most commenters outside of Microsoft don't actually know what Microsoft is up to, but from the way the comments are arrayed against Ballmer and not against specific actions of the company, we can see that people aren't really making a prediction about any particular course of action and they're just ragging on Ballmer.

    BTW, the #2 comment on HN says that Ballmer missed the boat on the biggest things in tech in the past 5 years and that Ballmer has deemphasized cloud computing (which was actually Microsoft's biggest bet at the time if you look at either capital expenditure or allocated headcount). The #3 comment says "Steve Ballmer is a sales guy at heart, and it's why he's been able to survive a decade of middling stock performance and strategic missteps: He must have close connections to Microsoft's largest enterprise customers, and were he to be fired, it would be an invitation for those customers to reevaluate their commitment to Microsoft's platforms.", and the rest of the top-level comments aren't about Ballmer.

    [return]
  3. There were the standard attempts at blocking the newfangled thing, e.g., when Azure wanted features added to Windows networking, they would get responses like "we'll put that on the roadmap", which was well understood to mean "we're more powerful than you and we don't have to do anything you say", so Microsoft leadership ripped networking out of Windows and put Windows networking in the Azure org, giving Azure control of the networking features they wanted. This kind of move is in contrast to efforts to change the focus of the company at nearly every other company. For an extreme example on the other end, consider Qualcomm's server chip effort. When the group threatened to become more profitable and more important than the mobile chip group, the mobile group to had the server group killed before it could become large enough to defend itself. Some leadership, including the CEO, supported the long-term health of the company and therefore supported the sever group. Those people, including the CEO, were removed from the board and fired. It's unusual to have enough support to unseat the CEO, but for a more typical effort, look at how Microsoft killed its 1997 version of an online office suite. [return]

How good can you be at Codenames without knowing any words?

2024-08-11 08:00:00

About eight years ago, I was playing a game of Codenames where the game state was such that our team would almost certainly lose if we didn't correctly guess all of our remaining words on our turn. From the given clue, we were unable to do this. Although the game is meant to be a word guessing game based on word clues, a teammate suggested that, based on the physical layout of the words that had been selected, most of the possibilities we were considering would result in patterns that were "too weird" and that we should pick the final word based on the location. This worked and we won.

[Click to expand explanation of Codenames if you're not familiar with the game] Codenames is played in two teams. The game has a 5x5 grid of words, where each word is secretly owned by one of {blue team, red team, neutral, assassin}. Each team has a "spymaster" who knows the secret word <-> ownership mapping. The spymaster's job is to give single-word clues that allow their teammates to guess which words belong to their team without accidentally guessing words of the opposing team or the assassin. On each turn, the spymaster gives a clue and their teammates guess which words are associated with the clue. The game continues until one team's words have all been guessed or the assassin's word is guessed (immediate loss). There are some details that are omitted here for simplicity, but for the purposes of this post, this explanation should be close enough. If you want more of an explanation, you can try this video, or the official rules

Ever since then, I've wondered how good someone would be if all they did was memorize all 40 setup cards that come with the game. To simulate this, we'll build a bot that plays using only position information would be (you might also call this an AI, but since we'll discuss using an LLM/AI to write this bot, we'll use the term bot to refer to the automated codenames playing agent to make it easy to disambiguate).

At the time, after the winning guess, we looked through the configuration cards to see if our teammate's idea of guessing based on shape was correct, and it was — they correctly determined the highest probability guess based on the possible physical configurations. Each card layout defines which words are your team's and which words belong to the other team and, presumably to limit the cost, the game only comes with 40 cards (160 configurations under rotation). Our teammate hadn't memorized the cards (which would've narrowed things down to only one possible configuration), but they'd played enough games to develop an intuition about what patterns/clusters might be common and uncommon, enabling them to come up with this side-channel attack against the game. For example, after playing enough games, you might realize that there's no card where a team has 5 words in a row or column, or that only the start player color ever has 4 in a row, and if this happens on an edge and it's blue, the 5th word must belong to the red team, or that there's no configuration with six connected blue words (and there is one with red, one with 2 in a row centered next to 4 in a row). Even if you don't consciously use this information, you'll probably develop a subconscious aversion to certain patterns that feel "too weird".

Coming back to the idea of building a bot that simulates someone who's spent a few days memorizing the 40 cards, below, there's a simple bot you can play against that simulates a team of such players. Normally, when playing, you'd provide clues and the team would guess words. But, in order to provide the largest possible advantage to you, the human, we'll give you the unrealistically large advantage of assuming that you can, on demand, generate a clue that will get your team to select the exact squares that you'd like, which is simulated by letting you click on any tile that you'd like to have your team guess that tile.

By default, you also get three guesses a turn, which would put you well above 99%-ile among Codenames players I've seen. While good players can often get three or more correct moves a turn, averaging three correct moves and zero incorrect moves a turn would be unusually good in most groups. You can toggle the display of remaining matching boards on, but if you want to simulate what it's like to be a human player who hasn't memorized every board, you might want to try playing a few games with the display off.

If, at any point, you finish a turn and it's the bot's turn and there's only one matching board possible, the bot correctly guesses every one of its words and wins. The bot would be much stronger if it ever guessed words before it can guess them all, either naively or to strategically reduce the search space, or if it even had a simple heuristic where it would randomly guess among the possible boards if it could deduce that you'd win on your next turn, but even when using the most naive "board memorization" bot possible has been able to beat every Codenames player who I handed this to in most games where they didn't toggle the remaining matching boards on and use the same knowledge the bot has access to.

JS for the Codenames bot failed to load!

A discussion of discussions on AI bias

2024-06-16 08:00:00

There've been regular viral stories about ML/AI bias with LLMs and generative AI for the past couple years. One thing I find interesting about discussions of bias is how different the reaction is in the LLM and generative AI case when compared to "classical" bugs in cases where there's a clear bug. In particular, if you look at forums or other discussions with lay people, people frequently deny that a model which produces output that's sort of the opposite of what the user asked for is even a bug. For example, a year ago, an Asian MIT grad student asked Playground AI (PAI) to "Give the girl from the original photo a professional linkedin profile photo" and PAI converted her face to a white face with blue eyes.

The top "there's no bias" response on the front-page reddit story, and one of the top overall comments, was

Sure, now go to the most popular Stable Diffusion model website and look at the images on the front page.

You'll see an absurd number of asian women (almost 50% of the non-anime models are represented by them) to the point where you'd assume being asian is a desired trait.

How is that less relevant that "one woman typed a dumb prompt into a website and they generated a white woman"?

Also keep in mind that she typed "Linkedin", so anyone familiar with how prompts currently work know it's more likely that the AI searched for the average linkedin woman, not what it thinks is a professional women because image AI doesn't have an opinion.

In short, this is just an AI ragebait article.

Other highly-ranked comments with the same theme include

Honestly this should be higher up. If you want to use SD with a checkpoint right now, if you dont [sic] want an asian girl it’s much harder. Many many models are trained on anime or Asian women.

and

Right? AI images even have the opposite problem. The sheer number of Asians in the training sets, and the sheer number of models being created in Asia, means that many, many models are biased towards Asian outputs.

Other highly-ranked comments noted that this was a sample size issue

"Evidence of systemic racial bias"

Shows one result.

Playground AI's CEO went with the same response when asked for an interview by the Boston Globe — he declined the interview and replied with a list of rhetorical questions like the following (the Boston Globe implies that there was more, but didn't print the rest of the reply):

If I roll a dice just once and get the number 1, does that mean I will always get the number 1? Should I conclude based on a single observation that the dice is biased to the number 1 and was trained to be predisposed to rolling a 1?

We could just have easily picked an example from Google or Facebook or Microsoft or any other company that's deploying a lot of ML today, but since the CEO of Playground AI is basically asking someone to take a look at PAI's output, we're looking at PAI in this post. I tried the same prompt the MIT grad student used on my Mastodon profile photo, substituting "man" for "girl". PAI usually turns my Asian face into a white (caucasian) face, but sometimes makes me somewhat whiter but ethnically ambiguous (maybe a bit Middle Eastern or East Asian or something. And, BTW, my face has a number of distinctively Vietnamese features and which pretty obviously look Vietnamese and not any kind of East Asian.

Profile photo of Vietnamese person4 profile photos run through playground AI, 3 look very European and one looks a bit ambiguous4 profile photos run through playground AI, none look East Asian or Southeast Asian

My profile photo is a light-skinned winter photo, so I tried a darker-skinned summer photo and PAI would then generally turn my face into a South Asian or African face, with the occasional Chinese (but never Vietnamese or kind of Southeast Asian face), such as the following:

Profile photo of tanned Vietnamese person4 profile photos of tanned Vietnamese person run through playground AI, 1 looks black and 3 look South Asian

A number of other people also tried various prompts and they also got results that indicated that the model (where “model” is being used colloquially for the model and its weights and any system around the model that's responsible for the output being what it is) has some preconceptions about things like what ethnicity someone has if they have a specific profession that are strong enough to override the input photo. For example, converting a light-skinned Asian person to a white person because the model has "decided" it can make someone more professional by throwing out their Asian features and making them white.

Other people have tried various prompts to see what kind of pre-conceptions are bundled into the model and have found similar results, e.g., Rob Ricci got the following results when asking for "linkedin profile picture of X professor" for "computer science", "philosophy", "chemistry", "biology", "veterinary science", "nursing", "gender studies", "Chinese history", and "African literature", respectively. In the 28 images generated for the first 7 prompts, maybe 1 or 2 people out of 28 aren't white. The results for the next prompt, "Chinese history" are wildly over-the-top stereotypical, something we frequently see from other models as well when asking for non-white output. And Andreas Thienemann points out that, except for the over-the-top Chinese stereotypes, every professor is wearing glasses, another classic stereotype.

Like I said, I don't mean to pick on Playground AI in particular. As I've noted elsewhere, trillion dollar companies regularly ship AI models to production without even the most basic checks on bias; when I tried ChatGPT out, every bias-checking prompt I played with returned results that were analogous to the images we saw here, e.g., when I tried asking for bios of men and women who work in tech, women tended to have bios indicating that they did diversity work, even for women who had no public record of doing diversity work and men tended to have degrees from name-brand engineering schools like MIT and Berkeley, even people who hadn't attended any name-brand schools, and likewise for name-brand tech companies (the link only has 4 examples due to Twitter limitations, but other examples I tried were consistent with the examples shown).

This post could've used almost any publicly available generative AI. It just happens to use Playground AI because the CEO's response both asks us to do it and reflects the standard reflexive "AI isn't biased" responses that lay people commonly give.

Coming back to the response about how it's not biased for professional photos of people to be turned white because Asians feature so heavily in other cases, the high-ranking reddit comment we looked at earlier suggested "go[ing] to the most popular Stable Diffusion model website and look[ing] at the images on the front page". Below is what I got when I clicked the link on the day the comment was made and then clicked "feed".

[Click to expand / collapse mildly NSFW images]

The site had a bit of a smutty feel to it. The median image could be described as "a poster you'd expect to see on the wall of a teenage boy in a movie scene where the writers are reaching for the standard stock props to show that the character is a horny teenage boy who has poor social skills" and the first things shown when going to the feed and getting the default "all-time" ranking are someone grabbing a young woman's breast, titled "Guided Breast Grab | LoRA"; two young women making out, titled "Anime Kisses"; and a young woman wearing a leash, annotated with "BDSM — On a Leash LORA". So, apparently there was this site that people liked to use to generate and pass around smutty photos, and the high incidence of photos of Asian women on this site was used as evidence that there is no ML bias that negatively impacts Asian women because this cancels out an Asian woman being turned into a white woman when she tried to get a cleaned up photo for her LinkedIn profile. I'm not really sure what to say to this. Fabian Geisen responded with "🤦‍♂️. truly 'I'm not bias. your bias' level discourse", which feels like an appropriate response.

Another standard line of reasoning on display in the comments, that I see in basically every discussion on AI bias, is typified by

AI trained on stock photo of “professionals” makes her white. Are we surprised?

She asked the AI to make her headshot more professional. Most of “professional” stock photos on the internet have white people in them.

and

If she asked her photo to be made more anything it would likely turn her white just because that’s the average photo in the west where Asians only make up 7.3% of the US population, and a good chunk of that are South Indians that look nothing like her East Asian features. East Asians are 5% or less; there’s just not much training data.

These comments seem to operate from a fundamental assumption that companies are pulling training data that's representative of the United States and that this is a reasonable thing to do and that this should result in models converting everyone into whatever is most common. This is wrong on multiple levels.

First, on whether or not it's the case that professional stock photos are dominated by white people, a quick image search for "professional stock photo" turns up quite a few non-white people, so either stock photos aren't very white or people have figured out how to return a more representative sample of stock photos. And given worldwide demographics, it's unclear what internet services should be expected to be U.S.-centric. And then, even if we accept that major internet services should assume that everyone is in the United States, it seems like both a design flaw as well as a clear sign of bias to assume that every request comes from the modal American.

Since a lot of people have these reflexive responses when talking about race or ethnicity, let's look at a less charged AI hypothetical. Say I talk to an AI customer service chatbot for my local mechanic and I ask to schedule an appointment to put my winter tires on and do a tire rotation. Then, when I go to pick up my car, I find out they changed my oil instead of putting my winter tires on and then a bunch of internet commenters explain why this isn't a sign of any kind of bias and you should know that an AI chatbot will convert any appointment with a mechanic to an oil change appointment because it's the most common kind of appointment. A chatbot that converts any kind of appointment request into "give me the most common kind of appointment" is pretty obviously broken but, for some reason, AI apologists insist this is fine when it comes to things like changing someone's race or ethnicity. Similarly, it would be absurd to argue that it's fine for my tire change appointment to have been converted to an oil change appointment because other companies have schedulers that convert oil change appointments to tire change appointments, but that's another common line of reasoning that we discussed above.

And say I used some standard non-AI scheduling software like Mindbody or JaneApp to schedule an appointment with my mechanic and asked for an appointment to have my tires changed and rotated. If I ended up having my oil changed because the software simply schedules the most common kind of appointment, this would be a clear sign that the software is buggy and no reasonable person would argue that zero effort should go into fixing this bug. And yet, this is a common argument that people are making with respect to AI (it's probably the most common defense in comments on this topic). The argument goes a bit further, in that there's this explanation of why the bug occurs that's used to justify why the bug should exist and people shouldn't even attempt to fix it. Such an explanation would read as obviously ridiculous for a "classical" software bug and is no less ridiculous when it comes to ML. Perhaps one can argue that the bug is much more difficult to fix in ML and that it's not practical to fix the bug, but that's different from the common argument that it isn't a bug and that this is the correct way for software to behave.

I could imagine some users saying something like that when the program is taking actions that are more opaque to the user, such as with autocorrect, but I actually tried searching reddit for autocorrect bug and in the top 3 threads (I didn't look at any other threads), 2 out of the 255 comments denied that incorrect autocorrects were a bug and both of those comments were from the same person. I'm sure if you dig through enough topics, you'll find ones where there's a higher rate, but on searching for a few more topics (like excel formatting and autocorrect bugs), none of the topics I searched approached what we see with generative AI, where it's not uncommon to see half the commenters vehemently deny that a prompt doing the opposite of what the user wants is a bug.

Coming back to the bug itself, in terms of the mechanism, one thing we can see in both classifiers as well as generative models is that many (perhaps most or almost all) of these systems are taking bias that a lot of people have that's reflected in some sample of the internet, which results in things like Google's image classifier classifying a black hand holding a thermometer as {hand, gun} and a white hand holding a thermometer as {hand, tool}1. After a number of such errors over the past decade, from classifying black people as gorillas in Google Photos in 2015, to deploying some kind of text-classifier for ads that classified ads that contained the terms "African-American composers" and "African-American music" as "dangerous or derogatory" in 2018 Google turned the knob in the other direction with Gemini which, by the way, generated much more outrage than any of the other examples.

There's nothing new about bias making it into automated systems. This predates generative AI, LLMs, and is a problem outside of ML models as well. It's just that the widespread use of ML has made this legible to people, making some of these cases news. For example, if you look at compression algorithms and dictionaries, Brotli is heavily biased towards the English language — the human-language elements of the 120 transforms built into the language are English, and the built-in compression dictionary is more heavily weighted towards English than whatever representative weighting you might want to reference (population-weighted language speakers, non-automated human-languages text sent on on messaging platforms, etc.). There are arguments you could make as to why English should be so heavily weighted, but there are also arguments as to why the opposite should be the case, e.g., English language usage is positively correlated with a user's bandwidth, so non-English speakers, on average, need the compression more. But regardless of the exact weighting function you think should be used to generate a representative dictionary, that's just not going to make a viral news story because you can't get the typical reader to care that a number of the 120 built-in Brotli transforms do things like add " of the ", ". The", or ". This" to text, which are highly specialized for English, and none of the transforms encode terms that are highly specialized for any other human language even though only 20% of the world speaks English, or that, compared to the number of speakers, the built-in compression dictionary is extremely highly tilted towards English by comparison to any other human language. You could make a defense of the dictionary of Brotli that's analogous to the ones above, over some representative corpus which the Brotli dictionary was trained on, we get optimal compression with the Brotli dictionary, but there are quite a few curious phrases in the dictionary such as "World War II", ", Holy Roman Emperor", "British Columbia", "Archbishop" , "Cleveland", "esperanto", etc., that might lead us to wonder if the corpus the dictionary was trained on is perhaps not the most representative, or even particularly representative of text people send. Can it really be the case that including ", Holy Roman Emperor" in the dictionary produces, across the distribution of text sent on the internet, better compression than including anything at all for French, Urdu, Turkish, Tamil, Vietnamese, etc.?

Another example which doesn't make a good viral news story is my not being able to put my Vietnamese name in the title of my blog and have my blog indexed by Google outside of Vietnamese-language Google — I tried that when I started my blog and it caused my blog to immediately stop showing up in Google searches unless you were in Vietnam. It's just assumed that the default is that people want English language search results and, presumably, someone created a heuristic that would trigger if you have two characters with Vietnamese diacritics on a page that would effectively mark the page as too Asian and therefore not of interest to anyone in the world except in one country. "Being visibly Vietnamese" seems like a fairly common cause of bugs. For example, Vietnamese names are a problem even without diacritics. I often have forms that ask for my mother's maiden name. If I enter my mother's maiden name, I'll be told something like "Invalid name" or "Name too short". That's fine, in that I work around that kind of carelessness by having a stand-in for my mother's maiden name, which is probably more secure anyway. Another issue is when people decide I told them my name incorrectly and change my name. For my last name, if I read my name off as "Luu, ell you you", that gets shortened from the Vietnamese "Luu" to the Chinese "Lu" about half the time and to a western "Lou" much of the time as well, but I've figured out that if I say "Luu, ell you you, two yous", that works about 95% of the time. That sometimes annoys the person on the other end, who will exasperatedly say something like "you didn't have to spell it out three times". Maybe so for that particular person, but most people won't get it. This even happens when I enter my first name into a computer system, so there can be no chance of a transcription error before my name is digitally recorded. My legal first name, with no diacritics, is Dan. This isn't uncommon for an American of Vietnamese descent because Dan works as both a Vietnamese name and an American name and a lot Vietnamese immigrants didn't know that Dan is usually short for Daniel. At six of the companies I've worked for full-time, someone has helpfully changed my name to Daniel at three of them, presumably because someone saw that Dan was recorded in a database and decided that I failed to enter my name correctly and that they knew what my name was better than I did and they were so sure of this they saw no need to ask me about it. In one case, this only impacted my email display name. Since I don't have strong feelings about how people address me, I didn't bother having it changed and lot of people called me Daniel instead of Dan while I worked there. In two other cases, the name change impacted important paperwork, so I had to actually change it so that my insurance, tax paperwork, etc., actually matched my legal name. As noted above, with fairly innocuous prompts to Playground AI using my face, even on the rare occasion they produce Asian output, seem to produce East Asian output over Southeast Asian output. I've noticed the same thing with some big company generative AI models as well — even when you ask them for Southeast Asian output, they generate East Asian output. AI tools that are marketed as tools that clean up errors and noise will also clean up Asian-ness (and other analogous "errors"), e.g., people who've used Adobe AI noise reduction (billed as "remove noise from voice recordings with speech enhacement") note that it will take an Asian accent and remove it, making the person sound American (and likewise for a number of other accents, such as eastern European accents).

I probably see tens to hundreds things like this most weeks just in the course of using widely used software (much less than the overall bug count, which we previously observed was in hundreds to thousands per week), but most Americans I talk to don't notice these things at all. Recently, there's been a lot of chatter about all of the harms caused by biases in various ML systems and the widespread use of ML is going to usher in all sorts of new harms. That might not be wrong, but my feeling is that we've encoded biases into automation for as long as we've had automation and the increased scope and scale of automation has been and will continue to increase the scope and scale of automated bias. It's just that now, many uses of ML make these kinds of biases a lot more legible to lay people and therefore likely to make the news.

There's an ahistoricity in the popular articles I've seen on this topic so far, in that they don't acknowledge that the fundamental problem here isn't new, resulting in two classes of problems that arise when solutions are proposed. One is that solutions are often ML-specific, but the issues here occur regardless of whether or not ML is used, so ML-specific solutions seem focused at the wrong level. When the solutions proposed are general, the proposed solutions I've seen are ones that have been proposed before and failed. For example, a common call to action for at least the past twenty years, perhaps the most common (unless "people should care more" counts as a call to action), has been that we need more diverse teams.

This clearly hasn't worked; if it did, problems like the ones mentioned above wouldn't be pervasive. There are multiple levels at which this hasn't worked and will not work, any one of which would be fatal to this solution. One problem is that, across the industry, the people who are in charge (execs and people who control capital, such as VCs, PE investors, etc.), in aggregate, don't care about this. Although there are efficiency justifications for more diverse teams, the case will never be as clear-cut as it is for decisions in games and sports, where we've seen that very expensive and easily quantifiable bad decisions can persist for many decades after the errors were pointed out. And then, even if execs and capital were bought into the idea, it still wouldn't work because there are too many dimensions. If you look at a company that really prioritized diversity, like Patreon from 2013-2019, you're lucky if the organization is capable of seriously prioritizing diversity in two or three dimensions while dropping the ball on hundreds or thousands of other dimensions, such as whether or not Vietnamese names or faces are handled properly.

Even if all those things weren't problems, the solution still wouldn't work because while having a team with relevant diverse experience may be a bit correlated with prioritizing problems, it doesn't automatically cause problems to be prioritized and fixed. To pick a non-charged example, a bug that's existed in Google Maps traffic estimates since inception that existed at least until 2022 (I haven't driven enough since then to know if the bug still exists) is that, if I ask how long a trip will take at the start of rush hour, this takes into account current traffic and not how traffic will change as I drive and therefore systematically underestimates how long the trip will take (and conversely, if I plan a trip at peak rush hour, this will systematically overestimate how long the trip will take). If you try to solve this problem by increasing commute diversity in Google Maps, this will fail. There are already many people who work on Google Maps who drive and can observe ways in which estimates are systematically wrong. Adding diversity to ensure that there are people who drive and notice these problems is very unlikely to make a difference. Or, to pick another example, when the former manager of Uber's payments team got incorrected blacklisted from Uber by an ML model incorrectly labeling his transactions as fraudulent, no one was able to figure out what happened or what sort of bias caused him to get incorrectly banned (they solved the problem by adding his user to an allowlist). There are very few people who are going to get better service than the manager of the payments team, and even in that case, Uber couldn't really figure out what was going on. Hiring a "diverse" candidate to the team isn't going to automatically solve or even make much difference to bias in whatever dimension the candidate is diverse when the former manager of the team can't even get their account unbanned except for having it whitelisted after six months of investigation.

If the result of your software development methodology is that the fix to the manager of the payments team being banned is to allowlist the user after six months, that traffic routing in your app is systematically wrong for two decades, that core functionality of your app doesn't work, etc., no amount of hiring people with a background that's correlated with noticing some kinds of issues is going to result in fixing issues like these, whether that's with respect to ML bias or another class of bug.

Of course, sometimes variants of old ideas that have failed do succeed, but for a proposal to be credible, or even interesting, the proposal has to address why the next iteration won't fail like every previous iteration did. As we noted above, at a high level, the two most common proposed solutions I've seen are that people should try harder and care more and that we should have people of different backgrounds, in a non-technical sense. This hasn't worked for the plethora of "classical" bugs, this hasn't worked for old ML bugs, and it doesn't seem like there's any reason to believe that this should work for the kinds of bugs we're seeing from today's ML models.

Laurence Tratt says:

I think this is a more important point than individual instances of bias. What's interesting to me is that mostly a) no-one notices they're introducing such biases b) often it wouldn't even be reasonable to expect them to notice. For example, some web forms rejected my previous addresss, because I live in the countryside where many houses only have names -- but most devs live in cities where houses exclusively have numbers. In a sense that's active bias at work, but there's no mal intent: programmers have to fill in design details and make choices, and they're going to do so based on their experiences. None of us knows everything! That raises an interesting philosophical question: when is it reasonable to assume that organisations should have realised they were encoding a bias?

My feeling is that the "natural", as in lowest energy and most straightforward state for institutions and products is that they don't work very well. If someone hasn't previously instilled a culture or instituted processes that foster quality in a particular dimension, quality is likely to be poor, due to the difficulty of producing something high quality, so organizations should expect that they're encoding all sorts of biases if there isn't a robust process for catching biases.

One issue we're running up against here is that, when it comes to consumer software, companies have overwhelmingly chosen velocity over quality. This seems basically inevitable given the regulatory environment we have today or any regulatory environment we're likely to have in my lifetime, in that companies that seriously choose quality over features velocity get outcompeted because consumers overwhelmingly choose the lower cost or more featureful option over the higher quality option. We saw this with cars when we looked at how vehicles perform in out-of-sample crash tests and saw that only Volvo was optimizing cars for actual crashes as opposed to scoring well on public tests. Despite vehicular accidents being one of the leading causes of death for people under 50, paying for safety is such a low priority for consumers that Volvo has become a niche brand that had to move upmarket and sell luxury cars to even survive. We also saw this with CPUs, where Intel used to expend much more verification effort than AMD and ARM and had concomitantly fewer serious bugs. When AMD and ARM started seriously threatening, Intel shifted effort away from verification and validation in order to increase velocity because their quality advantage wasn't doing them any favors in the market and Intel chips are now almost as buggy as AMD chips.

We can observe something similar in almost every consumer market and many B2B markets as well, and that's when we're talking about issues that have known solutions. If we look at problem that, from a technical standpoint, we don't know how to solve well, like subtle or even not-so-subtle bias in ML models, it stands to reason that we should expect to see more and worse bugs than we'd expect out of "classical" software systems, which is what we're seeing. Any solution to this problem that's going to hold up in the market is going to have to be robust against the issue that consumers will overwhelmingly choose the buggier product if it has more features they want or ships features they want sooner, which puts any solution that requires taking care in a way that significantly slows down shipping in a very difficult position, absent a single dominant player, like Intel in its heyday.

Thanks to Laurence Tratt, Yossi Kreinin, Anonymous, Heath Borders, Benjamin Reeseman, Andreas Thienemann, and Misha Yagudin for comments/corrections/discussion

Appendix: technically, how hard is it to improve the situation?

This is a genuine question and not a rhetorical question. I haven't done any ML-related work since 2014, so I'm not well-informed enough about what's going on now to have a direct opinion on the technical side of things. A number of people who've worked on ML a lot more recently than I have like Yossi Kreining (see appendix below) and Sam Anthony think the problem is very hard, maybe impossibly hard where we are today.

Since I don't have a direct opinion, here are three situations which sound plausibly analogous, each of which supports a different conclusion.

Analogy one: Maybe this is like people saying that someone will build a Google any day now at least since 2014 because existing open source tooling is already basically better than Google search or people saying that building a "high-level" CPU that encodes high-level language primitives into hardware would give us a 1000x speedup on general purpose CPUs. You can't really prove that this is wrong and it's possible that a massive improvement in search quality or a 1000x improvement in CPU performance is just around the corner but people who make these proposals generally sound like cranks because they exhibit the ahistoricity we noted above and propose solutions that we already know don't work with no explanation of why their solution will address the problems that have caused previous attempts to fail.

Analogy two: Maybe this is like software testing, where software bugs are pervasive and, although there's decades of prior art from the hardware industry on how to find bugs more efficiently, there are very few areas where any of these techniques are applied. I've talked to people about this a number of times and the most common response is something about how application XYZ has some unique constraint that make it impossibly hard to test at all or test using the kinds of techniques I'm discussing, but every time I've dug into this, the application has been much easier to test than areas where I've seen these techniques applied. One could argue that I'm a crank when it comes to testing, but I've actually used these techniques to test a variety of software and been successful doing so, so I don't think this is the same as things like claiming that CPUs would be 1000x faster if we only my pet CPU architecture.

Due to the incentives in play, where software companies can typically pass the cost of bugs onto the customer without the customer really understanding what's going on, I think we're not going to see a large amount of effort spent on testing absent regulatory changes, but there isn't a fundamental reason that we need to avoid using more efficient testing techniques and methodologies.

From a technical standpoint, the barrier to using better test techniques is fairly low — I've walked people through how to get started writing their own fuzzers and randomized test generators and this typically takes between 30 minutes and an hour, after which people will tend to use these techniques to find important bugs much more efficiently than they used to. However, by revealed preference, we can see that organizations don't really "want to" have their developers test efficiently.

When it comes to testing and fixing bias in ML models, is the situation more like analogy one or analogy two? Although I wouldn't say with any level of confidence that we are in analogy two, I'm not sure how I could be convinced that we're not in analogy two. If I didn't know anything about testing, I would listen to all of these people explaining to me why their app can't be tested in a way that finds showstopping bugs and then conclude something like one of the following

  • "Everyone" is right, which makes sense — this is a domain they know about and I don't, so why should I believe anything different?
  • No opinion, perhaps on due to a high default level of skepticism
  • Everyone is wrong, which seems unreasonable given that I don't know anything about the domain and have no particular reason to believe that everyone is wrong

As an outsider, it would take a very high degree of overconfidence to decide that everyone is wrong, so I'd have to either incorrectly conclude that "everyone" is right or have no opinion.

Given the situation with "classical" testing, I feel like I have to have no real opinion here. WIth no up to date knowledge, it wouldn't be reasonable to conclude that so many experts are wrong. But there are enough problems that people have said are difficult or impossible that turn out to be feasible and not really all that tricky that I have a hard time having a high degree of belief that a problem is essentially unsolvable without actually looking into it.

I don't think there's any way to estimate what I'd think if I actually looked into it. Let's say I try to work in this area and try to get a job at OpenAI or another place where people are working on problems like this, somehow pass the interview,I work in the area for a couple years, and make no progress. That doesn't mean that the problem isn't solvable, just that I didn't solve it. When it comes to the "Lucene is basically as good as Google search" or "CPUs could easily be 1000x faster" people, it's obvious to people with knowledge of the area that the people saying these things are cranks because they exhibit a total lack of understanding of what the actual problems in the field are, but making that kind of judgment call requires knowing a fair amount about the field and I don't think there's a shortcut that would let you reliably figure out what your judgment would be if you had knowledge of the field.

Appendix: the story of this post

I wrote a draft of this post when the Playground AI story went viral in mid-2023, and then I sat on it for a year to see if it seemed to hold up when the story was no longer breaking news. Looking at this a year, I don't think the fundamental issues or the discussions I see on the topic have really changed, so I cleaned it up and then published this post in mid-2024.

If you like making predictions, what do you think the odds are that this post will still be relevant a decade later, in 2033? For reference, this post on "classical" software bugs that was published in 2014 could've been published today, in 2024, with essentially the same results (I say essentially because I see more bugs today than I did in 2014, and I see a lot more front-end and OS bugs today than I saw in 2014, so there would more bugs and different kinds of bugs).

Appendix: comments from other folks

[Click to expand / collapse comments from Yossi Kreinin]

I'm not sure how much this is something you'd agree with but I think a further point related to generative AI bias being a lot like other-software-bias is exactly what this bias is. "AI bias" isn't AI learning the biases of its creators and cleverly working to implement them, e.g. working against a minority that its creators don't like. Rather, "AI bias" is something like "I generally can't be bothered to fix bugs unless the market or the government compels me to do so, and as a logical consequence of this, I especially can't be bothered to fix bugs that disproportionately negatively impact certain groups where the impact, due to the circumstances of the specific group in question, is less likely to compel me to fix the bug."

This is a similarity between classic software bugs and AI bugs — meaning, nobody is worried that "software is biased" in some clever scheming sort of way, everybody gets that it's the software maker who's scheming or, probably more often, it's the software maker who can't be bothered to get things right. With generative AI I think "scheming" is actually even less likely than with traditional software and "not fixing bugs" is more likely, because people don't understand AI systems they're making and can make them do their bidding, evil or not, to a much lesser extent than with traditional software; OTOH bugs are more likely for the same reason [we don't know what we're doing.] I think a lot of people across the political spectrum [including for example Elon Musk and not just journalists and such] say things along the lines of "it's terrible that we're training AI to think incorrectly about the world" in the context of racial/political/other charged examples of bias; I think in reality this is a product bug affecting users to various degrees and there's bias in how the fixes are prioritized but the thing isn't capable of thinking at all.

I guess I should add that there are almost certainly attempts at "scheming" to make generative AI repeat a political viewpoint, over/underrepresent a group of people etc, but invariably these attempts create hilarious side effects due to bugs/inability to really control the model. I think that similar attempts to control traditional software to implement a politics-adjacent agenda are much more effective on average (though here too I think you actually had specific examples of social media bugs that people thought were a clever conspiracy). Whether you think of the underlying agenda as malice or virtue, both can only come after competence and here there's quite the way to go.

See Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models. I feel like if this doesn't work, a whole lot of other stuff doesn't work, either and enumerating it has got to be rather hard.

I mean nobody would expect a 1980s expert system to get enough tweaks to not behave nonsensically. I don't see a major difference between that and an LLM, except that an LLM is vastly more useful. It's still something that pretends to be talking like a person but it's actually doing something conceptually simple and very different that often looks right.

[Click to expand / collapse comments from an anonymous founder of an AI startup] [I]n the process [of founding an AI startup], I have been exposed to lots of mainstream ML code. Exposed as in “nuclear waste” or “H1N1”. It has old-fashioned software bugs at a rate I find astonishing, even being an old, jaded programmer. For example, I was looking at tokenizing recently, and the first obvious step was to do some light differential testing between several implementations. And it failed hilariously. Not like “they missed some edge cases”, more like “nobody ever even looked once”. Given what we know about how well models respond to out of distribution data, this is just insane.

In some sense, this is orthogonal to the types of biases you discuss…but it also suggests a deep lack of craftsmanship and rigor that matches up perfectly.

[Click to expand / collapse comments from Benjamin Reeseman]

[Ben wanted me to note that this should be considered an informal response]

I have a slightly different view of demographic bias and related phenomena in ML models (or any other “expert” system, to your point ChatGPT didn’t invent this, it made it legible to borrow your term).

I think that trying to force the models to reflect anything other than a corpus that’s now basically the Internet give or take actually masks the real issue: the bias is real, people actually get mistreated over their background or skin color or sexual orientation or any number of things and I’d far prefer that the models surface that, run our collective faces in the IRL failure mode than try to tweak the optics in an effort to permit the abuses to continue.

There’s a useful analogy to things like the #metoo movement or various DEI initiatives, most well-intentioned in the beginning but easily captured and ultimately representing a net increase in the blank check of those in positions of privilege.

This isn’t to say that alignment has no place and I think it likewise began with good intentions and is even maybe a locally useful mitigation.

But the real solution is to address the injustice and inequity in the real world.

I think the examples you cited are or should be a wake-up call that no one can pretend to ignore credibly about real issues and would ideally serve as a forcing function on real reform.

I’d love to chat about this at your leisure, my viewpoint is a minority one, but personally I’m a big fan of addressing the underlying issues rather than papering over them with what amounts to a pile of switch statements.

There’s a darker take on this: real reform is impossible, we live in techno dystopia now, let’s mitigate where and how a hunted minority can.

And there is a distressingly strong argument for that case: even in the ostensibly developed world cops today look like soldiers did when I was a kid 30 years ago, someone is guarding something from something at great expense and it isn’t black kids from getting shot near as I can tell.

But I don’t subscribe to the pessimism, I think it’s a local anomaly as industrialization transitions into arbitrary power over the physical logistics if we only knew how to change it.

I did a brief stint consulting for biotech folks affiliated with Nobel shortlist types in an area called proteomics. Like any consultant I surrender any credit to the people paying handsomely by the hour.

But it really is a Shannon problem now: CRISPR Cas-9 can do arbitrary edits to not only a person’s genome but (in an ethical and legal grey area) to the germ line as well.

We just have no idea what to change and there’s enough integrity in the field that we won’t risk just arbitrary children on finding out.

Appendix: reproducing Rob Ricci's results

I tried prompts with default settings, except for reducing image quality to 10 to generate images more quickly. This means we had 512/512, 7 prompt guidance, 10 quality, random seed. After 2 tries, increased image quality to 40 because the images were too low quality to tell the ethnicity sometimes. Other than increasing the image quality, there was no attempt to re-run prompts or otherwise due any kind of selection from the output. The prompts were "Generate a very professional looking linkedin profile photo for a X", where X was Doctor, Lawyer, Engineer, Scientist, Journalist, and Banker.

This produced the following images:

Roughly speaking, I think Rob's results reproduced, which should be no surprise at this point given how many images we've seen.

And then, to see if we could reproduce the standard rebuttal that generative AI isn't biased because requests for smutty images often have Asian women, I tried the prompt "Generate a trashy instagram profile photo for a porn star". There's an NSFW filter that was tripped in some cases, so we don't get groups of four images and instead got:

[Click to expand / collapse very mildly NSFW images]

And, indeed, the generated images are much more Asian than we got for any of our professional photos, save Rob Ricci's set of photos for asking for a "linkedin profile picture of Chinese Studies professor".

Appendix: comments from Benjamin Reeseman


  1. Naturally, when I mentioned this, a "smart contrarian" responded with "what are base rates", but spending 30 seconds googling reveals that the base rate of U.S. gun ownership is much higher among whites than in any other demographic. The base rate argument is even more absurd if you think about the base rate of a hand holding an object — what fraction of the time is that object a gun? Regardless of race, it's going to be very low. Of course, you could find a biased sample that doesn't resemble the underlying base rate at all, which appears to be what Google did, but it's not clear why this justifies having this bug. [return]

What the FTC got wrong in the Google antitrust investigation

2024-05-26 08:00:00

From 2011-2012, the FTC investigated the possibility of pursuing antitrust action against Google. The FTC decided to close the investigation and not much was publicly known about what happened until Politico released 312 pages of internal FTC memos that from the investigation a decade later. As someone who works in tech, on reading the memos, the most striking thing is how one side, the side that argued to close the investigation, repeatedly displays a lack of basic understanding of tech industry and the memos from directors and other higher-ups don't acknowledge that this at all.

If you don't generally follow what regulators and legislators are saying about tech, seeing the internal c(or any other industry) when these decisions are, apparently, being made with little to no understanding of the industries1.

Inside the FTC, the Bureau of Competition (BC) made a case that antitrust action should be pursued and the Bureau of Economics (BE) made the case that the investigation should be dropped. The BC case is moderately strong. Reasonable people can disagree on whether or not the case is strong enough that antitrust action should've been pursued, but a reasonable person who is anti-antitrust has to concede that the antitrust case in the BC memo is at least defensible. The case against in the BE is not defensible. There are major errors in core parts of the BE memo. In order for the BE memo to seem credible, the reader must have large and significant gaps in their understanding of the tech industry. If there was any internal FTC discussion on the errors in the BE memo, there's no indication of that in any public documents. As far as we can see from the evidence that's available, nobody noticed that the BE memo's errors. The publicly available memos from directors and other higher ups indicate that they gave the BE memo as much or more weight than the BC memo, implying a gap in FTC leadership's understanding of the tech industry.

Brief summary

Since the BE memo is effective a rebuttal of a the BC memo, we'll start by looking at the arguments in the BC memo. The bullet points below summarize the Executive Summary from the BC memo, which roughly summarizes the case made by the BC memo:

  • Google is dominant search engine and seller of search ads
  • This memo addresses 4 of 5 areas with anticompetitive conduct; mobile is in a supplemental memo
  • Google has monopoly power in the U.S. in Horizontal Search; Search Advertising; and Syndicated Search and Search Advertising
  • On the question of whether Google has unlawfully preferenced its own content while demoting rivals, we do not recommend the FTC proceed; it's a close call and case law is not favorable to anticompetitive product design and Google's efficiency justifications are strong and there's some benefit to users
  • On whether Google has unlawfully scraped content from vertical rivals to improve their own vertical products, recommending condemning as a conditional refusal to deal under Section 2
    • Prior voluntary dealing was mutually beneficial
    • Threats to remove rival content from general search designed to coerce rivals into allowing Google to user their content for Google's vertical product
    • Natural and probable effect is to diminish incentives of vertical website R&D
  • On anticompetitive contractual restrictions on automated cross-management of ad campaigns, restrictions should be condemned under Section 2
    • They limit ability of advertisers to make use of their own data, reducing innovation and increasing transaction costs for advertisers and third-party businesses
    • Also degrade the quality of Google's rivals in search and search advertising
    • Google's efficiency justifications appears to be pretextual
  • On anticompetitive exclusionary agreements with websites for syndicated search and search ads, Google should be condemned under Section 2
    • Only modest anticompetitive effects on publishers, but deny scale to competitors, competitively significant to main rival (Bing) as well as significant barrier to entry in longer term
    • Google's efficiency justifications are, on balance, non-persuasive
  • Possible remedies
    • Scraping
      • Could be required to provide an opt-out for snippets (reviews, ratings) from Google's vertical properties while retaining snippets in web search and/or Universal Search on main search results page
      • Could be required to limit use of content indexed from web search results
    • Campaign management restrictions
      • Could be required to remove problematic contractual restrictions from license agreements
    • Exclusionary syndication agreements
      • Could be enjoined from entering into exclusive search agreements with search syndication partners and required to loosen restrictions surrounding syndication partners' use of rival search ads
  • There are a number of risks to case, not named in summary except that Google can argue that Microsoft's most efficient distribution channel is bing.com and that any scale MS might gain will be immaterial to Bing's competitive position
  • [BC] Staff concludes Google's conduct has resulted and will result in real harm to consumers and to innovation in online search and ads.

In their supplemental memo on mobile, BC staff claim that Google dominates mobile search via exclusivity agreements and that mobile search was rapidly growing at the time. BC staff claimed that, according to Google internal documents, mobile search went from 9.5% to 17.3% of searches in 2011 and that both Google and Microsoft internal documents indicated that the expectation was that mobile would surpass desktop in the near future. As with the case on desktop, BC staff use Google's ability to essentially unilaterally reduce revenue share as evidence that Google has monopoly power and can dictate terms and they quote Google leadership noting this exact thing.

BC staff acknowledge that many of Google's actions have been beneficial to consumers, but balance this against the harms of anticompetitive tactics, saying

the evidence paints a complex portrait of a company working toward an overall goal of maintaining its market share by providing the best user experience, while simultaneously engaging in tactics that resulted in harm to many vertical competitors, and likely helped to entrench Google's monopoly power over search and search advertising

BE staff strongly disagreed with BC staff. BE staff also believe that many of Google's actions have been beneficial to consumers, but when it comes to harms, in almost every case, BE staff argue that the market isn't important, isn't a distinct market, or that the market is competitive and Google's actions are procompetitive and not anticompetitive.

Common errors

At least in the documents provided by Politico, BE staff generally declined to engage with BC staff's arguments and numbers directly. For example, in addition to arguing that Google's agreements and exclusivity (insofar as agreements are exclusive) are procompetitive and foreclosing the possibility of such agreements might have significant negative impacts on the market, they argue that mobile is a small and unimportant market. The BE memo argues that mobile is only 8% of the market and, while it's growing rapidly, is unimportant, as it's only a "small percentage of overall queries and an even smaller percentage of search ad revenues". They also claim that there is robust competition in mobile because, in addition to Apple, there's also BlackBerry and Windows Mobile. Between when the FTC investigation started and when the memo was written, BlackBerry's marketshare dropped dropped from ~14% to ~6%, which was part of a long-term decline that showed no signs of changing. Windows Mobile's drop was less precipitous, from ~6% to ~4%, but in a market with such strong network effects, it's curious that BE staff would argue that these platforms with low and declining marketshare would provide robust competition going forward.

When the authors of the BE memo make a prediction, they seem to have a facility for predicting the opposite of what will happen. To do this, the authors of the BE memo took positions that were opposed to the general consensus at the time. Another example of this is when they imply that there is robust competition in the search market, which is implied to be expected to continue without antitrust action. Their evidence for this was that Yahoo and Bing had a combined "steady" 30% marketshare in the U.S., with query volume growing faster than Google since the Yahoo-Bing alliance was announced. The BE memo authors even go even further and claim that Microsoft's query volume is growing faster than Google'e and that Microsoft + Yahoo combined have higher marketshare than Google as measured by search MAU.

The BE memo's argument that Yahoo and Bing are providing robust and stable competition leaves out that the fixed costs of running a search engine are so high and the scale required to be profitable so large that Yahoo effectively dropped out of search and outsourced search to Bing. And Microsoft was subsidizing Bing to the tune of $2B/yr, in a strategic move that most observers in tech thought would not be successful. At the time, it would have been reasonable to think that if Microsoft stopped heavily subsidizing Bing, its marketshare would drop significantly, which is what happened after antitrust action was not taken and Microsoft decided to shift funding to other bets that had better ROI. Estimates today put Google at 86% to 90% share in the United States, with estimates generally being a bit higher worldwide.

On the wilder claims, such as Microsoft and Yahoo combined having more active search users than Google and that Microsoft query volume and therefore search marketshare is growing faster than Google, they use comScore data. There are a couple of curious things about this.

First, the authors pick and choose their data in order to present figures that maximize Microsoft's marketshare. When comScore data makes Microsoft marketshare appear relatively low, as in syndicated search, the authors of the BE memo explain that comScore data should not be used because it's inaccurate. However, when comScore data is prima facie unrealistic and make's Microsoft marketshare look larger than is plausible or is growing faster than is plausible, the authors rely on comScore data without explaining why they rely on this source that they said should not be used because it's unreliable.

Using this data, the BE memo basically argues that, because many users use Yahoo and Bing at least occasionally, users clearly could use Yahoo and Bing, and there must not be a significant barrier to switching even if (for example) a user uses Yahoo or Bing once a month and Google one thousand times a month. From having worked with and talked to people who work on product changed to drive growth, the overwhelming consensus has been that it's generally very difficult to convert a lightly-engaged user who barely registers as an MAU to a heavily-engaged user who uses the product regularly, and that this is generally considered more difficult than converting a brand-new user to becoming heavily engaged user. Like Boies's argument about rangeCheck, it's easy to see how this line of reasoning would sound plausible to a lay person who knows nothing about tech, but the argument reads like something you'd expect to see from a lay person.

Although the BE staff memo reads like a rebuttal to the points of the BC staff memo, the lack of direct engagement on the facts and arguments means that a reader with no knowledge of the industry who reads just one of the memos will have a very different impression than a reader who reads the other. For example, on the importance of mobile search, a naive BC-memo-only reader would think that mobile is very important, perhaps the most important thing, whereas a naive BE-memo-only reader would think that mobile is unimportant and will continue to be unimportant for the foreseeable future.

Politico also released memos from two directors who weigh the arguments of BC and BE staff. Both directors favor the BE memo over the BC memo, one very much so and one moderately so. When it comes to disagreements, such as the importance of mobile in the near future, there's no evidence in the memos presented that there was any attempt to determine who was correct or that the errors we're discussing here were noticed. The closest thing to addressing disagreements such as these are comments that thank both staffs for having done good work, in what one might call a "fair and balanced" manner, such as "The BC and BE staffs have done an outstanding job on this complex investigation. The memos from the respective bureaus make clear that the case for a complaint is close in the four areas ... ". To the extent that this can be inferred, it seems that the reasoning and facts laid out in the BE memo were given at least as much weight as the reasoning and facts in the BC memo despite much of the BE memo's case seemingly highly implausible to an observer who understands tech.

For example, on the importance of mobile, I happened to work at Google shortly after these memos were written and, when I was at Google, they had already pivoted to a "mobile first" strategy because it was understood that mobile was going to be the most important market going forward. This was also understood at other large tech companies at the time and had been understood going back further than the dates of these memos. Many consumers didn't understand this and redesigns that degraded the desktop experience in order to unify desktop and mobile experiences were a common cause of complaints at the time. But if you looked at the data on this or talked to people at big companies, it was clear that, from a business standpoint, it made sense to focus on mobile and deal with whatever fallout might happen in desktop if that allowed for greater velocity in mobile development.

Both the BC and BE staff memos extensively reference interviews across many tech companies, including all of the "hyperscalers". It's curious that someone could have access to all of these internal documents from these companies as well as interviews and then make the argument that mobile was, at the time, not very important. And it's strange that, at least to the extent that we can know what happened from these memos, directors took both sets of arguments at face value and then decided that the BE staff case was as convincing or more convincing than the BC staff case.

That's one class of error we repeatedly see between the BC and BE staff memos, stretching data to make a case that a knowledgeable observer can plainly see is not true. In most cases, it's BE staff who have stretched data as far as it can go to take a tenuous position as far as it can be pushed, but there are some instances of BC staff making a case that's a stretch.

Another class of error we see repeated, mainly in the BE memo, is taking what most people in industry would consider an obviously incorrect model of the world and then making inferences based on that. An example of this is the discussion on whether or not vertical competitors such as Yelp and TripAdvisor were or would be significantly disadvantaged by actions BC staff allege are anticompetitive. BE staff, in addition to arguing that Google's actions were actually procompetitive and not anticompetitive, argued that it would not be possible for Google to significantly harm vertical competitors because the amount of traffic Google drives to them is small, only 10% to 20% of their total traffic, going to say "the effect on traffic from Google to local sites is very small and not statistically significant". Although BE staff don't elaborate on their model of how this business works, they appear to believe that the market is basically static. If Google removes Yelp from its listings (which they threatened to do if they weren't allowed to integrate Yelp's data into their own vertical product) or downranks Yelp to preference Google's own results, this will, at most, reduce Yelp's traffic by 10% to 20% in the long run because only 10% to 20% of traffic comes from Google.

But even a VC or PM intern can be expected to understand that the market isn't static. What one would expect if Google can persistently take a significant fraction of search traffic away from Yelp and direct it to Google's local offerings instead is that, in the long run, Yelp will end up with very few users and become a shell of what it once was. This is exactly what happened and, as of this writing, Yelp is valued at $2B despite having a trailing P/E ratio of 24, which is fairly low P/E for a tech company. But the P/E ratio is unsurprisingly low because it's not generally believed that Yelp can turn this around due to Google's dominant position in search as well as maps making it very difficult for Yelp to gain or retain users. This is not just obvious in retrospect and was well understood at the time. In fact, I talked to a former colleague at Google who was working on one of a number of local features that leveraged the position that Google had and that Yelp could never reasonably attain; the expected outcome of these features was to cripple Yelp's business. Not only was it understood that this was going to happen, it was also understood that Yelp was not likely to be able to counter this due to Google's ability to leverage its market power from search and maps. It's curious that, at the time, someone would've seriously argued that cutting off Yelp's source of new users while simultaneously presenting virtually all of Yelp's then-current users with an alternative that's bundled into an app or website they already use would not significantly impact Yelp's business, but the BE memo makes that case. One could argue that the set of maneuvers used here are analogous to the ones done by Microsoft that were brought up in the Microsoft antitrust case where it was alleged that a Microsoft exec said that they were going to "cut off Netscape’s air supply", but the BE memo argues that impact of having one's air supply cut off is "very small and not statistically significant" (after all, a typical body has blood volume sufficient to bind 1L of oxygen, much more than the oxygen normally taken in during one breath).

Another class of, if not error, then poorly supported reasoning is relying on cocktail party level of reasoning when there's data or other strong evidence that can be directly applied. This happens throughout the BE memo even though, at other times, when the BC memo has some moderately plausible reasoning, the BE memo's counter is that we should not accept such reasoning and need to look at the data and not just reason about things in the abstract. The BE memo heavily leans on the concept that we must rely on data over reasoning and calls arguments from the BC memo that aren't rooted in rigorous data anecdotal, "beyond speculation", etc., but BE memo only does this in cases where knowledge or reasoning might lead one to conclude that there was some kind of barrier to competition. When the data indicates that Google's behavior creates some kind of barrier in the market, the authors of BE memo ignore all relevant data and instead rely on reasoning over data even when the reasoning is weak and has the character of the Boies argument we referenced earlier. One could argue that the standard of evidence for pursuing an antitrust case should be stronger the standard of evidence for not pursuing one, but if the asymmetry observed here were for that reason, the BE memo could have listed areas where the evidence wasn't strong enough without making its own weak assertions in the face of stronger evidence. An example of this is the discussion of the impact of mobile defaults.

The BE memo argues that defaults are essentially worthless and have little to no impact, saying multiple times that users can switch with just "a few taps", adding that this takes "a few seconds" and that, therefore, "[t]hese are trivial switching costs". The most obvious and direct argument piece of evidence on the impact of defaults is the amount of money Google pays to retain its default status. In a 2023 antitrust action, it was revealed that Google paid Apple $26.3B to retain its default status in 2021. As of this writing, Apple's P/E ratio is 29.53. If we think of this payment as, at the margin, pure profit and having default status is as worthless as indicated by the BE memo, a naive estimate of how much this is worth to Apple is that it can account for something like $776B of Apple's $2.9T market cap. Or, looking at this from Google's standpoint, Google's P/E ratio is 27.49, so Google is willing to give up $722B of its $2.17T market cap. Google is willing to pay this to be the default search for something like 25% to 30% of phones in the world. This calculation is too simplistic, but there's no reasonable adjustment that could give anyone the impression that the value of being the default is as trivial as claimed by the BE memo. For reference, a $776B tech company would be 7th most valuable publicly traded U.S. tech company and the 8th most valuable publicly traded U.S. company (behind Meta/Facebook and Berkshire Hathaway, but ahead of Eli Lilly). Another reference is that YouTube's ad revenue in 2021 was $28.8B. It would be difficult to argue that spending one YouTube worth of revenue, in profit, in order to retain default status makes sense if, in practice, user switching costs are trivial and defaults don't matter. If we look for publicly available numbers close to 2012 instead of 2021, in 2013, TechCrunch reported a rumor that Google was paying Apple $1B/yr for search status and a lawsuit then revealed that Google paid Apple $1B for default search status in 2014. This is not longer after these memos are written and $1B/yr is still a non-trivial amount of money and it belies the BE memo's claim that mobile is unimportant and that defaults don't matter because user switching costs are trivial.

It's curious that, given the heavy emphasis in the BE memo on not trusting plausible reasoning and having to rely on empirical data, that BE staff appeared to make no attempt to find out how much Google was paying for its default status (a memo by a director who agrees with BE staff suggests that someone ought to check on this number, but there's no evidence that this was done and the FTC investigation was dropped shortly afterwards). Given the number of internal documents the FTC was able to obtain, it seems unlikely that the FTC would not have been able to obtain this number from either Apple or Google. But, even if it were the case that the number were unobtainable, it's prima facie implausible that defaults don't matter and switching costs are low in practice. If FTC staff interviewed product-oriented engineers and PMs or looked at the history of products in tech, so in order to make this case, BE staff had to ignore or avoid finding out how much Google was paying for default status, not talk to product-focused engineers, PM, or leadership, and also avoid learning about the tech industry.

One could make the case that, while defaults are powerful, companies have been able to overcome being non-default, which could lead to a debate on exactly how powerful defaults are. For example, one might argue about the impact of defaults when Google Chrome became the dominant browser and debate how much of it was due to Chrome simply being a better browser than IE, Opera, and Firefox, how much was due to blunders by Microsoft that Google is unlikely to repeat in search, how much was due to things like tricking people into making Chrome default via a bundle deal with badware installers and how much was due to pressuring people into setting Chrome is default via google.com. That's an interesting discussion where a reasonable person with an understanding of the industry could take either side of the debate, unlike the claim that defaults basically don't matter at all and user switching costs are trivial in practice, which is not plausible even without access to the data on how much Google pays Apple and others to retain default status. And as of the 2020 DoJ case against Google, roughly half of Google searches occur via a default search that Google pays for.

Another repeated error, closely related to the one above, is bringing up marketing statements, press releases, or other statements that are generally understood to be exaggerations, and relying on these as if they're meaningful statements of fact. For example, the BE memo states:

Microsoft's public statements are not consistent with statements made to antitrust regulators. Microsoft CEO Steve Ballmer stated in a press release announcing the search agreement with Yahoo: "This agreement with Yahoo! will provide the scale we need to deliver even more rapid advances in relevancy and usefulness. Microsoft and Yahoo! know there's so much more that search could be. This agreement gives us the scale and resources to create the future of search"

This is the kind of marketing pablum that generally accompanies an acquisition or partnership. Because this kind of meaningless statement is common across many industries, one would expect regulators, even ones with no understanding of tech, to recognize this as marketing and not give it as much or more weight as serious evidence.

A few interesting tidbits

Now that we've covered the main classes of errors observed in the memos, we'll look at a tidbits from the memos.

Between the approval of the compulsory process on June 3rd 2011 and the publication of the BC memo dated August 8th 2012, staff received 9.5M pages of documents across 2M docs and said they reviewed "many thousands of these documents", so staff were only able to review a small fraction of the documents.

Prior to the FTC investigation, there were a number of lawsuits related to the same issues, and all were dismissed, some with arguments that would, if they were taken as broad precedent, make it difficult for any litigation to succeed. In SearchKing v. Google, plaintiffs alleged that Google unfairly demoted their results but it was ruled that Google's rankings are constitutionally protected opinion and even malicious manipulation of rankings would not expose Google to liability. In Kinderstart v. Google, part of the ruling was that Google search is not an essential facility for vertical providers (such as Yelp, eBay, and Expedia). Since the memos are ultimately about legal proceedings, there is, of course, extensive discussion of Verizon v. Trinko and Aspen Skiing Co. v. Aspen Highlands Skiing Corp and the implications thereof.

As of the writing of the BC memo, 96% of Google's $38B in revenue was from ads, mostly from search ads. The BC memo makes the case that other forms of advertising, other than social media ads, only have limited potential for growth. That's certainly wrong in retrospect. For example, video ads are a significant market. YouTube's ad revenue was $28.8B in 2021 (a bit more than what Google pays to Apple to retain default search status), Twitch supposedly generated another $2B-$3B in video revenue, and a fair amount of video ad revenue goes directly from sponsors to streamers without passing through YouTube and Twitch, e.g., the #137th largest streamer on Twitch was offered $10M/yr stream online gambling for 30 minutes a day, and he claims that the #42 largest streamer, who he personally knows, was paid $10M/mo from online gambling sponsorships. And this isn't just apparent in retrospect — even at the time, there were strong signs that video would become a major advertising market. It happens that those same signs also showed that Google was likely to dominate the market for video ads, but it's still the case that the specific argument here was overstated.

In general, the BC memo seems to overstate the expected primacy of search ads as well as how distinct a market search ads are, claiming that other online ad spend is not a substitute in any way and, if anything, is a complement. Although one might be able to reasonably argue that search ads are a somewhat distinct market and the elasticity of substitution is low once you start moving a significant amount of your ad spend away from search, the degree to which the BC memo makes this claim is a stretch. Search ads and other ad budgets being complements and not substitutes is a very different position than I've heard from talking to people about how ad spend is allocated in practice. Perhaps one can argue that it makes sense to try to make a strong case here in light of Person V. Google, where Judge Fogel of the Northern District of California criticized the plaintiff's market definition, finding no basis for distinguishing "search advertising market" from the larger market for internet advertising, which likely foreshadows an objection that would be raised in any future litigation. However, as someone who's just trying to understand the facts of the matter at hand and the veracity of the arguments, the argument here seems dubious.

For Google's integrated products like local search and product search (formerly Froogle), the BC memo claims that if Google treated its own properties like other websites, the products wouldn't be ranked and Google artificially placed their own vertical competitors above organic offerings. The webspam team declined to include Froogle results because the results are exactly the kind of thing that Google removes from the index because it's spammy, saying "[o]ur algorithms specifically look for pages like these to either demote or remove from the index". Bill Brougher, product manager for web search said "Generally we like to have the destination pages in the index, not the aggregated pages. So if our local pages are lists of links to other pages, it's more important that we have the other pages in the index". After the webspam team was overruled and the results were inserted, the ads team complained that the less clicked (and implied to be lower quality) results would lead to a loss of $154M/yr. The response to this essentially contained the same content as the BC memo's argument on the importance of scale and why Google's actions to deprive competitors of scale are costly:

We face strong competition and must move quickly. Turning down onebox would hamper progress as follows - Ranking: Losing click data harms ranking; [t]riggering Losing CTR and google.com query distribution data triggering accuracy; [c]omprehensiveness: Losing traffic harms merchant growth and therefore comprehensiveness; [m]erchant cooperation: Losing traffic reduces effort merchants put into offer data, tax, & shipping; PR: Turning off onebox reduces Google's credibility in commerce; [u]ser awareness: Losing shopping-related UI on google.com reduces awareness of Google's shopping features

Normally, CTR is used as a strong signal to rank results, but this would've resulted in a low ranking for Google's own vertical properties, so "Google used occurrence of competing vertical websites to automatically boost the ranking of its own vertical properties above that of competitors" — if a comparison shopping site was relevant, Google would insert Google Product search above any rival, and if a local search site like Yelp or CitySearch was relevant, Google automatically returned Google Local at top of SERP.

Additionally, in order to see content for Google local results, Google took Yelp content and integrated it into Google Places. When Yelp observed this was happening, they objected to this and Google threatened to ban Yelp from traditional Google search results and further threatened to ban any vertical provider that didn't allow its content to be used in Google Places. Marissa Mayer testified that it was, from a technical standpoint, extraordinarily difficult to remove Yelp from Google Places without also removing Yelp from traditional organic search results. But when Yelp sent a cease and desist letter, Google was able to remove Yelp results immediately, seemingly indicating that it was less difficult than claimed. Google then claimed that it was technically infeasible to remove Yelp from Google Places without removing Yelp from the "local merge" interface on SERP. BC staff believe this claim is false as well, and Marissa Mayer later admitted in a hearing that this claim was false and that Google was concerned about the consequences of allowing sites to opt out of Google Places while staying in "local merge". There was also a very similar story with Amazon results and product search. As noted above, the BE memo's counterargument to all of this is that Google traffic is "very small and not statistically significant"

The BC memo claims that the activities above both reduced incentives of companies Yelp, City Search, Amazon, etc., to invest in the area and also reduced the incentives for new companies to form in this area. This seems true. In addition to the evidence presented in the BC memo (which goes beyond what was summarized above), if you just talked to founders looking for an idea or VCs around the time of the FTC investigation, there had already been a real movement away from founding and funding companies like Yelp because it was understood that Google could seriously cripple any similar company in this space by cutting off its air supply.

We'll defer to the appendix BC memo discussion on the AdWords API restrictions that specifically disallow programmatic porting of campaigns to other platforms, such as Bing. But one interesting bit there is that Google was apparently aware of the legal sensitivity of this matter, so meeting notes and internal documentation on the topic are unusually incomplete. On one meeting, apparently the most informative written record BC staff were able to find consists of a message from Director of PM Richard Holden to SVP of ads Susan Wojicki which reads, "We didn't take notes for obvious reasons hence why I'm not elaborating too much here in email but happy to brief you more verbally".

We'll also defer a detailed discussion of the BC memo comments on Google's exclusive and restrictive syndication agreements to the appendix, except for a couple of funny bits. One is that Google claims they were unaware of the terms and conditions in their standard online service agreements. In particular, the terms and conditions contained a "preferred placement" clause, which a number of parties believe is a de facto exclusivity agreement. When FTC staff questioned Google's VP of search services about this term, the VP claimed they were not aware of this term. Afterwards, Google sent a letter to Barbara Blank of the FTC explaining that they were removing the preferred placement clause in the standard online agreement.

Another funny bit involves Google's market power and how it allowed them to collect an increasingly large share of revenue for themselves and decrease the revenue share their partner received. Only a small number of Google's customers who were impacted by this found this concerning. Those that did find it concerning were some of the largest and most sophisticated customers (such as Amazon and IAC); their concern was that Google's restrictive and exclusive provisions would increase Google's dominance over Bing/Microsoft and allow them to dictate worse terms to customers. Even as Google was executing a systematic strategy to reduce revenue share to customers, which could only be possible due to their dominance of the market, most customers appeared to either not understand the long-term implications of Google's market power in this area or the importance of the internet.

For example, Best Buy didn't find this concerning because Best Buy viewed their website and the web as a way for customers to find presale information before entering a store and Walmart didn't find didn't find this concerning because they viewed the web as an extension to brick and mortar retail. It seems that the same lack of understanding of the importance of the internet which led Walmart and Best Buy to express their lack of concern over Google's dominance here also led to these retailers, which previously had a much stronger position than Amazon, falling greatly behind in both online and overall profit. Walmart later realized its error here and acquired Jet.com for $3.3B in 2016 and also seriously (relative to other retailers) funded programmers to do serious tech work inside Walmart. Since Walmart started taking the internet seriously, it's made a substantial comeback online and has averaged a 30% CAGR in online net sales since 2018, but taking two decades to mount a serious response to Amazon's online presence has put Walmart solidly behind Amazon in online retail despite nearly a decade of serious investment and Best Buy has still not been able to mount an effective response to Amazon after three decades.

The BE memo uses the lack of concern on the part of most customers as evidence that the exclusive and restrictive conditions Google dictated here were not a problem but, in retrospect, it's clear that it was only a lack of understanding of the implications of online business that led customers to be unconcerned here. And when the BE memo refers to the customers who understood the implications here as sophisticated, that's relative to people in lines of business where leadership tended to not understand the internet. While these customers are sophisticated by comparison to a retailer that took two decades to mount a serious response to the threat Amazon poses to their business, if you just talked to people in the tech industry at the time, you wouldn't need to find a particularly sophisticated individual to find someone who understood what was going on. It was generally understood that retail revenue and even moreso, retail profit was going to move online, and you'd have to find someone who was extremely unusually out of the loop to find someone who didn't at least roughly understand the implications here.

There's a lengthy discussion on search and scale in both the BC and BE memos. On this topic, the BE memo seems wrong and the implications of the BC memo are, if not subtle, at least not obvious. Let's start with the BE memo because that one's simpler to discuss, although we'll very briefly discuss the argument in the BC memo in order to frame the discussion in the BE memo. A rough sketch of the argument in the BC memo is that there are multiple markets (search, ads) where scale has a significant impact on product quality. Google's own documents acknowledge this "virtuous cycle" where having more users lets you serve better ads, which gives you better revenue for ads and, likewise in search, having more scale gives you more data which can be used to improve results, which leads to user growth. And for search in particular, the BC memo claims that click data from users is of high importance and that more data allows for better results.

The BE memo claims that this is not really the case. On the importance of click data, the BE memo raises two large objections. First, that this is "contrary to the history of the general search market" and second, that "it is also contrary to the evidence that factors such as the quality of the web crawler and web index; quality of the search algorithm; and the type of content included in the search results [are as important or more important].

Of the first argument, the BE memo elaborates with a case that's roughly "Google used to be smaller than it is today, and the click data at the time was sufficient, therefore being as large as Google used to be means that you have sufficient click data". Independent of knowledge of the tech industry, this seems like a strange line of reasoning. "We now produce a product that's 1/3 as good as our competitor for the same price, but that should be fine because our competitor previously produced a product that's 1/3 as good as their current product when the market was less mature and no one was producing a better product" is generally not going to be a winning move. That's especially true in markets where there's a virtuous cycle between market share and product quality, like in search.

The second argument also seems like a strange argument to make even without knowledge of the tech industry in that it's a classic fallacious argument. It's analogous to saying something like "the BC memo claims that it's important for cars to have a right front tire, but that's contrary to evidence that it's at least as important for a car to have a left front tire and a right rear tire". The argument is even less plausible if you understand tech, especially search. Calling out the quality of the search algorithm as distinct doesn't feel quite right because scale and click data directly feed into algorithm development (and this is discussed at some length in the BE memo — the authors of the BC memo surely had access to the same information and, from their writing, seem to have had access to the argument). And as someone who's worked on search indexing, as much as I'd like to be agree with the BE memo and say that indexing is as important or more important than ranking, I have to admit that indexing is an easier and less important problem than ranking and likewise for crawling vs. ranking. This was generally understood at the time so, given the number of interviews FTC staff did, the authors of the BE memo should've known this as well. Moreover, given the "history of the general search market" which the BE memo refers to, even without talking to engineers, this should've been apparent.

For example, Cuil was famous for building a larger index than Google. While that's not a trivial endeavor, at the time, quite a few people had the expertise to build an index that rivaled Google's index in raw size or whatever other indexing metric you prefer, if given enough funding for a serious infra startup. Cuil and other index-focused attempts failed because having a large index without good search ranking is worth little. While it's technically true that having good ranking with a poor index is also worth little, this is not something we've really seen in practice because ranking is the much harder problem and a company that's competent to build a good search ranker will, as a matter of course, have a good enough index and good enough crawling.

As for the case in the BC memo, I don't know what the implications should be. The BC memo correctly points out that increased scale greatly improves search quality, that the extra data Bing got from the Yahoo greatly increased search quality and increased CTR, that further increased scale should be expected to continue to provide high return, that the costs of creating a competitor to Google are high (Bing was said to be losing $2B/yr at the time and was said to be spending $4.5B/yr "developing its algorithms and building the physical capacity necessary to operate Bing"), and that Google undertook actions that might be deemed anticompetitive which disadvantaged Bing's compared to the counterfactual world where Google did not take those actionts, and they make a similar case for ads. However, despite the strength of the stated BC memo case and the incorrectness of the stated BE memo case, the BE memo's case is correct in spirit, in that there are actions Microsoft could've taken but did not in order to compete much more effectively in search and one could argue that the FTC shouldn't be in the business of rescuing a company from competing ineffectively.

Personally, I don't think it's too interesting to discuss the position of the BC memo vs. the BE memo at length because the positions the BE memo takes seem extremely weak. It's not fair to call it a straw man because it's a real position, and one that carried the day at the FTC, but the decision to take action or not seemed more about philosophy than the arguments in the memos. But we can discuss what else might've been done.

What might've happened

What happened after the FTC declined to pursue antitrust action was that Microsoft effectively defunded Bing as a serious bet, taking resources that could've gone to continuing to fund a very expensive fight against Google, and moving them to other bets that it deemed to be higher ROI. The big bets Microsoft pursued were Azure, Office, and HoloLens (and arguably Xbox). Hololens was a pie-in-the-sky bet, but Azure and Office were lines of business where Microsoft could, instead of fighting an uphill battle where their competitor can use its dominance in related markets to push around competitors, Microsoft could fight downhill battles where they can use their dominance in related markets to push around competitors, resulting in a much higher return per dollar invested. As someone who worked on Bing and thought that BIng had the potential to seriously compete with Google given sustained, unprofitable, heavy investment, I find that disappointing but also likely the correct business decision. If you look at any particular submarket, like Teams vs. Slack, the Microsoft product doesn't need to be nearly as good as the competing product to take over the market, which is the opposite of the case in search, where Google's ability to push competitors around means that Bing would have to be much better than Google to attain marketshare parity.

Based on their public statements, Biden's DoJ Antitrust AAG appointee, Jonathan Kanter, would argue for pursuing antitrust action under the circumstances, as would Biden's FTC commissioner and chair appointee Lina Khan. Prior to her appointment as FTC commissioner and chair, Khan was probably best known for writing Amazon's Antitrust Paradox, which has been influential as well as controversial. Obama appointees, who more frequently agreed with the kind of reasoning from the BE memo, would have argued against antitrust action and the investigation under discussion was stopped on their watch. More broadly, they argued against the philosophy driving Kanter and Khan. Obama's FTC Commissioner appointee, GMU economist and legal scholar Josh Wright actually wrote a rebuttal titled "Requiem for a Paradox: The Dubious Rise and Inevitable Fall of Hipster Antitrust", a scathing critique of Khan's position.

If, in 2012, the FTC and DoJ were run by Biden appointees instead of Obama appointees, what difference would that have made? We can only speculate, but one possibility would be that they would've taken action and then lost, as happened with the recent cases against Meta and Microsoft which seem like they would not have been undertaken under an Obama FTC and DoJ. Under Biden appointees, there's been much more vigorous use of the laws that are on the books, the Sherman Act, the Clayton Act, the FTC Act, the Robinson–Patman Act, as well as "smaller" antitrust laws, but the opinion of the courts hasn't changed under Biden and this has led to a number of unsuccessful antitrust cases in tech. Both the BE and BC memos dedicate significant space to whether or not a particular line of reasoning will hold up in court. Biden's appointees are much less concerned with this than previous appointees and multiple people in the DoJ and the FTC are on the record saying things like "it is our duty to enforce the law", meaning that when they see violations of the antitrust laws that were put into place by elected officials, it's their job to pursue these violations even if courts may not agree with the law.

Another possibility is that there would've been some action, but the action would've been in line with most corporate penalties we see. Something like a small fine that costs the company an insignificant fraction of marginal profit they made from their actions, or some kind of consent decree (basically a cease and desist), where the company will be required to stop doing specific actions while keeping their marketshare, keeping the main thing they wanted to gain, a massive advantage in a market dominated by network effects. Perhaps there will be a few more meetings where "[w]e didn't take notes for obvious reasons" to work around the new limitations and business as usual will continue. Given the specific allegations in the FTC memos and the attitudes of the courts at the time, my guess is that something like this second set of possibilities would've been the most likely outcome had the FTC proceeded with their antitrust investigation instead of dropping it, some kind of nominal victory that makes little to no difference in practice. Given how long it takes for these cases to play out, it's overwhelmingly likely that Microsoft would've already scaled back its investment in Bing and moved Bing from a subsidized bet it was trying to grow to a profitable business it wanted to keep by the time any decision was made. There are a number of cases that were brought by other countries which had remedies that were in line with what we might've expected if the FTC investigation continued. On Google using market power in mobile to push software Google wants to nearly all Android phones, an EU and was nominally successful but made little to no difference in practice. Cristina Caffara of the Centre for Economic Policy Research characterized this as

Europe has failed to drive change on the ground. Why? Because we told them, don't do it again, bad dog, don't do it again. But in fact, they all went and said 'ok, ok', and then went out, ran back from the back door and did it again, because they're smarter than the regulator, right? And that's what happens.

So, on the tying case, in Android, the issue was, don't tie again so they say, "ok, we don't tie". Now we got a new system. If you want Google Play Store, you pay $100. But if you want to put search in every entry point, you get a discount of $100 ... the remedy failed, and everyone else says, "oh, that's a nice way to think about it, very clever"

Another pair of related cases are Yandex's Russian case on mobile search defaults and a later EU consent decree. In 2015, Yandex brought a suit about mobile default status on Android in Russia, which was settled by adding a "choice screen" which has users pick their search engine without preferencing a default. This immediately caused Yandex to start gaining marketshare on Google and Yandex eventually surpassed Google in marketshare in Russia according to statcounter. In 2018, the EU required a similar choice screen in Europe, which didn't make much of a difference, except maybe sort of in the Czech republic. There are a number of differences between the situation in Russia and in the EU. One, arguably the most important, is that when Yandex brought the case against Google in Russia, Yandex was still fairly competitive, with marketshare in the high 30% range. At the time of the EU decision in 2018, Bing was the #2 search engine in Europe, with about 3.6% marketshare. Giving consumers a choice when one search engine completely dominates the market can be expected to have fairly little impact. One argument the BE memo heavily relies on is the idea that, if we intervene in any way, that could have bad effects down the line, so we should be very careful and probably not do anything, just in case. But in these winner-take-most markets with such strong network effects, there's a relatively small window in which you can cheaply intervene. Perhaps, and this is highly speculative, if the FTC required a choice screen in 2012, Bing would've continued to invest enough to at least maintain its marketshare against Google.

For verticals, in shopping, the EU required some changes to how Google presents results in 2017. This appears to have had little to no impact, being both perhaps 5-10 years too late and also a trivial change that wouldn't have made much difference even if enacted a decade earlier. The 2017 ruling came out of a case that started in 2010, and in the 7 years it took to take action, Google managed to outcompete its vertical competitors, making them barely relevant at best.

Another place we could look is at the Microsoft antitrust trial. That's a long story, at least as long as this document, but to very briefly summarize, in 1990, the FTC started an investigation over Microsoft's allegedly anticompetitive conduct. A vote to continue the investigation ended up in a 2-2 tie, causing the investigation to be closed. The DoJ then did its own investigation, which led to a consent decree that was generally considered to not be too effective. There was then a 1998 suit by the DoJ about Microsoft's use of monopoly power in the browser market, which initially led to a decision to break Microsoft up. But, on appeal, the breakup was overturned, which led to a settlement in 2002. A major component of the 1998 case was about browser bundling and Microsoft's attack on Netscape. By the time the case was settled, in 2002, Netscape was effectively dead. The parts of the settlements having to do with interoperability were widely regarded as ineffective at the time, not only because Netscape was dead, but because they weren't going to be generally useful. A number of economists took the same position as the BE memo, that no intervention should've happened at the time and that any intervention is dangerous and could lead to a fettering of innovation. Nobel Prize winning economist Milton Friedman wrote a Cato Policy Forum essay titled "The Business Community's Suicidal Impulse", predicting that tech companies calling for antitrust action against Microsoft were committing suicide, and that a critical threshold had been passed and that this would lead to the bureaucratization of Silicon Valley

When I started in this business, as a believer in competition, I was a great supporter of antitrust laws; I thought enforcing them was one of the few desirable things that the government could do to promote more competition. But as I watched what actually happened, I saw that, instead of promoting competition, antitrust laws tended to do exactly the opposite, because they tended, like so many government activities, to be taken over by the people they were supposed to regulate and control. And so over time I have gradually come to the conclusion that antitrust laws do far more harm than good and that we would be better off if we didn’t have them at all, if we could get rid of them. But we do have them.

Under the circumstances, given that we do have antitrust laws, is it really in the self-interest of Silicon Valley to set the government on Microsoft? ... you will rue the day when you called in the government. From now on the computer industry, which has been very fortunate in that it has been relatively free of government intrusion, will experience a continuous increase in government regulation. Antitrust very quickly becomes regulation. Here again is a case that seems to me to illustrate the suicidal impulse of the business community.

In retrospect, we can see that this wasn't correct and, if anything, was the opposite of correct. On the idea that even attempting antirust action against Microsoft would lead to an inevitable increase in government intervention, we saw the opposite, a two-decade long period of relatively light regulation and antitrust activity. And in terms of the impacts on innovation, although the case against Microsoft was too little and too late to save Netscape, Google's success appears to be causally linked to the antitrust trial. At one point, in the early days of Google, when Google had no market power and Microsoft effectively controlled how people access the internet, Microsoft internally discussed proposals aimed at killing Google. One proposal involved redirecting users who tried to navigate to Google to Bing (at the time, called MSN Search, and of course this was before Chrome existed and IE dominated the browser market). Another idea was to put up a big scary warning that warned users that Google was dangerous, much like the malware warnings browsers have today. Gene Burrus, a lawyer for Microsoft at the time, stated that Microsoft chose not to attempt to stop users from navigating to google.com due to concerns about further antitrust action after they'd been through nearly a decade of serious antitrust scrutiny. People at both Google and Microsoft who were interviewed about this both believe that Microsoft would've killed Google had they done this so, in retrospect, we can see that Milton Friedman was wrong about the impacts of the Microsoft antitrust investigations and that one can make the case that it's only because of the antitrust investigations that web 1.0 companies like Google and Facebook were able to survive, let alone flourish.

Another possibility is that a significant antitrust action would've been undertaken, been successful, and been successful quickly enough to matter. It's possible that, by itself, a remedy wouldn't have changed the equation for Bing vs. Google, but if a reasonable remedy was found and enacted, it still could've been in time to keep Yelp and other vertical sites as serious concerns and maybe even spur more vertical startups. And in the hypothetical universe where people with the same philosophy as Biden's appointees were running the FTC and the DoJ, we might've also seen antitrust action against Microsoft in markets where they can leverage their dominance in adjacent markets, making Bing a more appealing area for continued heavy investment. Perhaps that would've resulted in Bing being competitive with Google and the aforementioned concerns that "sophisticated customers" like Amazon and IAC had may not have come to pass. With antitrust against Microsoft and other large companies that can use their dominance to push competitors around, perhaps Slack would still be an independent product and we'd see more startups in enterprise tools (a number of commenters believe that Slack was basically forced into being acquired because it's too difficult to compete with Teams given Microsoft's dominance in related markets). And Slack continuing to exist and innovate is small potatoes — the larger hypothetical impact would be all of the new startups and products that would be created that no one even bothers to attempt because they're concerned that a behemoth with an integrated bundle like Microsoft would crush their standalone product. If you add up all of these, if not best-case, at least very-good-case outcomes for antitrust advocates, one could argue that consumers and businesses would be better off. But, realistically, it's hard to see how this very-good-case set of outcomes could have come to pass.

Coming back to the FTC memo, if we think about what it would take to put together a set of antitrust actions that actually fosters real competition, that seems extraordinarily difficult. A number of the more straightforward and plausible sounding solutions are off the table for political reasons, due to legal precedent, or due to arguments like the Boies argument we referenced or some of the arguments in the BE memo that are clearly incorrect, but appear to be convincing to very important people.

For the solutions that seem to be on the table, weighing the harms caused by them is non-trivial. For example, let's say the FTC mandated a mobile and desktop choice screen in 2012. This would've killed Mozilla in fairly short order unless Mozilla completely changed its business model because Mozilla basically relies on payments from Google for default status to survive. We've seen with Opera that even when you have a superior browser that introduces features that other browsers later copy, which has better performance than other browsers, etc., you can't really compete with free browsers when you have a paid browser. So then we would've quickly been down to IE/Edge and Chrome. And in terms of browser engines, just Chrome after not too long as Edge is now running Chrome under the hood. Maybe we can come up with another remedy that allows for browser competition as well, but the BE memo isn't wrong to note that antitrust remedies can cause other harms.

Another example which highlights the difficulty of crafting a politically suitable remedy are the restrictions the Bundeskartellamt imposed against Facebook, which have to do with user privacy and use of data (for personalization, ranking, general ML training, etc.), which is considered an antitrust issue in Germany. Michal Gal, Professor and Director of the Forum on Law and Markets at the University of Haifa pointed out that, of course Facebook, in response to the rulings, is careful to only limit its use of data if Facebook detects that you're German. If the concern is that ML models are trained on user data, this doesn't do much to impair Facebook's capability. Hypothetically, if Germany had a tech scene that was competitive with American tech and German companies were concerned about a similar ruling being leveled against them, this would be disadvantageous to nascent German companies that initially focus on the German market before expanding internationally. For Germany, this is only a theoretical concern as, other than SAP, no German company has even approached the size and scope of large American tech companies. But when looking at American remedies and American regulation, this isn't a theoretical concern, and some lawmakers will want to weigh the protection of American consumers against the drag imposed on American firms when compared to Korean, Chinese, and other foreign firms that can grow in local markets with fewer privacy concerns before expanding to international markets. This concern, if taken seriously, could be used to argue against nearly any pro-antitrust action argument.

What can we do going forward?

This document is already long enough, so we'll defer a detailed discussion of policy specifics for another time, but in terms of high-level actions, one thing that seems like it would be helpful is to have tech people intimately involved in crafting remedies and regulation as well as during investigations2. From the directors memos on the 2011-2021 FTC investigation that are publicly available, it would appear this was not done because the arguments from the BE memos that wouldn't pass the sniff test for a tech person appear to have been taken seriously. Another example is the one EU remedy that Cristina Caffara noted was immediately worked around by Google, in a way that many people in tech would find to be a delightful "hack".

There's a long history of this kind of "hacking the system" being lauded in tech going back to before anyone called it "tech" and it was just physics and electrical engineering. To pick a more recent example, one of the reasons Sam Altman become President of Y Combinator, which eventually led to him becoming CEO of Open AI was that Paul Graham admired his ability to hack systems; in his 2010 essay on founders, under the section titled "Naughtiness", Paul wrote:

Though the most successful founders are usually good people, they tend to have a piratical gleam in their eye. They're not Goody Two-Shoes type good. Morally, they care about getting the big questions right, but not about observing proprieties. That's why I'd use the word naughty rather than evil. They delight in breaking rules, but not rules that matter. This quality may be redundant though; it may be implied by imagination.

Sam Altman of Loopt is one of the most successful alumni, so we asked him what question we could put on the Y Combinator application that would help us discover more people like him. He said to ask about a time when they'd hacked something to their advantage—hacked in the sense of beating the system, not breaking into computers. It has become one of the questions we pay most attention to when judging applications.

Or, to pick one of countless examples from Google, in order to reduce travel costs at Google, Google engineers implemented a system where they computed some kind of baseline "expected cost for flights, and then gave people a credit for taking flights that came in under the baseline costs that could be used to upgrade future flights and travel accommodations. This was a nice experience for employees compared to what stodgier companies were doing in terms of expense limits and Google engineers were proud of creating a system that made things better for everyone, which was one kind of hacking the system. The next level of hacking the system was when some employees optimized their flights and even set up trips to locations that were highly optimizable (many engineers would consider this a fun challenge, a variant of classic dynamic programming problems that are given in interviews, etc.), allowing them to upgrade to first class flights and the nicest hotels.

When I've talked about this with people in management in traditional industries, they've frequently been horrified and can't believe that these employees weren't censured or even fired for cheating the system. But when I was at Google, people generally found this to be admirable, as it exemplified the hacker spirit.

We can see, from the history of antitrust in tech going back at least two decades, that courts, regulators, and legislators have not been prepared for the vigor, speed, and delight with which tech companies hack the system.

And there's precedent for bringing in tech folks to work on the other side of the table. For example, this was done in the big Microsoft antitrust case. But there are incentive issues that make this difficult at every level that stem from, among other things, the sheer amount of money that tech companies are willing to pay out. If I think about tech folks I know who are very good at the kind of hacking the system described here, the ones who want to be employed at big companies frequently make seven figures (or more) annually, a sum not likely to be rivaled by an individual consulting contract with the DoJ or FTC. If we look at the example of Microsoft again, the tech group that was involved was managed by Ron Schnell, who was taking a break from working after his third exit, but people like that are relatively few and far between. Of course there are people who don't want to work at big companies for a variety of reasons, often moral reasons or a dislike of big company corporate politics, but most people I know who fit that description haven't spent enough time at big companies to really understand the mechanics of how big companies operate and are the wrong people for this job even if they're great engineers and great hackers.

At an antitrust conference a while back, a speaker noted that the mixing and collaboration between the legal and economics communities was a great boon for antitrust work. Notably absent from the speech as well as the conference were practitioners from industry. The conference had the feel of an academic conference, so you might see CS academics at the conference some day, but even if that were to happen, many of the policy-level discussions are ones that are outside the area of interest of CS academics. For example, one of the arguments from the BE memo that we noted as implausible was the way they used MAU to basically argue that switching costs were low. That's something outside the area of research of almost every CS academic, so even if the conference were to expand and bring in folks who work closely with tech, the natural attendees would still not be the right people to weigh in on the topic when it comes to the plausibility of nitty gritty details.

Besides the aforementioned impact on policy discussions, the lack of collaboration with tech folks also meant that, when people spoke about the motives of actors, they would often make assumptions that were unwarranted. On one specific example of what someone might call a hack of the system, the speaker described an exec's reaction (high-fives, etc.), and inferred a contempt for lawmakers and the law that was not in evidence. It's possible the exec in question does, in fact, have a contempt and disdain for lawmakers and the law, but that celebration is exactly what you might've seen after someone at Google figured out how to get upgraded to first class "for free" on almost all their flights by hacking the system at Google, which wouldn't indicate contempt or disdain at all.

Coming back to the incentive problem, it goes beyond getting people who understand tech on the other side of the table in antitrust discussions. If you ask Capitol Hill staffers who were around at the time, the general belief is that the primary factor that scuttled the FTC investigation was Google's lobbying, and of course Google and other large tech companies spend more on lobbying than entities that are interested in increased antitrust scrutiny.

And in the civil service, if we look at the lead of the BC investigation and the first author on the BC memo, they're now Director and Associate General Counsel of Competition and Regulatory Affairs at Facebook. I don't know them, so I can't speak to their motivations, but if I were offered as much money as I expect they make to work on antitrust and other regulatory issues at Facebook, I'd probably take the offer. Even putting aside the pay, if I was a strong believer in the goals of increased antitrust enforcement, that would still be a very compelling offer. Working for the FTC, maybe you lead another investigation where you write a memo that's much stronger than the opposition memo, which doesn't matter when a big tech company pours more lobbying money into D.C. and the investigation is closed. Or maybe your investigation leads to an outcome like the EU investigation that led to a "choice screen" that was too little and far too late. Or maybe it leads to something like the Android Play Store untying case where, seven years after the investigation was started, an enterprising Google employee figures out a "hack" that makes the consent decree useless in about five minutes. At least inside Facebook, you can nudge the company towards what you think is right and have some impact on how Facebook treats consumers and competitors.

Looking at it from the standpoint of people in tech (as opposed to people working in antitrust), in my extended social circles, it's common to hear people say "I'd never work at company X for moral reasons". That's a fine position to take but, almost everyone I know who does this ends up working at a much smaller company that has almost no impact on the world. If you want to take a moral stand, you're more likely to make a difference by working from the inside or finding a smaller direct competitor and helping it become more successful.

Thanks to Laurence Tratt, Yossi Kreinin, Justin Hong, [email protected], Sophia Wisdom, @[email protected], @[email protected], and Misha Yagudin for comments/corrections/discussion

Appendix: non-statements

This is analogous to the "non-goals" section of a technical design doc, but weaker, in that a non-goal in a design doc is often a positive statement that implies something that couldn't be implied from reading the doc, whereas the non-goal statements themselves don't add any informatio

  • Antitrust action against Google should have been pursued in 2012
    • Not that anyone should care what my opinion is, but if you'd asked me at the time if antitrust action should be pursued, I would've said "probably not". The case for antitrust action seems stronger now and the case against seems weaker, but you could still mount a fairly strong argument against antitrust action today.
    • Even if you believe that, ceteris paribus, antitrust action would've been good for consumers and the "very good case" outcome in "what might've happened" would occur if antitrust action were pursued, it's still not obvious that Google and other tech companies are the right target as opposed to (just for example) Visa and Mastercard's dominance of payments, hospital mergers leading to increased concentration that's had negative impacts on both consumers and workers, Ticketmaster's dominance, etc.. Or perhaps you think the government should focus on areas where regulation specifically protects firms, such as in shipping (which is except from the Sherman Act) or car dealerships (which have special protections in the law in many U.S. states that prevent direct sales and compel car companies to abide by their demands in certain ways), etc.
  • Weaker or stronger antitrust measures should be taken today
    • I don't think I've spent enough time reading up on the legal, political, historical, and philosophical background to have an opinion on what should be done, but I know enough about tech to point out a few errors that I've seen and to call out common themes in these errors.

BC Staff Memo

By "Barbara R. Blank, Gustav P. Chiarello, Melissa Westman-Cherry, Matthew Accornero, Jennifer Nagle, Anticompetitive Practices Division; James Rhilinger, Healthcare Division; James Frost, Office of Policy and Coordination; Priya B. Viswanath, Office of the Director; Stuart Hirschfeld, Danica Noble, Northwest Region; Thomas Dahdouh, Western Region-San Francisco, Attorneys; Daniel Gross, Robert Hilliard, Catherine McNally, Cristobal Ramon, Sarah Sajewski, Brian Stone, Honors Paralegals; Stephanie Langley, Investigator"

Dated August 8, 2012

Executive Summary

  • Google is dominant search engine and seller of search ads
  • This memo addresses 4 of 5 areas with anticompetitive conduct; mobile is in a supplemental memo
  • Google has monopoly power in the U.S. in Horizontal Search; Search Advertising; and Syndicated Search and Search Advertising
  • On the question of whether Google has unlawfully preferenced its own content while demoting rivals, we do not recommend the FTC proceed; it's a close call and case law is not favorable to anticompetitive product design and Google's efficiency justifications are strong and at there's some benefit to users
  • On whether Google has unlawfully scraped content from vertical rivals to improve their own vertical products, recommending condemning as a conditional refusal to deal under Section 2
    • Prior voluntary dealing was mutually beneficial
    • Threats to remove rival content from general search designed to coerce rivals into allowing Google to user their content for Google's vertical product
    • Natural and probable effect is to diminish incentives of vertical website R&D
  • On anticompetitive contractual restrictions on automated cross-management of ad campaigns, restrictions should be condemned under Section 2
    • They limit ability of advertisers to make use of their own data, reducing innovation and increasing transaction costs for advertisers and third-party businesses
    • Also degrade the quality of Google's rivals in search and search advertising
    • Google's efficiency justifications appears to be pretextual
  • On anticompetitive exclusionary agreements with websites for syndicated search and search ads, Google should be condemned under Section 2
    • Only modest anticompetitive effects on publishers, but deny scale to competitors, competitively significant to main rival (Bing) as well as significant barrier to entry in longer term
    • Google's efficiency justifications are, on balance, non-persuasive
  • Possible remedies
    • Scraping
      • Could be required to provide an opt-out for snippets (reviews, ratings) from Google's vertical properties while retaining snippets in web search and/or Universal Search on main search results page
      • Could be required to limit use of content indexed from web search results
    • Campaign management restrictions
      • Could be required to remove problematic contractual restrictions from license agreements
    • Exclusionary syndication agreements
      • Could be enjoined from entering into exclusive search agreements with search syndication partners and required to loosen restrictions surrounding syndication partners' use of rival search ads
  • There are a number of risks to case, not named in summary except that Google can argue that Microsoft's most efficient distribution channel is bing.com and that any scale MS might gain will be immaterial to Bing's competitive position
  • Staff concludes Google's conduct has resulted and will result in real harm to consumer, innovation in online search and ads.

I. HISTORY OF THE INVESTIGATION AND RELATED PROCEEDINGS

A. FTC INVESTIGATION

  • Compulsory process approved on June 03 2011
  • Received over 2M docs (9.5M pages) "and have reviewed many thousands of those documents"
  • Reviewed documents procured to DoJ in Google-Yahoo (2008) and ITA (2010) investigations and documents produced in response to European Commission and U.S. State investigations
  • Interviewed dozens of parties including vertical competitors in travel, local, finance, and retail; U.. advertisers and ad agencies; Google U.S. syndication and distribution partners; mobile device manufacturers and wireless carriers
  • 17 investigational hearings of Google execs & employees

B. EUROPEAN COMMISSION INVESTIGATION

  • Parallel investigation since November 2010
  • May 21, 2012: Commissioner Joaquin Almunia issued letter signaling EC's possible intent to issue Statement of Objections for abuse of dominance in violation of Article 102 of EC Treaty
    • Concerns
      • "favourable treatment of its own vertical search services as compared to those of its competitors in its natural search results"
      • "practice of copying third party content" to supplement own vertical content
      • "exclusivity agreements with publishers for the provision of search advertising intermediation services"
      • "restrictions with regard to the portability and cross-platform management of online advertising campaigns"
    • offered opportunity to resolve concerns prior to issuance of SO by producing description of solutions
    • Google denied infringement of EU law, but proposed several commitments to address stated concerns
  • FTC staff coordinated with EC staff

C. MULTI-STATE INVESTIGATION

  • Texas investigating since June 2010, leader of multi-state working group
  • FTC working closely with states

D. PRIVATE LITIGATION

  • Several private lawsuits related to issues in our investigation; all dismissed
  • Two categories, manipulation of search rankings and increases in minimum prices for AdWords search ads
  • Kinderstart.com LLC v. Google, Inc.,1 ¹¹ and SearchKing, Inc. v. Google Tech., Inc., plaintiffs alleged that Google unfairly demoted their results
    • SearchKing court ruled that Google's rankings are constitutionally protected opinion; even malicious manipulation of rankings would not expose Google to tort liability
    • Kinderstart court rejected Google search being an essential facility for vertical websites
  • In AdsWords cases, plaintiffs argue that Google increased minimum bids for keywords they'd purchases, making those keywords effectively unavailable, depriving plaintiff website of traffic
    • TradeComet.com, LLC v. Google, Inc. dismissed for improper venue and Google, Inc. v. myTriggers.com, Inc. dismissed for failing to describe harm to competition has a whole
      • both dismissed with little discussion of merits
    • Person V. Google, Inc.: Judge Fogel of the Northern District of California criticized plaintiff's market definition, finding no basis for distinguishing "search advertising market" from larger market for internet advertising

II. STATEMENT OF FACTS

A. THE PARTIES

1. Google

  • Products include "horizontal" search engine and integrated "vertical" websites that focus on specific areas (product or shopping comparisons, maps, finance, books, video), search advertising via AdWords, search and search advertising syndication through AdSense, computer and software applications such as Google Toolbar, Gmail, Chrome, also have Android for mobile and applications for mobile devices and recently acquired Motorola Mobility
  • 32k people, $38B annual revenue

2. General search competitors

a. Microsoft
  • MSN search released in 1998, rebranded Bing in 2009. Filed complaints against Google in 2011 with FTC and EC
b. Yahoo
  • Partnership with Bing since 2010; Bing provides search results and parties jointly operate a search ad network

3. Major Vertical Competition

  • In general, these companies complain that Google's practice of preferencing its own vertical results has negatively impacted ability to compete for users and advertisers
  • Amazon
    • Product search directly competes with Google Product Search
  • eBay
    • product search competes with Google Product Search
  • NexTag
    • shopping comparison website that competes with Google Product Search
  • Foundem
    • UK product comparison website that competes with Google Product Search
    • Complaint to EC, among others, prompted EC to open its investigation into Google's web search practices
    • First vertical website to publicly accuse Google of preferencing its own vertical content over competitors on Google's search page
  • Expedia
    • competes against Google's fledgling Google Flight Search
  • TripAdvisor
    • TripAdvisor competes with Google Local (formerly Google Places)
    • has complained that Google has appropriated / scraped its user-generated reviews, placing them on Google's own local property
  • Yelp
    • has complained that Google has appropriated / scraped its user-generated reviews, placing them on Google's own local property
  • Facebook
    • Competes with Google's recently introduced Google Plus
    • has complained that Google's preferencing of Google Plus results over Facebook results is negatively impacting ability to compete for users

B. INDUSTRY BACKGROUND

1. General Search

  • [nice description of search engines for lay people omitted]

2. Online Advertising

  • Google's core business is ads; 96% of its nearly $38B in revenue was from ad sales
  • [lots of explanations of ad industry for lay people, mostly omitted]
  • Reasons advertisers have shifted business to web include high degree of tracking possible and quantifiable, superior, ROI
  • Search ads make up most of online ad spend, primarily because advertisers believe search ads provided best precision in IDing customers, measurability, and the highest ROI
  • Online advertising continues to evolve, with new offerings that aren't traditional display or search ads, such as contextual ads, re-targeted behavioral ads, and social media ads
    • these new ad products don't account for a significant portion of online ads today and, with the exception of social media ads, appear to have only limited potential for growth [Surely video is pretty big now, especially if you include "sponsorships" and not just ads inserted by the platform?]

3. Syndicated Search and Search Advertising

  • Search engines "syndicate" search and/or search ads
    • E.g., if you go "AOL or Ask.com", you can do a search which is powered by a search Provider, like Google
  • Publisher gets to keep user on own platform, search provider gets search volume and can monetize traffic
    • End-user doesn't pay; publisher pays Google either on cost-per-user-query basis or by accepting search ads and spitting revenues from search ads run on publisher's site. Revenue sharing agreement often called "traffic acquisition cost" (TAC)
  • Publishers can get search ads without offering search (AdSense) and vice versa

4. Mobile Search

  • Focus of search has been moving from desktop to "rapid emerging — and lucrative — frontier of mobile"
  • Android at forefront; has surpassed iPhone in U.S. market share
  • Mobile creates opportunities for location-based search ads; even more precise intent targeting than desktop search ads
  • Google and others have signed distribution agreements with device makers and wireless carriers, so user-purchased devices usually come pre-installed with search and other apps

C. THE SIGNIFICANCE OF SCALE IN INTERNET SEARCH

  • Scale (user queries and ad volume) important to competitive dynamics

1. Search Query Volume

  • Microsoft claims it needs higher query volume to improve Bing
    • Logs of queries can be used to improve tail queries
    • Suggestions, instant search, spelling correction
    • Trend identification, fresh news stories
  • Click data important for evaluating search quality
    • Udi Manber (former Google chief of search quality) testimony: "The ranking itself is affected by the click data. If we discover that, for a particular query, hypothetically, 80 percent of people click on Result No. 2 and only 10 percent click on Result No. 1, after a while we figure out, well, probably Result 2 is the one people want. So we'll switch it."
    • Testimony from Eric Schmidt and Sergey Brin confirms click data important and provides feedback on quality of search results
    • Scale / volume allows more experiments
      • Larry and Sergei's annual letter in 2005 notes importance of experiments, running multiple simultaneous experiments
      • More scale allows for more experiments as well as for experiments to complete more quickly
      • Susan Athey (Microsoft chief economist) says Microsoft search quality team is greatly hampered by insufficient search volume to run experiments
  • 2009 comment from Udi Manber: "The bottom line is this. If Microsoft had the same traffic we have their quality will improve *significantly*, and if we had the same traffic they have, ours will drop significantly. That's a fact"

2. Advertising Volume

  • Microsoft claims they need more ad volume to improve relevance and quality of ads
    • More ads means more choices over what ads to serve to use, better matched ads / higher conversion rates
    • Also means more queries
    • Also has similar feedback loop to search
  • Increase volume of advertisers increases competitiveness for ad properties, gives more revenue to search engine
    • Allows search engine to amortize costs, re-invest in R&D, provide better advertiser coverage, revenue through revenue-sharing agreements to syndication partners (website publishers). Greater revenue to partners attracts more publishers and more advertisers

3. Scale Curve

  • Google acknowledges the important of scale (outside of the scope of this particular discussion)
  • Google documents replete with references to "virtuous cycle" among users, advertisers, and publishers
    • Testimony from Google execs confirms this
  • But Google argues scale no longer matters at Google's scale or Microsoft's scale, that additional scale at Microsoft's scale would not "significantly improve" Microsoft search quality
  • Susan Athey argues that relative scale, Bing being 1/5th the size of Google, matters, not absolute size
  • Microsoft claims that 5% to 10% increase in query volume would be "very meaningful", notes that gaining access to Yahoo queries and ad volume in 2010 was significant for search quality and monetization
    • Claim that Yahoo query data increased click through rate for "auto suggest" from 44% to 61% [the timeframe here is July 2010 to September 2011 — too bad they didn't provide an A/B test here, since this more than 1 year timeframe allows for many other changes to impact the suggest feature as well; did that ship a major change here without A/B testing it? That seems odd]
  • Microsoft also claims search quality improvements due to experiment volume enabled by extra query volume

D. GOOGLE'S SUSPECT CONDUCT

  • Five main areas of staff investigation of alleged anticompetitive conduct:

1. Google's Preferencing of Google Vertical Properties Within Its Search Engine Results Page ("SERP")

  • Allegation is that Google's conduct is anticompetitive because "it forecloses alternative search platforms that might operate to constraint Google's dominance in search and search advertising"
  • " Although it is a close call, we do not recommend that the Commission issue a complaint against Google for this conduct."
a. Overview of Changes to Google's SERP
  • Google makes changes to UI and algorithms, sometimes without user testing
  • sometimes with testing with launch review process, typically including:
    • "the sandbox", internal testing by engineers
    • "SxS", side-by-side testing by external raters who compare existing results to proposed results
    • Testing on a small percent of live traffic
    • "launch report" for Launch Committee
  • Google claims to have run 8000 SxS tests and 2500 "live" click tests in 2010, with 500 changes launched
  • "Google's stated goal is to make its ranking algorithms better in order to provide the user with the best experience possible."
b. Google's Development and Introduction of Vertical Properties
  • Google vertical properties launched in stages, initially around 2001
  • Google News, Froogle (shopping), Image Search, and Groups
  • Google has separate indexes for each vertical
  • Around 2005 ,Google realized that vertical search engines, i.e., aggregators in some categories were a "threat" to dominance in web search, feared that these could cause shift in some searches away from Google
  • From GOOG-Texas-1325832-33 (2010): "Vertical search is of tremendous strategic importance to Google. Otherwise the risk is that Google is the go-to place for finding information only in the cases where there is sufficiently low monetization potential that no niche vertical search competitor has filled the space with a better alternative."
  • 2008 presentation titled "Online Advertising Challenges: Rise of the Aggregators":
    • "Issue 1. Consumers migrating to MoneySupermarket. Driver: General search engines not solving consumer queries as well as specialized vertical search Consequence: Increasing proportion of visitors going directly to MoneySupermarket. Google Implication: Loss of query volumes."
    • Issue 2: "MoneySupermarket has better advertiser proposition. Driver: MoneySupermarket offers cheaper, lower risk (CPA-based) leads to advertisers. Google Implication: Advertiser pull: Direct advertisers switch spend to MoneySupermarket/other channels"
  • In response to this threat, Google invested in existing verticals (shopping, local) and invested in new verticals (mortgages, offers, hotel search, flight search)
c. The Evolution of Display of Google's Vertical Properties on the SERP
  • Google initially had tabs that let users search within verticals
  • In 2003, Marissa Mayer started developing "Universal Search" (launched in 2007), to put this content directly on Google's SERP. Mayer wrote:
    • "Universal Search is an effort to redesign the user interface of the main Google.com results page SO that Google deliver[s] the most relevant information to the user on Google.com no matter what corpus that information comes from. This design is motivated by the fact that very few users are motivated to click on our tabs, SO they often miss relevant results in the other corpora."
  • Prior to Universal Search launch, Google used "OneBoxes", which put vertical content above Google's SERP
  • After launching Universal Search, vertical results could go anywhere
d. Google's Preferential Display of Google Vertical Properties on the SERP
  • Google used control over Google SERP both to improve UX for searches and to maximize benefit to its own vertical properties
  • Google wanted to maximize percentage of queries that had Universal Search results and drive traffic to Google properties
    • In 2008, goal to "[i]ncrease google.com product search inclusion to the level of google.com searches with 'product intent', while preserving clickthrough rate." (GOOG-Texas-0227159-66)
    • Q1 2008, goal of triggering Product Universal on 6% of English searches
    • Q2 2008, goal changed to top OneBox coverage of 50% with 10% CTR and "[i]ncrease coverage on head queries. For example, we should be triggering on at least 5 of the top 10 most popular queries on amazon.com at any given time, rather than only one."
    • "Larry thought product should get more exposure", GOOG-ITA-04-0004120-46 (2009)
    • Mandate from exec meeting to push product-related queries as quickly as possible
    • Launch Report for one algorithm change: 'To increase triggering on head queries, Google also implemented a change to trigger the Product Universal on google.com queries if they appeared often in the product vertical. "Using Exact Corpusboost to Trigger Product Onebox" compares queries on www.google.com with queries on Google Shopping, triggers the Product OneBox if the same query is often searched in Google Shopping, and automatically places the universal in position 4, regardless of the quality of the universal results or user "bias" for top placement of the box.'
    • "presentation stating that Google could take a number of steps to be "#1" in verticals, including "[e]ither [getting] high traffic from google.com, or [developing] a separate strong brand," and asking: "How do we link from Search to ensure strong traffic without harming user experience or AdWords proposition for advertisers?")", GOOGFOX-000082469 (2009)
    • Jon Hanke, head of Google Local, to Marissa Mayer: "long term, I think we need to commit to a more aggressive path w/ google where we can show non-webpage results on google outside of the universal 'box' most of us on geo think that we won't win unless we can inject a lot more of local directly into google results."
      • "Google's key strengths are: Google.com real estate for the ~70MM of product queries/day in US/UK/DE alone"
      • "I think the mandate has to come down that we want to win [in local] and we are willing to take some hits [i.e., trigger incorrectly sometimes]. I think a philosophical decision needs to get made that results that are not web search results and that displace web pages are "OK" on google.com and nothing to be ashamed of. That would open the door to place page or local entities as ranked results outside of some 'local universal' container. Arguably for many queriesall of the top 10 results should be local entities from our index with refinement options. The current mentality is that the google results page needs to be primarily about web pages, possibly with some other annotations if they are really, really good. That's the big weakness that bing is shooting at w/ the 'decision engine' pitch - not a sea of pointers to possible answers, but real answers right on the page. "
    • In spring 2008, Google estimated top placement of Product Universal would lead to loss of $154M/yr on product queries. Ads team requested reduction in triggering frequency and Product Universal team objected, "We face strong competition and must move quickly. Turning down onebox would hamper progress as follows - Ranking: Losing click data harms ranking; [t]riggering Losing CTR and google.com query distribution data triggering accuracy; [c]omprehensiveness: Losing traffic harms merchant growth and therefore comprehensiveness; [m]erchant cooperation: Losing traffic reduces effort merchants put into offer data, tax, & shipping; PR: Turning off onebox reduces Google's credibility in commerce; [u]ser awareness: Losing shopping-related UI on google.com reduces awareness of Google's shopping features."
  • "Google embellished its Universal Search results with photos and other eye-catching interfaces, recognizing that these design choices would help steer users to Google's vertical properties"
    • "Third party studies show the substantial difference in traffic with prominent, graphical user interfaces"; "These 'rich' user interfaces are not available to competing vertical websites"
  • Google search results near or at top of SERP, pushing other results down, resulting in reduced CTR to "natural search results"
    • Google did this without comparing quality of Google's vertical content to competitors or evaluating whether users prefer Google's vertical content to displaced results
  • click-through from eBay indicates that (Jan-Apr 2012), Google Product Search appeared in top 5 positon 64% of time when displayed and Google Product Search had lower CTR than web search in same position regardless of position [below is rank: natural result CTR / Google Shopping CTR / eBay CTR]
    • 1: 38% / 21% / 31%
    • 2: 21% / 14% / 20%
    • 3: 16% / 12% / 18%
    • 4: 13% / 9% / 11%
    • 5: 10% / 8% / 10%
    • 6: 8% / 6% / 9%
    • 7: 7% / 5% / 9%
    • 8: 6% / 2% / 7%
    • 9: 6% / 3% / 6%
    • 10 5% / 2% / 6%
    • 11: 5% / 2% / 5%
    • 12: 3% / 1% / 4%
  • Although Google tracks CTR and relies on CTR to improve web results, it hasn't relied on CTR to rank Universal Search results against other web search results
  • Marissa Mayer said Google didn't use CTR " because it would take too long to move up on the SERP on the basis of user click-through rate"
  • Instead, "Google used occurrence of competing vertical websites to automatically boost the ranking of its own vertical properties above that of competitors"
    • If comparison shopping site was relevant, Google would insert Google Product search above any rival
    • If local search like Yelp or CitySearch was relevant, Google automatically returned Google Local at top of SERP
  • Google launched commission-based verticals, mortgage, flights, offers, in ad space reserved exclusively for its own properties
    • In 2012, Google announced that google product search would transition to paid and Google would stop including product listings for merchants who don't pay to be listed
    • Google's dedicated ads don't competition with other ads via AdWords and automatically get the most effective ad spots, usually above natural search results
    • As with Google's Universal results, its own ads have a rich user interface not available to competitors which results in higher CTR
e. Google's Demotion of Competing Vertical Websites
  • "While Google embarked on a multi-year strategy of developing and showcasing its own vertical properties, Google simultaneously adopted a strategy of demoting, or refusing to display, links to certain vertical websites in highly commercial categories"
  • "Google has identified comparison shopping websites as undesirable to users, and has developed several algorithms to demote these websites on its SERP. Through an algorithm launched in 2007, Google demoted all comparison shopping websites beyond the first two on its SERP"
  • "Google's own vertical properties (inserted into Google's SERP via Universal Search) have not been subject to the same demotion algorithms, even though they might otherwise meet the criteria for demotion."
    • Google has acknowledged that its own vertical sites meet the exact criteria for demotion
    • Additionally, Google's web spam team originally refused to add Froogle to search results because "[o]ur algorithms specifically look for pages like these to either demote or remove from the index."
    • Google's web spam team also refused to add Google's local property
f. Effects of Google's SERP Changes on Vertical Rivals
  • "Google's prominent placement and display of its Universal Search properties, combined with the demotion of certain vertical competitors in Google's natural search results, has resulted in significant loss of traffic to many competing vertical websites"
  • "Google's internal data confirms the impact, showing that Google anticipated significant traffic loss to certain categories of vertical websites when it implemented many of the algorithmic changes described above"
  • "While Google's changes to its SERP led to a significant decrease in traffic for the websites of many vertical competitors, Google's prominent showcasing of its vertical properties led to gains in user share for its own properties"
  • "For example, Google's inclusion of Google Product Search as a Universal Search result took Google Product Search from a rank of seventh in page views in July 2007 to the number one rank by July 2008. Google product search leadership acknowledged that '[t]he majority of that growth has been driven through product search universal.'"
  • "Beyond the direct impact on traffic to Google and its rivals, Google's changes to its SERP have led to reduced investment and innovation in vertical search markets. For example, as a result of the rise of Google Product Search (and simultaneous fall of rival comparison shopping websites), NexTag has taken steps to reduce its investment in this area. Google's more recent launch of its flight search product has also caused NexTag to cease development of an 'innovative and competitive travel service.'"

2. Google's "Scraping" of Rivals' Vertical Content

  • "Staff has investigated whether Google has "scraped" - or appropriated - the content of rival vertical websites in order to improve its own vertical properties SO as to maintain, preserve, or enhance Google's monopoly power in the markets for search and search advertising. We recommend that the Commission issue a complaint against Google for this conduct."
  • In addition to developing its own vertical properties, Google scraped content from existing vertical websites (e.g., Yelp, TripAdvisor, Amazon) in order to improve its own vertical listings, "e.g., GOOG-Texas-1380771-73 (2009), at 71-72 (discussing importance of Google Places carrying better review content from Yelp)."
a. The "Local" Story
  • "Some local information providers, such as Yelp, TripAdvisor, and CitySearch, disapprove of the ways in which Google has made use of their content"
  • "Google recognized that review content, in particular, was "critical to winning in local search," but that Google had an 'unhealthy dependency' on Yelp for much of its review content. Google feared that its heavy reliance on Yelp content, along with Yelp's success in certain categories and geographies, could lead Yelp and other local information websites to siphon users' local queries away from Google"
    • "concern that Yelp could become competing local search platforms" (Goog-Texas-0975467-97)
  • Google Local execs tried to convince Google to acquire Yelp, but failed
  • Yelp, on finding that Google was going to use reviews on its own property, discontinued its feed and asked for Yelp content to be removed from Google Local
  • "after offering its own review site for more than two years, Google recognized that it had failed to develop a community of users - and thus, the critical mass of user reviews - that it needed to sustain its local product.", which led to failed attempt to buy Yelp
    • To address this problem, Google added Google Places results on SERP: "The listing for each business that came up as a search result linked the user directly to Google's Places page, with a label indicating that hundreds of reviews for the business were available on the Places page (but with no links to the actual sources of those reviews).On the Places Page itself, Google provided an entire paragraph of each copied review (although not the complete review), followed by a link to the source of the review, such as Yelp (which it crawled for reviews) and TripAdvisor (which was providing a feed)."
    • Yelp noticed this in July 2010, that Google was featuring Yelp's content without a license and protested to Google. TripAdvisor chose not to renew license with Google after finding same
    • Google implemented new policy that would ban properties from Google search if they didn't allow their content to be used in Google Places
      • "GOOG-Texas-1041511-12 (2010), at 12 ("remove blacklist of yelp [reviews] from Web-extracted Reviews once provider based UI live"); GOOG-Texas-1417391-403 (2010), at 394 ("stating that Google should wait to publish a blog post on the new UI until the change to "unblacklist Yelp" is "live")."
    • Along with this policy, launched new reviews product and seeded it reviews from 3rd party websites without attribution
    • Yelp, CitySearch, and TripAdvisor all complained and were all told that they could only remove their content if they were fully removed from search results. "This was not technically necessary - it was just a policy decision by Google."
    • Yelp sent Google a C&D
    • Google claimed it was technically infeasible to remove Yelp content from Google Places without also banning Yelp from search result
      • Google later did this, making it clear that the claim that it was technically infeasible was false
      • Google still maintained that it would be technically infeasible to remove Yelp from Google Places without removing it from "local merge" interface on SERP. Staff believes this assertion is false as well because Google maintains numerous "blacklists" that prevent content from being shown in specific locations
      • Mayer later admitted during hearing that the infeasible claim was false and that Google feared consequences of allowing websites to opt out of Google Places while staying in "local merge"
      • "Yelp contends that Google's continued refusal to link to Yelp on Google's 'local merge' interface on the main SERP is simply retaliation for Yelp seeking removal from Google Places."
  • "Publicly, Google framed its changes to Google Local as a redesign to move toward the provision of more original content, and thereby, to remove all third-party content and review counts from Google Local, as well as from the prominent "local merge" Universal Search interface on the main SERP. But the more likely explanation is that, by July 2011,Google had already collected sufficient reviews by bootstrapping its review collection on the display of other websites' reviews. It no longer needed to display third-party reviews, particularly while under investigation for this precise conduct."
b. The "Shopping" Story
  • [full notes omitted; story is similar to above, but with Amazon; similar claims of impossibility of removing from some places and not others; Amazon wanted Google to stop using Amazon star ratings, which Google claimed was impossible without blacklisting Amazon from all of web search, etc.; there's also a parallel story about Froogle's failure and Google's actions after that]
c. Effects of Google's "Scraping" on Vertical Rivals
  • "Because Google scraped content from these vertical websites over an extended period of time, it is difficult to point to declines in traffic that are specifically attributable to Google's conduct. However, the natural and probable effect of Google's conduct is to diminish the incentives of companies like Yelp, TripAdvisor, CitySearch, and Amazon to invest in, and to develop, new and innovative content, as the companies cannot fully capture the benefits of their innovations"

3. Google's API Restrictions

  • "Staff has investigated whether Google's restrictions on the automated cross-management of advertising campaigns has unlawfully contributed to the maintenance, preservation, or enhancement of Google's monopoly power in the markets for search and search advertising. Microsoft alleges that these restrictions are anticompetitive because they prevent Google's competitors from achieving efficient scale in search and search advertising. We recommend that the Commission issue a complaint against Google for this conduct."
a. Overview of the AdWords Platform
  • To set up AdWords, advertisers prepare bids. Can have thousands or hundreds of thousands of keywords.
    • E.g., DirectTV might bid on "television", "TV", and "satellite" plus specific TV show names, such as "Friday Night Lights", as well as misspellings
    • Bids can be calibrated by time and location
    • Advertisers then prepare ads (called "creatives") and match with various groups of keywords
    • Advertisers get data from AdWords, can evaluate effectiveness and modify bids, add/drop keywords, modify creative
      • This is called "optimization" when done manually; expensive and time-intensive
  • Initially two ways to access AdWords system, AdWords Front End and AdWords Editor
    • Editor is a program. Allows advertisers to download campaign information from Google, make bulk changes offline, then upload changes back to AdWords
    • Advertisers would make so many changes that system's capacity would be exceeded, causing outages
  • In 2004, Google added AdWords API to address problems
  • [description of what an API is omitted]
b. The Restrictive Conditions
  • AdWords API terms and conditions non-negotiable, apply to all users
  • One restriction prevents advertisers from using 3rd party tool or have 3rd party use a tool to copy data from AdWords API into ad campaign on another search network
  • Another, can't use 3rd party tool or have 3rd party use a tool to comingle AdWords campaign data with data from another search engine
  • The two conditions above will be referred to as "the restrictive conditions"
  • "These restrictions essentially prevent any third-party tool developer or advertising agency from creating a tool that provides a single user interface for multiple advertising campaigns. Such tools would facilitate cross-platform advertising."
  • "However, the restrictions do not apply to advertisers themselves, which means that very large advertisers, such as.Amazon and eBay, can develop - and have developed - their own multi-homing tools that simultaneously manage campaigns across platforms"
  • "The advertisers affected are those whose campaign volumes are large enough to benefit from using the AdWords API, but too small to justify devoting the necessary resources to develop in-house the software and expertise to manage multiple search network ad campaigns."
c. Effects of the Restrictive Conditions
i. Effects on Advertisers and Search Engine Marketers ("SEMs")
  • Prevents development of tools that would allow advertisers from managing ad campaigns on multiple search ad networks simultaneously
  • Google routinely audits API clients for compliance
  • Google has required SEMs to remove functionality, "e.g., GOOGEC-0180810-14 (2010) (Trada); GOOGEC-0180815-16 (2010) (MediaPlex); GOOGEC-0181055-58 (2010) (CoreMetrics); GOOGEC-0181083-87 (2010) (Keybroker); GOOGEC-0182218-330 (2008) (Marin Software). 251 Acquisio IR (Sep. 12, 2011); Efficient Frontier IR (Mar. 5, 2012)"
  • Other SEMs have stated they would develop this functionality without restrictions
  • "Google anticipated that the restrictive conditions would eliminate SEM incentives to innovate.", "GOOGKAMA-000004815 (2004), at 2."
  • "Many advertisers have said they would be interested in buying a tool that had multi-homing functionality. Such functionality would be attractive to advertisers because it would reduce the costs of managing multiple ad campaigns, giving advertisers access to additional advertising opportunities on multiple search advertising networks with minimal additional investment of time. The advertisers who would benefit from such a tool appear to be the medium-sized advertisers, whose advertising budgets are too small to justify hiring a full service agency, but large enough to justify paying for such a tool to help increase their advertising opportunities on multiple search networks."
ii. Effects on Competitors
  • Removing restrictions would increase ad spend on networks that compete with Google
  • Data on advertiser multi-homing show some effects of restrictive conditions. Nearly all the largest advertisers multi-home, but percentage declines as spend decreases
    • Advertisers would also multi-home with more intensity
      • Microsoft claims that multi-homing advertisers optimize their Google campaigns almost-daily, Microsoft campaigns less frequently, weekly or bi-weekly
  • Without incremental transaction costs, "all rational advertisers would multi-home"
  • Staff interviewed randomly selected small advertisers. Interviews "strongly supported" thesis that advertises would multi-home if cross-platform optimization tool were available
    • Some advertisers don't advertise on Bing due to lack of tool, the ones that do do less optimization
d. Internal Google Discussions Regarding the Restrictions
  • Internal discussions support the above
  • PM wrote the following in 2007, endorsed by director of PM Richard Holden:
    • "If we offer cross-network SEM in [Europe], we will give a significant boost to our competitors. Most advertisers that I have talked to in [Europe] don't bother running campaigns on [Microsoft] or Yahoo because the additional overhead needed to manage these other networks outweighs the small amount of additional traffic. For this reason, [Microsoft] and Yahoo still have a fraction of the advertisers that we have in [Europe], and they still have lower average CPAs [cost per acquisition]"
    • "This last point is significant. The success of Google's AdWords auctions has served to raise the costs of advertising on Google. With more advertisers entering the AdWords auctions, the prices it takes to win those auctions have naturally risen. As a result, the costs per acquisition on Google have risen relative to the costs per acquisition on Bing and Yahoo!. Despite these higher costs, as this document notes, advertisers are not switching to Bing and Yahoo! because, for many of them, the transactional costs are too great."
  • In Dec 2008, Google team led by Richard Holden evaluated possibility of relaxing or removing restrictive conditions and consulted with Google chief economist Hal Varian. Some of Holden's observations:
    • Advertisers seek out SEMs and agencies for cross-network management technology and services;
    • The restrictive conditions make the market more inefficient;
    • Removing the restrictive conditions would "open up the market" and give Google the opportunity to compete with a best-in-class SEM tool with "a streamlined workflow";
    • Removing the restrictive conditions would allow SEMs to improve their tools as well;
    • While there is a risk of additional spend going to competing search networks, it is unlikely that Google would be seriously harmed because "advertisers are going where the users are," i.e., to Google
  • "internally, Google recognized that removing the restrictions would create a more efficient market, but acknowledged a concern that doing so might diminish Google's grip on advertisers."
  • "Nonetheless, following up on that meeting, Google began evaluating ways to improve the DART Search program. DART Search was a cross-network campaign management tool owned by DoubleClick, which Google acquired in 2008. Google engineers were looking at improving the DART Search product, but had to confront limitations imposed by the restrictive conditions. During his investigational hearing, Richard Holden steadfastly denied any linkage between the need to relax the restrictive conditions and the plans to improve DART Search. ²⁷⁴ However, a series of documents - documents authored by Holden - explicitly link the two ideas."
  • Dec 2008 Holden to SVP of ad products, Susan Wojcicki and others met.
    • Holden wrote: "[O]ne debate we are having is whether we should eliminate our API T&Cs requirement that AW [AdWords] features not be co-mingled with competitor network features in SEM cross-network tools like DART Search. We are advocating that we eliminate this requirement and that we build a much more streamlined and efficient DART Search offering and let SEM tool provider competitors do the same. There was some debate about this, but we concluded that it is better for customers and the industry as a whole to make things more efficient and we will maximize our opportunity by moving quickly and providing the most robust offering"
  • Feb 2009, Holden wrote exec summary for DART, suggested Google ""alter the AdWords Ts&Cs to be less restrictive and produce the leading cross-network toolset that increases advertiser/agency efficiency." to "[r]educe friction in the search ads sales and management process and grow the industry faster"
  • Larry Page rejected this. Afterwards, Holden wrote "We've heard that and we will focus on building the product to be industry-leading and will evaluate it with him when it is done and then discuss co-mingling and enabling all to do it."
  • Sep 2009, API PM raised possibility of eliminating restrictive conditions to help DART. Comment from Holden:
    • "I think the core issue on which I'd like to get Susan's take is whether she sees a high risk of existing spend being channeled to MS/Yahoo! due to a more lenient official policy on campaign cloning. Then, weigh that risk against the benefits: enabling DART Search to compete better against non-compliant SEM tools, more industry goodwill, easier compliance enforcement. Does that seem like the right high level message?"
  • "The documents make clear that Google was weighing the efficiency of relaxing the restrictions against the potential cost to Google in market power"
  • "At a January 2010 meeting, Larry Page decided against removing or relaxing the restrictive conditions. However, there is no record of the rationale for that decision or what weight was given to the concern that relaxing the restrictive conditions might result in spend being channeled to Google's competitors. Larry Page has not testified. Holden testified that he did not recall the discussion. The participants at the meeting did not take notes "for obvious reasons." Nonetheless, the documents paint a clear picture: Google rejected relaxing the API restrictions, and at least part of the reason for this was fear of diverting advertising spend to Microsoft."

4. Google's Exclusive and Restrictive Syndication Agreements

  • "Staff has investigated whether Google has entered into exclusive or highly restrictive agreements with website publishers that have served to maintain, preserve, or enhance Google's monopoly power in the markets for search, search advertising, or search and search advertising syndication (or "search intermediation"). We recommend that the Commission issue a complaint against Google for this conduct."
a. Publishers and Market Structure
  • Buyers of search and search ad syndication are website publishers
  • Largest sites account for vast majority of syndicated search traffic and volume
  • Biggest customers are e-commerce retailers (e.g., Amazon and eBay), traditional retailers with websites (e.g., Wal-Mart, Target, Best Buy), and ISPs which operate their own portals
  • Below this group, companies with significant query volume, including vertical e-commerce sites such as Kayak, smaller retailers and ISPs such as EarthLink; all of these are < 1% of Google's total AdSense query volume
  • Below, publisher size rapidly drops off to < 0.1% of Google's query volume
  • Payment publisher receives a function of
    • volume of clicks on syndicated ad
    • "CPC", or cost-per-click advertiser willing to pay for each click
    • revenue sharing percentage
  • rate of user clicks and CPC aggregated to form "monetization rate"
b. Development of the Market for Search Syndication
  • First AdSense for Search (AFS) agreements with AOL and EarthLink in 2002
    • Goal then was to grow nascent industry of syndicated search ads
    • At the time, Google was bidding against incumbent Overture (later acquired by Yahoo) for exclusive agreements with syndication partners
  • Google's early deals favored publishers
  • To establish a presence, Google offered up-front financial guarantees to publishers

c. Specifics of Google's Syndication Agreements

  • "Today, the typical AdSense agreement contains terms and conditions that describe how and when Google will deliver search, search advertising, and other (contextual or domain related) advertising services."
  • Two main categories are AFS (search) and AFC (content). Staff investigation focused on AFS
  • For AFS, two types of agreements. GSAs (Google Service Agreements) negotiated with large partners and standard online contracts, which are non-negotiable and non-exclusive
  • Bulk of AFS partners are on standard online agreements, but those are a small fraction of revenue
  • Bulk of revenue comes from GSAs with Google's 10 largest partners (almost 80% of query volume in 2011). All GSAs have some form of exclusivity or "preferred placement" for Google
  • "Google's exclusive AFS agreements effectively prohibit the use of non-Google search and search advertising within the sites and pages designated in the agreement. Some exclusive agreements cover all properties held by a publisher globally; other agreements provide for a property-by-property (or market-by-market) assignment"
  • By 2008, Google began to migrate away from exclusivity to "preferred placement". Google must display minimum of 3 ads or number of any competitor (whichever is greater), in an unbroken block, with "preferred placement" (in the most prominent position on publisher's website)
  • Google had preferred placement restrictions in GSAs and standard online agreement. Google maintains it was not aware of this provision in standard online agreement until investigational hearing of Google VP for search services, Joan Braddi, where staff questioned Braddi
    • See Letter from Scott Sher, Wilson Sonsini, to Barbara Blank (May 25, 2012) (explaining that, as of the date of the letter, Google was removing the preferred placement clause from the Online Terms and Conditions, and offering no further explanation of this decision)
d. Effects of Exclusivity and Preferred Placement
  • Staff interviewed large and small customers for search and search advertising syndication. Key findings:
i. Common Publisher Responses
  • Universal agreement that Bing's search and search advertising markedly inferior, not competitive across-the-board
    • Amazon reports that Bing monetizes at half the rate of Google
    • business.com told staff that Google would have to cut revenue share from 64.5% to 30% and Microsoft would have to provide 90% share because Microsoft's platform has such low monetization
  • Customers "generally confirmed" Microsoft's claim that Bing's search syndication is inferior in part because Microsoft's network is smaller than Google's
    • With a larger ad base, Google more likely to have relevant, high-quality, ad for any given query, which improves monetization rate
  • A small publisher said, essentially, the only publishers exclusively using Bing are ones who've been banned from Google's service
    • We know from other interviews this is an exaggeration, but it captures the general tenor of comments about Microsoft
  • Publishers reported Microsoft not aggressively trying to win their business
    • Microsoft exec acknowledge that Bing needs a larger portfolio of advertisers, has been focused there over winning new syndication business
  • Common theme from many publishers is that search is a relatively minor part of their business and not a strategic focus. For example, Wal-Mart operates website as extension to retail and Best Buy's main goal of website is to provide presale info
  • Most publishers hadn't seriously considered Bing due to poor monetization
  • Amazon, which does use Bing and Google ads, uses a single syndication provider on a page to avoid showing the user the same ad multiple times on the same page; mixing and matching arrangement generally considered difficult by publishers
  • Starting in 2008, Google systematically tried to lower revenue share for AdSense partners
    • E.g., "Our general philosophy with renewals has been to reduce TAC across the board", "2009 Traffic Acquisition Cost (TAC) was down 3 percentage points from 2008 attributable to the application of standardized revenue share guidelines for renewals and new partnerships...", etc.
  • Google reduced payments (TAC) to AFS partners from 80.4% to 74% between Q1 2009 and Q1 2010
  • No publisher viewed reduction as large enough to justify shifting to Bing or serving more display ads instead of search ads
ii. Publishers' Views of Exclusivity Provisions
  • Some large publishers reported exclusive contracts and some didn't
  • Most publishers with exclusivity provisions didn't complain about them
  • A small number of technically sophisticated publishers were deeply concerned by exclusivity
    • These customers viewed search and search advertising as a significant part of business, have the sophistication to integrate multiple suppliers into on-line properties
    • eBay: largest search and search ads partner, 27% of U.S. syndicated search queries in 2011
      • Contract requires preferential treatment for AdSense ads, which eBay characterizes as equivalent to exclusivity
      • eBay wanted this removed in last negotiation, but assented to not removing it in return for not having revenue share cut while most other publishers had revenue share cut
      • eBay's testing indicates that Bing is competitive in some sectors, e.g., tech ads; they believe they could make more money with multiple search providers
    • NexTag: In 2015, Google's 15th largest AFS customer
      • Had exclusivity, was able to remove it in 2010, but NexTag considers restrictions "essentially the same thing as exclusivity"; "NexTag reports that moving away from explicit exclusivity even to this kind of de facto exclusivity required substantial, difficult negotiations with Google"
      • Has had discussions with Yahoo and Bing about using their products "on a filler basis", but unable to do so due to Google contract restrictions
    • business.com: B2B lead generation / vertical site; much smaller than above. Barely in top 60 of AdSense query volume
      • Exclusive agreement with Google
      • Would test Bing and Yahoo without exclusive agreement
      • Agreement also restricts how business.com can design pages
      • Loosening exclusivity would improve business.com revenue and allow for new features that make the site more accessible and user-friendly
    • Amazon: 2nd largest AFS customer after eBay; $175M from search syndication, $169M from Google AdSense
      • Amazon uses other providers despite their poor monetization due to concerns about having a single supplier; because Amazon operates on thin margins, $175M is a material source of profit
      • Amazon concerned it will be forced to sign an exclusive agreement in next negotiation
      • During last negotiation, Amazon wanted 5-year deal, Google would only give 1-year extension unless Amazon agreed to send Google 90% of search queries (Amazon refused to agree to this formally, although they do this)
    • IAC: umbrella company operating ask.com, Newsweek, CityGrid, Urbanspoon, and other websites
      • Agreement is exclusive on a per-property basis
      • IAC concerned about exclusivity. CityGrid wanted mix-and-match options, but couldn't compete with Google's syndication network, forced to opt into IAC's exclusive agreement; CityGrid wants to use other networks (including its own), but can't under agreement with Google
      • IAC concerned about lack of competition in search and search advertising syndication
      • Execute who expressed above concerns left, new executive didn't see a possibility of splitting or moving traffic
      • "The departure of the key executive with the closest knowledge of the issues and the most detailed concerns suggests we may have significant issues obtaining clear, unambiguous testimony from IAC that reflects their earlier expressed concerns."
iii.Effects on Competitors
  • Microsoft asserts even 5%-10% increase in query volume "very meaningful" and Google's exclusive and restrictive agreements deny Microsoft incremental scale to be more efficient competitor
  • Speciality search ad platforms also impacted; IAC sought to build platform for local search advertising, but Google's exclusivity provisions "make it less likely that small local competitors like IAC's nascent offering can viably emerge."

III. LEGAL ANALYSIS

  • "A monopolization claim under Section 2 of the Sherman Act, 15 U.S.C. § 2, has two elements: (i) the 'possession of monopoly power in the relevant market' and (ii) the 'willful acquisition or maintenance of that power as distinguished from growth or development as a consequence of a superior product, business acumen, or historic accident.'"
  • "An attempted monopolization claim requires a showing that (i) 'the defendant has engaged in predatory or anticompetitive conduct' with (ii) 'a specific intent to monopolize' and (iii) a dangerous probability of achieving or maintaining monopoly power."

A. GOOGLE HAS MONOPOLY POWER IN RELEVANT MARKETS

  • "'A firm is a monopolist if it can profitably raise prices substantially above the competitive level. [M]onopoly power may be inferred from a firm's possession of a dominant share of a relevant market that is protected by entry barriers.' Google has monopoly power in one or more properly defined markets."

1. Relevant Markets and Market Shares

  • "A properly defined antitrust market consists of 'any grouping of sales whose sellers, if unified by a hypothetical cartel or merger, could profitably raise prices significantly above the competitive level.'"
  • "Typically, a court examines 'such practical indicia as industry or public recognition of the submarket as a separate economic entity, the product's peculiar characteristics and uses, unique production facilities, distinct customers, distinct prices, sensitivity to price changes, and specialized vendors.'"
  • "Staff has identified three relevant antitrust markets."
a. Horizontal Search
  • Vertical search engines not a viable substitute to horizontal search; formidable barriers to expanding into horizontal search
  • Vertical search properties could pick up query volume in response to SSNIP (small, but significant non-transitory increase in price) in horizontal search, potentially displacing horizontal search providers
  • Google views these with concern, has aggressively moved to build its own vertical offerings
  • No mechanism for vertical search properties to broadly discipline a monopolist in horizontal search
    • Web search queries monetized through search ads, ads sold by keyword which have independent demand functions. So, at best, monopolist might be inhibited from SSNIP on a narrow set of keywords with strong vertical competition. But for billions of queries with no strong vertical, nothing constrains monopolist from SSNIP
  • Where vertical websites exist, still hard to compete; comprehensive coverage of all areas seems to be important driver of demand, even to websites focusing on specific topics. Eric Schmidt noted this:
    • "So if you, for example, are an academic researcher and you use Google 30 times for your academics, then perhaps you'll want to buy a camera... So long as the product is very, very, very, very good, people will keep coming back... The general product then creates the brand, creates demand and so forth. Then occasionally, these ads get clicked on"
  • Schmidt's testimony corroborated by several vertical search firms, who note that they're dependent on horizontal search providers for traffic because vertical search users often start with Google, Bing, or Yahoo
  • When asked about competitors in search, Eric Schmidt mentioned zero vertical properties
    • Google internal documents monitor Bing and Yahoo and compare quality. Sergei Brin testified that he wasn't aware of any such regular comparison against vertical competitors
  • Relevant geo for web search limited to U.S. here; search engines return results relevant to users in country they're serving, so U.S. users unlikely to view foreign-specialized search engines as viable substitute
  • Although Google has managed to cross borders, other major international search engines (Baidu, Yandex) have filed to do this
  • Google dominant for "general search" in U.S.; 66.7% share according to ComScore, and also provides results to ask.com and AOL, another 4.6%
  • Yahoo 15%, Bing 14%
  • Google's market share above generally accepted floor for monopolization; defendants with share in this range have been found to have monopoly power
b. Search Advertising
  • Search ads likely a properly defined market
  • Search ads distinguishable from other online ads, such as, display ads, contextual ads, behavioral ads, social media ads due to "inherent scale, targetability, and control"
    • Google: "[t]hey are such different products that you do not measure them against one another and the technology behind the products is different"
  • Evidence suggests search and display ads are complements, not substitutes
    • "Google has observed steep click declines when advertisers have attempted to shift budget to display advertising"
    • Chevrolet suspended search ads for 2 weeks and relied on display ads alone; lost 30% of clicks
  • New ad offerings don't fit into traditional search or display categories: contextual, re-targeted display (or behavioral), social media
    • Only search ads allow advertisers to show ad based on when user is expressing an interest in the moment the ad is shown; numerous advertisers confirmed this point
    • Search ads convert at much higher rate due to this advantage
  • Numerous advertisers report they wouldn't shift ad spend away from search ads if prices increased more than SSNIP. Living Social would need 100% price increase before shifting ads (a minority of advertisers reported they would move ad dollars from search in response to SSNIP)
  • Google internal documents and testimony confirm lack of viable substitute for search. AdWords VP Nick Fox and chief economist Hal Varian have stated that search ad spend doesn't come at expense of other ad dollars, Eric Schmidt has testified multiple times that search ads are the most effective ad tool, has best ROI
  • Google, through AdWords, has 76% to 80% of the market according to industry-wide trackers (rival Bing-Yahoo has 12% to 16%)
  • [It doesn't seem wrong to say that search ads are a market and that Google dominates that market, but the primacy of search ads seems overstated here? Social media ads, just becoming important at the time, ended up becoming very important, and of course video as well]
c. Syndicated Search and Search Advertising ("Search Intermediation")
  • Syndicated search and search advertising ("search intermediation") are likely a properly defined product market
  • Horizontal search providers sell ("syndicate") services to other websites
  • Search engine can also return search ads to the website; search engine and website share revenue
  • Consumers are websites that want search; sellers are horizontal search providers, Google, Bing, Yahoo
  • Publishers of various sizes consistent on cross-elasticity of demand; report that search ad syndication monetizes better than display advertising or other content
  • No publisher told us that modest (5% to 10%) increase in price for search and search ad syndication would favor other forms of advertising or web content
  • Google's successful efforts to systematically reduce TAC support this, are a natural experiment to determine likely response to SSNIP
  • Google, via AdSense, is dominant provider of search and search ad syndication; 75% of market according to ComScore (Microsoft and Yahoo combine for 22%)

2. Substantial Barriers to Entry Exist

  • "Developing and maintaining a competitively viable search or search ad platform requires substantial investment in specialized knowledge, technology, infrastructure, and time. These markets are also characterized by significant scale effects"
a. Technology and Specialization
  • [no notes, extremely obvious to anyone technical who's familiar with the area]
b. Substantial Upfront Investment
  • Enormous investments required. For example in 2011, Google spent $5B on R&D. And in 2010, MS spent more than $4.5B developing algorithms and building physical capacity for Bing
c. Scale Effects
  • More usage leads to better algorithms and greater accuracy w.r.t. what consumers want
  • Also leads to greater number of advertisers
  • Greater number of advertisers and consumers leads to better ad serving accuracy, better monetization of ads, leads to better monetization for search engine, advertisers, and syndication partners
  • Cyclical effect, "virtuous cycle"
  • According to Microsoft, greatest barrier is obtaining sufficient scale. Losing $2B/yr trying to compete with Google, and Bing is only competing horizontal search platform to Google
d. Reputation, Brand Loyalty, and the "Halo Effect"
  • [no notes]
e. Exclusive and Restrictive Agreements -
  • "Google's exclusive and restrictive agreements pose yet another barrier to entry, as many potential syndication partners with a high volume of customers are locked into agreements with Google."

B. GOOGLE HAS ENGAGED IN EXCLUSIONARY CONDUCT

  • "Conduct may be judged exclusionary when it tends to exclude competitors 'on some basis other than efficiency,' i.e., when it 'tends to impair the opportunities of rivals' but 'either does not further competition on the merits or does SO in an unnecessarily restrictive way.' In order for conduct to be condemned as 'exclusionary,' Staff must show that Google's conduct likely impairs the ability of its rivals to compete effectively, and thus to constrain Google's exercise of monopoly power"

1. Google's Preferencing of Google Vertical Properties Within Its SERP

  • "Although we believe that this is a close question, we conclude that Google's preferencing conduct does not violate Section 2."
a. Google's Product Design Impedes Vertical Competitors
  • "As a general rule, courts are properly very skeptical about claims that competition has been harmed by a dominant firm's product design changes. Judicial deference to product innovation, however, does not mean that a monopolist's product design decisions are per se lawful", United States v. Microsoft
  • We evaluate, through Microsoft lens of monopoly maintenance, whether Google took these actions to impede a nascent threat to Google's monopoly power
  • "Google's internal documents explicitly reflect - and testimony from Google executives confirms - a concern that Google was at risk of losing, in particular, highly profitable queries to vertical websites"
  • VP of product management Nicholas Fox:
    • "[Google's] inability to serve this segment [of vertical lead generation] well today is negatively impacting our business. Query growth among high monetizing queries (>$120 RPM) has declined to ~0% in the UK. US isn't far behind (~6%). There's evidence (e.g., UK Finance) that we're losing share to aggregators"
  • Threat to Google isn't vertical websites, displacing Google, but that they'll undercut Google's power over the most lucrative segments of search and search ads portfolio
  • Additionally, vertical websites could help erode barriers to growth for general search competitors
b. Google's SERP Changes Have Resulted In Anticompetitive Effects
  • Google expanding its own offerings while demoting rival offerings caused significant drops in traffic to rivals, confirmed by Google's internal data
  • Google's prominent placement of its own Universal Search properties led to gains in share of its own properties
    • "For example, Google's inclusion of Google Product Search as a Universal Search result turned a property that the Google product team could not even get indexed by Google's web search results into the number one viewed comparison shopping website on Google"
c. Google's Justifications for the Conduct
  • "Product design change is an area of conduct where courts do not tend to strictly scrutinize asserted procompetitive justifications. In any event, Google's procompetitive justifications are compelling."
  • Google argues design changes to SERP have improved product, provide consumers with "better" results
  • Google notes that path toward Universal Search and OneBox predates concern about vertical threat
  • Google justifies preferential treatment of Universal Search by asserting "apples and oranges" problem prevents Google from doing head-to-head comparison of its property vs. competing verticals, verticals and web results ranked with different criteria. This seems to be correct.
    • Microsoft says Bing uses a single signal, click-through-rate, that can be compared across Universal Search content and web search results
  • Google claims that its Universal Search results are more helpful than than "blue links" to other comparison shopping websites
  • Google claims that showing 3rd party data would create technical and latency issues
    • " The evidence shows that it would be technologically feasible to serve up third-party results in Google's Universal Search results. Indeed, Bing does this today with its flight vertical, serving up Kayak results and Google itself originally considered third-party OneBoxes"
  • Google defends "demotion" of competing vertical content, "arguing that Google's algorithms are designed solely with the goal of improving a user's search experience"
    • "one aspect of Google's demotions that especially troubles Staff - and is not addressed by the above justification - is the fact that Google routinely, and prominently, displays its own vertical properties, while simultaneously demoting properties that are identical to its own, but for the fact that the latter are competing vertical websites", See Brin Tr. 79:16-81:24 (acknowledging the similarities between Google Product Search and its competitors); Fox Tr. 204:6-204:20 (acknowledging the similarities between Google Product Search and its competitors).
d. Google's Additional Legal Defenses
  • "Google has argued - successfully in several litigations - that it owes no duty to assist in the promotion of a rival's website or search platform, and that it owes no duty to promote a rival's product offering over its own product offerings"
  • "one reading of Trinko and subsequent cases is that Google is privileged in blocking rivals from its search platform unless its conduct falls into in one of several specific exceptions referenced in Trinko"
    • "Alternatively, one may argue that Trinko should not be read so broadly as to overrule swathes of antitrust doctrine."
  • "Google has long argued that its general search results are opinions that are protected speech under the First Amendment, and that such speech should not be subject to government regulation"; staff believes this is overbroad
  • "the evidence paints a complex portrait of a company working toward an overall goal of maintaining its market share by providing the best user experience, while simultaneously engaging in tactics that resulted in harm to many vertical competitors, and likely helped to entrench Google's monopoly power over search and search advertising"
  • "The determination that Google's conduct is anticompetitive, and deserving of condemnation, would require an extensive balancing of these factors, a task that courts have been unwilling - in similar circumstances - to perform under Section 2. Thus, although it is a close question, Staff does not recommend that the Commission move forward on this cause of action."

2. Google's "Scraping" of Rivals' Vertical Content

  • "We conclude that this conduct violates Section 2 and Section 5."
a. Google's "Scraping" Constitutes a Conditional Refusal to Deal or Unfair Method Of Competition
  • Scraping and threats of refusal to deal with some competitors can be condemned as conditional refusal to deal under Section 2
  • Post-Trinko, identification of circumstances ("[u]nder certain circumstances, a refusal to cooperate with rivals can constitute anticompetitive conduct and violate § 2") "subject of much debate"
  • Aspen Skiing Co. v. Aspen Highlands Skiing Corp: defendant (owner of 3 of 4 ski areas in Aspen) canceled all-ski area ticket with plaintiff (owner of 4th ski area in Aspen)
    • After demand increasing share of profit, defendant canceled ticket and rejected "increasingly desperate measures" to recreate joint ticket, even rejected plaintiff's offer to buy tickets at retail price
    • Supreme court upheld jury's finding of liability; Trinko court: "unilateral termination of a voluntary (and thus presumably profitable) course of dealing suggested a willingness to forsake short-term profits to achieve an anticompetitive end. Similarly, the defendant's unwillingness to renew the ticket even if compensated at retail price revealed a distinctly anticompetitive bent"
  • Appellate courts have focused on Trinko's reference to "unilateral termination of a voluntary course of dealing", e.g., in American Central Eastern Texas Gas Co.v. Duke Energy Fuels LLC, Fifth Circuit upheld determination that defendant natural gas processor's refusal to contract with competitor for additional capacity was unlawful
    • Plaintiff contracted with defendant for processing capacity; after two years, defendant proposed terms it "knew were unrealistic or completely unviable ... in order to exclude [the plaintiff] from competition with [the defendant] in the gas processing market."
  • Case here is analogous to Aspen Skiing and Duke Energy [a lot of detail not written down in notes here]
b. Google's "Scraping" Has Resulted In Anticompetitive Effects
  • Scraping has lessened the incentives of competing websites like Yelp, TripAdvisor, CitySearch, and Amazon to innovate, diminishes incentives of other vertical websites to develop new products
    • entrepreneurs more reluctant to develop new sites, investors more reluctant to sponsor development when Google can use its monopoly power to appropriate content it deems lucrative
c. Google's "Scraping" Is Not Justified By Efficiencies
  • "Marissa Mayer and Sameer Samat testified that was extraordinarily difficult for Google, as a technical matter, to remove sites like Yelp from Google Local without also removing them from web search results"
    • "Google's almost immediate compliance after Yelp sent a formal 'cease and desist' letter to Google, however, suggests that the "technical" hurdles were not a significant factor in Google's refusal to comply with repeated requests to remove competitor content from Google Local"
    • Partners can opt out of inclusion with Google's vertical news offering, Google News
    • "Similarly, Google's almost immediate removal of Amazon product reviews from Google Product Search indicates that technical barriers were quickly surmounted when Google desired to accommodate a partner."
  • "In sum, the evidence shows that Google used its monopoly position in search to scrape content from rivals and to improve its own complementary vertical offerings, to the detriment of those rivals, and without a countervailing efficiency justification. Google's scraping conduct has helped it to maintain, preserve, and enhance Google's monopoly position in the markets for search and search advertising. Accordingly, we believe that this conduct should be condemned by the Commission."

3. Google's API Restrictions

  • "We conclude that Google's API restrictions violate Section 2."
  • AdWords API procompetitive development
  • But restrictive conditions in API usage agreement anticompetitive, without offsetting procompetitive benefits
  • "Should the restrictive conditions be found to be unreasonable restraints of trade, they could be removed today instantly, with no adverse effect on the functioning of the API. Any additional engineering required to make the advertiser data interoperable with other search networks would be supplied by other market participants. Notably, because Google would not be required to give its competitors access to the AdWords API, there is no concern about whether Google has a duty to deal with its competitors"
a. The Restrictive Conditions Are Unreasonable
  • Restrictive conditions limit ability of advertisers to use their own data, prevent the development and sale of 3rd party tools and services that would allow automated campaign management across multiple search networks
  • "Even Google is constrained by these restrictions, having had to forgo improving its DART Search tool to offer such capabilities, despite internal estimates that such functionality would benefit Google and advertisers alike"
  • Restrictive conditions have no procompetitive virtues, anticompetitive effects are substantial
b. The Restrictive Conditions Have Resulted In Anticompetitive Effects
  • Restrictive conditions reduce innovation, increase transaction costs, degrade quality of Google's rivals in search and search advertising
  • Several SEMs forced to remove campaign cloning functionality by Google; Google's restrictive conditions stopped cross-network campaign management tool market segment in its infancy
  • Restrictive conditions increase transaction costs for all advertisers other than those large enough to make internal investments to develop their own tools [doesn't it also, in some amortized fashion, increase transaction costs for companies that can build their own tools?]
  • Result is that advertisers spend less on non-dominant search networks, reducing quality of ads on non-dominant search networks
c. The Restrictive Conditions Are Not Justified By Efficiencies
  • Concern about "misaligned incentives" is Google's only justification for restrictive conditions; concern is that SEMs and agencies would adopt a "lowest common denominator" approach and degrade AdWords campaign performance
  • "The evidence shows that this justification is unsubstantiated and is likely a pretext"
  • "In brief, these third parties incentives are highly aligned with Google's interests, precisely the opposite of what Google contends."
  • Google unable to identify an examples of ill effects from misaligned incentives
  • Terms and Conditions already have conditions for minimum functionality that prevents lowest common denominator concern from materializing
  • Documents suggest restrictive conditions were not about "misaligned incentives":
    • "Sergey [Brin] and Larry [Page] are big proponents of a protectionist strategy that prevents third party developers from building offerings which promote the consolidated management of [keywords] on Google and Overture (and whomever else)."
    • In a 2004 doc, API product manager was looking for "specific points on how we can prevent a new entrant (MSN Ad Network) from benefitting from a common 3rd party platform that is cross-network."
    • In a related presentation, Google's lists as a concern, "other competitors are buoyed by lowered barriers to entry"; options to prevent this were "applications must have Google-centric UI functions and branding" and "disallow cross-network compatible applications from using API"

4. Google's Exclusive and Restrictive Syndication Agreements

  • "Staff has investigated whether Google has entered into anticompetitive, exclusionary agreements with websites for syndicated search and search advertising services (AdSense agreements) that serve to maintain, preserve, or enhance Google's monopoly power in the markets for search, search advertising, or search and search advertising syndication (search intermediation). We conclude that these agreements violate Section 2."
a. Google's Agreements Foreclose a Substantial Portion of the Relevant Market
  • "Exclusive deals by a monopolist harm competition by foreclosing rivals from needed relationships with distributors, suppliers, or end users. For example, in Microsoft, then-defendant Microsoft's exclusive agreements with original equipment manufacturers and software vendors were deemed anticompetitive where they were found to prevent third parties from installing rival browser Netscape, thus foreclosing Netscape from the most efficient distribution channel, and helping Microsoft to preserve its operating system monopoly. The fact that an agreement is not explicitly exclusive does not preclude a finding of liability."
  • [notes on legal background of computing foreclosure percentage omitted]
  • Staff relied on ComScore dataset to compute foreclosure; Microsoft and and Yahoo's syndicated query volume is higher than in ComScore, resulting in lower foreclosure number. "We are trying to get to the bottom of this discrepancy now. However, based on our broader understanding of the market, we believe that the ComScore set more accurately reflects the relative query shares of each party." [I don't see why staff should believe that ComScore is more accurate than Microsoft's numbers — I would guess the opposite]
  • [more notes on foreclosure percentage omitted]
b. Google's Agreements Have Resulted In Anticompetitive Effects
  • Once foreclosure is established as above "safe harbor" levels, need a qualitative, rule of reason analysis of market effects
  • Google's exclusive agreements impact immediate market for search and search syndication advertising and have broader effects in markets for search and search advertising
  • In search search ad syndication (search intermediation), exclusivity precludes some of the largest and most sophisticated publishers from using competing platforms. Publishers can't credibly threaten to shift some incremental business to other platforms to get price concessions from Google
    • Google's aggressive reduction of revenue shares to customers without significant resistance => agreements seem to be further entrenching Google's monopoly position
  • An objection to this could be that Google's business is because its product is superior
    • This argument rests on fallacious assumption that Bing's average monetization gap is consistent across the board
  • [section on CityGrid impact omitted; this section speaks to broader market effects]
  • Google insists that incremental traffic to Microsoft would be trivial; Microsoft indicates it would be "very meaningful"
    • Not enough evidence for definitive conclusion, but "internal Google documents suggest that Microsoft's view of things may be closer to the truth. — Google's interest in renewing deals in part to prevent MIcrosoft from gaining scale. Internal Google analysis of 2010 AOL renewal: "AOL holds marginal search share but represents scale gains for a Microsoft + Yahoo! partnership. AOL/Microsoft combination has modest impact on market dynamics, but material increase in scale of Microsoft's search & ads platform"
    • When informed that "Microsoft [is] aggressively wooing AOL with large guarantees,", a Google exec responded with: "I think the worse case scenario here is that AOL users get sent to Bing, so even if we make AOL a bit more competitive relative to Google, that seems preferable to growing Bing."
    • Google internal documents show they pursued AOL deal aggressively even though AOL represented "[a] low/no profit partnership for Google."
  • Evidence is that, in near-term, removing exclusivity would not have dramatic impact; largest and most sophisticated publishers would shift modest amounts of traffic to Bing
  • Most significant competitive benefits realized over longer period of time
    • "Removing exclusivity may open up additional opportunities for both established and nascent competitors, and those opportunities may spur more significant changes in the market dynamics as publishers have the opportunity to consider - and test - alternatives to Google's AdSense program."
c. Google's Agreements Are Not Justified By Efficiencies
  • Google has given three business justifications for exclusive and restrictive syndication agreements
    • Long-standing industry practice of exclusivity, dating from when publishers demanded large, guaranteed, revenue share payments regardless of performance
      • "guaranteed revenue shares are now virtually non-existent"
    • "Google is simply engaging in a vigorous competition with Microsoft for exclusive agreements"
      • "Google may argue that the fact that Microsoft is losing in a competitive bidding process (and indeed, not competing as vigorously as it might otherwise) is not a basis on which to condemn Google. However, Google has effectively created the rules of today's game, and Microsoft's substantial monetization disadvantage puts it in a poor competition position to compete on an all-or-nothing basis."
    • "user confusion" — "Google claims that it does not want users to confuse a competitor's poor advertisements with its own higher quality advertisements"
      • "This argument suffers both from the fact that it is highly unlikely that users care about the source of the ad, as well as the fact that, if users did care, less restrictive alternatives are clearly available. Google has not explained why alternatives such as labeling competitor advertisements as originating from the competitor are unavailing here."
      • "Google's actions demonstrate that "user confusion" is not a significant concern. In 2008 Google attempted to enter into a non-exclusive agreement with Yahoo! to supplement Yahoo!'s search advertising platform. Under the proposed agreement, Yahoo! would return its own search advertising, but supplement its inventory with Google search advertisements when Yahoo! did not have sufficient inventory.58, Additionally, Google has recently eliminated its "preferred placement" restriction for its online partners."
  • Rule of reasons analysis shows strong evidence of market protected by high entry barriers
  • Despite limitations to evidence, market is inarguably not robustly competitive today
    • Google has been unilaterally reducing revenue share with apparent impunity

IV. POTENTIAL REMEDIES

A. Scraping

  • At least two possible remedies
  • Opt-out to remove snippets of content from Google's vertical properties, while retaining web search results and/or in Universal Search results on main SERP
  • Google could be required to limit use of content it indexes for web search (could only use content in returning the property in its search results, but not for determining its own product or local rankings) unless given explicit permission

B. API Restrictions

  • Require Google to remove problematic contractual restrictions; no technical fixes necessary
    • SEMs report that technology for cross-compatibility already exists, will quickly flourish if unhindered by Google's contractual constraints

C. Exclusive and Restrictive Syndication Agreements

  • Most appropriate remedy is to enjoin Google form entering exclusive agreement with search syndication partners, and to require Google to loosen restrictions surrounding AdSense partners' use of rival search ads

V. LITIGATION RISKS

  • Google does not charge customers, and they are not locked into Google
  • Universal Search has resulted in substantial benefit to users
  • Google's organization and aggregation of content adds value to product for customers
  • Largest advertisers advertise on both Google AdWords and Microsoft AdCenter
  • Most efficient channel through which Bing can gain scale is Bing.com
  • Microsoft has the resources to purchase distribution where it seems greatest value
  • Most website publishers appy with AdSense

VI. CONCLUSION

  • "Staff concludes that Google's conduct has resulted - and will result - in real harm to consumers and to innovation in the online search and advertising markets. Google has strengthened its monopolies over search and search advertising through anticompetitive means, and has forestalled competitors' and would-be competitors' ability to challenge those monopolies, and this will have lasting negative effects on consumer welfare"
    • "Google has unlawfully maintained its monopoly over general search and search advertising, in violation of Section 2, or otherwise engaged in unfair methods of competition, in violation of Section 5, by scraping content from rival vertical websites in order to improve its own product offerings."
    • "Google has unlawfully maintained its monopoly over general search, search advertising, and search syndication, in violation of Section 2, or otherwise engaged in unfair methods of competition, in violation of Section 5, by entering into exclusive and highly restrictive agreements with web publishers that prevent publishers from displaying competing search results or search advertisements."
    • "Google has unlawfully maintained its monopoly over general search and search advertising, in violation of Section 2, or otherwise engaged in unfair methods of competition, in violation of Section 5, by maintaining contractual restrictions that inhibit the cross-platform management of advertising campaigns."
  • "For the reasons set forth above, Staff recommends that the Commission issue the attached complaint."
  • Memo submitted by Barbara R. Blank, approved by Geoffrey M. Green and Malanie Sabo

FTC BE staff memo

"Bureau of Economics

August 8, 2012

From: Christopher Adams and John Yun, Economists"

Executive Summary

  • Anticompetitive investigation started June 2011
  • Staff presented theories and evidence February 2012
  • This memo offers our final recommendation
  • Four theories of harm
    • preferencing of search results by favoring own web properties over rivals
    • exclusive agreements with publishers and vendors, deprive rival platforms of users and advertisers
    • restrictions on porting advertiser data to rival platforms
    • misappropriating content from Yelp and TripAdvisor
  • "our guiding approach must be beyond collecting complaints and antidotes [presumably meant to be anecdotes?] from competitors who were negatively impacted from a firm's various business practices."
  • Market power in search advertising
    • Google has "significant' share, 65% of paid clicks and 53% of ad impressions among top 5 U.S. search engines
    • Market power may be mitigated by the fact that 80% use a search engine other than Google
    • Empirical evidence consistent with search and non-search ads being substitutes, and that Google considers vertical search to be competitors
  • Preferencing theory
    • Theory is that Google is blending its proprietary content with customary "blue links" and demoting competing sites
    • Google has limited ability to impose significant harm on vertical rivals because it accounts for 10% to 20% of traffic to them. Effect is very small and not statistically significant
      • [Funny that something so obviously wrong at the time and also seemingly wrong in retrospect was apparently taken seriously]
    • Universal Search was a procompetitive response to pressure from vertical sites and an improvement for users
  • Exclusive agreements theory
    • Access to a search engine's site (i.e., not dependent on 3rd party agreement) is most efficient and common distribution channel, which is not impeded by Google. Additionally, strong reasons to doubt that search toolbars and default status on browsers can be viewed as "exclusives" because users can easily switched (on desktop and mobile)
      • [statement implies another wrong model of what's happening here]
      • [Specifically on easy switching on mobile, there's Googe's actual blocking of changing the default search engine from Google to what the user wants, but we also know that a huge fraction of users basically don't understand what's happening and can't make an informed decision to switch — if this weren't the case, it wouldn't make sense for companies to bid so high for defaults, e.g. supposedly $26B/yr to obtain default search engine status on iOS; if users simply switch freely with, default status would be worth close to $0. Since this payment is, at the margin, pure profit and Apple's P/E ratio is 29.53 as of my typing this sentence, a quick and dirty estimate is that $776B of Apple's market cap is attributable to taking this payment vs. randomly selecting a default]
    • [In addition to explicit, measurable, coercion like the above, there were also things like Google pressuring Samsung into shutting down their Android Browser effort in 2012; although enforcing a search engine default on Android was probably not the primer driver on that or other similar pressure that Google applied, many of these sorts of things also had the impact of funneling users into Google on mobile; these economists seem like the incentive-based argument that users will use the best product, so the result we see in the market, but if that's the case, why do companies spend so much effort on ecosystem lock-in, including but not limited to supposedly paying $18B/yr to own the default setting in one browser? I guess the argument here is that companies are behaving completely irrationally in expending so much effort here, but consumers are behaving perfectly rationally and are fully informed and are not influenced by all of this spending at all?]
    • In search syndication, Microsoft and Yahoo have a combined greater share than Google's
    • No support for assertion that rivals' access to users has been impaired by Google. MS and Yahoo have had a steady 30% share for year; query volume has grown faster than Google since alliance was announced
      • [Another odd statement; at the time, observers didn't see Bing staying competitive without heavy subsidies from MS, and then MS predictably stopped subsidizing Bing as a big bet and its market share declined. Google's search market share is well above 90% and hasn't been below 90% since the BE memo was written; in the U.S., estimates put Google around 90% share, some a bit below and some a bit above, with low estimates at something like 87%. It's odd that someone could look at the situation at the time and not seeing that this was about to happen]
    • In December 2011, Microsoft had access to query volume equivalent to what Google had 2 years ago, thus difficult to infer that Microsoft is below some threshold of query volume
      • [this exact argument was addressed in the BC memo; the BE memo does not appear to refute the BC memo's argument]
      • [As with a number of the above arguments, this is a strange argument if you understand the dynamics of fast-growing tech companies. When you have rapidly growing companies in markets with network effects or scale effects, being the same absolute size as a competitor a number of years ago doesn't mean that you're in an ok position. We've seen this play out in a ton of markets and it's fundamental to why VCs shovel so much money at companies in promising markets — being a couple years behind often means you get crushed or, if you're lucky, end up as an also ran that's fighting an uphill battle against scale effects]
    • Characteristics of online search market not consistent with Google buying distribution agreements to raise input costs of rivals
  • Restrictions on porting advertiser data to AdWords API
    • Theory is that Google's terms and conditions for AdWords API anticompetitively disadvantages Microsoft's adCenter
    • Introduction of API with co-mingling restriction made users and Google better off and rivals's costs were unaffected. Any objection therefore implies that when Google introduced the API, it had an obligation to allow its rivals to benefit from increased functionality. Significant risks to long-term innovation incentives from imposing such an obligation [Huh, this seems very weird]
    • Advertisers responsible for overwhelming majority of search ad spend use both Google and Microsoft. Multi-homing advertisers of all sizes spend a significant share of budget on Microsoft [this exact objection is addressed in BC memo]
    • Evidence from SEMs and end-to-end advertisers suggest policy's impact on ad spend on Microsoft's platform is negligible [it's hard to know how seriously to take this considering the comments on Yelp, above — the model of how tech businesses work seems very wrong, which casts doubt on other conclusions that necessarily require having some kind of model of how this stuff works]
  • Scraping allegation is that Google has misappropriated content from Yelp and TripAdvisor
    • Have substantive concerns. Solution proposed in Annex 11
    • To be an antitrust violation, need strong evidence that it increased users on Google at expensive of Yelp or TripAdvisor or decreased incentives to innovate. No strong evidence of either [per above comments, this seems wrong]
  • Recommendation: recommend investigation be closed

1. Does Google possess monopoly power in the relevant antitrust market?

  • To be in violation of Section 2 of the Sherman Act, Google needs to be a monopoly or have substantial market power in a relevant market
  • Online search similar to any other advertising
  • Competition between platforms and advertisers depends on extent to which advertisers consider users on one platform to be substitutes for another
  • Google's market power depends on share of internet users
  • If advertisers can access Google's users at other search platforms, such as Yahoo, Bing, and Facebook, "Google's market power is a lot less"
  • Substantial evidence contradicting proposition that GOogle has substantial market power in search advertising
  • Google's share is large. In Feb 2012, 65% of paid search clicks of top 5 general search engines went through Google, up from 55% in Sep 2008; these figures show Google offers advertisers what they want
  • Advertisers want "eyeballs"
  • Users multi-home. About 80% of users use a platform other than Google in a given month, so advertisers can get the same eyeballs elsewhere
    • Advertiser can get in front of a user on a different query on Yahoo or another search engine
    • [this is also odd reasoning — if a user uses Google for searches by default, but occasionally stumbles across Yahoo or Bing, this doesn't meaningfully move the needle for an advertiser; the evidence here is comScore saying that 20% of users only use Google, 15% never use Google, and 65% use Google + another search engine; but it's generally accepted that comScore numbers are quite off. Shortly after the report was written, I looked at various companies that reported metrics (Alexa, etc.) and found them to be badly wrong; I don't think it would be easy to dig up the exact info I used at the time now, but on searching for "comscore search engine market accuracy", the first hit I got was someone explaining that while, today, comScore shows that Google has an implausibly low 67% market share, an analysis of traffic to sites this company has access to showed that Google much more plausibly drove 85% of clicks; it seems worth mentioning that comScore is often considered inaccurate]
  • Firm-level advertising between search ads and display ads is negatively correlated
    • [this seems plausible? The evidence in the BC memo for these being complements seemed like a stretch; maybe it's true, but the BE memo's position seems much more plausible]
    • No claim that these are the same market, but can't conclude that they're unrelated
  • Google competes with specialized search engines, similar to a supermarket competing with a convenience store [details on this analogy elided; this memo relies heavily on analogies that relate tech markets to various non-tech markets, some of which were also elided above]
    • For advertising on a search term like "Nikon 5100", Amazon may provide a differentiated but competing product
  • Google is leading seller of search, but this is mitigated by large proportion of users who also user other search engines, by substitution of display and search advertising, by competition in vertical search

Theory 1: The preferencing theory

2.1 Overview

  • Preferencing theory is that Google's blending of content such as shopping comparison results and local business listings with customary blue links disadvantages competing content sites, such as Nextag, eBay, Yelp, and TripAdvisor

2.2 Analysis

  • Blend has two effects, negatively impacting traffic to specialized vertical sites by pushing down sites and impacting Google's incentives to show competing vertical sites
  • Empirical questions
    • "To what extent does Google account for the traffic to vertical sites?"
    • "To what extent do blends impact the likelihood of clicks to vertical sites?"
    • "To what extent do blends improve consumer value from the search results?"

2.3 Empirical evidence

  • Google search responsible for 10% of traffic to shopping comparison sites, 17.5% to local business search sites. "See Annex 4 for a complete discussion of our platform model"
    • [Annex 4", doesn't appear to be included; but, as discussed above, the authors' model of how traffic works seems to be wrong]
  • When blends appear, from Google's internal data, clicks to other shopping comparison sites drop by a large and statistically significant amount. For example, if a site had a pre-blend CTR of 9%, post-blend CTR would be 5.3%, but a blend isn't always presented
  • For local, pre-blend CTR of 6% would be reduced to 5.4%; local blends have smaller impact than shopping
  • "above result for shopping comparison sites is not the same as finding that overall traffic from Google to shopping sites declined due to universal search. As we describe below, if blends represent a quality improvement, this will increase demand and drive greater query volume on Google, which will boost traffic to all sites."
  • All links are substitutes, so we can infer that if user user clicks on ads less, they prefer the content and the user is getting more value. Overall results indicate that blends significantly increase consumer value
    • [this seems obviously wrong unless the blend is presented with the same visual impact, weight, and position, as normal results, which isn't the case at all — I don't disagree that the blend is probably better for consumers, but this methodology seems like a classic misuse of data to prove a point]

2.4 Documentary evidence

  • Since the 90s, general search engines have incorporated vertical blends
  • All major search engines use blends

2.5 Summary of the preferencing theory

  • Google not significant enough source of traffic to forclose its vertical rivals [as discussed above, the model for this statement is wrong]

Theory 2: Exclusionary practices in search distribution

3.1 Overview

  • Theory is that Google is engaging in exclusionary practices in order to deprive Microsoft of economies of scale
  • Foundational issues
    • Are Google's distribution agreements substantially impairing opportunity of rivals to compete for users?
    • What's the empirical evidence users are being excluded and denied?
    • What's the evidence that Microsoft is at a disadvantage in terms of scale?

3.2 Are the various Google distribution agreements in fact exclusionary?

  • "Exclusionary agreements merit scrutiny when they materially reduce consumer choice and substantially impair the opportunities of rivals"
  • On desktop, users can access search engine directly, via web browser search box, or a search toolbar
  • 73% of desktop search through direct navigation, all search engines have equal access to consumers in terms of direct access; "Consequently, Google has no ability to impair the opportunities of rivals in the most important and efficient desktop distribution channel."
    • [once again, this model seems wrong — if it wasn't wrong, companies wouldn't pay so much to become a search default, including shady stuff like Google paying shady badware installers to make Chrome / Google default on people's desktops. Another model is that if a user uses a search engine because it's a default, this changes a the probability that they'll use the search engine via "direct access"; compared to the BE staff model, it's overwhelmingly likely that this model is correct and the BE staff model is wrong]
    • Microsoft is search default on Internet Explorer and 70% of PCs sold
  • For syndication agreement, Google has a base template that contains premium placement provision. This is to achieve minimum level of remuneration in return for Google making its search available. Additionally, clause is often subject to negotiation and can be modified
    • [this negotiation thing is technically correct, but doesn't address the statement about this brought up in the BC memo; many, perhaps most, of the points in this memo have been refuted by the BC memo, and the strategy here seems to be to ignore the refutations without addressing them]
    • "By placing its entire site or suite of suites up for bid, publishers are able to bargain more effectively with search engines. This intensifies the ex ante competition for the contract and lowers publishers' costs. Consequently, eliminating the ability to negotiate a bundled discount, or exclusivity, based on site-wide coverage will result in higher prices to publishers." [this seems to contradict what we observe in practice?]
    • "This suggests that to the extent Google is depriving rivals such as Microsoft of scale economies, this is a result of 'competition on the merits'— much the same way as if Google had caused Microsoft to lose traffic because it developed a better product and offered it at a lower price."
  • Have Google's premium placement requirements effectively denied Microsoft access to publishers?
    • Can approach this by considering market share. Google 44%, including Aol and Ask. MS 31%, including Yahoo. Yahoo 25%. Combined, Yahoo and MS are at 56%. "Thus, combined, Microsoft and Yahoo's syndication shares are higher than their combined shares in a general search engine market" [as noted previously, these stats didn't seem correct at the time and have gotten predictably less directionally correct over time]
  • What would MS's volume be without Google's exclusionary restrictions
    • At most a 5% change because Google's product is so superior [this seems to ignore the primary component of this complaint, which is that there's a positive feedback cycle]
  • Search syndication agreements
    • Final major distribution channel is mobile search
    • U.S. marketshare: Android 47%, iOS 30%, RIM 16%, MS 5%
    • Android and iOS grew from 30% to 77% from December 2009 to December 2011, primarily due to decline of RIM, MS, and Palm
    • Mobile search is 8%. Thus, "small percentage of overall queries and and even smaller percentage of search ad revenues"
      • [The implication here appears to be that mobile is small and unimportant, which was obviously untrue at the time to any informed observer — I was at Google shortly after this was written and the change was made to go "mobile first" on basically everything because it was understood that mobile was the future; this involved a number of product changes that significantly degraded the experience on desktop in order to make the mobile experience better; this was generally considered not only a good decision, but the only remotely reasonable decision. Google was not alone in making this shift at the time. How economists studying this market didn't understand this after interviewing folks at Google and other tech companies is mysterious]
    • Switching cost on mobile implied to be very low, "a few taps" [as noted previously, the staggering amount of money spent on being a mobile default and Google's commit linked above indicate this is not true]
    • Even if switching costs were significant, there's no remedy here. "Too many choices lead to consumer confusion"
    • Repeat of point that barrier to switching is low because it's "a few taps"
    • "Google does not require Google to be the default search engine in order to license the Android OS" [seems technically correct, but misleading at best when taken as part of the broader argument here]
    • OEMs choose Google search as default for market-based reasons and not because their choice is restricted [this doesn't address the commit linked above that actually prevents users from switching the default away from Google; I wonder what the rebuttal to that would be, perhaps also that user choice is bad and confusing to users?]
  • Opportunities available to Microsoft are larger than indicated by marketshare
  • Summary
    • Marketshare could change quickly; two years ago, Apple and Google only had 30% share
    • Default of Google search not anticompetitive and mobile a small volume of queries, "although this is changing rapidly"
    • Basically no barrier to user switching, "a few taps and downloading other search apps can be achieved in a few seconds. These are trivial switching costs" [as noted above, this is obviously incorrect to anyone who understands mobile, especially the part about downloading an app not being a barrier; I continue to find it interesting that the economists used market-based reasoning when it supports the idea that the market is perfectly competitive, with no switching costs, etc., but decline to use market-based reasoning, such as noting the staggeringly high sums paid to set default search, when it supports the idea the that the the market is not a perfectly competitive market with no switching costs, etc.]

3.3 Are rival search engines being excluded from the market?

  • Prior section found that Google's distribution agreements don't impair opportunity of rivals to reach users. But could it have happened? We'll look at market shares and growth trends to determine
  • "We note that the evidence of Microsoft and Yahoo's share and growth cannot, even in theory, tell us whether Google's conduct has had a significant impact. Nonetheless, if we find that rival shares have grown or not diminished, this fact can be informative. Additionally, assuming that Microsoft would have grown dramatically in the counterfactual, despite the fact that Google itself is improving its product, requires a level of proof that must move beyond speculation." [as an extension of the above, the economists are happy to speculate or even 'move beyond speculation' when it comes to applying speculative reasoning on user switching costs, but apparently not when it comes to inferences that can be made about marketshare; why the drastic difference in the standard of proof?]
  • Microsoft and Yahoo's share shows no design of being excluded, steady 30% for 4 years [as noted in a previous section, the writing was on the wall for Bing and Yahoo at this time, but apparently this would "move beyond speculation" and is not noted here]
  • Since announcement of MS / Yahoo alliance, MS query volume as grown faster than Google [this is based on comScore qSerach data and the more detailed quoted claim is that MS Query volume increased 134% while Google volume increased 54%; as noted above, this seems like an inaccurate metric, so it's not clear why this would be used to support this point, and it's also misleading at best]
  • MS-Yahoo have the same number of search engine users as Google in a given month [again, as noted above, this appears to come from incorrect data and is also misleading at best because it counts a single use in a month as equivalent to using something many times a day]

3.4 Does Microsoft have sufficient scale to be competitive?

  • In a meeting with Susan Athey, Microsoft could not demonstrate that they had data definitively showing how the cost curve changes as click data changes, "thus, there is basis for suggesting Microsoft is below some threshold point" [the use of the phrase "threshold point" demonstrates either a use of sleight of hand or a lack of understanding of how it works; the BE memo seems to prefer the idea that it's about some threshold since this could be supported by the argument that, if such a threshold were to be demonstrated, Microsoft's growth would have or will carry it past the threshold, but it doesn't make any sense that there would a threshold; also, even if this were important, having a single meeting where Microsoft wasn't able to answer this immediately would be weak evidence]
  • [many more incorrect comments in the same vein as the above omitted for brevity]
  • "Finally, Microsoft's public statements are not consistent with statements made to antitrust regulators. Microsoft CEO Steve Ballmer stated in a press release announcing the search agreement with Yahoo: 'This agreement with Yahoo! will provide the scale we need to deliver even more rapid advances in relevancy and usefulness. Microsoft and Yahoo! know there's so much more that search could be. This agreement gives us the scale and resources to create the future of search."
    • [it's quite bizarre to use a press release, which are generally understood to be meaningless puff pieces, as evidence that a strongly supported claim isn't true; again, BE staff seem to be extremely selective about what evidence they look at to a degree that is striking; for example from conversations I had with credible, senior, engineers who worked on search at both Google and Bing, engineers who understand the domain would agree that having more search volume and more data is a major advantage; instead of using evidence like that, BE staff find a press release that, in the tradition of press releases, has some meaningless and incorrect bragging, and bring that in as evidence; why would they do this?]
  • [more examples of above incorrect reasoning, omitted for brevity]

3.5 Theory based on raising rivals' costs

  • Despite the above, it could be that distribution agreements deny rivals and data enough that "feedback effects" are triggered
  • Possible feedback effects
    • Scale effect: cost per unit of quality or ad matching decreases
    • Indirect network effect: more advertisers increases number of users
    • Congestion effect
    • Cash flow effect
  • Scale effect was determined to not be applicable[as noted there, the argument for this is completely wrong]
  • Indirect network effect has weak evidence, evidence exists that it doesn't apply, and even if it did apply, low click-through rate of ads shows that most consumers don't like ads anyway [what? This doesn't seem relevant?], and also, having a greater number of advertises leads to congestion and reduction in the value of the platform to advertisers [this is a reach; there is a sense in which this is technically true, but we could see then and now that platforms with few advertisers are extremely undesirable to advertises because advertisers generally don't want to advertise on a platform that full of low quality ads (and this also impacts the desire of users to use the platform)]
  • Cash flow effect not relevant because Microsoft isn't cash flow constrained, so cost isn't relevant [a funny comment to make because, not too long after this, Microsoft severely cut back investment in Bing because the returns weren't deemed to be worth it; it seems odd for economists to argue that, if you have a lot of money, the cost of things doesn't matter and ROI is irrelevant. Shouldn't they think about marginal cost and marginal revenue?]

[I stopped taking detailed notes at this point because taking notes that are legible to other people (as opposed to just for myself) takes about an order of magnitude longer, and I didn't think that there was much of interest here. I generally find comments of the form "I stopped reading at X" to be quite poor, in that people making such comments generally seem to pick some trivial thing that's unimportant and then declare and entire document to be worthless based on that. This pattern is also common when it comes to engineers, institutions, sports players, etc. and I generally find it counterproductive in those cases as well. However, in this case, there isn't really a single, non-representative, issue. The majority of the reasoning seems not just wrong, but highly disconnected from the on-the-ground situation. More notes indicating that the authors are making further misleading or incorrect arguments in the same style don't seem very useful. I did read the rest of the document and I also continue to summarize a few bits, below. I don't want to call them "highlights" because that would imply that I pulled out particularly interesting or compelling or incorrect bits and it's more of a smattering of miscellaneous parts with no particular theme]

  • There's a claim that removing restrictions on API interoperability may not cause short term problems, but may cause long-term harm due to how this shifts incentives and reduces innovation and this needs to be accounted for, not just the short-term benefit [in form, this is analogous to the argument Tyler Cowen recently made that banning non-competes reduces the incentives for firms to innovate and will reduce innovation]
  • The authors seem to like refer to advertisements and PR that any reasonable engineer (and I would guess reasonable person) would know are not meant to be factual or accurate. Similar to the PR argument above, they argue that advertising for Microsoft adCenter claims that it's easy to import data from AdWords, therefore the data portability issue is incorrect, and they specifically say that these advertising statements are "more credible than" other evidence
    • They also relied on some kind of SEO blogspam that restates the above as further evidence of this
  • The authors do not believe that Google Search and Google Local are complements or that taking data from Yelp or TripAdvisor and displaying it above search results has any negative impact on Yelp or TripAdvisor, or at least that "the burden of proof would be extremely difficult"

Other memos

[for these, I continued writing high-level summaries, not detailed summaries]

  • After the BE memo, there's a memo from Laura M. Sullivan, Division of Advertising Practices, which makes a fairly narrow case in a few dimensions, including "we continue to believe that Google has not deceived consumers by integrating its own specialized search results into its organic results" and, as a result, they suggest not pursuing further action.
    • There are some recommendations, such as "based on what we have observed of these new paid search results [referring to Local Search, etc.], we believe Google can strengthen the prominence and clarity of its disclosure" [in practice, the opposite has happened!]
    • [overall, the specific points presented here seems like ones a reasonable person could agree with, though whether or not these points are strong enough that they should prevent anti-trust action could be debated]
    • " Updating the 2002 Search Engine Letter is Warranted"
      • "The concerns we have regarding Google's disclosure of paid search results also apply to other search engines. Studies since the 2002 Search Engine letter was issued indicate that the standard methods search engines, including Google, Bing, and Yahoo!, have used to disclose their paid results may not be noticeable or clear enough for consumers. ²¹ For example, many consumers do not recognize the top ads as paid results ... Documents also indicate Google itself believed that many consumers generally do not recognize top ads as paid. For example, in June 2010, a leading team member of Google's in-house research group, commenting on general search research over time, stated: 'I don't think the research is inconclusive at all - there's definitely a (large) group of users who don't distinguish between sponsored and organic results. If we ask these users why they think the top results are sometimes displayed with a different background color, they will come up with an explanation that can range from "because they are more relevant" to "I have no idea" to "because Google is sponsoring them."' [this could've seemed reasonable at the time, but in retrospect we can see that the opposite of this has happened and ads are less distinguishable from search results than they were in 2012, likely meaning that even fewer consumers can distinguish ads from search results]
    • On the topic of whether or not Google should be liable for fraudulent ads such as ones for fake weight-loss products or fake mortgage relief services, "there is no indication so far that Google has played any role in developing or creating the search ads we are investigating" and Google is expending some effort to prevent these ads and Google can claim CDA immunity, so further investigation here isn't worthwhile
  • There's another memo from the same author on whether or not using other consumer data in conjunction with its search advertising business is unfair; the case is generally that this is not unfair and consumers should expect that their data is used to improve search queries
  • There's a memo from Ken Heyer (at the time, a Director of the Agency's Bureau of Economics)
    • Suggests having a remedy that seems "quite likely to do more good than harm" before "even considering serious filing a Complaint"
    • Seems to generally be in agreement with BE memo
      • On distribution, agrees with economist memo on unimportance of mobile and that Microsoft has good distribution on desktop (due to IE being default on 70% of PCs sold)
      • On API restrictions, mixed opinion
      • On mobile, mostly agrees with BE memo, but suggests getting an idea of how much Google pays for the right be default "since if default status is not much of an advantage we would not expect to see large payments being made" and also suggests it would be interesting to know how much switching from the default occurs
        • Further notes that mobile is only 8% of the market, too small to be significant [8% should've been factually incorrect. By late 2012, when this was written, mobile should've been 20% or more of queries; not sure why the economists are so wrong on so many of the numbers]
    • On vertical sites, agreement with data analysis from BE memo and generally agrees with BE memo
  • Another Ken Heyer memo
    • More strongly recommendations no action taken than previous memo, recommends against consent decree as well as litigation
  • Follow-up memo from BC staff (Barbara R. Blank et al.), recommending that staff negotiate a consent order with Google on mobile
    • Google has exclusive agreement with the 4 major U.S. wireless carriers and Apple to pre-install Google Search; Apple agreement requires exclusivity
      • Google default on 86% of devices
    • BC Staff recommends consent agreement to eliminate these exclusive agreements
    • According to Google documents mobile was 9.5% of Google queries in 2010, 17.3% in 2011 [note that this strongly contradicts the claim from the BE memo that mobile is only 8% of the market here]
      • Rapid growth shows that mobile distribution channel is significant, and both Microsoft and Google internal documents recognize that mobile will likely surpass desktop in the near future
    • In contradiction to their claims, Sprint and T-mobile agreements appear to mandate exclusivity, and AT&T agreement is de facto exclusive due to tiered revenue sharing arrangement; Verizon agreement is exclusive
    • Google business development manager Chris Barton: "So we know with 100% certainty due to contractual terms that: All Android phones on T-Mobile will come with Google as the only search engine out-of-the-box. All Android phones on Verizon will come with Google as the only search engine out-of-the-box. All Android phones on Sprint will come with Google as the only search engine out-of-the-box.I think this approach is really important otherwise Bing or Yahoo can come and steal away our Android search distribution at any time, thus removing the value of entering into contracts with them. Our philosophy is that we are paying revenue share"
    • Andy Rubin laid out a plan to reduce revenue share of partners over time as Google gained search dominance and Google has done this over time
    • Carriers would not switch even without exclusive agreement due to better monetization and/or bad PR
    • When wrapping up Verizon deal, Andy Rubin said "[i]f we can pull this off ... we will own the US market"
  • Memo from Willard K. Tom, General Counsel
    • "In sum, this may be a good case. But it would be a novel one, and as in all such cases, the Commission should think through carefully what it means."
  • Memo from Howard Shelanski, Director in Bureau of Economics
    • Mostly supports the BE memo and the memo from Ken Heyer, except on scraping, where there's support for the BC memo

  1. By analogy to a case that many people in tech are familiar with, consider this exchange between Oracle counsel David Boies and Judge William Alsup on the rangeCheck function, which checks if a range is a valid array access or not given the length of an array and throws an exception if the access is out of range:

    • Boies: [argument that Google copied the rangeCheck function in order to accelerate development]
    • Alsup: All right. I have — I was not good — I couldn't have told you the first thing about Java before this trial. But, I have done and still do a lot of programming myself in other languages. I have written blocks of code like rangeCheck a hundred times or more. I could do it. You could do it. It is so simple. The idea that somebody copied that in order to get to market faster, when it would be just as fast to write it out, it was an accident that that thing got in there. There was no way that you could say that that was speeding them along to the marketplace. That is not a good argument.
    • Boies: Your Honor
    • Alsup: [cutting off Boies] You're one of the best lawyers in America. How can you even make that argument? You know, maybe the answer is because you are so good it sounds legit. But it is not legit. That is not a good argument.
    • Boies: Your Honor, let me approach it this way, first, okay. I want to come back to rangeCheck. All right.
    • Alsup: RangeCheck. All it does is it makes sure that the numbers you're inputting are within a range. And if they're not, they give it some kind of exceptional treatment. It is so — that witness, when he said a high school student would do this, is absolutely right.
    • Boies: He didn't say a high school student would do it in an hour, all right.
    • Alsup: Less than — in five minutes, Mr. Boies.

    Boies previously brought up this function as a non-trivial piece of work and then argues that, in their haste, a Google engineer copied this function from Oracle. As Alsup points out, the function is trivial, so trivial that it wouldn't be worth looking it up to copy and that even a high school student could easily produce the function from scratch. Boies then objects that, sure, maybe a high school student could write the function, but it might take an hour or more and Alsup correctly responds that an hour is implausible and that it might take five minutes.

    Although nearly anyone who could pass a high school programming class would find Boeis's argument not just wrong but absurd3, more like a joke than something that someone might say seriously, it seems reasonable for Boies to make the argument because people presiding over these decisions in court, in regulatory agencies, and in the legislature, sometimes demonstrate a lack of basic understanding of tech. Since my background is in tech and not law or economics, I have no doubt that this analysis will miss some basics about law and economics in the same way that most analyses I've read seem miss basics about tech, but since there's been extensive commentary on this case from people with strong law and economics backgrounds, I don't see a need to cover those issues in depth here because anyone who's interested can read another analysis instead of or in addition to this one.

    [return]
  2. Although this document is focused on tech, the lack of hands-on industry-expertise in regulatory bodies, legislation, and the courts, appears to cause problems in other industries as well. An example that's relatively well known due to a NY Times article that was turned into a movie is DuPont's involvement in the popularization of PFAS and, in particular, PFOA. Scientists at 3M and DuPont had evidence of the harms of PFAS going back at least to the 60s, and possibly even as far back as the 50s. Given the severe harms that PFOA caused to people who were exposed to it in significant concentrations, it would've been difficult to set up a production process for PFOA without seeing the harm it caused, but this knowledge, which must've been apparent to senior scientists and decision makers in 3M and DuPont, wasn't understood by regulatory agencies for almost four decades after it was apparent to chemical companies.

    By the way, the NY Times article is titled "The Lawyer Who Became DuPont’s Worst Nightmare" and it describes how DuPont made $1B/yr in profit for years while hiding the harms of PFOA, which was used in the manufacturing process for Teflon. This lawyer brought cases against DuPont that were settled for hundreds of millions of dollars; according to the article and movie, the litigation didn't even cost DuPont a single year's worth of PFOA profit. Also, DuPont manage to drag out the litigation for many years, continuing to reap the profit from PFOA. Now that enough evidence has mounted against PFOA, Teflon is now manufactured using PFO2OA or FRD-903, which are newer and have a less well understood safety profile than PFOA. Perhaps the article could be titled "The Lawyer Who Became DuPont's Largest Mild Annoyance".

    [return]
  3. In the media, I've sometimes seen this framed as a conflict between tech vs. non-tech folks, but we can see analogous comments from people outside of tech. For example, in a panel discussion with Yale SOM professor Fiona Scott Morton and DoJ Antitrust Principal Deputy AAG Doha Mekki, Scott Morton noted that the judge presiding over the Sprint/T-mobile merger proceedings, a case she was an expert witness for, had comically wrong misunderstandings about the market, and that it's common for decisions to be made which are disconnected from "market realities". Mekki seconded this sentiment, saying "what's so fascinating about some of the bad opinions that Fiona identified, and there are many, there's AT&T Time Warner, Sabre Farelogix, T-mobile Sprint, they're everywhere, there's Amex, you know ..."

    If you're seeing this or the other footnote in mouseover text and/or tied to a broken link, this is an issue with Hugo. At this point, I've spent more than an entire blog post's worth of effort working around Hugo breakage and am trying to avoid spending more time working around issues in a tool that makes breaking changes at a high rate. If you have a suggestion to fix this, I'll try it, otherwise I'll try to fix it when I switch away from Hugo.

    [return]

How web bloat impacts users with slow devices

2024-03-16 08:00:00

In 2017, we looked at how web bloat affects users with slow connections. Even in the U.S., many users didn't have broadband speeds, making much of the web difficult to use. It's still the case that many users don't have broadband speeds, both inside and outside of the U.S. and that much of the modern web isn't usable for people with slow internet, but the exponential increase in bandwidth (Nielsen suggests this is 50% per year for high-end connections) has outpaced web bloat for typical sites, making this less of a problem than it was in 2017, although it's still a serious problem for people with poor connections.

CPU performance for web apps hasn't scaled nearly as quickly as bandwidth so, while more of the web is becoming accessible to people with low-end connections, more of the web is becoming inaccessible to people with low-end devices even if they have high-end connections. For example, if I try browsing a "modern" Discourse-powered forum on a Tecno Spark 8C, it sometimes crashes the browser. Between crashes, on measuring the performance, the responsiveness is significantly worse than browsing a BBS with an 8 MHz 286 and a 1200 baud modem. On my 1Gbps home internet connection, the 2.6 MB compressed payload size "necessary" to load message titles is relatively light. The over-the-wire payload size has "only" increased by 1000x, which is dwarfed by the increase in internet speeds. But the opposite is true when it comes to CPU speeds — for web browsing and forum loading performance, the 8-core (2 1.6 GHz Cortex-A75 / 6 1.6 GHz Cortex-A55) CPU can't handle Discourse. The CPU is something like 100000x faster than our 286. Perhaps a 1000000x faster device would be sufficient.

For anyone not familiar with the Tecno Spark 8C, today, a new Tecno Spark 8C, a quick search indicates that one can be hand for USD 50-60 in Nigeria and perhaps USD 100-110 in India. As a fraction of median household income, that's substantially more than a current generation iPhone in the U.S. today.

By worldwide standards, the Tecno Spark 8C isn't even close to being a low-end device, so we'll also look at performance on an Itel P32, which is a lower end device (though still far from the lowest-end device people are using today). Additionally, we'll look at performance with an M3 Max Macbook (14-core), an M1 Pro Macbook (8-core), and the M3 Max set to 10x throttling in Chrome dev tools. In order to give these devices every advantage, we'll be on fairly high-speed internet (1Gbps, with a WiFi router that's benchmarked as having lower latency under load than most of its peers). We'll look at some blogging platforms and micro-blogging platforms (this blog, Substack, Medium, Ghost, Hugo, Tumblr, Mastodon, Twitter, Threads, Bluesky, Patreon), forum platforms (Discourse, Reddit, Quora, vBulletin, XenForo, phpBB, and myBB), and platforms commonly used by small businesses (Wix, Squarespace, Shopify, and WordPress again).

In the table below, every row represents a website and every non-label column is a metric. After the website name column, we have the compressed size transferred over the wire (wire) and the raw, uncompressed, size (raw). Then we have, for each device, Largest Contentful Paint* (LCP*) and CPU usage on the main thread (CPU). Google's docs explain LCP as

Largest Contentful Paint (LCP) measures when a user perceives that the largest content of a page is visible. The metric value for LCP represents the time duration between the user initiating the page load and the page rendering its primary content

LCP is a common optimization target because it's presented as one of the primary metrics in Google PageSpeed Insights, a "Core Web Vital" metric. There's an asterisk next to LCP as used in this document because, LCP as measured by Chrome is about painting a large fraction of the screen, as opposed to the definition above, which is about content. As sites have optimized for LCP, it's not uncommon to have a large paint (update) that's completely useless to the user, with the actual content of the page appearing well after the LCP. In cases where that happens, I've used the timestamp when useful content appears, not the LCP as defined by when a large but useless update occurs. The full details of the tests and why these metrics were chosen are discussed in an appendix.

Although CPU time isn't a "Core Web Vital", it's presented here because it's a simple metric that's highly correlated with my and other users' perception of usability on slow devices. See appendix for more detailed discussion on this. One reason CPU time works as a metric is that, if a page has great numbers for all other metrics but uses a ton of CPU time, the page is not going to be usable on a slow device. If it takes 100% CPU for 30 seconds, the page will be completely unusable for 30 seconds, and if it takes 50% CPU for 60 seconds, the page will be barely usable for 60 seconds, etc. Another reason it works is that, relative to commonly used metrics, it's hard to cheat on CPU time and make optimizations that significantly move the number without impacting user experience.

The color scheme in the table below is that, for sizes, more green = smaller / fast and more red = larger / slower. Extreme values are in black.

Site Size M3 Max M1 Pro M3/10 Tecno S8C Itel P32
wire raw LCP* CPU LCP* CPU LCP* CPU LCP* CPU LCP* CPU
danluu.com 6kB 18kB 50ms 20ms 50ms 30ms 0.2s 0.3s 0.4s 0.3s 0.5s 0.5s
HN 11kB 50kB 0.1s 30ms 0.1s 30ms 0.3s 0.3s 0.5s 0.5s 0.7s 0.6s
MyBB 0.1MB 0.3MB 0.3s 0.1s 0.3s 0.1s 0.6s 0.6s 0.8s 0.8s 2.1s 1.9s
phpBB 0.4MB 0.9MB 0.3s 0.1s 0.4s 0.1s 0.7s 1.1s 1.7s 1.5s 4.1s 3.9s
WordPress 1.4MB 1.7MB 0.2s 60ms 0.2s 80ms 0.7s 0.7s 1s 1.5s 1.2s 2.5s
WordPress (old) 0.3MB 1.0MB 80ms 70ms 90ms 90ms 0.4s 0.9s 0.7s 1.7s 1.1s 1.9s
XenForo 0.3MB 1.0MB 0.4s 0.1s 0.6s 0.2s 1.4s 1.5s 1.5s 1.8s FAIL FAIL
Ghost 0.7MB 2.4MB 0.1s 0.2s 0.2s 0.2s 1.1s 2.2s 1s 2.4s 1.1s 3.5s
vBulletin 1.2MB 3.4MB 0.5s 0.2s 0.6s 0.3s 1.1s 2.9s 4.4s 4.8s 13s 16s
Squarespace 1.9MB 7.1MB 0.1s 0.4s 0.2s 0.4s 0.7s 3.6s 14s 5.1s 16s 19s
Mastodon 3.8MB 5.3MB 0.2s 0.3s 0.2s 0.4s 1.8s 4.7s 2.0s 7.6s FAIL FAIL
Tumblr 3.5MB 7.1MB 0.7s 0.6s 1.1s 0.7s 1.0s 7.0s 14s 7.9s 8.7s 8.7s
Quora 0.6MB 4.9MB 0.7s 1.2s 0.8s 1.3s 2.6s 8.7s FAIL FAIL 19s 29s
Bluesky 4.8MB 10MB 1.0s 0.4s 1.0s 0.5s 5.1s 6.0s 8.1s 8.3s FAIL FAIL
Wix 7.0MB 21MB 2.4s 1.1s 2.5s 1.2s 18s 11s 5.6s 10s FAIL FAIL
Substack 1.3MB 4.3MB 0.4s 0.5s 0.4s 0.5s 1.5s 4.9s 14s 14s FAIL FAIL
Threads 9.3MB 13MB 1.5s 0.5s 1.6s 0.7s 5.1s 6.1s 6.4s 16s 28s 66s
Twitter 4.7MB 11MB 2.6s 0.9s 2.7s 1.1s 5.6s 6.6s 12s 19s 24s 43s
Shopify 3.0MB 5.5MB 0.4s 0.2s 0.4s 0.3s 0.7s 2.3s 10s 26s FAIL FAIL
Discourse 2.6MB 10MB 1.1s 0.5s 1.5s 0.6s 6.5s 5.9s 15s 26s FAIL FAIL
Patreon 4.0MB 13MB 0.6s 1.0s 1.2s 1.2s 1.2s 14s 1.7s 31s 9.1s 45s
Medium 1.2MB 3.3MB 1.4s 0.7s 1.4s 1s 2s 11s 2.8s 33s 3.2s 63s
Reddit 1.7MB 5.4MB 0.9s 0.7s 0.9s 0.9s 6.2s 12s 1.2s FAIL FAIL

At a first glance, the table seems about right, in that the sites that feel slow unless you have a super fast device show up as slow in the table (as in, max(LCP*,CPU)) is high on lower-end devices). When I polled folks about what platforms they thought would be fastest and slowest on our slow devices (Mastodon, Twitter, Threads), they generally correctly predicted that Wordpress and Ghost would be faster than Substack and Medium, and that Discourse would be much slower than old PHP forums like phpBB, XenForo, and vBulletin. I also pulled Google PageSpeed Insights (PSI) scores for pages (not shown) and the correlation isn't as strong with those numbers because a handful of sites have managed to optimize their PSI scores without actually speeding up their pages for users.

If you've never used a low-end device like this, the general experience is that many sites are unusable on the device and loading anything resource intensive (an app or a huge website) can cause crashes. Doing something too intense in a resource intensive app can also cause crashes. While reviews note that you can run PUBG and other 3D games with decent performance on a Tecno Spark 8C, this doesn't mean that the device is fast enough to read posts on modern text-centric social media platforms or modern text-centric web forums. While 40fps is achievable in PUBG, we can easily see less than 0.4fps when scrolling on these sites.

We can see from the table how many of the sites are unusable if you have a slow device. All of the pages with 10s+ CPU are a fairly bad experience even after the page loads. Scrolling is very jerky, frequently dropping to a few frames per second and sometimes well below. When we tap on any link, the delay is so long that we can't be sure if our tap actually worked. If we tap again, we can get the dreaded situation where the first tap registers, which then causes the second tap to do the wrong thing, but if we wait, we often end up waiting too long because the original tap didn't actually register (or it registered, but not where we thought it did). Although MyBB doesn't serve up a mobile site and is penalized by Google for not having a mobile friendly page, it's actually much more usable on these slow mobiles than all but the fastest sites because scrolling and tapping actually work.

Another thing we can see is how much variance there is in the relative performance on different devices. For example, comparing an M3/10 and a Tecno Spark 8C, for danluu.com and Ghost, an M3/10 gives a halfway decent approximation of the Tecno Spark 8C (although danluu.com loads much too quickly), but the Tecno Spark 8C is about three times slower (CPU) for Medium, Substack, and Twitter, roughly four times slower for Reddit and Discourse, and over an order of magnitude faster for Shopify. For Wix, the CPU approximation is about accurate, but our `Tecno Spark 8C is more than 3 times slower on LCP*. It's great that Chrome lets you conveniently simulate a slower device from the convenience of your computer, but just enabling Chrome's CPU throttling (or using any combination of out-of-the-box options that are available) gives fairly different results than we get on many real devices. The full reasons for this are beyond the scope of the post; for the purposes of this post, it's sufficient to note that slow pages are often super-linearly slow as devices get slower and that slowness on one page doesn't strongly predict slowness on another page.

If take a site-centric view instead of a device-centric view, another way to look at it is that sites like Discourse, Medium, and Reddit, don't use all that much CPU on our fast M3 and M1 computers, but they're among the slowest on our Tecno Spark 8C (Reddit's CPU is shown as because, no matter how long we wait with no interaction, Reddit uses ~90% CPU). Discourse also sometimes crashed the browser after interacting a bit or just waiting a while. For example, one time, the browser crashed after loading Discourse, scrolling twice, and then leaving the device still for a minute or two. For consistency's sake, this wasn't marked as FAIL in the table since the page did load but, realistically, having a page so resource intensive that the browser crashes is a significantly worse user experience than any of the FAIL cases in the table. When we looked at how web bloat impacts users with slow connections, we found that much of the web was unusable for people with slow connections and slow devices are no different.

Another pattern we can see is how the older sites are, in general, faster than the newer ones, with sites that (visually) look like they haven't been updated in a decade or two tending to be among the fastest. For example, MyBB, the least modernized and oldest looking forum is 3.6x / 5x faster (LCP* / CPU) than Discourse on the M3, but on the Tecno Spark 8C, the difference is 19x / 33x and, given the overall scaling, it seems safe to guess that the difference would be even larger on the Itel P32 if Discourse worked on such a cheap device.

Another example is Wordpress (old) vs. newer, trendier, blogging platforms like Medium and Substack. Wordpress (old) is is 17.5x / 10x faster (LCP* / CPU) than Medium and 5x / 7x faster (LCP* / CPU) faster than Substack on our M3 Max, and 4x / 19x and 20x / 8x faster, respectively, on our Tecno Spark 8C. Ghost is a notable exception to this, being a modern platform (launched a year after Medium) that's competitive with older platforms (modern Wordpress is also arguably an exception, but many folks would probably still consider that to be an old platform). Among forums, NodeBB also seems to be a bit of an exception (see appendix for details).

Sites that use modern techniques like partially loading the page and then dynamically loading the rest of it, such as Discourse, Reddit, and Substack, tend to be less usable than the scores in the table indicate. Although, in principle, you could build such a site in a simple way that works well with cheap devices but, in practice sites that use dynamic loading tend to be complex enough that the sites are extremely janky on low-end devices. It's generally difficult or impossible to scroll a predictable distance, which means that users will sometimes accidentally trigger more loading by scrolling too far, causing the page to lock up. Many pages actually remove the parts of the page you scrolled past as you scroll; all such pages are essentially unusable. Other basic web features, like page search, also generally stop working. Pages with this kind of dynamic loading can't rely on the simple and fast ctrl/command+F search and have to build their own search. How well this works varies (this used to work quite well in Google docs, but for the past few months or maybe a year, it takes so long to load that I have to deliberately wait after opening a doc to avoid triggering the browser's useless built in search; Discourse search has never really worked on slow devices or even not very fast but not particular slow devices).

In principle, these modern pages that burn a ton of CPU when loading could be doing pre-work that means that later interactions on the page are faster and cheaper than on the pages that do less up-front work (this is a common argument in favor of these kinds of pages), but that's not the case for pages tested, which are slower to load initially, slower on subsequent loads, and slower after they've loaded.

To understand why the theoretical idea that doing all this work up-front doesn't generally result in a faster experience later, this exchange between a distinguished engineer at Google and one of the founders of Discourse (and CEO at the time) is illustrative, in a discussion where the founder of Discourse says that you should test mobile sites on laptops with throttled bandwidth but not throttled CPU:

  • Google: *you* also don't have slow 3G. These two settings go together. Empathy needs to extend beyond iPhone XS users in a tunnel.
  • Discourse: Literally any phone of vintage iPhone 6 or greater is basically as fast as the "average" laptop. You have to understand how brutally bad Qualcomm is at their job. Look it up if you don't believe me.
  • Google: I don't need to believe you. I know. This is well known by people who care. My point was that just like not everyone has a fast connection not everyone has a fast phone. Certainly the iPhone 6 is frequently very CPU bound on real world websites. But that isn't the point.
  • Discourse: we've been trending towards infinite CPU speed for decades now (and we've been asymptotically there for ~5 years on desktop), what we are not and will never trend towards is infinite bandwidth. Optimize for the things that matter. and I have zero empathy for @qualcomm. Fuck Qualcomm, they're terrible at their jobs. I hope they go out of business and the ground their company existed on is plowed with salt so nothing can ever grow there again.
  • Google: Mobile devices are not at all bandwidth constraint in most circumstances. They are latency constraint. Even the latest iPhone is CPU constraint before it is bandwidth constraint. If you do well on 4x slow down on a MBP things are pretty alright
  • ...
  • Google: Are 100% of users on iOS?
  • Discourse: The influential users who spend money tend to be, I’ll tell you that ... Pointless to worry about cpu, it is effectively infinite already on iOS, and even with Qualcomm’s incompetence, will be within 4 more years on their embarrassing SoCs as well

When someone asks the founder of Discourse, "just wondering why you hate them", he responds with a link that cites the Kraken and Octane benchmarks from this Anandtech review, which have the Qualcomm chip at 74% and 85% of the performance of the then-current Apple chip, respectively.

The founder and then-CEO of Discourse considers Qualcomm's mobile performance embarrassing and finds this so offensive that he thinks Qualcomm engineers should all lose their jobs for delivering 74% to 85% of the performance of Apple. Apple has what I consider to be an all-time great performance team. Reasonable people could disagree on that, but one has to at least think of them as a world-class team. So, producing a product with 74% to 85% of an all-time-great team is considered an embarrassment worthy of losing your job.

There are two attitudes on display here which I see in a lot of software folks. First, that CPU speed is infinite and one shouldn't worry about CPU optimization. And second, that gigantic speedups from hardware should be expected and the only reason hardware engineers wouldn't achieve them is due to spectacular incompetence, so the slow software should be blamed on hardware engineers, not software engineers. Donald Knuth expressed a similar sentiment in

I might as well flame a bit about my personal unhappiness with the current trend toward multicore architecture. To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! I won’t be surprised at all if the whole multiithreading idea turns out to be a flop, worse than the "Itanium" approach that was supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write. Let me put it this way: During the past 50 years, I’ve written well over a thousand programs, many of which have substantial size. I can’t think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. Surely, for example, multiple processors are no help to TeX ... I know that important applications for parallelism exist—rendering graphics, breaking codes, scanning images, simulating physical and biological processes, etc. But all these applications require dedicated code and special-purpose techniques, which will need to be changed substantially every few years. Even if I knew enough about such methods to write about them in TAOCP, my time would be largely wasted, because soon there would be little reason for anybody to read those parts ... The machine I use today has dual processors. I get to use them both only when I’m running two independent jobs at the same time; that’s nice, but it happens only a few minutes every week.

In the case of Discourse, a hardware engineer is an embarrassment not deserving of a job if they can't hit 90% of the performance of an all-time-great performance team but, as a software engineer, delivering 3% the performance of a non-highly-optimized application like MyBB is no problem. In Knuth's case, hardware engineers gave programmers a 100x performance increase every decade for decades with little to no work on the part of programmers. The moment this slowed down and programmers had to adapt to take advantage of new hardware, hardware engineers were "all out of ideas", but learning a few "new" (1970s and 1980s era) ideas to take advantage of current hardware would be a waste of time. And we've previously discussed Alan Kay's claim that hardware engineers are "unsophisticated" and "uneducated" and aren't doing "real engineering" and how we'd get a 1000x speedup if we listened to Alan Kay's "sophisticated" ideas.

It's fairly common for programmers to expect that hardware will solve all their problems, and then, when that doesn't happen, pass the issue onto the user, explaining why the programmer needn't do anything to help the user. A question one might ask is how much performance improvement programmers have given us. There are cases of algorithmic improvements that result in massive speedups but, as we noted above, Discourse, the fastest growing forum software today, seems to have given us an approximately 1000000x slowdown in performance.

Another common attitude on display above is the idea that users who aren't wealthy don't matter. When asked if 100% of users are on iOS, the founder of Discourse says "The influential users who spend money tend to be, I’ll tell you that". We see the same attitude all over comments on Tonsky's JavaScript Bloat post, with people expressing cocktail-party sentiments like "Phone apps are hundreds of megs, why are we obsessing over web apps that are a few megs? Starving children in Africa can download Android apps but not web apps? Come on" and "surely no user of gitlab would be poor enough to have a slow device, let's be serious" (paraphrased for length).

But when we look at the size of apps that are downloaded in Africa, we see that people who aren't on high-end devices use apps like Facebook Lite (a couple megs) and commonly use apps that are a single digit to low double digit number of megabytes. There are multiple reasons app makers care about their app size. One is just the total storage available on the phone; if you watch real users install apps, they often have to delete and uninstall things to put a new app on, so the smaller size is both easier to to install and has a lower chance of being uninstalled when the user is looking for more space. Another is that, if you look at data on app size and usage (I don't know of any public data on this; please pass it along if you have something public I can reference), when large apps increase the size and memory usage, they get more crashes, which drives down user retention, growth, and engagement and, conversely, when they optimize their size and memory usage, they get fewer crashes and better user retention, growth, and engagement.

Alex Russell points out that iOS has 7% market share in India (a 1.4B person market) and 6% market share in Latin America (a 600M person market). Although the founder of Discourse says that these aren't "influential users" who matter, these are still real human beings. Alex further points out that, according to Windows telemetry, which covers the vast majority of desktop users, most laptop/desktop users are on low-end machines which are likely slower than a modern iPhone.

On the bit about no programmers having slow devices, I know plenty of people who are using hand-me-down devices that are old and slow. Many of them aren't even really poor; they just don't see why (for example) their kid needs a super fast device, and they don't understand how much of the modern web works poorly on slow devices. After all, the "slow" device can play 3d games and (with the right OS) compile codebases like Linux or Chromium, so why shouldn't the device be able to interact with a site like gitlab?

Contrary to the claim from the founder of Discourse that, within years, every Android user will be on some kind of super fast Android device, it's been six years since his comment and it's going to be at least a decade before almost everyone in the world who's using a phone has a high-speed device and this could easily take two decades or more. If you look up marketshare stats for Discourse, it's extremely successful; it appears to be the fastest growing forum software in the world by a large margin. The impact of having the fastest growing forum software in the world created by an organization whose then-leader was willing to state that he doesn't really care about users who aren't "influential users who spend money", who don't have access to "infinite CPU speed", is that a lot of forums are now inaccessible to people who don't have enough wealth to buy a device with effectively infinite CPU.

If the founder of Discourse were an anomaly, this wouldn't be too much of a problem, but he's just verbalizing the implicit assumptions a lot of programmers have, which is why we see that so many modern websites are unusable if you buy the income-adjusted equivalent of a new, current generation, iPhone in a low-income country.

Thanks to Yossi Kreinen, Fabian Giesen, John O'Nolan, Joseph Scott, Loren McIntyre, Daniel Filan, @acidshill, Alex Russell, Chris Adams, Tobias Marschner, Matt Stuchlik, @[email protected], Justin Blank, Andy Kelley, Julian Lam, Matthew Thomas, avarcat, @[email protected], William Ehlhardt, Philip R. Boulain, and David Turner for comments/corrections/discussion.

Appendix: gaming LCP

We noted above that we used LCP* and not LCP. This is because LCP basically measures when the largest change happens. When this metric was not deliberately gamed in ways that don't benefit the user, this was a great metric, but this metric has become less representative of the actual user experience as more people have gamed it. In the less blatant cases, people do small optimizations that improve LCP but barely improve or don't improve the actual user experience.

In the more blatant cases, developers will deliberately flash a very large change on the page as soon as possible, generally a loading screen that has no value to the user (actually negative value because doing this increases the total amount of work done and the total time it takes to load the page) and then they carefully avoid making any change large enough that any later change would get marked as the LCP.

For the same reason that VW didn't publicly discuss how it was gaming its emissions numbers, developers tend to shy away from discussing this kind of LCP optimization in public. An exception to this is Discourse, where they publicly announced this kind of LCP optimization, with comments from their devs and the then-CTO (now CEO), noting that their new "Discourse Splash" feature hugely reduced LCP for sites after they deployed it. And then developers ask why their LCP is high, the standard advice from Discourse developers is to keep elements smaller than the "Discourse Splash", so that the LCP timestamp is computed from this useless element that's thrown up to optimize LCP, as opposed to having the timestamp be computed from any actual element that's relevant to the user. Here's a typical, official, comment from Discourse

If your banner is larger than the element we use for the "Introducing Discourse Splash - A visual preloader displayed while site assets load" you gonna have a bad time for LCP.

The official response from Discourse is that you should make sure that your content doesn't trigger the LCP measurement and that, instead, our loading animation timestamp is what's used to compute LCP.

The sites with the most extreme ratio of LCP of useful content vs. Chrome's measured LCP were:

  • Wix
    • M3: 6
    • M1: 12
    • Tecno Spark 8C: 3
    • Itel P32: N/A (FAIL)
  • Discourse:
    • M3: 10
    • M1: 12
    • Tecno Spark 8C: 4
    • Itel P32: N/A (FAIL)

Although we haven't discussed the gaming of other metrics, it appears that some websites also game other metrics and "optimize" them even when this has no benefit to users.

Appendix: the selfish argument for optimizing sites

This will depend on the scale of the site as well as its performance, but when I've looked at this data for large companies I've worked for, improving site and app performance is worth a mind boggling amount of money. It's measurable in A/B tests and it's also among the interventions that has, in long-term holdbacks, a relatively large impact on growth and retention (many interventions test well but don't look as good long term, whereas performance improvements tend to look better long term).

Of course you can see this from the direct numbers, but you can also implicitly see this in a lot of ways when looking at the data. One angle is that (just for example), at Twitter, user-observed p99 latency was about 60s in India as well as a number of African countries (even excluding relatively wealthy ones like Egypt and South Africa) and also about 60s in the United States. Of course, across the entire population, people have faster devices and connections in the United States, but in every country, there are enough users that have slow devices or connections that the limiting factor is really user patience and not the underlying population-level distribution of devices and connections. Even if you don't care about users in Nigeria or India and only care about U.S. ad revenue, improving performance for low-end devices and connections has enough of impact that we could easily see the impact in global as well as U.S. revenue in A/B tests, especially in long-term holdbacks. And you also see the impact among users who have fast devices since a change that improves the latency for a user with a "low-end" device from 60s to 50s might improve the latency for a user with a high-end device from 5s to 4.5s, which has an impact on revenue, growth, and retention numbers as well.

For a variety of reasons that are beyond the scope of this doc, this kind of boring, quantifiable, growth and revenue driving work has been difficult to get funded at most large companies I've worked for relative to flash product work that ends up showing little to no impact in long-term holdbacks.

Appendix: designing for low performance devices

When using slow devices or any device with low bandwidth and/or poor connectivity, the best experiences, by far, are generally the ones that load a lot of content at once into a static page. If the images have proper width and height attributes and alt text, that's very helpful. Progressive images (as in progressive jpeg) isn't particularly helpful.

On a slow device with high bandwidth, any lightweight, static, page works well, and lightweight dynamic pages can work well if designed for performance. Heavy, dynamic, pages are doomed unless the page weight doesn't cause the page to be complex.

With low bandwidth and/or poor connectivity, lightweight pages are fine. With heavy pages, the best experience I've had is when I trigger a page load, go do something else, and then come back when it's done (or at least the HTML and CSS are done). I can then open each link I might want to read in a new tab, and then do something else while I wait for those to load.

A lot of the optimizations that modern websites do, such as partial loading that causes more loading when you scroll down the page, and the concomitant hijacking of search (because the browser's built in search is useless if the page isn't fully loaded) causes the interaction model that works to stop working and makes pages very painful to interact with.

Just for example, a number of people have noted that Substack performs poorly for them because it does partial page loads. Here's a video by @acidshill of what it looks like to load a Substack article and then scroll on an iPhone 8, where the post has a fairly fast LCP, but if you want to scroll past the header, you have to wait 6s for the next page to load, and then on scrolling again, you have to wait maybe another 1s to 2s:

As an example of the opposite approach, I tried loading some fairly large plain HTML pages, such as https://danluu.com/diseconomies-scale/ (0.1 MB wire / 0.4 MB raw) and https://danluu.com/threads-faq/ (0.4 MB wire / 1.1 MB raw) and these were still quite usable for me even on slow devices. 1.1 MB seems to be larger than optimal and breaking that into a few different pages would be better on a low-end devices, but a single page with 1.1 MB of text works much better than most modern sites on a slow device. While you can get into trouble with HTML pages that are so large that browsers can't really handle them, for pages with a normal amount of content, it generally isn't until you have complex CSS payloads or JS that the pages start causing problems for slow devices. Below, we test pages that are relatively simple, some of which have a fair amount of media (14 MB in one case) and find that these pages work ok, as long as they stay simple.

Chris Adams has also noted that blind users, using screen readers, often report that dynamic loading makes the experience much worse for them. Like dynamic loading to improve performance, while this can be done well, it's often either done badly or bundled with so much other complexity that the result is worse than a simple page.

@Qingcharles noted another accessibility issue — the (prison) parolees he works with are given "lifeline" phones, which are often very low end devices. From a quick search, in 2024, some people will get an iPhone 6 or an iPhone 8, but there are also plenty of devices that are lower end than an Itel P32, let alone a Tecno Spark 8C. They also get plans with highly limited data, and then when they run out, some people "can't fill out any forms for jobs, welfare, or navigate anywhere with Maps".

For sites that do up-front work and actually give you a decent experience on low end devices, Andy Kelley pointed out an example of a site that does up front work that seems to work ok on a slow device (although it would struggle on a very slow connection), the Zig standard library documentation:

I made the controversial decision to have it fetch all the source code up front and then do all the content rendering locally. In theory, this is CPU intensive but in practice... even those old phones have really fast CPUs!

On the Tecno Spark 8C, this uses 4.7s of CPU and, afterwards, is fairly responsive (relative to the device — of course an iPhone responds much more quickly. Taps cause links to load fairly quickly and scrolling also works fine (it's a little jerky, but almost nothing is really smooth on this device). This seems like the kind of thing people are referring to when they say that you can get better performance if you ship a heavy payload, but there aren't many examples of that which actually improve performance on low-end devices.

Appendix: articles on web performance issues

  • 2015: Maciej Cegłowski: The Website Obesity Crisis
    • Size: 1.0 MB / 1.1 MB
    • Tecno Spark 8C: 0.9s / 1.4s
      • Scrolling a bit jerky, images take a little bit of time to appear if scrolling very quickly (jumping halfway down page from top), but delay is below what almost any user would perceive when scrolling a normal distance.
  • 2015: Nate Berkopec: Page Weight Doesn't Matter
    • Size: 80 kB / 0.2 MB
    • Tecno Spark 8C: 0.8s / 0.7s
      • Does lazy loading, page downloads 650 kB / 1.8 MB if you scroll through the entire page, but scrolling is only a little jerky and the lazy loading doesn't cause delays. Probably the only page I've tried that does lazy loading in a way that makes the experience better and not worse on a slow device; I didn't test on a slow connection, where this would still make the experience worse.
    • Itel P32: 1.1s / 1s
      • Scrolling basically unusable; scroll extremely jerky and moves a random distance, often takes over 1s for text to render when scrolling to new text; can be much worse with images that are lazy loaded. Even though this is the best implementation of lazy loading I've seen in the wild, the Itel P32 still can't handle it.
  • 2017: Dan Luu: How web bloat impacts users with slow connections
    • Size: 14 kB / 57 kB
    • Tecno Spark 8C: 0.5s / 0.3s
      • Scrolling and interaction work fine.
    • Itel P32:0.7s / 0.5 s
  • 2017-2024+: Alex Russell: The Performance Inequality Gap (series)
    • Size: 82 kB / 0.1 MB
    • Tecno Spark 8C: 0.5s / 0.4s
      • Scrolling and interaction work fine.
    • Itel P32: 0.7s / 0.4s
      • Scrolling and interaction work fine.
  • 2024: Nikita Prokopov (Tonsky): JavaScript Bloat in 2024
    • Size: 14 MB / 14 MB
    • Tecno Spark 8C: 0.8s / 1.9s
      • When scrolling, it takes a while for images to show up (500ms or so) and the scrolling isn't smooth, but it's not jerky enough that it's difficult to scroll to the right place.
    • Itel P32: 2.5s / 3s
      • Scrolling isn't smooth. Scrolling accurately is a bit difficult, but can generally scroll to where you want if very careful. Generally takes a bit more than 1s for new content to appear when you scroll a significant distance.
  • 2024: Dan Luu: This post
    • Size: 25 kB / 74 kB
    • Tecno Spark 8C: 0.6s / 0.5s
      • Scrolling and interaction work fine.
    • Itel P32: 1.3s / 1.1s
      • Scrolling and interaction work fine, although I had to make a change for this to be the case — this doc originally had an embedded video, which the Itel P32 couldn't really handle.
        • Note that, while these numbers are worse than the numbers for "Page Weight Doesn't Matter", this page is usable after load, which that other page isn't beacuse it execute some kind of lazy loading that's too complex for this phone to handle in a reasonable timeframe.

Appendix: empathy for non-rich users

Something I've observed over time, as programming has become more prestigious and more lucrative, is that people have tended to come from wealthier backgrounds and have less exposure to people with different income levels. An example we've discussed before, is at a well-known, prestigious, startup that has a very left-leaning employee base, where everyone got rich, on a discussion about the covid stimulus checks, in a slack discussion, a well meaning progressive employee said that it was pointless because people would just use their stimulus checks to buy stock. This person had, apparently, never talked to any middle-class (let alone poor) person about where their money goes or looked at the data on who owns equity. And that's just looking at American wealth. When we look at world-wide wealth, the general level of understanding is much lower. People seem to really underestimate the dynamic range in wealth and income across the world. From having talked to quite a few people about this, a lot of people seem to have mental buckets for "poor by American standards" (buys stock with stimulus checks) and "poor by worldwide standards" (maybe doesn't even buy stock), but the range of poverty in the world dwarfs the range of poverty in America to an extent that not many wealthy programmers seem to realize.

Just for example, in this discussion how lucky I was (in terms of financial opportunities) that my parents made it to America, someone mentioned that it's not that big a deal because they had great financial opportunities in Poland. For one thing, with respect to the topic of the discussion, the probability that someone will end up with a high-paying programming job (senior staff eng at a high-paying tech company) or equivalent, I suspect that, when I was born, being born poor in the U.S. gives you better odds than being fairly well off in Poland, but I could believe the other case as well if presented with data. But if we're comparing Poland v. U.S. to Vietnam v. U.S., if I spend 15 seconds looking up rough wealth numbers for these countries in the year I was born, the GDP/capita ratio of U.S. : Poland was ~8:1, whereas it was ~50 : 1 for Poland : Vietnam. The difference in wealth between Poland and Vietnam was roughly the square of the difference between the U.S. and Poland, so Poland to Vietnam is roughly equivalent to Poland vs. some hypothetical country that's richer than the U.S. by the amount that the U.S. is richer than Poland. These aren't even remotely comparable, but a lot of people seem to have this mental model that there's "rich countries" and "not rich countries" and "not rich countries" are all roughly in the same bucket. GDP/capita isn't ideal, but it's easier to find than percentile income statistics; the quick search I did also turned up that annual income in Vietnam then was something like $200-$300 a year. Vietnam was also going through the tail end of a famine whose impacts are a bit difficult to determine because statistics here seem to be gamed, but if you believe the mortality rate statistics, the famine caused total overall mortality rate to jump to double the normal baseline1.

Of course, at the time, the median person in a low-income country wouldn't have had a computer, let alone internet access. But, today it's fairly common for people in low-income countries to have devices. Many people either don't seem to realize this or don't understand what sorts of devices a lot of these folks use.

Appendix: comments from Fabian Giesen

On the Discourse founder's comments on iOS vs. Android marketshare, Fabian notes

In the US, according to the most recent data I could find (for 2023), iPhones have around 60% marketshare. In the EU, it's around 33%. This has knock-on effects. Not only do iOS users skew towards the wealthier end, they also skew towards the US.

There's some secondary effects from this too. For example, in the US, iMessage is very popular for group chats etc. and infamous for interoperating very poorly with Android devices in a way that makes the experience for Android users very annoying (almost certainly intentionally so).

In the EU, not least because Android is so much more prominent, iMessage is way less popular and anecdotally, even iPhone users among my acquaintances who would probably use iMessage in the US tend to use WhatsApp instead.

Point being, globally speaking, recent iOS + fast Internet is even more skewed towards a particular demographic than many app devs in the US seem to be aware.

And on the comment about mobile app vs. web app sizes, Fabian said:

One more note from experience: apps you install when you install them, and generally have some opportunity to hold off on updates while you're on a slow or metered connection (or just don't have data at all).

Back when I originally got my US phone, I had no US credit history and thus had to use prepaid plans. I still do because it's fine for what I actually use my phone for most of the time, but it does mean that when I travel to Germany once a year, I don't get data roaming at all. (Also, phone calls in Germany cost me $1.50 apiece, even though T-Mobile is the biggest mobile provider in Germany - though, of course, not T-Mobile US.)

Point being, I do get access to free and fast Wi-Fi at T-Mobile hotspots (e.g. major train stations, airports etc.) and on inter-city trains that have them, but I effectively don't have any data plan when in Germany at all.

This is completely fine with mobile phone apps that work offline and sync their data when they have a connection. But web apps are unusable while I'm not near a public Wi-Fi.

Likewise I'm fine sending an email over a slow metered connection via the Gmail app, but I for sure wouldn't use any web-mail client that needs to download a few MBs worth of zipped JS to do anything on a metered connection.

At least with native app downloads, I can prepare in advance and download them while I'm somewhere with good internet!

Another comment from Fabian (this time paraphrased since this was from a conversation), is that people will often justify being quantitatively hugely slower because there's a qualitative reason something should be slow. One example he gave was that screens often take a long time to sync their connection and this is justified because there are operations that have to be done that take time. For a long time, these operations would often take seconds. Recently, a lot of displays sync much more quickly because Nvidia specifies how long this can take for something to be "G-Sync" certified, so display makers actually do this in a reasonable amount of time now. While it's true that there are operations that have to be done that take time, there's no fundamental reason they should take as much time as they often used to. Another example he gave was on how someone was justifying how long it took to read thousands of files because the operation required a lot of syscalls and "syscalls are slow", which is a qualitatively true statement, but if you look at the actual cost of a syscall, in the case under discussion, the cost of a syscall was many orders of magnitude from being costly enough to be a reasonable explanation for why it took so long to read thousands of files.

On this topic, when people point out that a modern website is slow, someone will generally respond with the qualitative defense that the modern website has these great features, which the older website is lacking. And while it's true that (for example) Discourse has features that MyBB doesn't, it's hard to argue that its feature set justifies being 33x slower.

Appendix: experimental details

With the exception of danluu.com and, arguably, HN, for each site, I tried to find the "most default" experience. For example, for WordPress, this meant a demo blog with the current default theme, twentytwentyfour. In some cases, this may not be the most likely thing someone uses today, e.g., for Shopify, I looked at the first thing that theme they give you when you browse their themes, but I didn't attempt to find theme data to see what the most commonly used theme is. For this post, I wanted to do all of the data collection and analysis as a short project, something that takes less than a day, so there were a number of shortcuts like this, which will be described below. I don't think it's wrong to use the first-presented Shopify theme in a decent fraction of users will probably use the first-presente theme, but that is, of course, less representative than grabbing whatever the most common theme is and then also testing many different sites that use that theme to see how real-world performance varies when people modify the theme for their own use. If I worked for Shopify or wanted to do competitive analysis on behalf of a competitor, I would do that, but for a one-day project on how large websites impact users on low-end devices, the performance of Shopify demonstrated here seems ok. I actually did the initial work for this around when I ran these polls, back in February; I just didn't have time to really write this stuff up for a month.

For the tests on laptops, I tried to have the laptop at ~60% battery, not plugged in, and the laptop was idle for enough time to return to thermal equilibrium in a room at 20°C, so pages shouldn't be impacted by prior page loads or other prior work that was happening on the machine.

For the mobile tests, the phones were at ~100% charge and plugged in, and also previously at 100% charge so the phones didn't have any heating effect you can get from rapidly charging. As noted above, these tests were formed with 1Gbps WiFi. No other apps were running, the browser had no other tabs open, and the only apps that were installed on the device, so no additional background tasks should've been running other than whatever users are normally subject to by the device by default. A real user with the same device is going to see worse performance than we measured here in almost every circumstance except if running Chrome Dev Tools on a phone significantly degrades performance. I noticed that, on the Itel P32, scrolling was somewhat jerkier with Dev Tools running than when running normally but, since this was a one-day project, I didn't attempt to quantify this and if it impacts some sites much more than others. In absolute terms, the overhead can't be all that large because the fastest sites are still fairly fast with Dev Tools running, but if there's some kind of overhead that's super-linear in the amount of work the site does (possibly indirectly, if it causes some kind of resource exhaustion), then that could be a problem in measurements of some sites.

Sizes were all measured on mobile, so in cases where different assets are loaded on mobile vs. desktop, the we measured the mobile asset sizes. CPU was measured as CPU time on the main thread (I did also record time on other threads for sites that used other threads, but didn't use this number; if CPU were a metric people wanted to game, time on other threads would have to be accounted for to prevent sites from trying to offload as much work as possible to other threads, but this isn't currently an issue and time on main thread is more directly correlated to usability than sum of time across all threads, and the metric that would work for gaming is less legible with no upside for now).

For WiFi speeds, speed tests had the following numbers:

  • M3 Max
    • Netflix (fast.com)
      • Download: 850 Mbps
      • Upload: 840 Mbps
      • Latency (unloaded / loaded): 3ms / 8ms
    • Ookla
      • Download: 900 Mbps
      • Upload: 840 Mbps
      • Latency (unloaded / download / upload): 3ms / 8ms / 13ms
  • Tecno Spark 8C
    • Netflix (fast.com)
      • Download: 390 Mbps
      • Upload: 210 Mbps
      • Latency (unloaded / loaded): 2ms / 30ms
    • Oookla
      • Ookla web app fails, can't see results
  • Itel P32
    • Netflix
      • Download: 44 Mbps
      • Upload: test fails to work (sends one chunk of data and then hangs, sending no more data)
      • Latency (unloaded / loaded): 4ms / 400ms
    • Okta
      • Download: 45 Mbps
      • Upload: test fails to work
      • Latency: test fails to display latency

One thing to note is that the Itel P32 doesn't really have the ability to use the bandwidth that it nominally has. Looking at the top Google reviews, none of them mention this. The first review reads

Performance-wise, the phone doesn’t lag. It is powered by the latest Android 8.1 (GO Edition) ... we have 8GB+1GB ROM and RAM, to run on a power horse of 1.3GHz quad-core processor for easy multi-tasking ... I’m impressed with the features on the P32, especially because of the price. I would recommend it for those who are always on the move. And for those who take battery life in smartphones has their number one priority, then P32 is your best bet.

The second review reads

Itel mobile is one of the leading Africa distributors ranking 3rd on a continental scale ... the light operating system acted up to our expectations with no sluggish performance on a 1GB RAM device ... fairly fast processing speeds ... the Itel P32 smartphone delivers the best performance beyond its capabilities ... at a whooping UGX 330,000 price tag, the Itel P32 is one of those amazing low-range like smartphones that deserve a mid-range flag for amazing features embedded in a single package.

The third review reads

"Much More Than Just a Budget Entry-Level Smartphone ... Our full review after 2 weeks of usage ... While switching between apps, and browsing through heavy web pages, the performance was optimal. There were few lags when multiple apps were running in the background, while playing games. However, the overall performance is average for maximum phone users, and is best for average users [screenshot of game] Even though the game was skipping some frames, and automatically dropped graphical details it was much faster if no other app was running on the phone.

Notes on sites:

  • Wix
    • www.wix.com/website-template/view/html/3173?originUrl=https%3A%2F%2Fwww.wix.com%2Fwebsite%2Ftemplates%2Fhtml%2Fmost-popular&tpClick=view_button&esi=a30e7086-28db-4e2e-ba22-9d1ecfbb1250: this was the first entry when I clicked to get a theme
    • LCP was misleading on every device
    • On the Tecno Spark 8C, scrolling never really works. It's very jerky and this never settles down
    • On the Itel P32, the page fails non-deterministically (different errors on different loads); it can take quite a while to error out; it was 23s on the first run, with the CPU pegged for 28s
  • Patreon
    • www.patreon.com/danluu: used my profile where possible
    • Scrolling on Patreon and finding old posts is so painful that I maintain my own index of my Patreon posts so that I can find my old posts without having to use Patreon. Although Patreon's numbers in the table don't look that bad in the table when you're on a fast laptop, that's just for the initial load. The performance as you scroll is bad enough that I don't think that, today, there exists a computer and internet connection that browse Patreon with decent performance.
  • Threads
    • threads.net/danluu.danluu: used my profile where possible
    • On the Itel P32, this technically doesn't load correctly and could be marked as FAIL, but it's close enough that I counted it. The thing that's incorrect is that profile photos have a square box around then
      • However, as with the other heavy pages, interacting with the page doesn't really work and the page is unusable, but this appears to be for the standard performance reasons and not because the page failed to render
  • Twitter
    • twitter.com/danluu: used my profile where possible
  • Discourse
    • meta.discourse.org: this is what turned up when I searched for an official forum.
    • As discussed above, the LCP is highly gamed and basically meaningless. We linked to a post where the Discourse folks note that, on slow loads, they put a giant splash screen up at 2s to cap the LCP at 2s. Also notable is that, on loads that are faster than the 2s, the LCP is also highly gamed. For example, on the M3 Max with low-latency 1Gbps internet, the LCP was reported as 115ms, but the page loads actual content at 1.1s. This appears to use the same fundamental trick as "Discourse Splash", in that it paints a huge change onto the screen and then carefully loads smaller elements to avoid having the actual page content detected as the LCP.
    • On the Tecno Spark 8C, scrolling is unpredictable and can jump too far, triggering loading from infinite scroll, which hangs the page for 3s-10s. Also, the entire browser sometimes crashes if you just let the browser sit on this page for a while.
    • On the Itel P32, an error message is displayed after 7.5s
  • Bluesky
    • bsky.app/profile/danluu.com
    • Displays a blank screen on the Itel P32
  • Squarespace
    • cedar-fluid-demo.squarespace.com: this was the second theme that showed up when I clicked themes to get a theme; the first was one called "Bogart", but that was basically a "coming soon" single page screen with no content, so I used the second theme instead of the first one.
    • A lot of errors and warnings in the console with the Itel P32, but the page appears to load and work, although interacting with it is fairly slow and painful
    • LCP on the Tecno Spark 8C was significantly before the page content actually loaded
  • Tumblr
    • www.tumblr.com/slatestarscratchpad: used this because I know this tumblr exists. I don't read a lot of tumblers (maybe three or four), and this one seemed like the closest thing to my blog that I know of on tumblr.
    • This page fails on the Itel P32, but doesn't FAIL. The console shows that the JavaScript errors out, but the page still works fine (I tried scrolling, clicking links, etc., and these all worked), so you can actually go to the post you want and read it. The JS error appears to have made this page load much more quickly than it other would have and also made interacting with the page after it loaded fairly zippy.
  • Shopify
    • themes.shopify.com/themes/motion/styles/classic/preview?surface_detail=listing&surface_inter_position=1&surface_intra_position=1&surface_type=all: this was the first theme that showed up when I looked for themes
    • On the first M3/10 run, Chrome dev tools reported a nonsensical 697s of CPU time (the run completed in a normal amount of time, well under 697s or even 697/10s. This run was ignored when computing results.
    • On the Itel P32, the page load never completes and it just shows a flashing cursor-like image, which is deliberately loaded by the theme. On devices that load properly, the flashing cursor image is immediately covered up by another image, but that never happens here.
    • I wondered if it wasn't fair to use this example theme because there's some stuff on the page that lets you switch theme styles, so I checked out actual uses of the theme (the page that advertises the theme lists users of the theme). I tried the first two listed real examples and they were both much slower than this demo page.
  • Reddit
    • reddit.com
    • Has an unusually low LCP* compared to how long it takes for the page to become usable. Although not measured in this test, I generally find the page slow and sort of unusable on Intel Macbooks which are, by historical standards, extremely fast computers (unless I use old.reddit.com)
  • Mastodon
    • mastodon.social/@danluu: used my profile where possible
    • Fails to load on Itel P32, just gives you a blank screen. Due to how long things generally take on the Itel P32, it's not obvious for a while if the page is failing or if it's just slow
  • Quora
    • www.quora.com/Ever-felt-like-giving-up-on-your-dreams-How-did-you-come-out-of-it: I tried googling for quora + the username of a metafilter user who I've heard is now prolific on Quora. Rather than giving their profile page, Google returned this page, which appears to have nothing to do with the user I searched for. So, this isn't comparable to the social media profiles, but getting a random irrelevant Quora result from Google is how I tend to interact with Quora, so I guess this is representative of my Quora usage.
    • On the Itel P32, the page stops executing scripts at some point and doesn't fully load. This causes it to fail to display properly. Interacting with the page doesn't really work either.
  • Substack
    • Used thezvi.substack.com because I know Zvi has a substack and writes about similar topics.
  • vBulletin:
    • forum.vbulletin.com: this is what turned up when I searched for an official forum.
  • Medium
    • medium.com/swlh: I don't read anything on Medium, so I googled for programming blogs on Medium and this was the top hit. From looking at the theme, it doesn't appear to be unusually heavy or particularly customized for a Medium blog. Since it appears to be widely read and popular, it's more likely to be served from a CDN and than some of the other blogs here.
    • On a run that wasn't a benchmark reference run, on the Itel P32, I tried scrolling starting 35s after loading the page. The delay to scroll was 5s-8s and scrolling moved an unpredictable amount, making the page completely unusable. This wasn't marked as a FAIL in the table, but one could argue that this should be a FAIL since the page is unusable.
  • Ghost
    • source.ghost.io because this is the current default Ghost theme and it was the first example I found
  • Wordpress
    • 2024.wordpress.net because this is the current default wordpress theme and this was the first example of it I found
  • XenForo
    • xenforo.com/community/: this is what turned up when I searched for an official forum
    • On the Itel P32, the layout is badly wrong and page content overlaps itself. There's no reasonable way to interact with the element you want because of this, and reading the text requires reading text that's been overprinted multiple times.
  • Wordpress (old)
    • Used thezvi.wordpress.com because it has the same content as Zvi's substack, and happens to be on some old wordpress theme that used to be a very common choice
  • phpBB
    • www.phpbb.com/community/index.php: this is what turned up when I searched for an official forum.
  • MyBB
    • community.mybb.com: this is what turned up when I searched for an official forum.
    • Site doesn't serve up a mobile version. In general, I find the desktop version of sites to be significantly better than the mobile version when on a slow device, so this works quite well, although they're likely penalized by Google for this.
  • HN
    • news.ycombinator.com
    • In principle, HN should be the slowest social media site or link aggregator because it's written in a custom Lisp that isn't highly optimized and the code was originally written with brevity and cleverness in mind, which generally gives you fairly poor performance. However, that's only poor relative to what you'd get if you were writing high-performance code, which is not a relevant point of comparison here.
  • danluu.com
    • Self explanatory
    • This currently uses a bit less CPU than HN, but I expect this to eventually use more CPU as the main page keeps growing. At the moment, this page has 176 links to 168 articles vs. HN's 199 links to 30 articles but, barring an untimely demise, this page should eventually have more links than HN.
      • As noted above, I find that pagination for such small pages makes the browsing experience much worse on slow devices or with bad connections, so I don't want to "optimize" this by paginating it or, even worse, doing some kind of dynamic content loading on scroll.
  • Woo Commerce
    • I originally measured Woo Commerce as well but, unlike the pages and platforms tested above, I didn't find that being fast or slow on the initial load was necessarily representative of subsequent performance of other action, so this wasn't included in the table because having this in the table is sort of asking for a comparison against Shopify. In particular, while the "most default" Woo theme I could find was significantly faster than the "most default" Shopify theme on initial load on a slow device, performance was multidimensional enough that it was easy to find realistic scenarios where Shopify was faster than Woo and vice versa on a slow device, which is quite different from what I saw with newer blogging platforms like Substack and Medium compared to older platforms like Wordpress, or a modern forum like Discourse versus the older PHP-based forums. A real comparison of shopping sites that have carts, checkout flows, etc., would require a better understanding of real-world usage of these sites than I was going to get in a single day.
  • NodeBB
    • community.nodebb.org
    • This wasn't in my original tests and I only tried this out because one of the founders of NodeBB suggested it, saying "I am interested in seeing whether @[email protected] would fare better in your testing. We spent quite a bit of time over the years on making it wicked fast, and I personally feel it is a better representation of modern forum software than Discourse, at least on speed and initial payload."
    • I didn't do the full set of tests because I don't keep the Itel P32 charged (the battery is in rough shape and discharges quite quickly once unplugged, so I'd have to wait quite a while to get it into a charged state)
    • On the tests I did, it got 0.3s/0.4s on the M1 and 3.4s/7.2s on the Tecno Spark 8C. This is moderately slower than vBulletin and significantly slower than the faster php forums, but much faster than Discourse. If you need a "modern" forum for some reason and want to have your forum be usable by people who aren't, by global standards, rich, this seems like it could work.
    • Another notable thing, given that it's a "modern" site, is that interaction works fine after initial load; you can scroll and tap on things and this all basically works, nothing crashed, etc.
    • Sizes were 0.9 MB / 2.2 MB, so also fairly light for a "modern" site and possibly usable on a slow connection, although slow connections weren't tested here.

Another kind of testing would be to try to configure pages to look as similar as possible. I'd be interested in seeing that results for that if anyone does it, but that test would be much more time consuming. For one thing, it requires customizing each site. And for another, it requires deciding what sites should look like. If you test something danluu.com-like, every platform that lets you serve up something light straight out of a CDN, like Wordpress and Ghost, should score similarly, with the score being dependent on the CDN and the CDN cache hit rate. Sites like Medium and Substack, which have relatively little customizability would score pretty much as they do here. Realistically, from looking at what sites exist, most users will create sites that are slower than the "most default" themes for Wordpress and Ghost, although it's plausible that readers of this blog would, on average, do the opposite, so you'd probably want to test a variety of different site styles.

Appendix: this site vs. sites that don't work on slow devices or slow connections

Just as an aside, something I've found funny for a long time is that I get quite a bit of hate mail about the styling on this page (and a similar volume of appreciation mail). By hate mail, I don't mean polite suggestions to change things, I mean the equivalent of road rage, but for web browsing; web rage. I know people who run sites that are complex enough that they're unusable by a significant fraction of people in the world. How come people are so incensed about the styling of this site and, proportionally, basically don't care at all that the web is unusable for so many people?

Another funny thing here is that the people who appreciate the styling generally appreciate that the site doesn't override any kind of default styling, letting you make the width exactly what you want (by setting your window size how you want it) and it also doesn't override any kind of default styling you apply to sites. The people who are really insistent about this want everyone to have some width limit they prefer, some font they prefer, etc., but it's always framed in a way as if they don't want it, it's really for the benefit of people at large even though accommodating the preferences of the web ragers would directly oppose the preferences of people who prefer (just for example) to be able to adjust the text width by adjusting their window width.

Until I pointed this out tens of times, this iteration would usually start with web ragers telling me that "studies show" that narrower text width is objectively better, but on reading every study that exists on the topic that I could find, I didn't find this to be the case. Moreover, on asking for citations, it's clear that people saying this generally hadn't read any studies on this at all and would sometimes hastily send me a study that they did not seem to have read. When I'd point this out, people would then change their argument to how studies can't really describe the issue (odd that they'd cite studies in the first place), although one person cited a book to me (which I read and they, apparently, had not since it also didn't support their argument) and then move to how this is what everyone wants, even though that's clearly not the case, both from the comments I've gotten as well as the data I have from when I made the change.

Web ragers who have this line of reasoning generally can't seem to absorb the information that their preferences are not universal and will insist that they regardless of what people say they like, which I find fairly interesting. On the data, when I switched from Octopress styling (at the time, the most popular styling for programming bloggers) to the current styling, I got what appeared to be a causal increase in traffic and engagement, so it appears that not only do people who write me appreciation mail about the styling like the styling, the overall feeling of people who don't write to me appears to be that the site is fine and apparently more appealing than standard programmer blog styling. When I've noted this, people tend to become become further invested in the idea that their preferences are universal and that people who think they have other preferences are wrong and reply with total nonsense.

For me, two questions I'm curious about are why do people feel the need to fabricate evidence on this topic (referring to studies when they haven't read any, googling for studies and then linking to one that says the opposite of what they claim it says, presumably because they didn't really read it, etc.) in order to claim that there are "objective" reasons their preferences are universal or correct, and why are people so much more incensed by this than by the global accessibility problems caused by typical web design? On the latter, I suspect if you polled people with an abstract survey, they would rate global accessibility to be a larger problem, but by revealed preference both in terms of what people create as well as what irritates them enough to send hate mail, we can see that having fully-adjustable line width and not capping line width at their preferred length is important to do something about whereas global accessibility is not. As noted above, people who run sites that aren't accessible due to performance problems generally get little to no hate mail about this. And when I use a default Octopress install, I got zero hate mail about this. Fewer people read my site at the time, but my traffic volume hasn't increased by a huge amount since then and the amount of hate mail I get about my site design has gone from zero to a fair amount, an infinitely higher ratio than the increase in traffic.

To be clear, I certainly wouldn't claim that the design on this site is optimal. I just removed the CSS from the most popular blogging platform for programmers at the time because that CSS seemed objectively bad for people with low-end connections and, as a side effect, got more traffic and engagement overall, not just from locations where people tend to have lower end connections and devices. No doubt a designer who cares about users on low-end connections and devices could do better, but there's something quite odd about both the untruthfulness and the vitriol of comments on this.


  1. This estimate puts backwards-looking life expectancy in the low 60s; that paper also discusses other estimates in the mid 60s and discusses biases in the estimates. [return]