Why is it so hard to buy things that work well?

There's a cocktail party version of the efficient markets hypothesis I frequently hear that's basically, "markets enforce efficiency, so it's not possible that a company can have some major inefficiency and survive". We've previously discussed Marc Andreessen's quote that tech hiring can't be inefficient here and here:

Let's launch right into it. I think the critique that Silicon Valley companies are deliberately, systematically discriminatory is incorrect, and there are two reasons to believe that that's the case. ... No. 2, our companies are desperate for talent. Desperate. Our companies are dying for talent. They're like lying on the beach gasping because they can't get enough talented people in for these jobs. The motivation to go find talent wherever it is unbelievably high.

Variants of this idea that I frequently hear engineers and VCs repeat involve companies being efficient and/or products being basically as good as possible because, if it were possible for them to be better, someone would've outcompeted them and done it already¹.

There's a vague plausibility to that kind of statement, which is why it's a debate I've often heard come up in casual conversation, where one person will point out some obvious company inefficiency or product error and someone else will respond that, if it's so obvious, someone at the company would have fixed the issue or another company would've come along and won based on being more efficient or better. Talking purely abstractly, it's hard to settle the debate, but things are clearer if we look at some specifics, as in the two examples above about hiring, where we can observe that, whatever abstract arguments people make, inefficiencies persisted for decades.

When it comes to buying products and services, at a personal level, most people I know who've checked the work of people they've hired for things like home renovation or accounting have found grievous errors in the work. Although it's possible to find people who don't do shoddy work, it's generally difficult for someone who isn't an expert in the field to determine if someone is going to do shoddy work in the field. You can try to get better quality by paying more, but once you get out of the very bottom end of the market, it's frequently unclear how to trade money for quality, e.g., my friends and colleagues who've gone with large, brand name, accounting firms have paid much more than people who go with small, local, accountants and gotten a higher error rate; as a strategy, trying expensive local accountants hasn't really fared much better. The good accountants are typically somewhat expensive, but they're generally not charging the highest rates and only a small percentage of somewhat expensive accountants are good.

More generally, in many markets, consumers are uninformed and it's fairly difficult to figure out which products are even half decent, let alone good. When people happen to choose a product or service that's right for them, it's often for the wrong reasons. For example, in my social circles, there have been two waves of people migrating from iPhones to Android phones over the past few years. Both waves happened due to Apple PR snafus which caused a lot of people to think that iPhones were terrible at something when, in fact, they were better at that thing than Android phones. Luckily, iPhones aren't strictly superior to Android phones and many people who switched got a device that was better for them because they were previously using an iPhone due to good Apple PR, causing their errors to cancel out. But, when people are mostly making decisions off of marketing and PR and don't have access to good information, there's no particular reason to think that a product being generally better or even strictly superior will result in that winning and the worse product losing. In capital markets, we don't need all that many informed participants to think that some form of the efficient market hypothesis holds ensuring "prices reflect all available information". It's a truism that published results about market inefficiencies stop being true the moment they're published because people exploit the inefficiency until it disappears. But with the job market examples, even though firms can take advantage of mispriced labor, as Greenspan famously did before becoming Chairman of the fed, inefficiencies can persist:

Townsend-Greenspan was unusual for an economics firm in that the men worked for the women (we had about twenty-five employees in all). My hiring of women economists was not motivated by women's liberation. It just made great business sense. I valued men and women equally, and found that because other employers did not, good women economists were less expensive than men. Hiring women . . . gave Townsend-Greenspan higher-quality work for the same money . . .

But as we also saw, individual firms exploiting mispriced labor have a limited demand for labor and inefficiencies can persist for decades because the firms that are acting on "all available information" don't buy enough labor to move the price of mispriced people to where it would be if most or all firms were acting rationally.

In the abstract, it seems that, with products and services, inefficiencies should also be able to persist for a long time since, similarly, there also isn't a mechanism that allows actors in the system to exploit the inefficiency in a way that directly converts money into more money, and sometimes there isn't really even a mechanism to make almost any money at all. For example, if you observe that it's silly for people to move from iPhones to Android phones because they think that Apple is engaging in nefarious planned obsolescence when Android devices generally become obsolete more quickly, due to a combination of iPhones getting updates for longer and iPhones being faster at every price point they compete at, allowing the phone to be used on bloated sites for longer, you can't really make money off of this observation. This is unlike a mispriced asset that you can buy derivatives of to make money (in expectation).

A common suggestion to the problem of not knowing what product or service is good is to ask an expert in the field or a credentialed person, but this often fails as well. For example, a friend of mine had trouble sleeping because his window air conditioner was loud and would wake him up when it turned on. He asked a trusted friend of his who works on air conditioners if this could be improved by getting a newer air conditioner and his friend said "no; air conditioners are basically all the same". But any consumer who's compared items with motors in them would immediately know that this is false. Engineers have gotten much better at producing quieter devices when holding power and cost constant. My friend eventually bought a newer, quieter, air conditioner, which solved his sleep problem, but he had the problem for longer than he needed to because he assumed that someone whose job it is to work on air conditioners would give him non-terrible advice about air conditioners. If my friend were an expert on air conditioners or had compared the noise levels of otherwise comparable consumer products over time, he could've figured out that he shouldn't trust his friend, but if he had that level of expertise, he wouldn't have needed advice in the first place.

So far, we've looked at the difficulty of getting the right product or service at a personal level, but this problem also exists at the firm level and is often worse because the markets tend to be thinner, with fewer products available as well as opaque, "call us" pricing. Some commonly repeated advice is that firms should focus on their "core competencies" and outsource everything else (e.g., Joel Spolsky, Gene Kim, Will Larson, Camille Fournier, etc., all say this), but if we look mid-sized tech companies, we can see that they often need to have in-house expertise that's far outside what anyone would consider their core competency unless, e.g., every social media company has kernel expertise as a core competency. In principle, firms can outsource this kind of work, but people I know who've relied on outsourcing, e.g., kernel expertise to consultants or application engineers on a support contract, have been very unhappy with the results compared to what they can get by hiring dedicated engineers, both in absolute terms (support frequently doesn't come up with a satisfactory resolution in weeks or months, even when it's one a good engineer could solve in days) and for the money (despite engineers being expensive, large support contracts can often cost more than an engineer while delivering worse service than an engineer).

This problem exists not only for support but also for products a company could buy instead of build. For example, Ben Kuhn, the CTO of Wave, has a Twitter thread about some of the issues we've run into at Wave, with a couple of followups. Ben now believes that one of the big mistakes he made as CTO was not putting much more effort into vendor selection, even when the decision appeared to be a slam dunk, and more strongly considering moving many systems to custom in-house versions sooner. Even after selecting the consensus best product in the space from the leading (as in largest and most respected) firm, and using the main offering the company has, the product often not only doesn't work but, by design, can't work.

For example, we tried "buy" instead of "build" for a product that syncs data from Postgres to Snowflake. Syncing from Postrgres is the main offering (as in the offering with the most customers) from a leading data sync company, and we found that it would lose data, duplicate data, and corrupt data. After digging into it, it turns out that the product has a design that, among other issues, relies on the data source being able to seek backwards on its changelog. But Postgres throws changelogs away once they're consumed, so the Postgres data source can't support this operation. When their product attempts to do this and the operation fails, we end up with the sync getting "stuck", needing manual intervention from the vendor's operator and/or data loss. Since our data is still on Postgres, it's possible to recover from this by doing a full resync, but the data sync product tops out at 5MB/s for reasons that appear to be unknown to them, so a full resync can take days even on databases that aren't all that large. Resyncs will also silently drop and corrupt data, so multiple cycles of full resyncs followed by data integrity checks are sometimes necessary to recover from data corruption, which can take weeks. Despite being widely recommended and the leading product in the space, the product has a number of major design flaws that mean that it literally cannot work.

This isn't so different from Mongo or other products that had fundamental design flaws that caused severe data loss, with the main difference being that, in most areas, there isn't a Kyle Kingsbury who spends years publishing tests on various products in the field, patiently responding to bogus claims about correctness until the PR backlash caused companies in the field to start taking correctness seriously. Without that pressure, most software products basically don't work, hence the Twitter threads from Ben, above, where he notes that the "buy" solutions you might want to choose mostly don't work². Of course, at our scale, there are many things we're not going to build any time soon, like CPUs, but, for many things where the received wisdom is to "buy", "build" seems like a reasonable option. This is even true for larger companies and building CPUs. Fifteen years ago, high-performance (as in, non-embedded level of performance) CPUs were a canonical example of something it would be considered bonkers to build in-house, absurd for even the largest software companies, but Apple and Amazon have been able to produce best-in-class CPUs on the dimensions they're optimizing for, for predictable reasons³.

This isn't just an issue that impacts tech companies; we see this across many different industries. For example, any company that wants to mail items to customers has to either implement shipping themselves or deal with the fallout of having unreliable shipping. As a user, whether or not packages get shipped to you depends a lot on where you live and what kind of building you live in.

When I've lived in a house, packages have usually arrived regardless of the shipper (although they've often arrived late). But, since moving into apartment buildings, some buildings just don't get deliveries from certain delivery services. Once, I lived in a building where the postal service didn't deliver mail properly and I didn't get a lot of mail (although I frequently got mail addressed to other people in the building as well as people elsewhere). More commonly, UPS and Fedex usually won't attempt to deliver and will just put a bunch of notices up on the building door for all the packages they didn't deliver, where the notice falsely indicates that the person wasn't home and correctly indicates that, to get the package, the person has to go to some pick-up location to get the package.

For a while, I lived in a city where Amazon used 3rd-party commercial courier services to do last-mile shipping for same-day delivery. The services they used were famous for marking things as delivered without delivering the item for days, making "same day" shipping slower than next day or even two day shipping. Once, I naively contacted Amazon support because my package had been marked as delivered but wasn't delivered. Support, using a standard script supplied to them by Amazon, told me that I should contact them again three days after the package was marked as delivered because couriers often mark packages as delivered without delivering them, but they often deliver the package within a few days. Amazon knew that the courier service they were using didn't really even try to deliver packages⁴ promptly and the only short-term mitigation available to them was to tell support to tell people that they shouldn't expect that packages have arrived when they've been marked as delivered.

Amazon eventually solved this problem by having their own delivery people or using, by commercial shipping standards, an extremely expensive service (Apple has done for same-day delivery)⁵. At scale, there's no commercial service you can pay for that will reliably attempt to deliver packages. If you want a service that actually works, you're generally on the hook for building it yourself, just like in the software world. My local grocery store tried to outsource this to DoorDash. I've tried delivery 3 times from my grocery store and my groceries have showed up 2 out of 3 times, which is well below what most people would consider an acceptable hit rate for grocery delivery. Having to build instead of buy to get reliability is a huge drag on productivity, especially for smaller companies (e.g., it's not possible for small shops that want to compete with Amazon and mail products to customers to have reliable delivery since they can't build out their own delivery service).

The amount of waste generated by the inability to farm out services is staggering and I've seen it everywhere I've worked. An example from another industry: when I worked at a small chip startup, we had in-house capability to do end-to-end chip processing (with the exception of having its own fabs), which is unusual for a small chip startup. When the first wafer of a new design came off of a fab, we'd have the wafer flown to us on a flight, at which point someone would use a wafer saw to cut the wafer into individual chips so we could start testing ASAP. This was often considered absurd in the same way that it would be considered absurd for a small software startup to manage its own on-prem hardware. After all, the wafer saw and the expertise necessary to go from a wafer to a working chip will be idle over 99% of the time. Having full-time equipment and expertise that you use less than 1% of the time is a classic example of the kind of thing you should outsource, but if you price out having people competent to do this plus having the equipment available to do it, even at fairly low volumes, it's cheaper to do it in-house even if the equipment and expertise for it are idle 99% of the time. More importantly, you'll get much better service (faster turnaround) in house, letting you ship at a higher cadence. I've both worked at companies that have tried to contract this kind of thing out as well as talked with many people who've done that and you get slower, less reliable, service at a higher cost.

Likewise with chip software tooling; despite it being standard to outsource tooling to large EDA vendors, we got a lot of mileage out using our own custom tools, generally created or maintained by one person, e.g., while I was there, most simulator cycles were run on a custom simulator that was maintained by one person, which saved millions a year in simulator costs (standard pricing for a simulator at the time was a few thousand dollars per license per year and we had a farm of about a thousand simulation machines). You might think that, if a single person can create or maintain a tool that's worth millions of dollars a year to the company, our competitors would do the same thing, just like you might think that if you can ship faster and at a lower cost by hiring a person who knows how to crack a wafer open, our competitors would do that, but they mostly didn't.

Joel Spolsky has an old post where he says:

“Find the dependencies — and eliminate them.” When you're working on a really, really good team with great programmers, everybody else's code, frankly, is bug-infested garbage, and nobody else knows how to ship on time.

We had a similar attitude, although I'd say that we were a bit more humble. We didn't think that everyone else was producing garbage but, we also didn't assume that we couldn't produce something comparable to what we could buy for a tenth of the cost. From talking to folks at some competitors, there was a pretty big cultural difference between how we operated and how they operated. It simply didn't occur to them that they didn't have to buy into the standard American business logic that you should focus on your core competencies, that you can think through whether or not it makes sense to do something in-house on the merits of the particular thing instead of outsourcing your thinking to a pithy saying.

I once watched, from the inside, a company undergo this cultural shift. A few people in leadership decided that the company should focus on its core competencies, which meant abandoning custom software for infrastructure. This resulted in quite a few large migrations from custom internal software to SaaS solutions and open source software. If you watched the discussions on "why" various projects should or shouldn't migrate, there were a few unusually unreasonable people who tried to reason through particular cases on the merits of each case (in a post on pushing back against orders from the top, Yossi Kreinin calls these people insane employees; I'm going to refer to the same concept in this post, but instead call people who do this unusually unreasonable). But, for the most part, people bought the party line and pushed for a migration regardless of the specifics.

The thing that I thought was interesting was that leadership didn't tell particular teams they had to migrate and there weren't really negative consequences for teams where an "unusually unreasonable person" pushed back in order to keep running an existing system for reasonable reasons. Instead, people mostly bought into the idea and tried to justify migrations for vaguely plausible sounding reasons that weren't connected to reality, resulting in funny outcomes like moving to an open source system "to save money" when the new system was quite obviously less efficient⁶ and, predictably, required much higher capex and opex. The cost savings was supposed to come from shrinking the team, but the increase in operational cost dominated the change in the cost of the team and the complexity of operating the system meant that the team size increased instead of decreasing. There were a number of cases where it really did make sense to migrate, but the stated reasons for migration tended to be unrelated or weakly related to the reasons it actually made sense to migrate. Once people absorbed the idea that the company should focus on core competencies, the migrations were driven by the cultural idea and not any technical reasons.

The pervasiveness of decisions like the above, technical decisions made without serious technical consideration, is a major reason that the selection pressure on companies to make good products is so weak. There is some pressure, but it's noisy enough that successful companies often route around making a product that works, like in the Mongo example from above, where Mongo's decision to loudly repeat demonstrably bogus performance claims and making demonstrably false correctness claims was, from a business standpoint, superior to focusing on actual correctness and performance; by focusing their resources where it mattered for the business, they managed to outcompete companies that made the mistake of devoting serious resources to performance and correctness.

Yossi's post about how an unusually unreasonable person can have outsized impact in a dimension they value at their firm also applies to impact outside of a firm. Kyle Kingsbury, mentioned above, is an example of this. At the rates that I've heard Jepsen is charging now, Kyle can bring in what a senior developer at BigCo does (actually senior, not someone with the title "senior"), but that was after years of working long hours at below market rates on an uncertain endeavour, refuting FUD from his critics (if you read the replies to the linked posts or, worse yet, the actual tickets where he's involved in discussions with developers, the replies to Kyle were a constant stream of nonsense for many years, including people working for vendors feeling like he has it out for them in particular, casting aspersions on his character⁷, and generally trashing him). I have a deep respect for people who are willing to push on issues like this despite the system being aligned against them but, my respect notwithstanding, basically no one is going to do that. A system that requires someone like Kyle to take a stand before successful firms will put effort into correctness instead of correctness marketing is going to produce a lot of products that are good at marketing correctness without really having decent correctness properties (such as the data sync product mentioned in this post, whose website repeatedly mentions how reliable and safe the syncing product is despite having a design that is fundamentally broken).

It's also true at the firm level that it often takes an unusually unreasonable firm to produce a really great product instead of just one that's marketed as great, e.g., Volvo, the one car manufacturer that seemed to try to produce a level of structural safety beyond what could be demonstrated by IIHS tests fared so poorly as a business that it's been forced to move upmarket and became a niche, luxury, automaker since safety isn't something consumers are really interested in despite car accidents being a leading cause of death and a significant source of life expectancy loss. And it's not clear that Volvo will be able to persist in being an unreasonable firm since they weren't able to survive as an independent automaker. When Ford acquired Volvo, Ford started moving Volvos to the shared Ford C1 platform, which didn't fare particularly well in crash tests. Since Geely has acquired Volvo, it's too early to tell for sure if they'll maintain Volvo's commitment to designing for real-world crash data and not just crash data that gets reported in benchmarks. If Geely declines to continue Volvo's commitment to structural safety, it may not be possible to buy a modern car that's designed to be safe.

Most markets are like this, except that there was never an unreasonable firm like Volvo in the first place. On unreasonable employees, Yossi says

Who can, and sometimes does, un-rot the fish from the bottom? An insane employee. Someone who finds the forks, crashes, etc. a personal offence, and will repeatedly risk annoying management by fighting to stop these things. Especially someone who spends their own political capital, hard earned doing things management truly values, on doing work they don't truly value – such a person can keep fighting for a long time. Some people manage to make a career out of it by persisting until management truly changes their mind and rewards them. Whatever the odds of that, the average person cannot comprehend the motivation of someone attempting such a feat.

It's rare that people are willing to expend a significant amount of personal capital to do the right thing, whatever that means to someone, but it's even rarer that the leadership of a firm will make that choice and spend down the firm's capital to do the right thing.

Economists have a term for cases where information asymmetry means that buyers can't tell the difference between good products and "lemons", "a market for lemons", like the car market (where the term lemons comes from), or both sides of the hiring market. In economic discourse, there's a debate over whether cars are a market for lemons at all for a variety of reasons (lemon laws, which allow people to return bad cars, don't appear to have changed how the market operates, very few modern cars are lemons when that's defined as a vehicle with serious reliability problems, etc.). But looking at whether or not people occasionally buy a defective car is missing the forest for the trees. There's maybe one car manufacturer that really seriously tries to make a structurally safe car beyond what standards bodies test (and word on the street is that they skimp on the increasingly important software testing side of things) because consumers can't tell the difference between a more or less safe car beyond the level a few standards bodies test to. That's a market for lemons, as is nearly every other consumer and B2B market.

Appendix: culture

Something I find interesting about American society is how many people think that someone who gets the raw end of a deal because they failed to protect themselves against every contingency "deserves" what happened (orgs that want to be highly effective often avoid this by having a "blameless" culture, but very few people have exposure to such a culture).

Some places I've seen this recently:

Person had a laptop stolen in a cafe; blamed for not keeping their eye on the laptop the entire time since no reasonable person would ever let their eyes off of any belongings for 10 seconds as they turned their head to briefly chat with someone
Person posted a PSA that they were caught out by a change in the terms of service of a company and other people should be aware of the same thing, people said that the person caught out was dumb for not reading every word of every terms of service update they're sent
(many times, on r/idiotsincars): person gets in an accident that would've been difficult or impossible to reasonably avoid and people tell the person they're a terrible driver for not having avoided the accident
- At least once, the person did a frame-by-frame analysis that showed that they reacted to, within one frame of latency, as fast as humanly possible, and was still told they should've avoided the accident
- Often, people will say things like "I would never get into that situation in the first place", which, in the circumstance where someone is driving past a parked car, results in absurd statements like "I would never pass a vehicle at more than 10mph", as if the person making the comment slows down to 10mph on every street that has parked or stopped cars on it.
Person griped on flyertalk forum that Google maps instructions are unclear if you're not a robot (e.g., "turn right in 500 meters", which could be one of multiple intersections) and people responded with things like "I never go anywhere without being completely familiar with the route" and that you should map out all of your driving beforehand, just like you would for a road trip with a paper map in 1992 (this was used as a justification for the reasonableness of mapping out all travel beforehand – I did it back then and anyone who isn't dumb would do it now)
- People with those kinds of negative responses were highly upvoted; no one suggested switching to Apple Maps, which gives clear, landmark based directions like "go through the light and then take the next right"

If you read these kinds of discussions, you'll often see people claiming "that's just how the world is" and going further and saying that there is no other way the world could be, so anyone who isn't prepared for that is an idiot.

Going back to the laptop theft example, anyone who's traveled, or even read about other cultures, can observe that the things that North Americans think are basically immutable consequences of a large-scale society are arbitrary. For example, if you leave your bag and laptop on a table at a cafe in Korea and come back hours later, the bag and laptop are overwhelmingly likely to be there I've heard this is true in Japan as well. While it's rude to take up a table like that, you're not likely to have your bag and laptop stolen.

And, in fact, if you tweak the context slightly, this is basically true in America. It's not much harder to walk into an empty house and steal things out of the house (it's fairly easy to learn how to pick locks and even easier to just break a window) than it is to steal things out of a cafe. And yet, in most neighbourhoods in America, people are rarely burglarized and when someone posts about being burglarized, they're not excoriated for being a moron for not having kept an eye on their house. Instead, people are mostly sympathetic. It's considered normal to have unattended property stolen in public spaces and not in private spaces, but that's more of a cultural distinction than a technical distinction.

There's a related set of stories Avery Pennarun tells about the culture shock of being an American in Korea. One of them is about some online ordering service you can use that's sort of like Amazon. With Amazon, when you order something, you get a box with multiple bar/QR/other codes on it and, when you open it up, there's another box inside that has at least one other code on it. Of course the other box needs the barcode because it's being shipped through some facility at-scale where no one knows what the box is or where it needs to go and the inner box also had to go through some other kind of process and it also needs to be able to be scanned by a checkout machine if the item is sold at a retailer. Inside the inner box is the item. If you need to return the item, you put the item back into its barcoded box and then put that box into the shipping box and then slap another barcode onto the shipping box and then mail it out.

So, in Korea, there's some service like Amazon where you can order an item and, an hour or two later, you'll hear a knock at your door. When you get to the door, you'll see an unlabeled box or bag and the item is in the unlabeled container. If you want to return the item, you "tell" the app that you want to return the item, put it back into its container, put it in front of your door, and they'll take it back. After seeing this shipping setup, which is wildly different from what you see in the U.S., he asked someone "how is it possible that they don't lose track of which box is which?". The answer he got was, "why would they lose track of which box is which?". His other stories have a similar feel, where he describes something quite alien, asks a local how things can work in this alien way, who can't imagine things working any other way and response with "why would X not work?"

As with the laptop in cafe example, a lot of Avery's stories come down to how there are completely different shared cultural expectations around how people and organizations can work.

Another example of this is with covid. Many of my friends have spent most of the last couple of years in Asian countries like Vietnam or Taiwan, which have had much lower covid rates, so much so that they were barely locked down at all. My friends in those countries were basically able to live normal lives, as if covid didn't exist at all (at least until the latest variants, at which point they were vaccinated and at relatively low risk for the most serious outcomes), while taking basically zero risk of getting covid.

In most western countries, initial public opinion among many people was that locking down was pointless and there was nothing we could do to prevent an explosion of covid. Multiple engineers I know, who understand exponential growth and knew what the implications were, continued normal activities before lockdown and got and (probably) spread covid. When lockdowns were implemented, there was tremendous pressure to lift them as early as possible, resulting in something resembling the "adaptive response" diagram from this post. Since then, many people (I have a project tallying up public opinion on this that I'm not sure I'll ever prioritize enough to complete) have changed their opinion to "having ever locked down was stupid, we were always going to end up with endemic covid, all of this economic damage was pointless". If we look at in-person retail sales data or restaurant data, we can easily see that many people were voluntarily limiting their activities before and after lockdowns in the first year or so of the pandemic when the virus was in broad circulation.

Meanwhile, in some Asian countries, like Taiwan and Vietnam, people mostly complied with lockdowns when they were instituted, which means that they were able to squash covid in the country when outbreaks happened until relatively recently, when covid mutated into forms that spread much more easily and people's tolerance for covid risk went way up due to vaccinations. Of course, covid kept getting reintroduced into countries that were able to squash it because other countries were not, in large part due to the self-fulfilling belief that it would be impossible to squash covid.

Coming back to when it makes sense to bring something in-house, even in cases where it superficially sounds like it shouldn't, because the expertise is 99% idle or a single person would have to be able to build software that a single firm would pay millions of dollars a year for, much of this comes down to whether or not you're in a culture where you can trust another firm's promise. If you operate in a society where it's expected that other firms will push you to the letter of the law with respect to whatever contract you've negotiated, it's frequently not worth the effort to negotiate a contract that would give you service even one half as good as you'd get from someone in house. If you look at how these contracts end up being worded, companies often try to sneak in terms that make the contract meaningless, and even when you managed to stamp out all of that, legally enforcing the contract is expensive and, in the cases I know of where companies regularly violated their agreement for their support SLA (just for example), the resolution was to terminate the contract rather than pursue legal action because the cost of legal action wouldn't be worth anything that could be gained.

If you can't trust other firms, you frequently don't have a choice with respect to bringing things in house if you want them to work.

Although this is really a topic for another post, I'll note that lack of trust that exists across companies can also hamstring companies when it exists internally. As we discussed previously, a lot of larger scale brokenness also comes out of the cultural expectations within organizations. A specific example of this that leads to pervasive organizational problems is lack of trust within the organization. For example, a while back, I was griping to a director that a VP broke a promise and that we were losing a lot of people for similar reasons. The director's response was "there's no way the VP made a promise". When I asked for clarification, the clarification was "unless you get it in a contract, it wasn't a promise", i.e., the rate at which VPs at the company lie is high enough that a verbal commitment from a VP is worthless; only a legally binding commitment that allows you to take them to court has any meaning.

Of course, that's absurd, in that no one could operate at a BigCo while going around and asking for contracts for all their promises since they'd immediately be considered some kind of hyperbureaucratic weirdo. But, let's take the spirit of the comment seriously, that only trust people close to you. That's good advice in the company I worked for but, unfortunately for the company, the implications are similar to the inter-firm example, where we noted that a norm where you need to litigate the letter of the law is expensive enough that firms often bring expertise in house to avoid having to deal with the details. In the intra-firm case and you'll often see teams and orgs "empire build" because they know they, at least the management level, they can't trust anyone outside their fiefdom.

While this intra-firm lack of trust tends to be less costly than the inter-firm lack of trust since there are better levers to get action on an organization that's the cause of a major blocker, it's still fairly costly. Virtually all of the VPs and BigCo tech execs I've talked to are so steeped in the culture they're embedded in that they can't conceive of an alternative, but there isn't an inherent reason that organizations have to work like that. I've worked at two companies where people actually trust leadership and leadership does generally follow through on commitments even when you can't take them to court, including my current employer, Wave. But, at the other companies, the shared expectation that leadership cannot and should not be trusted "causes" the people who end up in leadership roles to be untrustworthy, which results in the inefficiencies we've just discussed.

People often think that having a high degree of internal distrust is inevitable as a company scales, but people I've talked to who were in upper management or fairly close to the top of Intel and Google said that the companies had an extended time period where leadership enforced trustworthiness and that stamping out dishonesty and "bad politics" was a major reason the company was so successful, under Andy Grove and Eric Schmidt, respectively. When the person at the top changed and a new person who didn't enforce honesty came in, the standard cultural norms that you see at the upper levels of most big companies seeped in, but that wasn't inevitable.

When I talk to people who haven't been exposed to BigCo leadership culture and haven't seen how decisions are actually made, they often find the decision making processes to be unbelievable in much the same way that people who are steeped in BigCo leadership culture find the idea that a large company could operate any other way to be unbelievable.

It's often difficult to see how absurd a system is from the inside. Another perspective on this is that Americans often find Japanese universities and the work practices of Japanese engineering firms absurd, though often not as absurd as the promotion policies in Korean chaebols, which are famously nepotistic, e.g., Chung Mong-yong is the CEO of Hyundai Sungwoo because he's the son of Chung Soon-yung, who was the head of Hyundai Sungwoo because he was the younger brother of Chung Ju-yung, the founder of Hyundai Group (essentially the top-level Hyundai corporation), etc. But Japanese and Korean engineering firms are not, in general, less efficient than American engineering firms outside of the software industry despite practices that seem absurdly inefficient to American eyes. American firms didn't lose their dominance in multiple industries while being more efficient; if anything, market inefficiencies allowed them to hang on to marketshare much longer than you would naively expect if you just looked at the technical merit of their products.

There are offsetting inefficiencies in American firms that are just as absurd as effectively having familiar succession of company leadership in Korean chaebols. It's just that the inefficiencies that come out of American cultural practices seem to be immutable facts about the world to people inside the system. But when you look at firms that have completely different cultures, it becomes clear that cultural norms aren't a law of nature.

Appendix: downsides of build

Of course, building instead of buying isn't a panacea. I've frequently seen internal designs that are just as broken as the data sync product described in this post. In general, when you see a design like that, a decent number of people explained why the design can never work during the design phase and were ignored. Although "build" gives you a lot more control than "buy" and gives you better odds of a product that works because you can influence the design, a dysfunctional team in a dysfunctional org can quite easily make products that don't work.

There's a Steve Jobs quote that's about companies that also applies to teams:

It turns out the same thing can happen in technology companies that get monopolies, like IBM or Xerox. If you were a product person at IBM or Xerox, so you make a better copier or computer. So what? When you have monopoly market share, the company's not any more successful.

So the people that can make the company more successful are sales and marketing people, and they end up running the companies. And the product people get driven out of the decision making forums, and the companies forget what it means to make great products. The product sensibility and the product genius that brought them to that monopolistic position gets rotted out by people running these companies that have no conception of a good product versus a bad product.

They have no conception of the craftsmanship that's required to take a good idea and turn it into a good product. And they really have no feeling in their hearts, usually, about wanting to really help the customers."

For "efficiency" reasons, some large companies try to avoid duplicate effort and kill projects if they seem too similar to another project, giving the team that owns the canonical verison of a product a monopoly. If the company doesn't have a culture of trying to do the right thing, this has the same problems that Steve Jobs discusses, but at the team and org level instead of the company level.

The workaround a team I was on used was to basically re-implement a parallel stack of things we relied on that didn't work. But this was only possible beacuse leadership didn't enforce basically anything. Ironically, this was despite their best efforts — leadership made a number of major attempts to impose top-down control, but they didn't understand how to influence an organization, so the attempts failed. Had leadership been successful, the company would've been significantly worse off. There are upsides to effective top-down direction when leadership has good plans, but that wasn't really on the table, so it's actually better that leadership didn't know how to execute.

Thanks to Fabian Giesen, Yossi Kreinen, Peter Bhat Harkins, Ben Kuhn, Laurie Tratt, John Hergenroeder, Tao L., @softminus, Justin Blank, @deadalnix, Dan Lew, @ollyrobot, Sophia Wisdom, Elizabeth Van Nostrand, Kevin Downey, and @PapuaHardyNet for comments/corrections/discussion.

To some, that position is so absurd that it's not believable that anyone would hold that position (in response to my first post that featured the Andreessen quote, above, a number of people told me that it was an exaggerated straw man, which is impossible for a quote, let alone one that sums up a position I've heard quite a few times), but to others, it's an immutable fact about the world. ^[return]
On the flip side, if we think about things from the vendor side of things, there's little incentive to produce working products since the combination of the fog of war plus making false claims about a product working seems to be roughly as good as making a working product (at least until someone like Kyle Kingsbury comes along, which never happens in most industries), and it's much cheaper.

And, as Fabian Giesen points out, when vendors actually want to produce good or working products, the fog of war also makes that difficult:

But producers have a dual problem, which is that all the signal you get from consumers is sporadic, infrequent and highly selected direct communication, as well as a continuous signal of how sales look over time, which is in general very hard to map back to why sales went up or down.

You hear directly from people who are either very unhappy or very happy, and you might hear second-hand info from your salespeople, but often that's pure noise. E.g. with RAD products over the years a few times we had a prospective customer say, "well we would license it but we really need X" and we didn't have X. And if we heard that 2 or 3 times from different customers, we'd implement X and get back to them a few months later. More often than not, they'd then ask for Y next, and it would become clear over time that they just didn't want to license for some other reason and saying "we need X, it's a deal-breaker for us" for a couple choices of X was just how they chose to get out of the eval without sounding rude or whatever.

In my experience that's a pretty thorny problem in general, once you spin something out or buy something you're crossing org boundaries and lose most of the ways you otherwise have to cut through the BS and figure out what's actually going on. And whatever communication does happen is often forced to go through a very noisy, low-bandwidth, low-fidelity, high-latency channel.

^[return]
Note that even though it was somewhat predictable that a CPU design team at Apple or Amazon that was well funded had a good chance of being able to produce a best-in-class CPU (e.g., see this 2013 comment about the effectiveness of Apple's team and this 2015 comment about other mobile vendors) that would be a major advantage for their firm, this doesn't mean that the same team should've been expected to succeed if they tried to make a standalone business. In fact, Apple was able to buy their core team cheaply because the team, after many years at DEC and then successfully founding SiByte, founded PA Semi, which basically failed as a business. Similarly, Amazon's big silicon initial hires were from Annapurna (also a failed business that was up for sale because it couldn't survive independently) and Smooth Stone (a startup that failed so badly that it didn't even need to be acquired and people could be picked up individually). Even when there's an obvious market opportunity, factors like network effects, high fixed costs, up front capital expenditures, the ability of incumbent players to use market power to suppress new competitors, etc., can and often does prevent anyone from taking the opportunity. Even though we can now clearly see that there were large opportunities available for the taking, there's every reason to believe that, based on the fates of many other CPU startups to date, an independent startup that attempt to implement the same ideas wouldn't have been nearly a successful and most likely have gone bankrupt or taken a low offer relative to the company's value due to the company's poor business prospects.

Also, before Amazon started shipping ARM server chips, the most promising ARM server chip, which had pre-orders from at least one major tech company, was killed because it was on the wrong side of an internal political battle.

The chip situation isn't so different from the motivating example we looked at in our last post, baseball scouting, where many people observed that baseball teams were ignoring simple statistics they could use to their advantage. But, none of the people observing that were in a position to run a baseball team for decades, allowing the market opportunity to persist for decades.
^[return]
Something that amuses me is how some package delivery services appear to apply relatively little effort to make sure that someone even made an attempt to delivery the package. When packages are marked delivered, there's generally a note about how it was delivered, which is frequently quite obviously wrong for the building, e.g., "left with receptionist" for a building with no receptionist or "left on porch" for an office building with no porch and a receptionist who was there during the alleged delivery time. You could imagine services would, like Amazon, request a photo along with "proof of delivery" or perhaps use GPS to check that the driver was plausibly at least in the same neighborhood as the building at the time of delivery, but they generally don't seem to do that?

I'd guess that a lot of the fake deliveries come from having some kind of quota, one that's difficult or impossible to achieve, combined with weak attempts at verifying that a delivery was done or even attempted.
^[return]
When I say they solved it, I mean that Amazon delivery drivers actually try to deliver the package maybe 95% of the time to the apartment buildings I've lived in, vs. about 25% for UPS and Fedex and much lower for USPS and Canada Post, if we're talking about big packages and not letters. ^[return]
Very fittingly for this post, I saw an external discussion on this exact thing where someone commented that it must've been quite expensive for the company to switch to the new system due to its known inefficiencies.

In true cocktail party efficient markets hypothesis form, an internet commenter replied that the company wouldn't have done it if it was inefficient and therefore it must not have been as inefficient as the first commenter thought.

I suspect I spent more time looking at software TCO than anyone else at the company and the system under discussion was notable for having one of the largest increases in cost of any system at the company without a concomitant increase in load. Unfortunately, the assumption that competition results in good internal decisions is just as false as the assumption that competition results in good external decisions.
^[return]
Note that if you click the link but don't click through to the main article, the person defending Kyle made the original quote seem more benign than it really is out of politeness because he elided the bit where the former Redis developer advocate (now "VP of community" for Zig) said that Jespen is "ultimately not that different from other tech companies, and thus well deserving of boogers and cum". ^[return]