Why Hardware Development Is Hard, Part 2: The Physical World Is Unforgiving

Facebook. Twitter. Snapchat. Pinterest. Every day, you hear about another successful startup by kids who are just out of school, or even still in school. Hardware is different. You never hear about a new team successfully making a high-performance microprocessor.

Sure, PA Semi had a moderately successful exit, but where did that team come from? They were the SiByte team, which left after SiByte was acquired by Broadcom, and SiByte was composed of key people from DEC who had been working together for over a decade. My old company was similar: an IBM fellow collected the best people he worked with at IBM, became CTO at Dell (back when Dell still did interesting design work), then split off to create a chip startup. A hardware team where most of the people are smart new grads usually spend on the order of $100 million over five or six years only to find that they don’t have a competitive product (or, more likely, don’t even have anything that’s close to working)1.

Smart and gets things done” has become the standard for software hiring, but that isn’t even enough for plumbing or carpentry. Next time you have a plumbing emergency, let me know how you pick a plumber. Do you hire the same way you do for software, taking the smart kid who’s read a few books, tried out some tools at Home Depot, and is a great hacker? Or do you go with the grizzled veteran with decades of experience?

Physical work isn’t the kind of thing you can derive from first principles, no matter how smart you are. Consider South Korea after WWII. Its GDP per capita was lower than Ghana, Kenya, and just barely above the Congo. For various reasons, the new regime didn’t have to deal with legacy institutions; and they wanted Korea to become a first-world nation.

The story I’ve heard is that the government started by subsidizing concrete. After many years making concrete, they wanted to move up the chain and start more complex manufacturing. They eventually got to building ships, because shipping was a critical part of the export economy they wanted to create.

They pulled some of their best business people who had learned skills like management and operations in other manufacturing. Those people knew they didn’t have the expertise to build ships themselves, so they contracted it out. They made the choice to work with Scottish firms, because Scotland has a long history of shipbuilding. Makes sense, right?

It didn’t work. For historical and geographic reasons, Scotland’s shipyards weren’t full-sized; they built their ships in two halves and then assembled them. Worked fine for them, because they’d be doing it at scale since the 1800s, and had world renowned expertise by the 1900s. But when the unpracticed Koreans tried to build ships using Scottish plans and detailed step-by-step directions, the result was two ship halves that didn’t quite fit together and sunk when assembled.

The Koreans eventually managed to start a shipbuilding industry by hiring foreign companies to come and build ships locally, showing people how it’s done. And it took decades to get what we would consider basic manufacturing working smoothly, even though all of the requisite knowledge existed in books, was taught in university courses, and could be had from experts for a small fee.

Today, anyone with a CS 101 background can take Geoffrey Hinton’s course on neural networks and deep learning, and start applying state of the art machine learning techniques in production within a couple months. In software land, you can fix minor bugs in real time. If it takes a whole day to run your regression test suite, you consider yourself lucky because it means you’re in one of the few environments that takes testing seriously. If the architecture is fundamentally flawed, you pull out your copy of Feathers’ “Working Effectively with Legacy Code” and you apply minor fixes until you’re done.

This isn’t to say that software isn’t hard, it’s just a different kind of hard: the sort of hard that can be attacked with genius and perseverance, even without experience. But, if you want to build a ship, and you “only” have a decade of experience with carpentry, milling, metalworking, etc., well, good luck. You’re going to need it. With a large ship, “minor” fixes can take days or weeks, and a fundamental flaw means that your ship sinks and you’ve lost half a year of work and tens of millions of dollars. By the time you get to something with the complexity of a modern high-performance microprocessor, a minor bug discovered in production costs three months and five million dollars. A fundamental flaw in the architecture will cost you five years and hundreds of millions of dollars2.

Physical mistakes are costly. There’s no undo and editing isn’t simply a matter of pressing some keys; changes consume real, physical resources. You need enough wisdom and experience to avoid common mistakes entirely – especially the ones that can’t be fixed.

Part 1: Verilog is Weird

Part 2: The Physical World is Unforgiving

  1. Comparing my old company to another x86 startup founded within the year is instructive. Both started at around the same time. Both had great teams of smart people. Our competitor even had famous software and business people on their side. But it’s notable that their hardware implementers weren’t a core team of multi-decade industry veterans who had worked together before. It took us about two years to get a working x86 chip, on top of $15M in funding. Our goal was to produce a low-cost chip and we nailed it. It took them five years, with over $250M in funding. Their original goal was to produce a high performance low-power processor, but they missed their performance target so badly that they were forced into the low-cost space. They ended up with worse performance than us, with a chip was 50% bigger (and hence, cost more than 50% more to produce) using team four times our size. They eventually went under, because there’s no way they couldn’t survive with 4x our burn rate and weaker performance. But, not before burning through $969M in funding (including $230M from patent lawsuits).

  2. A funny side effect of the importance of experience is that age discrimination doesn’t affect the areas I’ve worked in. At 30, I’m bizarrely young for someone who’s done microprocessor design. The core folks at my old place were in their 60s. They’d picked up some younger folks along the way, but 30? Freakishly young. People are much younger at the new gig: I’m surrounded by ex-supercomputer folks from Cray and SGI, who are barely pushing 50, along with a couple kids from Synplify and DESRES who, at 40, are unusually young. Not all hardware folks are that old. In another arm of the company, there are folks who grew up in the FPGA world, which is a lot more forgiving. In that group, I think I met someone who’s only a few years older than me. Kidding aside, you’ll see younger folks doing RTL design on complex projects at large companies that are willing to spend a decade mentoring folks. But, at startups and on small hardware teams that move fast, it’s rare to hire someone into design who doesn’t have a decade of experience.

    There’s a crowd that’s even younger than the FPGA folks, even younger than me, working on Arduinos and microcontrollers, doing hobbyist electronics and consumer products. I’m genuinely curious how many of those folks will decide to work on large-scale systems design. In one sense, it’s inevitable, as the area matures, and solutions become more complex. The other sense is what I’m curious about: will the hardware renaissance spark an interest in supercomputers, microprocessors, and warehouse-scale computers?