I love reading postmortems. They’re educational, but unlike most educational docs, they tell an entertaining story. I’ve spent a decent chunk of time reading postmortems at both Google and Microsoft. I haven’t done any kind of formal analysis on the most common causes of bad failures (yet), but there are a handful of postmortem patterns that I keep seeing over and over again.
TIL that Bell Labs and a whole lot of other websites block archive.org, not to mention most search engines. Turns out I have a broken website link in a github repo, caused by the deletion of an old webpage. When I tried to pull the original from archive.org, I found that it’s not available because Bell Labs blocks the archive.org crawler in their robots.txt:
Here’s a conversation I keep having:
Someone: Did you hear that Facebook/Google uses a giant monorepo? WTF!
Me: Yeah! It’s really convenient, don’t you think?
Someone: That’s THE MOST RIDICULOUS THING I’ve ever heard. Don’t FB and Google know what a terrible idea it is to put all your code in a single repo?
Me: I think engineers at FB and Google are probably familiar with using smaller repos (doesn’t Junio Hamano work at Google?), and they still prefer a single huge repo for [reasons].
Someone: Oh that does sound pretty nice. I still think it’s weird but I could see why someone would want that.
“[reasons]” is pretty long, so I’m writing this down in order to avoid repeating the same conversation over and over again.
Why are people so concerned with hardware power consumption nowadays? Some common answers to this question are that power is critically important for phones, tablets, and laptops and that we can put more silicon on a modern chip than we can effectively use. In 2001 Patrick Gelsinger observed that if scaling continued at then-current rates, chips would have the power density of a nuclear reactor by 2005, a rocket nozzle by 2010, and the surface of the sun by 2015. Needless to say, that didn’t happen. The importance of portables and scaling limits are both valid and important reasons, but since they’re widely discussed, I’m going to talk about an underrated reason.
People often focus on the portable market because it’s cannibalizing desktop market, but that’s not the only growth market – servers are also becoming more important than desktops, and power is really important for servers. To see why power is important for servers, let’s look at some calculations about how what it costs to run a datacenter from Hennessy & Patterson.
It’s really common to see claims that some meme is backed by “studies” or “science”. But when I look at the actual studies, it usually turns out that the data are opposed to the claim. Here are the last few instances of this that I’ve run across.
I’ve been reading a lot about software testing, lately. Coming from a hardware background (CPUs and hardware accelerators), it’s interesting how different software testing is. Bugs in software are much easier to fix, so it makes sense to spend a lot less effort spent on testing. because less effort is spent on testing, methodologies differ; software testing is biased away from methods with high fixed costs, towards methods with high variable costs. But that doesn’t explain all of the differences, or even most of the differences. Most of the differences come from a cultural path dependence, which shows how non-optimally test effort is allocated in both hardware and software.