What’s New in CPUs Since the 80s and How Does It Affect Programmers?

This is a response to the following question from David Albert:

My mental model of CPUs is stuck in the 1980s: basically boxes that do arithmetic, logic, bit twiddling and shifting, and loading and storing things in memory. I’m vaguely aware of various newer developments like vector instructions (SIMD) and the idea that newer CPUs have support for virtualization (though I have no idea what that means in practice).

What cool developments have I been missing? What can today’s CPU do that last year’s CPU couldn’t? How about a CPU from two years ago, five years ago, or ten years ago? The things I’m most interested in are things that programmers have to manually take advantage of (or programming environments have to be redesigned to take advantage of) in order to use and as a result might not be using yet. I think this excludes things like Hyper-threading/SMT, but I’m not honestly sure. I’m also interested in things that CPUs can’t do yet but will be able to do in the near future.

Julia Is Awesome, But…

Here’s a language that gives near-C performance that feels like Python or Ruby with optional type annotations (that you can feed to one of two static analysis tools) that has good support for macros plus decent-ish support for FP, plus a lot more. What’s not to like? I’m mostly not going to talk about how great Julia is, though, because you can find plenty of blog posts that do that all over the internet.

The last time I used Julia (around Oct. 2014), I ran into two new (to me) bugs involving bogus exceptions when processing unicode strings. To work around those, I used a try/catch, but of course that runs into a non-deterministic bug I’ve found with try/catch. I also hit a bug where a function returned a completely wrong result if you passed it an argument of the wrong type instead of throwing a “no method” error. I spent half an hour writing a throwaway script and ran into four bugs in the core language.

The Performance Cost of Integer Overflow Checking

How much overhead should we expect from enabling integer overflow checks? Using a compiler flag or built-in intrinsics, we should be able to do the check with a conditional branch that branches based on the overflow flag that add and sub set. Code that looks like

add     %esi, %edi

should turn into something like

add     %esi, %edi
jo      <handle_overflow>

Assuming that branch is always correctly predicted (which should be the case for most code), the costs of the branch are the cost of executing that correctly predicted not-taken branch, the pollution the branch causes in the branch history table, and the cost of decoding the branch (on x86, jo and jno don’t fuse with add or sub, which means that on the fast path, the branch will take up one of the 4 opcodes that can be come from the decoded instruction cache per cycle). That’s probably less than a 2x penalty per add or sub on front-end limited in the worst case (which might happen in a tightly optimized loop, but should be rare in general), plus some nebulous penalty from branch history pollution which is really difficult to measure in microbenchmarks. Overall, we can use 2x as a pessimistic guess for the total penalty.

2x sounds like a lot, but how much time do applications spend adding and subtracting? If we look at the most commonly used benchmark of “workstation” integer workloads, specint, the composition is maybe 40% load/store ops, 10% branches, and 50% other operations. Of the 50% “other” operations, maybe 30% of those are integer add/sub ops. If we guesstimate that load/store ops are 10x as expensive as add/sub ops, and other ops are as expensive as add/sub, a 2x penalty on add/sub should result in a (40*10+10+50 + 12) / (40*10+10+50) = 3% penalty. That the penalty for a branch is 2x, that add/sub ops are only 10x slower than load/store ops, and that add/sub ops aren’t faster than other “other” ops are all pessimistic assumptions, so this estimate should be on the high end for most workloads.

A Quick Tutorial on Implementing and Debugging Malloc, Free, Calloc, and Realloc

Let’s write a malloc and see how it works with existing programs!

This tutorial is going to assume that you know what pointers are, and that you know enough C to know that *ptr dereferences a pointer, ptr->foo means (*ptr).foo, that malloc is used to dynamically allocate space, and that you’re familiar with the concept of a linked list. For a basic intro to C, Pointers on C is one of my favorite books. If you want to look at all of this code at once, it’s available here.

Preliminaries aside, malloc’s function signature is

void *malloc(size_t size);

It takes as input a number of bytes and returns a pointer to a block of memory of that size.

There are a number of ways we can implement this. We’re going to arbitrarily choose to use the sbrk syscall. The OS reserves stack and heap space for processes and sbrk lets us manipulate the heap. sbrk(0) returns a pointer to the current top of the heap. sbrk(foo) increments the heap size by foo and returns a pointer to the previous top of the heap.

Markets Don’t Eliminate Discrimination

I literally can’t remember the last time I saw a public discussion of discrimination in tech where someone didn’t assert that discrimination is impossible because of the market. Here’s a quote from Marc Andreessen that sums up a common view1.

Let’s launch right into it. I think the critique that Silicon Valley companies are deliberately, systematically discriminatory is incorrect, and there are two reasons to believe that that’s the case. … No. 2, our companies are desperate for talent. Desperate. Our companies are dying for talent. They’re like lying on the beach gasping because they can’t get enough talented people in for these jobs. The motivation to go find talent wherever it is unbelievably high.

Marc Andreesen’s point is that the market is too competitive for discrimination to exist. But VC funded startups aren’t the first companies in the world to face a competitive hiring market. Consider the market for PhD economists from, say, 1958 to 1987. Alan Greenspan had this to say about how that market looked to his firm, Townsend-Greenspan.

What Do Linux Developers Say in Commit Messages?

I was curious what different people worked on in Linux, so I tried grabbing data from the current git repository to see if I could pull that out of commit message data. This doesn’t include history from before they switched to git, so it only goes back to 2005, but that’s still a decent chunk of history.

Here’s a list of the most commonly used words (in commit messages), by the top four most frequent committers, with users ordered by number of commits.

Given That We Aren’t Going to Put Any Effort Into Testing, What’s the Best Way to Test?

If I had to guess, I’d say I probably work around hundreds of bugs in an average week, and thousands in a bad week. It’s not unusual for me to run into a hundred new bugs in a single week. But I often get skepticism when I mention that I run into multiple new (to me) bugs per day, and that this is inevitable if we don’t change how we write tests. Well, here’s a log of one week of bugs, limited to bugs that were new to me that week. After a brief description of the bugs, I’ll talk about what we can do to improve the situation. The obvious answer to spend more effort on testing, but everyone already knows we should do that and no one does it. That doesn’t mean it’s hopeless, though.

One week of bugs

Julia

Unicode sequence causes match/ismatch to blow up with a bounds error.

Unicode sequence causes using a string as a hash index to blow up with a bounds error.

Exception randomly not caught by catch. This sucks because putting things in a try/catch was the workaround for the two bugs above. I’ve seen other variants of this before; it’s possible this shouldn’t count as a new bug because it might be the same root cause as some bug I’ve already seen.

Speeding Up Octopress by 30x

I’ve seen all these studies that show how a 100ms improvement in page load time has a significant effect on page views, conversion rate, etc., but I’d never actually tried to optimize my site. This blog is a static Octopress site, hosted on Github Pages. Static sites are supposed to be fast, and Github Pages uses Fastly, which is supposed to be fast, so everything should be fast, right?

Not having done this before, I didn’t know what to do. But in a great talk on how the internet works, Dan Espeset suggested trying webpagetest; let’s give it a shot.

What Are the Odds the Build Actually Works?

I’ve noticed that builds are broken and tests fail a lot more often on open source projects than on “work” projects. I wasn’t sure how much of that was my perception vs. reality, so I grabbed the travis CI data for a few popular categories on github1.