I’ve been reading a lot about software testing, lately. Coming from a hardware background (CPUs and hardware accelerators), it’s interesting how different software testing is. Bugs in software are much easier to fix, so it makes sense to spend a lot less effort spent on testing. because less effort is spent on testing, methodologies differ; software testing is biased away from methods with high fixed costs, towards methods with high variable costs. But that doesn’t explain all of the differences, or even most of the differences. Most of the differences come from a cultural path dependence, which shows how non-optimally test effort is allocated in both hardware and software.
I’ve been hearing this question a lot lately, and when I do, it reminds me how much I don’t know. Here are some questions this question brings to mind.
Most real-world problems are big enough that you can’t just head for the end goal, you have to break them down into smaller parts and set up intermediate goals. For that matter, most games are that way too. “Win” is too big a goal in chess, so you might have a subgoal like don’t get forked. While creating subgoals makes intractable problems tractable, it also creates the problem of determining the relative priority of different subgoals and whether or not a subgoal is relevant to the ultimate goal at all. In chess, there are libraries worth of books written on just that.
And chess is really simple compared to a lot of real world problems. 64 squares. 32 pieces. Pretty much any analog problem you can think of contains more state than chess, and so do a lot of discrete problems. Chess is also relatively simple because you can directly measure whether or not you succeeded (won). Many real-world problems have the additional problem of not being able to measure your goal directly.
Last week, I scheduled an appointment for an MRI. The MRI is for a jaw problem which makes it painful to talk. I was hoping that the scheduling would be easy, so I wouldn’t have to spend a lot of time talking on the phone. But, as is often the case when dealing with bureaucracy, it wasn’t easy.
Here are the steps it took.
- Have jaw pain.
- See dentist. Get referral for MRI when dentist determines that it’s likely to be a joint problem.
- Dentist gets referral form from the local health provider (let’s call them Local Health), faxes it to them according to the instructions on the form, and emails me a copy of the referral.
- Call Local Health.
- Local Health tries to schedule me for an MRI of my pituitary.
It’s generally accepted that any piece of software could be compromised with a backdoor. Prominent examples include the Sony/BMG installer, which had a backdoor built-in to allow Sony to keep users from copying the CD, which also allowed malicious third-parties to take over any machine with the software installed; the Samsung Galaxy, which has a backdoor that allowed the modem to access the device’s filesytem, which also allows anyone running a fake base station to access files on the device; Lotus Notes, which had a backdoor which allowed encryption to be defeated; and Lenovo laptops, which pushed all web traffic through a proxy (including HTTPS, via a trusted root certificate) in order to push ads, which allowed anyone with the correct key (which was distributed on every laptop) to intercept HTTPS traffic.
Despite sightings of backdoors in FPGAs and networking gear, whenever someone brings up the possibility of CPU backdoors, it’s still common for people to claim that it’s impossible. I’m not going to claim that CPU backdoors exist, but I will claim that the implementation is easy, if you’ve got the right access.
Does it make sense for me to run ads on my blog? I’ve been thinking about this lately, since Carbon Ads contacted me about putting an ad up. What are the pros and cons? This isn’t a rhetorical question. I’m genuinely interested in what you think.
This is a response to the following question from David Albert:
My mental model of CPUs is stuck in the 1980s: basically boxes that do arithmetic, logic, bit twiddling and shifting, and loading and storing things in memory. I’m vaguely aware of various newer developments like vector instructions (SIMD) and the idea that newer CPUs have support for virtualization (though I have no idea what that means in practice).
What cool developments have I been missing? What can today’s CPU do that last year’s CPU couldn’t? How about a CPU from two years ago, five years ago, or ten years ago? The things I’m most interested in are things that programmers have to manually take advantage of (or programming environments have to be redesigned to take advantage of) in order to use and as a result might not be using yet. I think this excludes things like Hyper-threading/SMT, but I’m not honestly sure. I’m also interested in things that CPUs can’t do yet but will be able to do in the near future.
Here’s a language that gives near-C performance that feels like Python or Ruby with optional type annotations (that you can feed to one of two static analysis tools) that has good support for macros plus decent-ish support for FP, plus a lot more. What’s not to like? I’m mostly not going to talk about how great Julia is, though, because you can find plenty of blog posts that do that all over the internet.
The last time I used Julia (around Oct. 2014), I ran into two new (to me) bugs involving bogus exceptions when processing unicode strings. To work around those, I used a try/catch, but of course that runs into a non-deterministic bug I’ve found with try/catch. I also hit a bug where a function returned a completely wrong result if you passed it an argument of the wrong type instead of throwing a “no method” error. I spent half an hour writing a throwaway script and ran into four bugs in the core language.
How much overhead should we expect from enabling integer overflow checks? Using a compiler flag or built-in intrinsics, we should be able to do the check with a conditional branch that branches based on the overflow flag that
sub set. Code that looks like
add %esi, %edi
should turn into something like
add %esi, %edi jo <handle_overflow>
Assuming that branch is always correctly predicted (which should be the case for most code), the costs of the branch are the cost of executing that correctly predicted not-taken branch, the pollution the branch causes in the branch history table, and the cost of decoding the branch (on x86,
jno don’t fuse with
sub, which means that on the fast path, the branch will take up one of the 4 opcodes that can be come from the decoded instruction cache per cycle). That’s probably less than a 2x penalty per
sub on front-end limited in the worst case (which might happen in a tightly optimized loop, but should be rare in general), plus some nebulous penalty from branch history pollution which is really difficult to measure in microbenchmarks. Overall, we can use 2x as a pessimistic guess for the total penalty.
2x sounds like a lot, but how much time do applications spend adding and subtracting? If we look at the most commonly used benchmark of “workstation” integer workloads, specint, the composition is maybe 40% load/store ops, 10% branches, and 50% other operations. Of the 50% “other” operations, maybe 30% of those are integer add/sub ops. If we guesstimate that load/store ops are 10x as expensive as add/sub ops, and other ops are as expensive as add/sub, a 2x penalty on add/sub should result in a
(40*10+10+50 + 12) / (40*10+10+50) = 3% penalty. That the penalty for a branch is 2x, that add/sub ops are only 10x slower than load/store ops, and that add/sub ops aren’t faster than other “other” ops are all pessimistic assumptions, so this estimate should be on the high end for most workloads.
Let’s write a malloc and see how it works with existing programs!
This tutorial is going to assume that you know what pointers are, and that you know enough C to know that
*ptr dereferences a pointer,
(*ptr).foo, that malloc is used to dynamically allocate space, and that you’re familiar with the concept of a linked list. For a basic intro to C, Pointers on C is one of my favorite books. If you want to look at all of this code at once, it’s available here.
Preliminaries aside, malloc’s function signature is
void *malloc(size_t size);
It takes as input a number of bytes and returns a pointer to a block of memory of that size.
There are a number of ways we can implement this. We’re going to arbitrarily choose to use the sbrk syscall. The OS reserves stack and heap space for processes and sbrk lets us manipulate the heap.
sbrk(0) returns a pointer to the current top of the heap.
sbrk(foo) increments the heap size by
foo and returns a pointer to the previous top of the heap.