Why Hardware Development Is Hard, Part 1: Verilog Is Weird

Verilog is the most commonly used language for hardware design in America (VHDL is more common in Europe). Too bad it’s so baroque. If you ever browse the Verilog questions on Stack Overflow, you’ll find a large number of questions, usually downvoted, asking “why doesn’t my code work?”, with code that’s not just a little off, but completely wrong.

images

Lets look at an example: “Idea is to store value of counter at the time of reset … I get DRC violatins and the memory, bufreadaddr, bufreadval are all optimized out.”

1
2
3
4
5
6
7
8
9
10
11
12
always @(negedge reset or posedge clk) begin
  if (reset == 0) begin
    d_out <= 16'h0000;
    d_out_mem[resetcount] <= d_out;
    laststoredvalue <= d_out;
  end else begin
    d_out <= d_out + 1'b1; 
  end
end

always @(bufreadaddr)
  bufreadval = d_out_mem[bufreadaddr];

We want a counter that keeps track of how many cycles it’s been since reset, and we want to store that value in an array-like structure that’s indexed by resetcount. If you’ve read a bit on semantics of Verilog, this is a perfectly natural way to solve the problem. Our poster knows enough about Verilog to use ‘<=’ in state elements, so that all of the elements are updated at the same time. Every time there’s a clock edge, we’ll increment d_out. When reset is 0, we’ll store that value and reset d_out. What could possibly go wrong?

The problem is that Verilog was originally designed as a language to describe simulations, so it has constructs to describe arbitrary interactions between events. When X transitions from 0 to 1, do Y. Great! Sounds easy enough. But then someone had the bright idea of using Verilog to represent hardware. The vast majority of statements you could write down don’t translate into any meaningful hardware. Your synthesis tool, which translates from Verilog to hardware will helpfully pattern match to the closest available thing, or produce nothing, if you write down something untranslatable. If you’re lucky, you might get some warnings.

Looking at the code above, the synthesis tool will see that there’s something called d_out which should be a clocked element that’s set to something when it shouldn’t be reset, and is otherwise asynchronously reset. That’s a legit hardware construct, so it will produce an N-bit [flip-flop](http://en.wikipedia.org/wiki/Flip-flop_(electronics)) and some logic to make it a counter that gets reset to 0.

Now, what about the value we’re supposed to store on reset? Well, the synthesis tool will see that it’s inside a block that’s clocked. But it’s not supposed to do anything when the clock is active; only when reset is asserted. That’s pretty unusual. What’s going to happen? Well, that depends on which version of which synthesis tool you’re using, and how the programmers of that tool decided to implement undefined behavior.

And then there’s the block that’s supposed to read out the stored value. It looks like the intent is to create a 64:1 MUX. Putting aside the cycle time issues you’ll get with such a wide MUX, the block isn’t clocked, so the synthesis tool will have to infer some sort of combinational logic. But, the output is only supposed to change if bufreadaddr changes, and not if d_out_mem changes. It’s quite easy to describe that in our simulation language, the but the synthesis tool is going to produce something that is definitely not what the user wants here. Not to mention that laststoredvalue isn’t meaningfully connected to bufreadvalue.

How is it possible that a reasonable description of something in Verilog turns into something completely wrong in hardware? You can think of hardware as some state, with pure functions connecting the state elements. This makes it natural to think about modeling hardware in a functional programming language. Another natural way to think about it would be with OO. Classes describe how the hardware works. Instances of the class are actual hardware that will get put onto the chip. Yet another natural way to describe things would be declaratively, where you write down constraints the hardware must obey, and the synthesis tool outputs something that meets those constraints.

Verilog does none of these things. To write Verilog that will produce correct hardware, you have to first picture the hardware you want to produce. Then, you have to figure out how to describe that in this weird C-like simulation language. That will then get synthesized into something like what you were imaging in the first step.

As a software engineer, how would you feel if 99% of valid Java code ended up being translated to something that produced random results, even though tests pass on the untranslated Java code? And, by the way, to run tests on the translated Java code you have to go through a multi-day long compilation process, after which your tests will run 200 million times slower than code runs in production. If you’re thinking of testing on some sandboxed production machines, sure, go ahead, but it costs 8 figures to push something to any number of your production machines, and it takes 3 months. But, don’t worry, you can run the untranslated code only 2 million times slower than in production [1]. People used to statically typed languages often complain that you get run-time errors about things that would be trivial to statically check in a language with stronger types. We hardware folks are so used to the vast majority of legal Verilog contructs producing unsynthesizable garbage that we don’t find it the least bit surprising that you not only do you not get compile-time errors, you don’t even get run-time errors, from writing naive Verilog code.

I won’t even get into how Verilog is so unexpressive that many companies use an ad hoc tool to embed a scripting language in Verilog or generate Verilog from a scripting language.

There have been a number of attempts to do better than jamming an ad hoc scripting language into Verilog, but they’ve all fizzled out. As a functional language that’s easy to add syntax to, Haskell is a natural choice for Verilog code generation; it spawned ForSyDe, Hydra, Lava, HHDL, and bluespec. But adoption of ForSyDe, Hydra, Lava, and HHDL is pretty much zero, not because of deficiencies in the language, but because it’s politically difficult to get people to use a Haskell based language. Bluespec has done better, but they’ve done it by making their language look C-like, scrapping the original Haskell syntax and introducing Bluespec SystemVerilog and Bluespec SystemC. The aversion to Haskell is so severe that when we discussed a hardware style at my new gig, one person suggested banning any Haskell based solution, even though bluespec has been used to good effect in a couple projects within the company.

Scala based solutions look more promising, not for any technical reason, but because Scala is less scary. Scala has managed to bring the modern world (in terms of type systems) to more programmers than ML, Ocaml, Haskell, Agda, etc., combined. Perhaps the same will be true in the hardware world. Chisel is interesting. Like bluespec, it simulates much more quickly than Verilog, and unsynthesizable representations are syntax errors. It’s not as high level, but it’s the only hardware description language with a modern type system that I’ve been able to discuss with hardware folks without people objecting that Haskell is a bad idea.

Commercial vendors are mostly moving in the other direction because C-like languages make people feel all warm and fuzzy. A number of them are pushing high-level hardware synthesis from SystemC, or even straight C or C++. These solutions are also politically difficult to sell, but this time it’s the history of the industry, and not the language. Vendors pushing high-level synthesis have a decades long track record of overpromising and underdelivering. I’ve lost track of the number of times I’ve heard people dismiss modern offerings with “Why should we believe that this they’re for real this time?”

What’s the future? Locally, I’ve managed to convince a couple of people on my team that Chisel is worth looking at. At the moment, none of the Haskell based solutions are even on the table. I’m open to suggestions.

Part 1: Verilog is Weird

Part 2: The Physical World is Unforgiving

[1] Approximate numbers from the last chip I worked on. We had licenses for both major commercial simulators, and we were lucky to get 500Hz, pre-synthesis, on the faster of the two, for a chip that ran at 2GHz in silicon. Don’t even get me started on open source simulators. The speed is at least 10x better for most ASIC work. Also, you can probably do synthesis much faster if you don’t have timing / parastics extraction baked into the process.

P.S. Dear hardware folks, sorry for oversimplying so much. I started writing footnotes explaining everything I was glossing over until I realized that my footnotes were longer than the post. The culled footnotes may make it into their own blog posts some day. A very long footnote that I’ll briefly summarize is that semantically correct Verilog simulation is inherently slower than something like bluespec or Chisel because of the complications involved with the event model. EDA vendors have managed to get decent performance out of Verilog, but only by hiring large teams of the best simulation people in the world to hammer at the problem, the same way javascript is fast not because of any property of the language, but because there are amazing people working on the VM. It should tell you something when a tiny team working on a shoestring grant-funded budget can produce a language and simulation infrastructure that smokes existing tools.

I’ve changed domains, so the auto-generated twitter and HN links for this page are off. Here are the original ones: