Here’s a language that gives near-C performance that feels like Python or Ruby with optional type annotations (that you can feed to one of two static analysis tools) that has good support for macros plus decent-ish support for FP, plus a lot more. What’s not to like? I’m mostly not going to talk about how great Julia is, though, because you can find plenty of blog posts that do that all over the internet.
The last time I used Julia (around Oct. 2014), I ran into two new (to me) bugs involving bogus exceptions when processing Unicode strings. To work around those, I used a try/catch, but of course that runs into a non-deterministic bug I’ve found with try/catch. I also hit a bug where a function returned a completely wrong result if you passed it an argument of the wrong type instead of throwing a “no method” error. I spent half an hour writing a throwaway script and ran into four bugs in the core language.
The second to last time I used Julia, I ran into too many bugs to list; the worst of them caused generating plots to take 30 seconds per plot, which caused me to switch to R/ggplot2 for plotting. First there was this bug with plotting dates to stop working. When I worked around that I ran into a regression that caused plotting to break large parts of the core language, so that data manipulation had to be done before plotting. That would have been fine if I knew exactly what I wanted, but for exploratory data analysis I want to plot some data, do something with the data, and then plot it again. Doing that required restarting the REPL for each new plot. That would have been fine, except that it takes 22 seconds to load Gadfly on my 1.7GHz Haswell (timed by using
time on a file that loads Gadfly and does no work), plus another 10-ish seconds to load the other packages I was using, turning my plotting workflow into: restart REPL, wait 30 seconds, make a change, make a plot, look at a plot, repeat.
It’s not unusual to run into bugs when using a young language, but Julia has more than its share of bugs for something at its level of maturity. If you look at the test process, that’s basically inevitable.
As far as I can tell, FactCheck is the most commonly used thing resembling a modern test framework, and it’s barely used. Until quite recently, it was unmaintained and broken, but even now the vast majority of tests are written using
@test, which is basically an assert. It’s theoretically possible to write good tests by having a file full of test code and asserts. But in practice, anyone who’s doing that isn’t serious about testing and isn’t going to write good tests.
Not only are existing tests not very good, most things aren’t tested at all. You might point out that the coverage stats for a lot of packages aren’t so bad, but last time I looked, there was a bug in the coverage tool that caused it to only aggregate coverage statistics for functions with non-zero coverage. That is to say, code in untested functions doesn’t count towards the coverage stats! That, plus the weak notion of test coverage that’s used (line coverage1) make the coverage stats unhelpful for determining if packages are well tested.
The lack of testing doesn’t just mean that you run into regression bugs. Features just disappear at random, too. When the REPL got rewritten a lot of existing shortcut keys and other features stopped working. As far as I can tell, that wasn’t because anyone wanted it to work differently. It was because there’s no way to re-write something that isn’t tested without losing functionality.
Something that goes hand-in-hand with the level of testing on most Julia packages (and the language itself) is the lack of a good story for error handling. Although you can easily use
Nullable (the Julia equivalent of
Some/None) or error codes in Julia, the most common idiom is to use exceptions. And if you use things in
Base, like arrays or
/, you’re stuck with exceptions. I’m not a fan, but that’s fine – plenty of reliable software uses exceptions for error handling.
The problem is that because the niche Julia occupies doesn’t care2 about error handling, it’s extremely difficult to write a robust Julia program. When you’re writing smaller scripts, you often want to “fail-fast” to make debugging easier, but for some programs, you want the program to do something reasonable, keep running, and maybe log the error. It’s hard to write a robust program, even for this weak definition of robust. There are problems at multiple levels. For the sake of space, I’ll just list two.
If I’m writing something I’d like to be robust, I really want function documentation to include all exceptions the function might throw. Not only do the Julia docs not have that, it’s common to call some function and get a random exception that has to do with an implementation detail and nothing to do with the API interface. Everything I’ve written that actually has to be reliable has been exception free, so maybe that’s normal when people use exceptions? Seems pretty weird to me, though.
Another problem is that catching exceptions doesn’t work (sometimes, at random). I ran into one bug where using exceptions caused code to be incorrectly optimized out. You might say that’s not fair because it was caught using a fuzzer, and fuzzers are supposed to find bugs, but the fuzzer wasn’t fuzzing exceptions or even expressions. The implementation of the fuzzer just happens to involve eval’ing function calls, in a loop, with a
try/catch to handle exceptions. Turns out, if you do that, the function might not get called. This isn’t a case of using a fuzzer to generate billions of tests, one of which failed. This was a case of trying one thing, one of which failed. That bug is now fixed, but there’s still a nasty bug that causes exceptions to sometimes fail to be caught by
catch, which is pretty bad news if you’re putting something in a
try/catch block because you don’t want an exception to trickle up to the top level and kill your program.
When I grepped through
Base to find instances of actually catching an exception and doing something based on the particular exception, I could only find a single one. Now, it’s me scanning grep output in less, so I might have missed some instances, but it isn’t common, and grepping through common packages finds a similar ratio of error handling code to other code. Julia folks don’t care about error handling, so it’s buggy and incomplete. I once asked about this and was told that it didn’t matter that exceptions didn’t work because you shouldn’t use exceptions anyway – you should use Erlang style error handling where you kill the entire process on an error and build transactionally robust systems that can survive having random processes killed. Putting aside the difficulty of that in a language that doesn’t have Erlang’s support for that kind of thing, you can easily spin up a million processes in Erlang. In Julia, if you load just one or two commonly used packages, firing up a single new instance of Julia can easily take half a minute or a minute. To spin up a million independent instances would at 30 seconds a piece would take approximately two years.
Since we’re broadly on the topic of APIs, error conditions aren’t the only place where the
Base API leaves something to be desired. Conventions are inconsistent in many ways, from function naming to the order of arguments. Some methods on collections take the collection as the first argument and some don’t (e.g., replace takes the string first and the regex second, whereas match takes the regex first and the string second).
Base APIs outside of the niche Julia targets often don’t make sense. There are too many examples to list them all, but consider this one: the UDP interface throws an exception on a partial packet. This is really strange and also unhelpful. Multiple people stated that on this issue but the devs decided to throw the exception anyway. The Julia implementers have great intuition when it comes to linear algebra and other areas they’re familiar with. But they’re only human and their intuition isn’t so great in areas they’re not familiar with. The problem is that they go with their intuition anyway, even in the face of comments about how that might not be the best idea.
Another thing that’s an issue for me is that I’m not in the audience the package manager was designed for. It’s backed by git in a clever way that lets people do all sorts of things I never do. The result of all that is that it needs to do
git status on each package when I run
Pkg.status(), which makes it horribly slow; most other
Pkg operations I care about are also slow for a similar reason.
That might be ok if it had the feature I most wanted, which is the ability to specify exact versions of packages and have multiple, conflicting, versions of packages installed3. Because of all the regressions in the core language libraries and in packages, I often need to use an old version of some package to make some function actually work, which can require old versions of its dependencies. There’s no non-hacky way to do this.
Since I’m talking about issues where I care a lot more than the core devs, there’s also benchmarking. The website shows off some impressive sounding speedup numbers over other languages. But they’re all benchmarks that are pretty far from real workloads. Even if you have a strong background in workload characterization and systems architecture (computer architecture, not software architecture), it’s difficult to generalize performance results on anything resembling real workload from microbenchmark numbers. From what I’ve heard, performance optimization of Julia is done from a larger set of similar benchmarks, which has problems for all of the same reasons. Julia is actually pretty fast, but this sort of ad hoc benchmarking basically guarantees that performance is being left on the table. Moreover, the benchmarks are written in a way that stacks the deck against other languages. People from other language communities often get rebuffed when they submit PRs to rewrite the benchmarks in their languages idiomatically. The Julia website claims that “all of the benchmarks are written to test the performance of specific algorithms, expressed in a reasonable idiom”, and that making adjustments that are idiomatic for specific languages would be unfair. However, if you look at the Julia code, you’ll notice that they’re written in a way to avoid doing one of a number of things that would crater performance. If you follow the mailing list, you’ll see that there are quite a few intuitive ways to write Julia code that has very bad performance. The Julia benchmarks avoid those pitfalls, but the code for other languages isn’t written with anywhere near that care; in fact, it’s just the opposite.
I’ve just listed a bunch of issues with Julia. I believe the canonical response for complaints about an open source project is, why don’t you fix the bugs yourself, you entitled brat? Well, I tried that. For one thing, there are so many bugs that I often don’t file bugs, let alone fix them, because it’s too much of an interruption. But the bigger issue are the barriers to new contributors. I spent a few person-days fixing bugs (mostly debugging, not writing code) and that was almost enough to get me into the top 40 on GitHub’s list of contributors. My point isn’t that I contributed a lot. It’s that I didn’t, and that still put me right below the top 40.
There’s lots of friction that keeps people from contributing to Julia. The build is often broken or has failing tests. When I polled Travis CI stats for languages on GitHub, Julia was basically tied for last in uptime. This isn’t just a statistical curiosity: the first time I tried to fix something, the build was non-deterministically broken for the better part of a week because someone checked bad code directly into master without review. I spent maybe a week fixing a few things and then took a break. The next time I came back to fix something, tests were failing for a day because of another bad check-in and I gave up on the idea of fixing bugs. That tests fail so often is even worse than it sounds when you take into account the poor test coverage. And even when the build is “working”, it uses recursive makefiles, and often fails with a message telling you that you need to run
make clean and build again, which takes half an hour. When you do so, it often fails with a message telling you that you need to
make clean all and build again, with takes an hour. And then there’s some chance that will fail and you’ll have to manually clean out
deps and build again, which takes even longer. And that’s the good case! The bad case is when the build fails non-deterministically. These are well-known problems that occur when using recursive make, described in Recursive Make Considered Harmful circa 1997.
And that’s not even the biggest barrier to contributing to core Julia. The biggest barrier is that the vast majority of the core code is written with no markers of intent (comments, meaningful variable names, asserts, meaningful function names, explanations of short variable or function names, design docs, etc.). There’s a tax on debugging and fixing bugs deep in core Julia because of all this. I happen to know one of the Julia core contributors (presently listed as the #2 contributor by GitHub’s ranking), and when I asked him about some of the more obtuse functions I was digging around in, he couldn’t figure it out either. His suggestion was to ask the mailing list, but for the really obscure code in the core codebase, there’s perhaps one to three people who actually understand the code, and if they’re too busy to respond, you’re out of luck.
I don’t mind spending my spare time working for free to fix other people’s bugs. In fact, I do quite a bit of that and it turns out I often enjoy it. But I’m too old and crotchety to spend my leisure time deciphering code that even the core developers can’t figure out because it’s too obscure.
None of this is to say that Julia is bad, but the concerns of the core team are pretty different from my concerns. This is the point in a complain-y blog post where you’re supposed to suggest an alternative or make a call to action, but I don’t know that either makes sense here. The purely technical problems, like slow load times or the package manager, are being fixed or will be fixed, so there’s not much to say there. As for process problems, like not writing tests, not writing internal documentation, and checking unreviewed and sometimes breaking changes directly into master, well, that’s “easy”4 to fix by adding a code review process that forces people to write tests and documentation for code, but that’s not free.
A small team of highly talented developers who can basically hold all of the code in their collective heads can make great progress while eschewing anything that isn’t just straight coding at the cost of making it more difficult for other people to contribute. Is that worth it? It’s hard to say. If you have to slow down Jeff, Keno, and the other super productive core contributors and all you get out of it is a couple of bums like me, that’s probably not worth it. If you get a thousand people like me, that’s probably worth it. The reality is in the ambiguous region in the middle, where it might or might not be worth it. The calculation is complicated by the fact that most of the benefit comes in the long run, whereas the costs are disproportionately paid in the short run. I once had an engineering professor who claimed that the answer to every engineering question is “it depends”. What should Julia do? It depends.
Update: this post was edited a bit to remove a sentence about how friendly the Julia community is since that no longer seemed appropriate in light of recent private and semi-private communications from one of the co-creators of Julia. They were, by far, the nastiest and most dishonest responses I’ve ever gotten to any blog post. Some of those responses were on a private discussion channel; multiple people later talked to me about how shocked they were at the sheer meanness and dishonesty of the responses. Oh, and there’s also the public mailing list. The responses there weren’t in the same league, but even so, I didn’t stick around long since I unsubscribed when one the Julia co-creators responded with something bad enough that it prompted someone else to to suggest sticking to the facts and avoiding attacks. That wasn’t the first attack, or even the first one to prompt someone to respond and ask that people stay on topic; it just happened to be the one that made me think that we weren’t going to have a productive discussion. I extended an olive branch before leaving, but who knows what happened there?
Update 2, 1 year later: The same person who previously attacked me in private is now posting heavily edited and misleading excerpts in an attempt to discredit this post. I’m not going to post the full content in part because it’s extremely long, but mostly because it’s a gross violation of that community’s norms to post internal content publicly. If you know anyone in the RC community who was there for the discussion before the edits and you want the truth, ask your RC buddy for their take. If you don’t know any RC folks, consider that my debate partner’s behavior was so egregious that multiple people asked him to stop, and many more people messaged me privately to talk about how inappropriate his behavior was. If you compare that to what’s been publicly dredged up, you can get an idea of both how representative the public excerpts are and of how honest the other person is being._
Aside from that, they also claim that the issues in this post have been addressed, and that (for example) test coverage is now good. But if you look at any thread about this, people find that they still run into a lot of bugs. I hear about this all the time because I know one of the co-authors of the O’Reilly Learning Julia book and they have to re-write examples to work around core bugs all the time. And that’s for basic examples for an intro book, stuff you’d expect to work because it’s so simple. This isn’t to say that Julia is a bad language, but there’s a trade-off between moving fast and breaking things. If you’re ok with the trade-off that Julia makes, that’s great! But you shouldn’t believe the claims that there’s no trade-off and that everything is great.
In retrospect, the initial response to my post from the Julia community is pretty amusing. Here’s a response from a core member of the Julia community that’s representative of the response at the time:
Other folks thought it was great that the community responded to criticism so well. And then one of the co-creators of Julia goes around for a year telling people that issues that haven’t been fixed have been fixed while also dredging up an extremely misleading and heavily edited set of quotes from private communications to discredit me. If that’s what it looks like when the community responds well, I’m afraid to even ask about what it looks like when it doesn’t respond well.
Thanks (or anti-thanks) to Leah Hanson for pestering me to write this for the past few months. It’s not the kind of thing I’d normally write, but the concerns here got repeatedly brushed off when I brought them up in private. For example, when I brought up testing, I was told that Julia is better tested than most projects. While that’s true in some technical sense (the median project on GitHub probably has zero tests, so any non-zero number of tests is above average), I didn’t find that to be a meaningful rebuttal (as opposed to a reply that Julia is still expected to be mostly untested because it’s in an alpha state). After getting a similar response on a wide array of topics I stopped using Julia. Normally that would be that, but Leah really wanted these concerns to stop getting ignored, so I wrote this up.
Thanks to Leah Hanson, Julia Evans, Joe Wilder, and Eddie V. for editor-esque comments that caused me to make some changes to this post. I really should have incorporated more of Julia Evans’s feedback but I got distracted by the happenings that caused the update above, and I’ve now forgotten most of it. Also, thanks to David Andrzejewski for reminding me that it was Morrow who drilled “it depends” into my brain.