I wonder how much of the speed change came from the difference in languages versus the difference in experience when writing both. The ruby version was written prior to the clojure one and so anything that was learned about git/programming during the ruby writing would haven be available during the clojure writing. I can believe that clojure would be faster; using concurrency well guarantees this somewhat. I still wonder what would have happened if they had written the clojure version first without having written the ruby version. Or wrote a ruby version after the clojure version.
Well, once I was done with writing code to cache a Subversion repo, it was clearly visible how much faster Clojure is.
Basically same loop to go through all revisions in a repo was 15-20x times faster in Clojure than in Ruby. Very similar calls, pretty much same algorithm. But something that Ruby does inside the bindings was not very efficient.
For Git the problem was a bit different. Internally Grit (the Git API library for Ruby) tries to read huge pack files (potentially hundreds of MBs) in pure Ruby. This just can't be fast.
There's also additional overhead of not using any ORM for operating with our caches in Clojure version. That probably contributed to overall performance improvement, but 20x for svn / 40x for git measurement was taken even before we got to saving the results into DB and it stayed that way later on.
According the article, they were using ruby to bridge to a svn module that had git capability. They replaced that module with a native git library, called from Clojure.
In summary: benchmark, profile, find hotspots, optimize. Works in every language. >:3
This is more about the Ruby VM being much slower than the JVM than anything to do with Clojure. It would be somewhat interesting to see how JRuby 1.7 running their Ruby code but using the Java libraries would fare.
JRuby is still going to be slower than clojure, due to the semantics of ruby. All function calls are resolved at runtime, while clojure resolves as many as possible at compile time.
Foo::Bar.method() vs (foo.bar/method)
The ruby version does three loads from the heap, at runtime, while clojure does zero.
invokedynamic in JDK 7 will help narrow the gap, but the fact remains that Clojure was designed with more performance in mind than ruby.
I don't think this is quite right, although I'd be delighted to be corrected. Clojure still has to deref the var in order to call the function, and afaik this happens at runtime (hence the fact that you can redefine a function in the REPL).
Alioth benchmarks are out of date as far as Clojure is concerned. These days it's not much work to get Clojure to deliver identical to Java performance. Better versions of the same benchmarks here that show the same performance as Java that you verify yourself - http://github.com/clojure/test.benchmark
>>out of date as far as Clojure is concerned<< What a pity that Clojure programs, written months ago to do benchmarks game tasks, have not been contributed to the benchmarks game for all to see!
Among others: because of completely arbitrary and I'd dare to say weird rules. Programs are not meant to be written in idiomatic language. They are meant to resemble the C implementation. In many cases you can see that the program representing the language in the comparison is neither the fastest nor the simplest of the submissions.
You've made a bunch of claims without showing any evidence to support them. Please point to specific examples we can all look at on the website which show -- "Programs are not meant to be..." -- which show -- "They are meant to resemble the C implementation." etc
I'm not going to back up these claims in any other way than pointing the "interesting alternative" programs that appear in some comparisons, which tend to be faster than the chosen versions. For example here:
http://shootout.alioth.debian.org/u64q/benchmark.php?test=fa...
Both Lisp and Java alternatives are faster than the "fastest" solution (written in Fortran). Finding any more support for the years-old opinion I voiced seems too much work to be worth it.
So we can immediately dismiss your "Programs are not meant to be written in idiomatic language. They are meant to resemble the C implementation." claims as baseless.
Yes, you can. Just one last mention: my opinion was formed in the times I used Perl and looked around the actual implementations of these. But that was years ago. It was reinforced quite recently by someone whining (probably on HN) about the same things. Apart from these anecdotes, I have no basis for these words.
An algorithm that's idiomatic in one language might be terribly unidiomatic in another. (I'm sure you know this, but) Haskell's pervasive laziness makes some algorithms tractable while requiring a lot of space/time complexity for other algorithms.
The benchmarks game URL was posted by someone who has put a lot of energy into promoting Haskell - dons makes the best of the opportunity that the benchmarks game provides to promote Haskell.
>>pervasive laziness... requiring a lot of space/time complexity for other algorithms<<
And GHC provides strictness analysis and explicit strictness to avoid reducing performance.
Unfortunately the cost of developing this solution in C++ would be forbidding for us.
In our case on the pretty much same loop the difference between Ruby and Clojure was 20-30 times. Please note that using Clojure allowed us direct access to Java libraries, so essentially this is Java performance that we've been able to tap into using Clojure.
Nope. Although JRuby is not very efficient in calling Java code, at least it's not as efficient as Clojure is from what I've read.
Another plus for Clojure is that it allows us to easily have a multi-threaded implementation. JRuby allows native threads too, but Clojure make concurrency really easy, while JRuby just wraps Java threads with nothing more, so I would have to use regular locking and stuff.
In my code, there are two kinds of places where I get a performance boost by using Clojure:
1. Micro-optimizations, mostly due to JVM and its excellent JIT (the garbage collectors are quite impressive, too, if what you need is predictable response time).
2. Architectural gains: thanks to the Clojure's excellent concurrency support I can make much better use of multiple cores. I get more parallellism, hence better performance on same hardware.
The first kind is cool, because you get it "for free". The second kind is the real game-changer, because non-parallel software only gets you so far in terms of performance, and writing concurrent software is Hard. Clojure makes it much, much easier.
But overall I wouldn't say that Clojure is a performance daemon on a single CPU. You can get performance similar to carefully written Java code. This is good, but you can always do better with C or hand-written assembly on critical sections. But that's not the main advantage: the big thing is that I can write correct Clojure code fast, it runs well enough, and I can easily make use of multiple cores. You can debate micro-benchmarks all you want, but what really counts for me is how quickly (and correctly) I can get from zero to production code that runs fast enough.
I've been playing with 4clojure[1] in my off time for weeks, and it's been a great introduction to the language, although it obviously doesn't help with learning how to package and deploy an application or service. How did you make the first steps from playing around in the REPL to writing production code?
Have you seen Leiningen[1]? It pretty much solves packaging and deploying problem. Awesome tool.
Also if you're into Web development, I can recommend playing with Noir[2] framework and Korma[3] for SQL abstraction. Heroku also support deploying Clojure apps out of the box, so you can easily use a free tier to get something out there.
I've got a question about Noir. Is it necessary for me to write HTML with Clojure, or can I separate it and write plain HTML somehow? Writing HTML in the middle of my Clojure code doesn't appeal to me coming from a MVC background...
Actually for a while I had a JS tool which I was using called "build.js" which would just build DOM components like this. (It was therefore JSON-serializable too, but I never really had an occasion to use that.)
There is something nice about HTML which is a bit lost here: HTML (and LaTeX for that matter) allow unquoted text, with fewer escape characters. They are markup languages, which s-expressions crucially lack. (On the other hand, XML & family lack the ability of Lisps to make the first token anything other than a symbol.) There was briefly a plot called NML / Enamel and some others -- DTML and TML I believe -- which would instead write:
This is actually an emulation of a C-type syntax with a Lisp-type semantics: the idea is that you have in some sense two syntactically different channels into your expressions, one which comes before the pipe character | and one that comes after. The stuff that comes after is allowed to be marked-up text; the stuff that comes before is some sort of node list, and perhaps has certain conventions (one could imagine instead using a Clojure-style `:type "text/javascript"`, which would limit you to what XML attributes can do -- short text only, symbolic keywords).
That is, one could hypothetically rewrite this in some C-ish syntax which would look like:
html {
head { title { My page }
script(type {text/javascript} src {/static/page.js}) {}}
body { p { Stuff }}}
and again, if you wanted to limit yourself to XML, the parens above could then say instead `type: "text/javascript" src: "/static/page.js"` but it doesn't have to be that way.
Some crazy ideas for anyone building a new language to think about.
"The rewrite in Clojure resulted in much cleaner, faster code, totaling at only 700 lines of Clojure code (I don’t have a clear comparison with Ruby code here). "
Could you post the Clojure code somewhere? It would be very interesting to see clean, fast Clojure code written to solve a real-world problem (as opposed to some toy example).
beanstalkapp guys/girls, thanks for sharing! I'd be really interested to see some profiling results, were you bottlenecked on sha hashing? IO/execve syscalls? Memory usage?
I think the reason for the performance difference is pretty clear. According to the article and the grit documentation, all the git api calls were either done in pure ruby, or by shelling out to `git`. Also, reading between the lines, it sounds like they required some information/relationships on commits that was non-trivial to retrieve using a basic git shell command. So by switching out the runtime and the algorithm, they get a huge performance increase.
With jgit, they can more easily traverse the graph directly and efficiently. I'm really curious to see what kind of performance they could get from using FFI+jruby and raw c calls to libgit/libgit2 (http://libgit2.github.com/).
Clojure does have more sugar than Scheme, but imho some of it improves readability; for example, using brackets for grouping instead of overloading lists like Scheme does can make code easier to scan, because when you see parens in Clojure there are fewer meanings to choose from (usually only function application or a list literal). Example:
Scheme
(let ((x 2) (y 3))
(let ((x 7) (z (+ x y)))
(* z x)))
Clojure
(let [x 2 y 3]
(let [x 7 z (+ x y)]
(* z x)))
I think the Clojure version is easier to read without a paren-matching editor, though Scheme's rigorous minimalism does have its charm.
Pedantic note: #^ for metadata has been deprecated for at least 2 if not 3 versions. ^{ and ^:x aren't special. It's just ^ (reader metadata symbol) followed by a map and keyword respectively. Other things can also be metadata (for instance you can tag something with a classname to indicate a type hint).
You also missed out a few of the syntax quote stuff.
What is your contention to their choice here? It seems that are simply choosing the best tool for the job, it is not as if they are rewriting their whole web frontend in Clojure too.
> I’ve been looking for an excuse to use Clojure in a production environment for a while
Except in wanting to mess around with Clojure, you didn't use the best tool for the job.
The Clojure solution now uses parallel implementations of Git and SVN to solve the problem, rather than the core code of SVN and Git. And now you also have a one-off daemon written in Clojure. It doesn't have the same support structure, ops requirements, or anything, as your Ruby code. Virtually no one uses clojure, so hiring and training are different, etc. You've incurred a lot of overhead for something not that great.
The best tool for this job was to improve the Ruby version by way of C extensions, or write a new C command that does this work for you, linking in the Git and SVN code directly. This has little to no new concepts, is straightforward, and would have given you the best compatibility and performance.
Clojure code is 700 lines, and pretty easy to grasp by anyone who've used Lisp before. We would have to implement a lot of C code to speed up both Git and Svn bindings. And I seriously doubt that we would be able to debug this in a reasonable amount of time.
Lines of code is the new form of premature optimization. A C solution is better in performance and compatibility, but you've prematurely optimized on lines of code as your deciding factor (based on you mentioning it in the article, and again here).
Obviously, you can choose whatever you like, it's your company. I'm just explaining what the best solution was for the many students and young programmers who visit hacker news. Clojure is not it for all the reasons I mentioned. And if people think C is hard, practice it. Read Zed's book and do Project Euler problems with it until you feel comfortable with it. It's an essential tool everyone must know how to reach for.
Those same students and programmers should realize that yours is just one opinion among many. Your advice leans toward the "fungible cogs" school, where the capital crime is to expect others to learn something they don't already know.
That's still a widely held view, and maybe even a majority, but probably not on HN. (see pg's Beating the Averages essay:
http://www.paulgraham.com/avg.html)
> The best tool for this job was to improve the Ruby version by way of C extensions, or write a new C command that does this work for you, linking in the Git and SVN code directly. This has little to no new concepts, is straightforward, and would have given you the best compatibility and performance.
Favoring "little to no new concepts" at least suggests a bias against certain kinds of learning. Sometimes learning a new concept is the best way to solve a problem.
I think you are doing students and young programmers a disservice by implying that there is one "best" solution here, and that you know it. I am assuming you haven't seen the ruby or Clojure code involved, don't work at Beanstalk, don't know the skillset of their team, their budget, strategic plans, or any other one of a bunch of details that determine what is best in a particular situation. If you already know C then perhaps it would be right for you. That does not mean it is right for everyone.
> I think you are doing students and young programmers a disservice by implying that there is one "best" solution here, and that you know it.
I read their requirements in the blog post. They need fast access to SVN and Git internals to store off metadata. Based on that it's really not hard to determine what to use to solve this problem. There may not be one exact right answer (in fact, I suggested two), but there is a direction that makes sense and one that does not.
The real disservice to those coming up in this industry is reading blog post after blog post on Hacker News with people looking for "excuses" to use random languages they're interested in playing with. The answer to most of these problems is to use something really boring and straightforward, like C, C++ or Java. I guess I've worked too many places where legacy code becomes a huge burden because one guy wants to play around with X one time. The funny thing is, I see these same guys go onto the next place and do it there.
I'm failing to see why Clojure is a bad choice for this problem. It's a general-purpose programming language, just like others that you mentioned. And beyond those, it happens to have the benefits of being more expressive, testable, concise, etc.
Can you elaborate on why you think it's a bad choice?
Before rails no-one used Ruby, the parallel implementations of svn and git you mention are heavily used in Eclipse and throughout the Java world. They gain access to all the advantages of the JVM. By using clojure they probably attract a better standard of developer. Why bother introducing a buggy blob of C when they can do it this way.