Hacker News new | past | comments | ask | show | jobs | submit login
Our experience using Clojure to speed up Beanstalk (beanstalkapp.com)
126 points by dsabanin on May 29, 2012 | hide | past | favorite | 70 comments



I wonder how much of the speed change came from the difference in languages versus the difference in experience when writing both. The ruby version was written prior to the clojure one and so anything that was learned about git/programming during the ruby writing would haven be available during the clojure writing. I can believe that clojure would be faster; using concurrency well guarantees this somewhat. I still wonder what would have happened if they had written the clojure version first without having written the ruby version. Or wrote a ruby version after the clojure version.


Well, once I was done with writing code to cache a Subversion repo, it was clearly visible how much faster Clojure is.

Basically same loop to go through all revisions in a repo was 15-20x times faster in Clojure than in Ruby. Very similar calls, pretty much same algorithm. But something that Ruby does inside the bindings was not very efficient.

For Git the problem was a bit different. Internally Grit (the Git API library for Ruby) tries to read huge pack files (potentially hundreds of MBs) in pure Ruby. This just can't be fast.

There's also additional overhead of not using any ORM for operating with our caches in Clojure version. That probably contributed to overall performance improvement, but 20x for svn / 40x for git measurement was taken even before we got to saving the results into DB and it stayed that way later on.


According the article, they were using ruby to bridge to a svn module that had git capability. They replaced that module with a native git library, called from Clojure.

In summary: benchmark, profile, find hotspots, optimize. Works in every language. >:3


Sorry if it sounded confusing, but we had two modules: one for svn, one for git, there was no intersection between those.

But I agree, we've used Clojure(and JVM) in exact point where it would bring the most speed up to our app.


This is more about the Ruby VM being much slower than the JVM than anything to do with Clojure. It would be somewhat interesting to see how JRuby 1.7 running their Ruby code but using the Java libraries would fare.


JRuby is still going to be slower than clojure, due to the semantics of ruby. All function calls are resolved at runtime, while clojure resolves as many as possible at compile time.

Foo::Bar.method() vs (foo.bar/method)

The ruby version does three loads from the heap, at runtime, while clojure does zero.

invokedynamic in JDK 7 will help narrow the gap, but the fact remains that Clojure was designed with more performance in mind than ruby.


I don't think this is quite right, although I'd be delighted to be corrected. Clojure still has to deref the var in order to call the function, and afaik this happens at runtime (hence the fact that you can redefine a function in the REPL).


In recent versions of Clojure, vars are not by default dynamically bound.


http://shootout.alioth.debian.org/u64q/benchmark.php?test=al...

Median speedup for the same problem in Clojure was 12x over Ruby. and 14x for Haskell; ... and 25x in Java .. and 35x in C++...


Alioth benchmarks are out of date as far as Clojure is concerned. These days it's not much work to get Clojure to deliver identical to Java performance. Better versions of the same benchmarks here that show the same performance as Java that you verify yourself - http://github.com/clojure/test.benchmark


>>out of date as far as Clojure is concerned<< What a pity that Clojure programs, written months ago to do benchmarks game tasks, have not been contributed to the benchmarks game for all to see!


All in good time Isaac :)


Let's hope the benchmarks game website is still being updated when that time arrives ;)


What are you two plotting?!


Programming language benchmark data.


"The Computer Language Benchmarks Game" has never been a valid benchmark for any language or implementation.


Why?


Among others: because of completely arbitrary and I'd dare to say weird rules. Programs are not meant to be written in idiomatic language. They are meant to resemble the C implementation. In many cases you can see that the program representing the language in the comparison is neither the fastest nor the simplest of the submissions.


You've made a bunch of claims without showing any evidence to support them. Please point to specific examples we can all look at on the website which show -- "Programs are not meant to be..." -- which show -- "They are meant to resemble the C implementation." etc


I'm not going to back up these claims in any other way than pointing the "interesting alternative" programs that appear in some comparisons, which tend to be faster than the chosen versions. For example here: http://shootout.alioth.debian.org/u64q/benchmark.php?test=fa... Both Lisp and Java alternatives are faster than the "fastest" solution (written in Fortran). Finding any more support for the years-old opinion I voiced seems too much work to be worth it.


So we can immediately dismiss your "Programs are not meant to be written in idiomatic language. They are meant to resemble the C implementation." claims as baseless.

The only example you chose shows programs that manually unroll loops are not accepted. (And that's stated explicitly on the Help page. http://shootout.alioth.debian.org/help.php#unroll)


Yes, you can. Just one last mention: my opinion was formed in the times I used Perl and looked around the actual implementations of these. But that was years ago. It was reinforced quite recently by someone whining (probably on HN) about the same things. Apart from these anecdotes, I have no basis for these words.


> But that was years ago.

It's a sad nuisance when people opine without checking what's actually shown on a public website.

It's sad to see fact based comments being down voted on HN.


> It's a sad nuisance when people opine without checking what's actually shown on a public website.

Well, I could simply ignore the question. Or give the answer I got a few years ago, what I did...

> It's sad to see fact based comments being down voted on HN.

Fact based? More like facts querying. Which is the reason I upvoted every single of your comments...


> Which is the reason I upvoted every single of your comments.

And yet people have down voted.


"... we ask that contributed programs not only give the correct result, but also use the same algorithm to calculate that result."

(Except for meteor-contest.)


I'm sure you wish to make a point -- please don't make everyone guess what you mean, just spell it out.


An algorithm that's idiomatic in one language might be terribly unidiomatic in another. (I'm sure you know this, but) Haskell's pervasive laziness makes some algorithms tractable while requiring a lot of space/time complexity for other algorithms.


The benchmarks game URL was posted by someone who has put a lot of energy into promoting Haskell - dons makes the best of the opportunity that the benchmarks game provides to promote Haskell.

>>pervasive laziness... requiring a lot of space/time complexity for other algorithms<<

And GHC provides strictness analysis and explicit strictness to avoid reducing performance.

http://www.haskell.org/haskellwiki/Performance/Strictness


>>I need some sort of shock collar to teach me not to argue about the worth of the Computer Language Benchmarks Game.<<

Do argue - but argue better.

Unfortunately, to argue better does require learning what the website says about the benchmarks game - and for most that's far too much effort.


Unfortunately the cost of developing this solution in C++ would be forbidding for us.

In our case on the pretty much same loop the difference between Ruby and Clojure was 20-30 times. Please note that using Clojure allowed us direct access to Java libraries, so essentially this is Java performance that we've been able to tap into using Clojure.


did you try doing the same using jruby?


Nope. Although JRuby is not very efficient in calling Java code, at least it's not as efficient as Clojure is from what I've read.

Another plus for Clojure is that it allows us to easily have a multi-threaded implementation. JRuby allows native threads too, but Clojure make concurrency really easy, while JRuby just wraps Java threads with nothing more, so I would have to use regular locking and stuff.


I wonder to what extent the speed increase was due to Clojure vs the JVM itself.

An interesting comparison might have been Clojure vs their previous code on JRuby.


In my code, there are two kinds of places where I get a performance boost by using Clojure:

1. Micro-optimizations, mostly due to JVM and its excellent JIT (the garbage collectors are quite impressive, too, if what you need is predictable response time).

2. Architectural gains: thanks to the Clojure's excellent concurrency support I can make much better use of multiple cores. I get more parallellism, hence better performance on same hardware.

The first kind is cool, because you get it "for free". The second kind is the real game-changer, because non-parallel software only gets you so far in terms of performance, and writing concurrent software is Hard. Clojure makes it much, much easier.

But overall I wouldn't say that Clojure is a performance daemon on a single CPU. You can get performance similar to carefully written Java code. This is good, but you can always do better with C or hand-written assembly on critical sections. But that's not the main advantage: the big thing is that I can write correct Clojure code fast, it runs well enough, and I can easily make use of multiple cores. You can debate micro-benchmarks all you want, but what really counts for me is how quickly (and correctly) I can get from zero to production code that runs fast enough.


I've been playing with 4clojure[1] in my off time for weeks, and it's been a great introduction to the language, although it obviously doesn't help with learning how to package and deploy an application or service. How did you make the first steps from playing around in the REPL to writing production code?

[1] http://www.4clojure.com/


Have you seen Leiningen[1]? It pretty much solves packaging and deploying problem. Awesome tool.

Also if you're into Web development, I can recommend playing with Noir[2] framework and Korma[3] for SQL abstraction. Heroku also support deploying Clojure apps out of the box, so you can easily use a free tier to get something out there.

[1] https://github.com/technomancy/leiningen

[2] http://webnoir.org/

[3] http://sqlkorma.org/


I've got a question about Noir. Is it necessary for me to write HTML with Clojure, or can I separate it and write plain HTML somehow? Writing HTML in the middle of my Clojure code doesn't appeal to me coming from a MVC background...


Of course you could probably write an s-expression serialization of XML as a macro, and then you could start writing things like:

    (html
        (head (title "My page")
            (script :type "text/javascript" :src "/static/page.js"))
        (body (p "Stuff")))
Actually for a while I had a JS tool which I was using called "build.js" which would just build DOM components like this. (It was therefore JSON-serializable too, but I never really had an occasion to use that.)

There is something nice about HTML which is a bit lost here: HTML (and LaTeX for that matter) allow unquoted text, with fewer escape characters. They are markup languages, which s-expressions crucially lack. (On the other hand, XML & family lack the ability of Lisps to make the first token anything other than a symbol.) There was briefly a plot called NML / Enamel and some others -- DTML and TML I believe -- which would instead write:

    <html | 
        <head | <title | My page >
            <script <type|text/javascript> <src|/static/page.js>>>
         <body | <p | Stuff>>>
This is actually an emulation of a C-type syntax with a Lisp-type semantics: the idea is that you have in some sense two syntactically different channels into your expressions, one which comes before the pipe character | and one that comes after. The stuff that comes after is allowed to be marked-up text; the stuff that comes before is some sort of node list, and perhaps has certain conventions (one could imagine instead using a Clojure-style `:type "text/javascript"`, which would limit you to what XML attributes can do -- short text only, symbolic keywords).

That is, one could hypothetically rewrite this in some C-ish syntax which would look like:

    html {
        head { title { My page }
            script(type {text/javascript} src {/static/page.js}) {}}
        body { p { Stuff }}}
and again, if you wanted to limit yourself to XML, the parens above could then say instead `type: "text/javascript" src: "/static/page.js"` but it doesn't have to be that way.

Some crazy ideas for anyone building a new language to think about.


Isn't that just the templating library (Hiccup iirc) that Noir defaults to? Check out Enlive, it's a nice alternative that addresses your concern:

https://github.com/cgrand/enlive

https://github.com/swannodette/enlive-tutorial/


Noir's indifferent to how you get your HTML; as long as you return a string, it'll serve it up:

    (defpage "/welcome" []
         "<html><head></head><body>Hi there!</body></html>")


You can use anything you want for templating. I tend to use Mustache (via the stencil library) for my templating. A lot of people also use Enlive.


Thanks - I've actually played with Noir a little, but wasn't sure whether it was worth taking the time to explore fully. I'm glad to hear that it is!


"The rewrite in Clojure resulted in much cleaner, faster code, totaling at only 700 lines of Clojure code (I don’t have a clear comparison with Ruby code here). "

Could you post the Clojure code somewhere? It would be very interesting to see clean, fast Clojure code written to solve a real-world problem (as opposed to some toy example).


Yes, please. I'd like to see "clean and fast Clojure code".


beanstalkapp guys/girls, thanks for sharing! I'd be really interested to see some profiling results, were you bottlenecked on sha hashing? IO/execve syscalls? Memory usage?

I think the reason for the performance difference is pretty clear. According to the article and the grit documentation, all the git api calls were either done in pure ruby, or by shelling out to `git`. Also, reading between the lines, it sounds like they required some information/relationships on commits that was non-trivial to retrieve using a basic git shell command. So by switching out the runtime and the algorithm, they get a huge performance increase.

With jgit, they can more easily traverse the graph directly and efficiently. I'm really curious to see what kind of performance they could get from using FFI+jruby and raw c calls to libgit/libgit2 (http://libgit2.github.com/).


I use Beanstalk every work day (for the last 8-9 months) and when they rolled out their changes I could tell a difference.

Didn't know what those changes entailed, but this is interesting. Thanks!


clojure has some great features but syntax is just disgusting (not because of parentheses, i like scheme)


And because of what then?


other tokens like #{ ^{ #() ~@ #_ #^ #'x ˆ:x


Clojure does have more sugar than Scheme, but imho some of it improves readability; for example, using brackets for grouping instead of overloading lists like Scheme does can make code easier to scan, because when you see parens in Clojure there are fewer meanings to choose from (usually only function application or a list literal). Example:

Scheme

  (let ((x 2) (y 3))
    (let ((x 7) (z (+ x y)))
      (* z x)))
Clojure

  (let [x 2 y 3]
    (let [x 7 z (+ x y)]
      (* z x)))
I think the Clojure version is easier to read without a paren-matching editor, though Scheme's rigorous minimalism does have its charm.


Pedantic note: #^ for metadata has been deprecated for at least 2 if not 3 versions. ^{ and ^:x aren't special. It's just ^ (reader metadata symbol) followed by a map and keyword respectively. Other things can also be metadata (for instance you can tag something with a classname to indicate a type hint).

You also missed out a few of the syntax quote stuff.

So, other tokens are #{} ^ #() #_ #'x ` ~ ~@

Full details: http://clojure.org/reader


There seems to be a race towards the most exotic or revivalist programming languages. As if programming languages were magic bullets of some sort.


What is your contention to their choice here? It seems that are simply choosing the best tool for the job, it is not as if they are rewriting their whole web frontend in Clojure too.


> I’ve been looking for an excuse to use Clojure in a production environment for a while

Except in wanting to mess around with Clojure, you didn't use the best tool for the job.

The Clojure solution now uses parallel implementations of Git and SVN to solve the problem, rather than the core code of SVN and Git. And now you also have a one-off daemon written in Clojure. It doesn't have the same support structure, ops requirements, or anything, as your Ruby code. Virtually no one uses clojure, so hiring and training are different, etc. You've incurred a lot of overhead for something not that great.

The best tool for this job was to improve the Ruby version by way of C extensions, or write a new C command that does this work for you, linking in the Git and SVN code directly. This has little to no new concepts, is straightforward, and would have given you the best compatibility and performance.


Clojure code is 700 lines, and pretty easy to grasp by anyone who've used Lisp before. We would have to implement a lot of C code to speed up both Git and Svn bindings. And I seriously doubt that we would be able to debug this in a reasonable amount of time.

Clojure caching took me like a month max.


Lines of code is the new form of premature optimization. A C solution is better in performance and compatibility, but you've prematurely optimized on lines of code as your deciding factor (based on you mentioning it in the article, and again here).

Obviously, you can choose whatever you like, it's your company. I'm just explaining what the best solution was for the many students and young programmers who visit hacker news. Clojure is not it for all the reasons I mentioned. And if people think C is hard, practice it. Read Zed's book and do Project Euler problems with it until you feel comfortable with it. It's an essential tool everyone must know how to reach for.


Those same students and programmers should realize that yours is just one opinion among many. Your advice leans toward the "fungible cogs" school, where the capital crime is to expect others to learn something they don't already know.

That's still a widely held view, and maybe even a majority, but probably not on HN. (see pg's Beating the Averages essay: http://www.paulgraham.com/avg.html)


> Your advice leans toward the "fungible cogs" school, where the capital crime is to expect others to learn something they don't already know.

Not following what you mean. Do you mean that I do expect people to learn stuff, or that I don't expect them to learn stuff in order to do their job?


> The best tool for this job was to improve the Ruby version by way of C extensions, or write a new C command that does this work for you, linking in the Git and SVN code directly. This has little to no new concepts, is straightforward, and would have given you the best compatibility and performance.

Favoring "little to no new concepts" at least suggests a bias against certain kinds of learning. Sometimes learning a new concept is the best way to solve a problem.


I think you are doing students and young programmers a disservice by implying that there is one "best" solution here, and that you know it. I am assuming you haven't seen the ruby or Clojure code involved, don't work at Beanstalk, don't know the skillset of their team, their budget, strategic plans, or any other one of a bunch of details that determine what is best in a particular situation. If you already know C then perhaps it would be right for you. That does not mean it is right for everyone.


> I think you are doing students and young programmers a disservice by implying that there is one "best" solution here, and that you know it.

I read their requirements in the blog post. They need fast access to SVN and Git internals to store off metadata. Based on that it's really not hard to determine what to use to solve this problem. There may not be one exact right answer (in fact, I suggested two), but there is a direction that makes sense and one that does not.

The real disservice to those coming up in this industry is reading blog post after blog post on Hacker News with people looking for "excuses" to use random languages they're interested in playing with. The answer to most of these problems is to use something really boring and straightforward, like C, C++ or Java. I guess I've worked too many places where legacy code becomes a huge burden because one guy wants to play around with X one time. The funny thing is, I see these same guys go onto the next place and do it there.


I'm failing to see why Clojure is a bad choice for this problem. It's a general-purpose programming language, just like others that you mentioned. And beyond those, it happens to have the benefits of being more expressive, testable, concise, etc.

Can you elaborate on why you think it's a bad choice?


They're optimizing for their business value and resources (programmers) available.

It might not impress their programmer friends, but it'll impress their accountant.


Well, in my book using Lisp in production is impressive for the best part of my programmer friends :-)


Two birds, one stone. I think that's what they call in the industry "Experienced". Keep that up.


> It might not impress their programmer friends, but it'll impress their accountant

If impressing accountants is the goal of what we do, then the whole thing should have been outsourced overseas.


Before rails no-one used Ruby, the parallel implementations of svn and git you mention are heavily used in Eclipse and throughout the Java world. They gain access to all the advantages of the JVM. By using clojure they probably attract a better standard of developer. Why bother introducing a buggy blob of C when they can do it this way.


Not sure how C changed in the last 20 years, but back then, C was a pain to develop, pain to debug, pain to maintain - yes I loved it though.

I'm also not sure I'd compare it to Clojure and suggest it as a better tool for this job.


Grit doesn't use the "core code" of Git. GitHub found that using Git itself was too slow, so they wrote a parallel implementation of it in Ruby.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: