Hacker News new | past | comments | ask | show | jobs | submit login

If the objective is to get a fast implementation of isotonic regression, I think the priority would be to choose the right algorithm. PAVA is indeed one of the fastest such algorithms and has a running time bound that is linear in the number of data points. I may be wrong, but from a quick look at the code it seems neither the [P/C]ython version nor the Julia version has an implementation that ensures linear time. The files are named Linear PAVA so I believe there was an expectation that they are indeed linear time.

Consider applying it to this sequence

     1, 2, 3, 4, 5, 6, 7, 6, 5, 4, 3, 2, -30.
It seems that the number of operations required according to these implementations would be quadratic in the size of the sequence. (It will average the tail repeatedly.)

I cannot read Julia yet, so not sure about the Julia version, but the others do indeed look very much a quadratic time algorithm at the least. The behavior will of course depend on input, so on certain inputs they may have a runtime that scales linearly. Complexity addresses the worst case behavior and as I showed coming up with such bad cases is not hard, and they need not be very un-natural either.

   OTOH: 
If the objective was to compare speeds of CPython, Cython and Julia, then it does not matter whether an efficient algorithm was chosen for the task. My hunch is, because the current implementations computes more than necessary, it is going to magnify the speed difference between Cython and Julia.

Does Julia do any inlining ? Or are there any plans for such.

If so, that would be one aspect where it could shine. As mentioned in the comment the Cython version is unfactored and manually inlined for performance. If Julia inlines in its JIT pass, such measures would be unnecessary and would allow one to write cleaner code.




Yeah, I implemented a guaranteed O(N) PAVA in Julia (see https://github.com/ajtulloch/Isotonic.jl/blob/master/src/poo...), and experimented with one in Cython - it was uniformly slower than the 'linear' PAVA across a few sample datasets I tried. See e.g. http://bit.ly/1oDF7nH for some graphs.

I'm not claiming I've written the fastest possible implementations or anything though - please feel free to improve any of this code and submit to scikit-learn if you find some improvements!


> I'm not claiming I've written the fastest possible implementations or anything though

Oh I never thought that you were claiming so. Thanks for clarifying the implementation. Julia is indeed something that I want to pickup. BTW what data-structure does this snippet invoke

    ydash = [1 => y[1]]
fixed length array ?

> please feel free to improve any of this code and submit to scikit-learn if you find some improvements!

Ah! then you would all hound me for my insistence of keeping https://bitbucket.org/sreangsu/libpav LGPL. A discussion I dont want to engage in. BTW I am not even sure if libpav would be faster as it stands now, because it does some really stupid stuff, like unnecessary allocation/deallocation on the heap. Its an old piece that I wrote years ago to entertain myself (proof: comments like these

   // I hereby solemnly declare
   // that 'it' is not aliased 
   // with anything that may get
   // deleted during this object's
   // lifetime. May the  runtime
   // smite me to my core otherwise.
https://bitbucket.org/sreangsu/libpav/src/dcc8411b10a0d4e220... and the perverse namespace pollution).

Its such a strange coincidence: I started hacking on it again just a few days ago, and lo and behold a thread on HN on PAVA. Now motivated enough to fix my crap :) Once I get rid of that damned linked-list I think it would be decent enough, although there is a fair chance that it would be faster than the current Julia version.

That said, libpav does a lot more than just isotonic regression, its more of a marriage between isotonic regression and large margin methods like SVM. Even for just the isotonic regression case, it can work with any abstraction of an iterator (so it is container agnostic) can take a ordering operator, and is of course fully generic (meaning can handle float32, double64 without any change).

@jamesjporter Thanks


    ydash = [1 => y[1]]
creates a Dict that maps for from Int64 to Float64, `=>` is used for declaring Dict literals in Julia. c.f.: http://docs.julialang.org/en/latest/stdlib/base/?highlight=d...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: