If the objective is to get a fast implementation of isotonic regression, I think...

ajtulloch · on March 12, 2014

Yeah, I implemented a guaranteed O(N) PAVA in Julia (see https://github.com/ajtulloch/Isotonic.jl/blob/master/src/poo...), and experimented with one in Cython - it was uniformly slower than the 'linear' PAVA across a few sample datasets I tried. See e.g. http://bit.ly/1oDF7nH for some graphs.

I'm not claiming I've written the fastest possible implementations or anything though - please feel free to improve any of this code and submit to scikit-learn if you find some improvements!

srean · on March 12, 2014

> I'm not claiming I've written the fastest possible implementations or anything though

Oh I never thought that you were claiming so. Thanks for clarifying the implementation. Julia is indeed something that I want to pickup. BTW what data-structure does this snippet invoke

    ydash = [1 => y[1]]

fixed length array ?

> please feel free to improve any of this code and submit to scikit-learn if you find some improvements!

Ah! then you would all hound me for my insistence of keeping https://bitbucket.org/sreangsu/libpav LGPL. A discussion I dont want to engage in. BTW I am not even sure if libpav would be faster as it stands now, because it does some really stupid stuff, like unnecessary allocation/deallocation on the heap. Its an old piece that I wrote years ago to entertain myself (proof: comments like these

   // I hereby solemnly declare
   // that 'it' is not aliased 
   // with anything that may get
   // deleted during this object's
   // lifetime. May the  runtime
   // smite me to my core otherwise.

https://bitbucket.org/sreangsu/libpav/src/dcc8411b10a0d4e220... and the perverse namespace pollution).

Its such a strange coincidence: I started hacking on it again just a few days ago, and lo and behold a thread on HN on PAVA. Now motivated enough to fix my crap :) Once I get rid of that damned linked-list I think it would be decent enough, although there is a fair chance that it would be faster than the current Julia version.

That said, libpav does a lot more than just isotonic regression, its more of a marriage between isotonic regression and large margin methods like SVM. Even for just the isotonic regression case, it can work with any abstraction of an iterator (so it is container agnostic) can take a ordering operator, and is of course fully generic (meaning can handle float32, double64 without any change).

@jamesjporter Thanks

jamesjporter · on March 12, 2014

    ydash = [1 => y[1]]

creates a Dict that maps for from Int64 to Float64, `=>` is used for declaring Dict literals in Julia. c.f.: http://docs.julialang.org/en/latest/stdlib/base/?highlight=d...