Hacker News new | past | comments | ask | show | jobs | submit login

Wasn’t the old stuff good enough? Why do we need new methods? In short: because from_chars is low-level, and offers the best possible performance.

That sounds like marketing BS, especially when most likely these functions just call into or are implemented nearly identically to the old C functions which are already going to "offers the best possible performance".

I did some benchmarks, and the new routines are blazing fast![...]around 4.5x faster than stoi, 2.2x faster than atoi and almost 50x faster than istringstream

Are you sure that wasn't because the compiler decided to optimise away the function directly? I can believe it being faster than istringstream, since that has a ton of additional overhead.

After all, the source is here if you want to look into the horse's mouth:

https://raw.githubusercontent.com/gcc-mirror/gcc/master/libs...

Not surprisingly, under all those layers of abstraction-hell, there's just a regular accumulation loop.




You might want to watch this releavnt video from Stephan T. Lavavej (the Microsoft STL maintainer): https://www.youtube.com/watch?v=4P_kbF0EbZM


I don't need to listen to what someone says if I can look at the source myself.


I believe the impl you link to is not fully standards compliant, and has an approximate soln.

MSFT's one is totally standards compliant and it is a very different beast: https://github.com/microsoft/STL/blob/main/stl/inc/charconv

Apart from various nuts and bolts optimizations (eg not using locales, better cache friendless, etc...) it also uses a novel algorithm which is an order of magnitude quicker for many floating points tasks (https://github.com/ulfjack/ryu).

If you actually want to learn about this, then watch the video I linked earlier.


You profiled the code in your head?


Integers are simple to parse, but from_chars is a great improvement when parsing floats. It's more standardized on different platforms than the old solutions (no need to worry about the locale, for example whether to use comma or dot as decimals separator) but also has more reliable performance in different compilers. The most advanced approaches to parsing floats can be surprisingly much faster than intermediately advanced approaches. The library used by GCC since version 12 (and also used by Chrome) claims to be 4 - 10 times faster than old strtod implementations:

https://github.com/fastfloat/fast_float

For more historical context:

https://lemire.me/blog/2020/03/10/fast-float-parsing-in-prac...


They're locale independent, which the C stol, stof, etc functions are not.


Yes, exactly. Which means that, while the speed gains are real, they only apply in cases where your libc is dangerously defective.


I agree with some of this, and the author could've made a better case for from/to_chars:

- Afaik stoi and friends depend on the locale, so it's not hard to believe this introduced additional overhead. The implicit locale dependency is also often very surprising.

- std::stoi only accepts std::string as input, so you're forced to allocate a string to use it. std::from_chars does not.

- from/to_chars don't throw. As far as I know this won't affect performance if it doesn't happen, it does mean you can use these functions in environments where exceptions are disabled.


Locale env stuff is inherently thread unsafe, which is the main reason to never rely on it.


There's also the new Ryu algorithm that is being used, which is probably the biggest speed up.

https://github.com/ulfjack/ryu


AFAIK the state of the art now is "dragonbox":

https://github.com/jk-jeon/dragonbox


A few months ago I optimized the parsing of a file and did some micro benchmarks. I observed a similar speed-up compared to stoi and atoi (didn't bother to look at stringstream). Others already commented, that it's probably due to not supporting locales.


For sake of example: a "locale-aware" number conversion routine would be the worst possible choice for parsing incoming network traffic. Beyond the performance concerns, there's the significant semantic difference in number formatting across cultures. Different conventions of decimal or thousands coma easily leading to subtle data errors or even security concerns.

Lastly, having a simple and narrowly specified conversion routines allows one to create a small sub-set of C++ standard library fit for constrained environments like embedded systems.


I get that. However then they should name the function and put highly visible disclaimers in the documentation. Something like "from_ascii" instead of "from_chars". Also the documentation, including this blog post should be very clear that this function is only suitable for parsing machine to machine communications and should never be used for human input data. There is clearly a place for this type of function, however this blog post miscommunicates this in a potentially harmful way. When I read the post I presumed that this was a replacement for atoi() even though it had a confusing "non-locale" bullet point.


Did you verify their claims or are you just calling BS and that's it? The new functions are in fact much faster than their C equivalent (and yes, I did verify that).


Care to explain and show the details?

"Extraordinary claims require extraordinary evidence."


Your original claim "I've not checked but this guy, and by extension the C++ standards committee who worked on this new API, are probably full of shit" was pretty extraordinary.


Look at the compiler-generated instructions yourself if you don't believe the source that I linked; in the cases I've seen all the extra new stuff just adds another layer on top of existing functions and if the former are faster the latter must necessarily also be.

The standards committee's purpose is to justify their own existence by coming up with new stuff all the time. Of course they're going to try to spin it as better in some way.


How not?

It compiles from sources, can be better in-lined, benefits from dead code elimination when you don't use unusual radix. It also don't do locale based things.


I wrote this library once; https://github.com/ton/fast_int.

Removed `std::atoi` from the benchmarks since it was performing so poorly; not a contender. Should be easy to verify.

Rough results (last column is #iterations):

  BM_fast_int<std::int64_t>/10                  1961 ns         1958 ns       355081
  BM_fast_int<std::int64_t>/100                 2973 ns         2969 ns       233953
  BM_fast_int<std::int64_t>/1000                3636 ns         3631 ns       186585
  BM_fast_int<std::int64_t>/10000               4314 ns         4309 ns       161831
  BM_fast_int<std::int64_t>/100000              5184 ns         5179 ns       136308
  BM_fast_int<std::int64_t>/1000000             5867 ns         5859 ns       119398
  BM_fast_int_swar<std::int64_t>/10             2235 ns         2232 ns       316949
  BM_fast_int_swar<std::int64_t>/100            3446 ns         3441 ns       206437
  BM_fast_int_swar<std::int64_t>/1000           3561 ns         3556 ns       197795
  BM_fast_int_swar<std::int64_t>/10000          3650 ns         3646 ns       188613
  BM_fast_int_swar<std::int64_t>/100000         4248 ns         4243 ns       165313
  BM_fast_int_swar<std::int64_t>/1000000        4979 ns         4973 ns       140722
  BM_atoi<std::int64_t>/10                     10248 ns        10234 ns        69021
  BM_atoi<std::int64_t>/100                    10996 ns        10985 ns        63810
  BM_atoi<std::int64_t>/1000                   12238 ns        12225 ns        56556
  BM_atoi<std::int64_t>/10000                  13606 ns        13589 ns        51645
  BM_atoi<std::int64_t>/100000                 14984 ns        14964 ns        47046
  BM_atoi<std::int64_t>/1000000                16226 ns        16206 ns        43279
  BM_from_chars<std::int64_t>/10                2162 ns         2160 ns       302880
  BM_from_chars<std::int64_t>/100               2410 ns         2407 ns       282778
  BM_from_chars<std::int64_t>/1000              3309 ns         3306 ns       208070
  BM_from_chars<std::int64_t>/10000             5034 ns         5028 ns       100000
  BM_from_chars<std::int64_t>/100000            6282 ns         6275 ns       107023
  BM_from_chars<std::int64_t>/1000000           7267 ns         7259 ns        96114
  BM_fast_float<std::int64_t>/10                2670 ns         2666 ns       262721
  BM_fast_float<std::int64_t>/100               3547 ns         3542 ns       196704
  BM_fast_float<std::int64_t>/1000              4643 ns         4638 ns       154391
  BM_fast_float<std::int64_t>/10000             5056 ns         5050 ns       132722
  BM_fast_float<std::int64_t>/100000            6207 ns         6200 ns       111565
  BM_fast_float<std::int64_t>/1000000           7113 ns         7105 ns        98847


> Not surprisingly, under all those layers of abstraction-hell, there's just a regular accumulation loop.

Your dismissive answer sounds so much like the one of a typical old-C style programmer that underestimate by 2 order of magnitude what compiler inlining can do.

Abstraction, genericity and inlining on a function like from_chars is currently exactly what you want.


It's my experience that says inlining only looks great in microbenchmarks but is absolutely horrible for cache usage and causes other things to become slower.


Which is wrong on almost any modern architecture.

For small size functions inlining is almost always preferable because (1) the prefetcher actually love that and (2) a cache-miss due to a mis predicted jump is way more costly than anything a bit of bloat will ever cost you.


Enabling new static optimization is a good, no?


youre answer shows dunning-kruger is full effect.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: