I don't think looking at asymptotic behavor makes a lot of sense in situations w...

_eojb · on Dec 17, 2023

Sorry, do you not have trees for which the size of the tree is large. Do all your trees fit inside a few cache lines of storage?

marginalia_nu · on Dec 17, 2023

I deal with very large tree structures (~100 GB) in my search engine, but even there the dominant factor for performance isn't big O, but reducing block reads, access patterns and keeping relevant data in the disk cache.

Big O isn't irrelevant, but it is not the full story either. There's a solid reason why hash tables are a thing in memory but aren't really a thing on disk.

_eojb · on Dec 17, 2023

Do you understand the data structure being proposed in the original post, and are you claiming that scanning 100GB of data every time you want to perform a childof operation is acceptable? Please, use the proposed tree for your application since big o isn't the full story to you lol

8organicbits · on Dec 17, 2023

I'm not sure why you're suggesting those claims were made. The parent appears to be talking about non-asymptotic behavior. Very often algorithms with worse big O perform better; its use-case specific. Hyper focus on big O isnt productive, but fairly common due to how CS curriculums focus on it. In some cases it takes unexpectedly long for the big-O to impact performance, as other factors dominate.

The parent commenter writes a wonderful blog that covers their experience with building and optimizing a search engine, well worth a read.

https://www.marginalia.nu/log/87_absurd_success/

_eojb · on Dec 17, 2023

Yes and I'm pointing out that non-asymptotic behavior doesn't apply when N here is the total number of nodes in the tree.

8organicbits · on Dec 17, 2023

Then perhaps I'm misunderstanding you. When N is *sufficiently* large, I think we all agree that you'll prefer an algorithm with better big-O.

When N is small, the asymptotic behavior is irrelevant and that's easy to show. Let's say we're comparing a O(N) algorithm to a O(N^2) algorithm, but each operation of the O(N) is 1000x more expensive. The O(N^2) algorithm is preferred as long as N < 1000. Choosing the O(N) algorithm will hurt performance in those situations. Real world examples like single-byte-writes causing full pages to be re-written on SSDs shows this isn't just a mathematical curiosity.

Without benchmarks, analysis of big-O behaviors, usage patterns, and known data size I'd (personally) avoid guessing the performance of Atree in an application.

Are you saying something different? It sounds like you have much more SIMD experience than I do and I'm always happy to learn something new.

https://www.wolframalpha.com/input?i=n*1000+%3D+n%5E2

alpaca128 · on Dec 17, 2023

> non-asymptotic behavior doesn't apply when N here is the total number of nodes in the tree

How? Non-asymptotic N stays non-asymptotic no matter how you label it.

marginalia_nu · on Dec 17, 2023

That's not how asymptotes work.

Big O tells you that there exists some number N such that for each number m larger than N, if O(f(m)) > O(g(m)) then f(m) > g(m). In practice, N may be 10, or it may be larger than the number of atoms in the universe. It's unrelated to the number of items in your collection, but a property of the algorithm itself.

_eojb · on Dec 18, 2023

I'm not sure why what I wrote necessitated a lesson on limits and asymptotes. My point was that given that N was the size of your tree, more often than not, big-O analysis would apply in this case since N is likely big compared to secondary fixed effects.