Atree: A simple and efficient pointer-free tree implementation

tlack · on Dec 16, 2023

Always surprising to click through a link on HN and discover it is one's own work. For a time I was very interested in lightweight array-based implementations of common data structures and this one seemed particularly handy.

spenczar5 · on Dec 16, 2023

It sounds a little like it didn’t work out as well as you hoped. How did it fare?

I am interested because I have some scientific software (astrodynamics - propagating orbits in space) that would really benefit from a cache-friendly tree, and this seemed promising.

_eojb · on Dec 16, 2023

It does? Feels like an O(n) scan every time you need to query the children of a node is a nonstarter for most applications (the readme is strangely optimistic about this point). Plenty of cache friendly tree implementations out there, and this one seems actually cache hostile with few benefits in my mind aside from ease of implementation.

Also, I write a lot of code that runs on a gpu, and the claim that this tree is somehow gpu friendly I find particularly dubious.

a1369209993 · on Dec 16, 2023

> Feels like an O(n) scan every time you need to query the children of a node is a nonstarter for most applications (the readme is strangely optimistic about this point).

The trick is that querying the children of N different nodes is usually still a single O(N) scan, so if you operate on it array-style (which APL heavily encourages anyway), it's amortized constant time. Of course that's not always viable, but APL programmers tend to be surprisingly good at making array operations out of things you wouldn't expect to use arrays for.

> cache hostile

If you additionally enforce that all parent indexes point to lower numbers, a preorder traversal is a linear scan forward, and a postorder traversal with child order reversed (which you can usually correct for one way or another) is a linear scan backward.

(This assumes you only need dependency ordering, ie the parent node uses or supplies data to/from its children; if you need a true sequential traversal, the array has to be sorted according to that traversal (but is still a valid Apter Tree).)

> the claim that this tree is somehow gpu friendly I find particularly dubious

Yeah, array programming is generally kind of hit-or-miss at that and this does look like a miss.

toasted-subs · on Dec 17, 2023

Whenever somebody says cache friendly without additional context I assume they got code from somebody else without understanding what makes things "cache-friendly".

gavinray · on Dec 17, 2023

What?

I mean, there are specific sizes that will fit in the L-(N) cache.

And consequently, sizes that don't suffer from the false-sharing problem in concurrent scenarios due to this.

https://en.cppreference.com/w/cpp/thread/hardware_destructiv...

There's an entire field of study devoted to cache-friendly/cache-oblivious data structures:

- https://www.microsoft.com/en-us/research/wp-content/uploads/...

- https://rcoh.me/posts/cache-oblivious-datastructures/

_eojb · on Dec 16, 2023

The linear scan you are talking about I don't think gives you any sort of ordered traversal right? Unless I'm missing something.

a1369209993 · on Dec 16, 2023

For a arbitrary Apter Tree, a linear scan is unordered. You can impose additional constraints to get a ordered traversal (in the same way that, eg, you can sort a assoc-list/JSON-style key-value table by keys to get a key-order traversal), and the result is still a valid Apter Tree (respectively valid list of key-value pairs).

_eojb · on Dec 17, 2023

Yes but that is not what is presented (a B+ tree is not a B tree even with minor modifications) and it changes the complexity of your other update operations drastically. The thing that grates me (as someone that has written a dozen or so different tree structures) is that this one is presented as a particularly good one, and I think it excels at almost nothing, hence its obscurity.

spenczar5 · on Dec 16, 2023

For me, N is small. Its also N-ary, not binary, which crosses off a bunch of the first options. Anyway, I am not sure this will work, just worth trying. Empirical numbers beat theory every time :)

_eojb · on Dec 16, 2023

You are using N in a different sense than I am. Unless I'm reading the tree description incorrectly, N is the size of the tree itself, not the number of children.

spenczar5 · on Dec 16, 2023

Oh, I was being sloppy and mixed N into the ariness: I meant N elements, each with a variable number of children (as many as 8).

_eojb · on Dec 16, 2023

I would hazard a guess that a regular n-ary tree would outperform the OP tree in many usage scenarios with no extra effort, and with a number of B+ tree variants being strictly better at the cost of more effort.

jiggawatts · on Dec 16, 2023

There are use cases where that doesn’t matter, such as some compilers where it makes a full pass over the source code for every optimisation type.

mgaunard · on Dec 16, 2023

Vector programming requires you to change your way of thinking; instead of computing something for 1 element, you compute it for N elements.

_eojb · on Dec 16, 2023

I do lots of simd and shader programming, but regardless of register width, O(n) is not O(1)

mgaunard · on Dec 16, 2023

The point is that you shouldn't try to get all the children of a single node, but rather all the children of all the nodes, which is still O(n) and not O(n^2).

_eojb · on Dec 17, 2023

This doesn't make any sense to me.

marginalia_nu · on Dec 16, 2023

I don't think looking at asymptotic behavor makes a lot of sense in situations where n is small and bounded. Big O says nothing about such cases.

_eojb · on Dec 17, 2023

Sorry, do you not have trees for which the size of the tree is large. Do all your trees fit inside a few cache lines of storage?

marginalia_nu · on Dec 17, 2023

I deal with very large tree structures (~100 GB) in my search engine, but even there the dominant factor for performance isn't big O, but reducing block reads, access patterns and keeping relevant data in the disk cache.

Big O isn't irrelevant, but it is not the full story either. There's a solid reason why hash tables are a thing in memory but aren't really a thing on disk.

_eojb · on Dec 17, 2023

Do you understand the data structure being proposed in the original post, and are you claiming that scanning 100GB of data every time you want to perform a childof operation is acceptable? Please, use the proposed tree for your application since big o isn't the full story to you lol

8organicbits · on Dec 17, 2023

I'm not sure why you're suggesting those claims were made. The parent appears to be talking about non-asymptotic behavior. Very often algorithms with worse big O perform better; its use-case specific. Hyper focus on big O isnt productive, but fairly common due to how CS curriculums focus on it. In some cases it takes unexpectedly long for the big-O to impact performance, as other factors dominate.

The parent commenter writes a wonderful blog that covers their experience with building and optimizing a search engine, well worth a read.

https://www.marginalia.nu/log/87_absurd_success/

_eojb · on Dec 17, 2023

Yes and I'm pointing out that non-asymptotic behavior doesn't apply when N here is the total number of nodes in the tree.

8organicbits · on Dec 17, 2023

Then perhaps I'm misunderstanding you. When N is *sufficiently* large, I think we all agree that you'll prefer an algorithm with better big-O.

When N is small, the asymptotic behavior is irrelevant and that's easy to show. Let's say we're comparing a O(N) algorithm to a O(N^2) algorithm, but each operation of the O(N) is 1000x more expensive. The O(N^2) algorithm is preferred as long as N < 1000. Choosing the O(N) algorithm will hurt performance in those situations. Real world examples like single-byte-writes causing full pages to be re-written on SSDs shows this isn't just a mathematical curiosity.

Without benchmarks, analysis of big-O behaviors, usage patterns, and known data size I'd (personally) avoid guessing the performance of Atree in an application.

Are you saying something different? It sounds like you have much more SIMD experience than I do and I'm always happy to learn something new.

https://www.wolframalpha.com/input?i=n*1000+%3D+n%5E2

alpaca128 · on Dec 17, 2023

> non-asymptotic behavior doesn't apply when N here is the total number of nodes in the tree

How? Non-asymptotic N stays non-asymptotic no matter how you label it.

marginalia_nu · on Dec 17, 2023

That's not how asymptotes work.

Big O tells you that there exists some number N such that for each number m larger than N, if O(f(m)) > O(g(m)) then f(m) > g(m). In practice, N may be 10, or it may be larger than the number of atoms in the universe. It's unrelated to the number of items in your collection, but a property of the algorithm itself.

_eojb · on Dec 18, 2023

I'm not sure why what I wrote necessitated a lesson on limits and asymptotes. My point was that given that N was the size of your tree, more often than not, big-O analysis would apply in this case since N is likely big compared to secondary fixed effects.

tlack · on Dec 16, 2023

I still like that setup for using trees in low level languages.

But personally I’ve been working at higher levels of the stack the last few years, where these kinds of decisions seem less important.

And on another level, it seems like coders in general aren’t that interested in vector oriented languages and techniques which makes their study somewhat isolating.

loxias · on Dec 17, 2023

"Isolating" is where the performance (innovation) is.

I used a very similar setup, first time I needed to implement a tree. Now, I'm a fan of Eytzinger layout. (referenced in a previous comment in this thread)

Yeah, most coders in general don't seem to be as interested in this stuff, but it's still necessary. They'll want more performance.

caslon · on Dec 17, 2023

Why do they need more performance? Hardware gets faster all the time, and the most popular implementations of the most popular programming languages have so much low-hanging fruit you can get a 10x improvement by rolling your cat on the keyboard.

I don't think programmers actually care about performance as much as they care for convenience. Every year the stack moves a bit higher, and everyone is okay with websites taking days to load on brand new phones with Gigabit wireless connections. There are companies that care about performance on the margin, like stock trading firms, but to get into one of them, you have to get pretty lucky, or come from a pretty special background. Even the banks are using Python more and more, these days.

loxias · on Dec 17, 2023

Shrug.

People will always care about performance because they will always want more functionality.

I bet you care about the performance of your database or how fast your pages load. You want your streaming videos to be skip-free and your phone calls to sound lifelike. Performance will always matter. Because while people like me will be always trying to squeeze more, "the other side" will always be ready to use it for some new feature.

59nadir · on Dec 17, 2023

The post you're replying to sounds more like it's lamenting the complacency of most developers more than it's saying that performance isn't worth caring about, or I'm projecting my own feelings onto it.

I'm not under the impression that people care at all, to be honest, outside of certain communities. If you're not in those communities talking about performance as a corner stone feels mostly like screaming into the void.

fwsgonzo · on Dec 16, 2023

It is always like that when you venture off the well trodden path. I am studying low latency emulation and it's also isolating.

chii · on Dec 17, 2023

> working at higher levels of the stack the last few years, where these kinds of decisions seem less important.

but then accumulated outcome of this is the slowness you see in web software!

runeblaze · on Dec 17, 2023

FWIW I worked also on scientific software (phylogenetics, which is all about biological evolutionary trees) and the tree structure is like Atree (https://github.com/RuneBlaze/internode/blob/main/src/tree.rs...).

It does help (roughly ~5x vs. pointer-chasing trees, probably can be further optimized) for my workload, but at the same time quite some time was spent just making sure the tree is correct.

loxias · on Dec 17, 2023

I'm interested in the same stuff/area. There's a lot of great results to read, check out cache-oblivious B-trees by Dr. Demaine (or honestly anything by him.) This link is a good distillation of some of the concepts. https://algorithmica.org/en/eytzinger

I'm _also_ interested in scientific software, but that's more a hobby than a job. =)

For propagating a large number of orbits in space, I'm really curious what the Correct (tm) numerical algorithm is, mind sharing more? I love that space right at the intersection of lots of math + need to understand how computers really work. Once upon a time I implemented fast multipole method for the n-body problem, but I certainly don't claim to deeply understand it, anymore. :)

8372049 · on Dec 16, 2023

Friendly heads up, "pseudo" is frequently misspelled "psuedo" in the readme.

edflsafoiewq · on Dec 16, 2023

This is the representation I usually see used for the tree of nodes for skinning 3D models. There each node has a transform, and the most common operation is to recompute the world transform for all nodes, formed by composing the transform for each node with the one for all of its parents. If the array is sorted so that parents always precede their children, that's just a straight loop over the arrays

   for i = 0, num_nodes do
     if parents[i] == -1 then
       world_xforms[i] = xforms[i]
     else
       world_xforms[i] = world_xforms[parents[i]] * xforms[i]

_eojb · on Dec 17, 2023

I would hope your joint hierarchy is at least depth sorted or the loop as shown would not work. The proposed ATree has no order invariants...

rav · on Dec 16, 2023

Lots of comments about iterating over children being O(N) for this style of trees. It's actually easy to generalize the atree design by e.g. adding pointers for "first child" and "next sibling" and potentially removing the parent pointers, if that's what you need in your application. I think the "Operations in psuedocode" section should simply state that there's no O(1) way to access children of a node - instead of recommending the O(N) approach, you should recommend changing the data structure to support the operations you need.

Storing nodes in arrays and using indices for pointers is a must whenever you're implementing algorithms on trees. I typically prefer using an array of structs, putting the key and the parent index next to each other, instead of putting them in separate arrays. If you need to access the children of a node, then be sure to consider if you can save memory by having a separate structure for leaves - remember that over half of the nodes will be leaves, so using space to store a child pointer on both internal nodes and leaves can be wasteful use of memory.

akoboldfrying · on Dec 17, 2023

One way to maintain leaves efficiently (amortised O(1) time and space to add/remove) is to keep 2 extra vectors that "point at each other": L[] is a vector containing the indices of all leaves, and LP[] is a vector such that LP[i] stores the index within L[] of the value i if i is a leaf, and -1 otherwise. That is, for i a non-leaf, LP[i] == -1, while for i a leaf, L[LP[i]] == i.

To find the indices of all leaves: That's just L[].

To add a leaf i: Append i to L[], and set LP[i] to L.length - 1.

To remove a leaf i in constant time:

    j = L[L.length - 1]
    L[LP[i]] = j
    LP[j] = i
    LP[i] = -1.
    DropLastElement(L[])

bicsi · on Dec 17, 2023

I find it a bit odd how this tree representation as a parents array (which is, by the way, I think the most basic representation in any CS course), got so much traction on HN. I think this goes to show how far a good presentation can drive a trivial idea. On top of that, it just casually presents suboptimal procedures for a lot of essential operations, without diving into too much details about the impact of the suboptimality. Good PR I guess…

matt3210 · on Dec 16, 2023

An integer which references another data location is a pointer. This project only replaces system pointers with home grown pointers

8372049 · on Dec 16, 2023

I think in this context, "pointer-free" is meant to imply (spatial) locality of reference, no additional memory allocations, address-independent and presumably memory safety (but I didn't read the code and may be wrong about the last one).

nine_k · on Dec 17, 2023

It's not exactly so right on the hardware level. Many architectures, including the ubiquitous x64, have an ability to add a direct offset to a memory-access operation (like mov), or efficiently compute the address with an offset and element size (like lea). This removes an extra pointer deteference, which may be hugely expensive.

With a cache-friendly structure like array, the one dereference you may need for the array access has a high chance to be served from L2 or L1, saving you a lot of clocks, because RAM has huge latency.

naasking · on Dec 17, 2023

This is typically referred to as a handle rather than a pointer. Handles can permit more flexibility than pointers.

nilamo · on Dec 17, 2023

> An integer which references another data location is a pointer.

A more useful, storage/serializable friendly, pointer perhaps.

huhtenberg · on Dec 16, 2023

Oh, please.

There are technically correct nitpicks with some merit and there are trivial remarks like yours. It is perfectly clear what "pointer-free" was referring to in the post title.

akoboldfrying · on Dec 17, 2023

It's a legitimate criticism. The claimed benefit of "vector processing is faster than pointer chasing" will materialise only when the most common access pattern is to read or write all node values, since in that case the accessed values are contiguous -- but that's a consequence of the data structure being a struct-of-arrays (instead of the more common array-of-structs), which is wholly independent of the pointers-or-indices question. For any of the (very common) other access patterns, like a preorder traversal, "index-chasing" needs to happen, and that will be exactly as bad as pointer-chasing would be if the entire memory block had been reserved ahead of time in the same way.

The only real advantages of indices over pointers are serialisability and ease of debugging.

kitd · on Dec 17, 2023

Is it? Please tell me because I had the same question.

From the programmer's PoV, having no pointers gives little to no benefit. From the CPU's PoV, (in c/c++) indices become pointers anyway. Maybe the compiler can optimise more easily, but that's not obvious.

Cache coherence is the key goal, but you can do that with pointers easily enough.

another2another · on Dec 18, 2023

I haven't looked at the code, but my guess would be that the whole data structure lives in a single chunk of RAM, and the tree is described as offsets to a base address (so probably more compact than with embedded pointers).

Of course reading the values at each offset will require constructing a pointer address, but if the whole chunk can fit in the L2 data cache then reading those values at the calculated address will be very fast since the whole data structure is in cache.

The aim isn't a compiler optimization, but a runtime on CPU core optimization.

zeroCalories · on Dec 16, 2023

Looks nice. That said, if you want to reduce mallocs and encourage data locality, maybe you could try a more traditional tree implementation using a pool of nodes similar to thread pools. I've found that most of my problems related to those were solved by such techniques.

OnlyMortal · on Dec 16, 2023

A custom pool for objects is often a good idea. Pre-allocate “enough” memory and dish them out as appropriate, releasing back to the pool when the reference count goes to zero.

You can also use the pool to bound the maximum size you’re able to allow.

winrid · on Dec 17, 2023

Man, if only something already did this. We could call it Oak... Or maybe Java? :)

mathiasgredal · on Dec 16, 2023

Why is this better than an arena style allocator?

akoboldfrying · on Dec 17, 2023

It is an arena-style allocator.

At least, if someone asked me to explain what an arena allocator is/does, I would say essentially what they wrote -- likely without the "disappears automatically once all references disappear" part, which is IMHO just a clever nice-to-have.

OnlyMortal · on Dec 17, 2023

You allocate a large lump of memory up front and supply parts of it on demand. This would be different from an allocator which may allocate smaller amounts of memory as needed. This can help reduce memory fragmentation.

It also helps to bound your memory usage. A request could block until memory was free.

icsa · on Dec 16, 2023

See Aaron Hsu's excellent descriptions (using APL):

https://www.youtube.com/watch?v=hzPd3umu78g https://www.youtube.com/watch?v=X5_5MtOYNos

mihaic · on Dec 16, 2023

One change I'd make to this structure would be to pack all the children of a node together. That way you could find out the list of all children of a node with a binary search, and they'd have some cache-friendly properties.

Inserting a child would need a memmove in O(N), but if edits are rare after an initial build it wouldn't be that bad.

snovv_crash · on Dec 16, 2023

You could also have extra space allocated for where extra data should go. By aggressively rebalancing you should only need to do a full reallocation when the tree is actually getting pretty full.

jiggawatts · on Dec 16, 2023

I was about to comment that this reminds me of Aaron Hsu’s Dyalog compiler, but it’s mentioned right there in the README as one of the inspirations.

IMHO, compilers are about an order of magnitude slower than they could be, and this type of tree representation could be one key element for fixing that.

One interesting approach would be to combine egg[1] with this style of vectorised trees for efficient optimisation passes.

[1] https://arxiv.org/pdf/2004.03082.pdf

sinuhe69 · on Dec 17, 2023

What is a array index if not a “pointer”? In fact, access an array member involves a pointer arithmetic: array_p + index * data_size

The fact that traditional tree implementation requires malloc is solely based on the wish to dynamically provide and remove of nodes. If the tree can have a limited size with no frequent delete operation, an array implementation is fine. Otherwise, expand an array can be a very costly or even an impossible operation in some circumstances.

And full scanning is not efficient.

kleiba · on Dec 17, 2023

The difference lies in the CPU cache.

emmanueloga_ · on Dec 17, 2023

A lot of people commenting about "an index into an array is also a pointer", I thought people commonly referred to integer indexes of this kind as handles, or index-handles? (like in this article [1]).

This way of representing trees reminds me of two classic data structures: heaps [2] and disjoint-sets [3].

--

1: https://floooh.github.io/2018/06/17/handles-vs-pointers.html

2: https://en.wikipedia.org/wiki/Heap_(data_structure)

3: https://en.wikipedia.org/wiki/Disjoint-set_data_structure

amluto · on Dec 16, 2023

> Pointers are annoying anyway.

But the parent indices are pointers. Not in the index-into-all-memory sense but in the offset-into-array sense. They’re smaller, type safe, inherently well packed, and can’t point outside the instance of the data structure in question if they’re bounds checked. But I would still think of them as a sort of pointer.

So this is just a tree, with only parent pointer, in SOA form, and iterable (in no particular order). Which is maybe useful if you want that specific data structure.

And it’s utterly useless if you want, say, a tree with children and need better-than-O(n) access to children. Which you probably do need. Of course, you can add child pointers in SOA form if you want.

(SOA is Struct Of Arrays)

alexchamberlain · on Dec 16, 2023

I think you could achieve something close to (I haven't sat down and thought about it) O(logN) with sorted arrays? You'd have the downside of inserting things in the middle, but I think that would be lost in cache efficiency elsewhere for many use cases.

amluto · on Dec 16, 2023

A sorted array to less inefficiently iterate children of a node? This is getting ridiculous.

The “no pointers” thing is cache efficient for three reasons:

1. Indices can be smaller than pointers.

2. The data is packed in memory. This means that cachelines will be full of relevant data, and adjacent cache lines (which are a bit faster than faraway lines) get used.

3. SOA form is more or less cache efficient depending on what you’re doing with it.

And that’s it.

If you need child pointers, use child pointers. If you want a cache-efficient tree, use a cache-efficient tree, e.g. a B-tree or an appropriate variant.

alexchamberlain · on Dec 16, 2023

Sorted arrays can be navigated like a tree. For sufficiently randomised data, the middle element is approximately the middle of your data. You can use binary search. You can implement range iteration very efficiently. It's not wholly ridiculous.

amluto · on Dec 16, 2023

Binary search into a sorted array is slow. Intro algorithms classes might count operations, and you discover that about log_2(n) operations are needed, which is optimal in a certain sense.

But those operations are horrible operations. Until you are log_2(cache line size / element size) steps from the leaves, you are hitting a different cache line each time, and you’re doing a branch such that the next cache line you want depends on the branch result.

So this is like log_2(n)-3 serial (cache miss, compare, branch, compute address operations). (Assume 8-byte entries — it’s much worse with bigger entries.) This not even close to optimal.

You will find that, for reasonably large data sets (maybe a few hundred elements or more?), any cache-optimized searchable data structure will beat binary search by a large factor. And for small arrays (or the last stage of an optimized large structure), you want a vectorized linear search.

Binary search is mostly nice because sorted arrays are elegant and binary search can be implemented in very little code. And it’s easy to explain and analyze.

senderista · on Dec 17, 2023

Eytzinger ("heap") layout packs the pivots as efficiently as possible, so cache misses are confined to the lower levels of the tree. But ordered iteration is slower, and of course it only works for static data. I agree that for dynamic structures I have yet to find anything more cache-efficient than a B-tree.

GrumpySloth · on Dec 17, 2023

Measurements suggest that binary search is at least competitive with linear search and is often the winner, so it’s a reasonable default[1].

But naive binary search can also be improved upon by dividing the searched space into 3 subranges instead of 2[2].

[1]: <https://www.pvk.ca/Blog/2012/07/03/binary-search-star-elimin...>

[2]: <https://www.pvk.ca/Blog/2012/07/30/binary-search-is-a-pathol...>

amluto · on Dec 17, 2023

To clarify, I’m not saying that binary search is a bad way to search a sorted array. I’m saying that, if you have a bunch of data, you intend to preprocess that data and then search it repeatedly, that sorting it and binary searching is not a great solution.

catlifeonmars · on Dec 16, 2023

I feel like atree in TFA is just giving a name to something I do all the time when it makes sense to so. For example, I used something similar recently when building a ECS entity tree for a game engine.

irreducible · on Dec 17, 2023

For anyone interested, I sketched out a basic implementation of this in Rust. https://github.com/irreducible-io/apter-tree

conradludgate · on Dec 17, 2023

Petgraph uses a similar design (two vecs) for storing graphs https://docs.rs/petgraph/latest/petgraph/graph/struct.Graph....

readthenotes1 · on Dec 16, 2023

This reminds me of data structures used to represent data on the old tape devices

mamcx · on Dec 17, 2023

Oh, cool.

I made my own attempt at the same kind of idea at https://elmalabarista.com/blog/2022-flat-tree/.

Is pretty simple actually. What I have observed (and my tree exploit) is that most tree are "too much" and I only have cared by pre-order tree and "sequentially" scan is the major op.

So, the major takeaway is that if your case is simple, Go! Go! Vec!

gumby · on Dec 16, 2023

A pointer is simply an index into an array which is RAM.

lelandbatey · on Dec 16, 2023

Conceptually, yes. There are a lot of finer points to this concept though. One of the points most relevanthere is that accessing "RAM" is not _actually_ an O(1) operation, due to CPU caches which can speed up sequential access of "close together" data. Thus, if you manually ensure that data is "close together" (by putting it all in an array together), you can massively speed up your program in the real world because now all your keys/data can be squeezed into the cache at once.

gumby · on Dec 16, 2023

You can allocate your data the same way, especially using custom allocators in C++. And every reference requires some arithmetic too, if you're worrying about that.

But my point is that storing them in another array doesn't buy you a lot if you allow deallocations (you can then still have a use-after-free, or even read the wrong object) while if you don't deallocate you might as well just use pointers.

Just trying to understand the benefit of going to all this work. I can see the point in old FORTRAN code before FORTRAN 90

lelandbatey · on Dec 16, 2023

I think it's mostly just an experiment, example, proof-of-concept or other "toy" implementation demonstrating/exploring the idea.

gumby · on Dec 16, 2023

Yeah, no judgement on that.

But I’ve worked with indexed classes (mixin class) to avoid pointers in the past and have never found them worth the effort. Since someone else is mentioning it here on HN I’m asking if there is an advantage over pointers that I don’t realise — if so I might be glad to take advantage of it myself.

senderista · on Dec 17, 2023

It's all just names, all the way down.

https://www.dropbox.com/s/mmi46lxk8ipoci0/Principles%20of%20...

mrkeen · on Dec 16, 2023

But they point into your RAM, which means you can't store them on disk or pass them over the network.

edflsafoiewq · on Dec 16, 2023

SoA form also lets you trivially store additional data at every node, eg. if an algorithm needs a flag for each node, you just alloc an array for it.

gumby · on Dec 16, 2023

A minor but legitimate benefit.

zappb · on Dec 16, 2023

Pointers are a bit more abstract than that. Since each program has a virtual address space, and said address space is made up of memory pages, even raw pointers need to be translated to the physical address via TLBs. In that sense, indices into arrays are more like pointers than the other way around.

stevage · on Dec 16, 2023

Interesting the casual references to languages K, J and Q. I have never heard of them, and they are hard to google. Anyone know?

spenczar5 · on Dec 16, 2023

Oh boy, they are weird. They are all related to APL. Extremely terse, dense languges. Got some traction in actuarial and financial circles.

Arthur Whitney is the origin of much of that family of languages. Bryan Cantrill’s interview with him is good: https://dl.acm.org/doi/pdf/10.1145/1515964.1531242

mgaunard · on Dec 16, 2023

K and Q are the languages behind KDB.

K is the lower-level language and is a variant of APL. Q is the higher-level one, bringing in elements of SQL.

J is another APL-like programming language, unrelated to KDB, by the actual creator of APL.

All are easy to google by adding "programming language" to your query, and each has a wikipedia page.

laszlokorte · on Dec 16, 2023

They are array languages. If you have never heart of array languages check out the code_report youtube channel.

ufo · on Dec 17, 2023

I'm curious about the performance of separating the data from the pointers. On one hand, homogenous arrays pack better. On the other, the traditional version that packages the data and child pointers next to each other might have better cache locality.

timrobinson333 · on Dec 17, 2023

I'm surprised nobody has pointed out what seems to be a fairy obvious limitation: the children within a parent cannot be ordered (unless you want to splice them in the middle of the arrays when inserting which would be pretty inefficient)

alpaca128 · on Dec 17, 2023

If all you care about is the order and not that children are neighbours in the array it should be fine.

nurettin · on Dec 16, 2023

Wow this is pretty similar to how I used to store hierarchical data back in my qbasic days to solve minimax problems. Didn't know it was called steven or whatever.

billforsternz · on Dec 17, 2023

C++ keeps getting increasingly esoteric new stuff. What I'd really like instead is a std::tree

rewmie · on Dec 16, 2023

It might be of interest to anyone that there's an implicit binary tree data structure dubbed Eytzinger's tree/method that only requires a single vector.

https://opendatastructures.org/ods-cpp/10_1_Implicit_Binary_...

I dare say that no tree data structure beats Eytzinger's method in cache locality.

amluto · on Dec 16, 2023

> I dare say that no tree data structure beats Eytzinger's method in cache locality

Cache locality is good toward the leaves and bad everywhere else. This is why binary search on a sorted array is actually quite slow.

ORC in Linux, for example, was first prototyped as a sorted array of little structures. It got much faster by adding an auxiliary index giving O(1) access to a credible starting point for a search.

funcDropShadow · on Dec 16, 2023

> Cache locality is good toward the leaves and bad everywhere else.

1. That depends on the operation performed. I'd say cache-locality is near perfect for depth-traversal.

2. Whether the effective performance of the cache is good or bad depends on the alternative. If the alternative is adding two 64 bit pointer to every 32 bit value in the tree node. And each of those node may be spread through out the heap. Then this representation starts to look quite good.

ulatich · on Dec 16, 2023

There is also the “Binary Ostensibly-Implicit Tree”. Like Eytzinger‘s tree/method but without the need to pad memory. Generalises to n-ary trees too.

bjourne · on Dec 17, 2023

Van Emde Boas trees are asymptotically cache optimal. In practice when cache sizes are known they may be slower than other tree types however.

dgreensp · on Dec 16, 2023

[flagged]

dmw_ng · on Dec 16, 2023

It's not just cache locality, but also at least the effectiveness of cache prefetching and the ability to use wider operations.. simple array loops are something compilers can often vectorize without assistance.

dgreensp · on Dec 17, 2023

Yes, but, operations like, “Does this node have children or is it a leaf?” should not require reading the entire array. Or, how many children; get the first child. Traversing the tree in pre-order or post-order doesn’t just have less cache locality than a B tree (say), it involves sorting a vector of vectors and is at best N log N and O(N) extra space. Deleting a node requires a O(N) writes. Every little operation is massively less local and less efficient.

I stand by the lack of integrity that I got downvoted for pointing out! I’ll die on that hill.

_eojb · on Dec 17, 2023

No idea why your parent comment was flagged. The proposed data structure is truly horrendous but somehow, this post remains on the front page...