My Trees can now have arbitrary roots. It is time to grow a Forest.

When I create a new tree, which might or might not be a clone of an existing one, I need to give the tree the root node id as an initializer argument. This is awkward, since I don't know what it will be...

I think the following is a better design:

  • if root_id is None, the tree picks the next id when it creates the root node
  • if root_id is not None, it is the root of the tree we clone from
  • when the tree updates nodes, it never overwrites the root node, instead it creates a new node all the time
  • when things are committed to disk, the forest queries all trees for their current root node and stores that list in the metadata

Done.

Next big thing: trees currently generate node ids themselves. This will end badly... Fixed.

The final big thing: removing a tree. I haven't used refcounts yet at all, that's going to be needed for this. Otherwise I can't remove nodes.

In fact, refcounting is the only interesting part of this. So, the changes that need making are:

  • when creating a node, set its refcount to 1
  • when creating an index node, additionally increment the refcount of each of its children
  • instead of removing a node, decrement its refcount and if it drops to zero, only then remove it
  • when removing an index node, additionally decrement the refcount of each of its children
  • node removal happens also when root node is replaced

Hmm. This is not quite the same as what Rodeh suggests, but he's planning for a filesystem where disk blocks need to be re-used, where I do not have that.

Actually, my scheme above can be simplified:

  • when creating a leaf node, set its refcount to 0
  • when creating an index node, increment refcount of each child
  • when removing an inex node, decrement refcount of each child
  • root nodes will always have a refcount of 1

How should I deal with the root that gets replaced? It might belong to another tree, after all. I guess I should not force roots to have refcounts of 1. Instead, when I clone a tree, the refcount for a root should grow. Alternatively, I could always create a new root node when cloning a tree.

Also, even when inserting, some nodes might end up getting destroyed. This requires some careful thinking, now.

Algorithm for decrementing refcount for a node:

decrement refcount for node
if refcount dropped to 0:
    if node is index node:
        decrement refcounts for children
    remove node

Calling this algorithm at judicious locations when inserting and removing should work.

I think I got it work. Didn't write extensive tests for the refcounting, but we'll see.

Next step should probably to do some benchmarking.