Disclaimer: If you need to store and retrieve large amounts of data on disk, SQLite is probably what you're looking for. If you want to quickly look up data in memory, the Python dict class is probably what you need. However, if you want something for which neither is suitable, a B-tree might be helpful.

This is an implementation of particular kind of B-tree, based on research by Ohad Rodeh. See "B-trees, Shadowing, and Clones" (copied here with permission of author) for details on the data structure. This is the same data structure that btrfs uses. Note that my implementation is independent from the btrfs one, and might differ from what the paper describes.

The distinctive feature of this B-tree is that a node is never modified (sort-of). Instead, all updates are done by copy-on-write. Among other things, this makes it easy to clone a tree, and modify only the clone, while other processes access the original tree. This is utterly wonderful for my backup application, and that's the reason I wrote larch in the first place.

I have tried to keep the implementation generic and flexible, so that you may use it in a variety of situations. For example, the tree itself does not decide where its nodes are stored: you provide a class that does that for it. I have two implementations of the NodeStore class, one for in-memory and one for on-disk storage.

The tree attempts to guarantee this: all modifications you make will be safely stored in the node store when the larch.Forest.commit method is called. After that, unless you actually modify the committed tree yourself, it will be safe from further modifications. (You need to take care to create a new tree for further modifications, though.)

Documentation is sparse. Docstrings and reading the code are your best hope.

Contributions are welcome. Every level of contribution is most appreciated: bug reports, spelling and grammar fixes, code patches, questions on how to get started, etc. See the README (link below) for information on how to modify the code. And if anything's unclear, ask!

Status: in production use. The on-disk file format is not expected to change anymore.