Disclaimer:
If you need to store and retrieve large amounts of data on disk,
SQLite is probably what you're looking for.
If you want to quickly look up data in memory, the Python
dict class is probably what you need.
However, if you want something for which neither is suitable,
a B-tree might be helpful.
This is an implementation of particular kind of B-tree, based on research by Ohad Rodeh. See "B-trees, Shadowing, and Clones" (copied here with permission of author) for details on the data structure. This is the same data structure that btrfs uses. Note that my implementation is independent from the btrfs one, and might differ from what the paper describes.
The distinctive feature of this B-tree is that a node is never modified (sort-of). Instead, all updates are done by copy-on-write. Among other things, this makes it easy to clone a tree, and modify only the clone, while other processes access the original tree. This is utterly wonderful for my backup application, and that's the reason I wrote larch in the first place.
I have tried to keep the implementation generic and flexible, so that
you may use it in a variety of situations. For example, the tree itself
does not decide where its nodes are stored: you provide a class that
does that for it. I have two implementations of the NodeStore class,
one for in-memory and one for on-disk storage.
The tree attempts to guarantee this: all modifications you make will be safely stored in the node store when the larch.Forest.commit method is called. After that, unless you actually modify the committed tree yourself, it will be safe from further modifications. (You need to take care to create a new tree for further modifications, though.)
Documentation is sparse. Docstrings and reading the code are your best hope.
Contributions are welcome. Every level of contribution is most appreciated: bug reports, spelling and grammar fixes, code patches, questions on how to get started, etc. See README for information on how to modify the code. And if anything's unclear, ask!
Status: in production use. The on-disk file format is not expected to change anymore.
- README (updated at release time)
- NEWS (updated at release time)
- example.py (an example program)
- Using stuff on code.liw.fi
- Version control: bzr get http://code.liw.fi/larch/bzr/trunk/
- Release tarballs and Debian packages: http://code.liw.fi/debian/pool/main/p/python-larch/
- Benchmark results
- Release checklist
- Bug tracking page
- Blog entries: http://blog.liw.fi/tag/larch/
- Private journal entries
- Copyright license: GNU GPL3 or later.
I have several question:
Best regards
Larch works with one writer at a time. If you set up mutual exclusion (locking) between them, many writers will work too. However, Larch itself does not deal with locking, and leaves it entirely up to you.
I use clones in my backup application to implement backup generations. Clones are not sufficient for multiple writers, because some things, such as the reference counts, are shared between all clones and need mutual exclusion for updates.
I have not given much thought for supporting multiple writers directly in Larch. It's not been something that's been interesting to the case where I use Larch (my backup program).
I see no reason why an SQL implementation like Sqlite could not be built on top of Larch. However, it's not something I have any interest working on.
If a commit fails, the "forest" metadata does not get updated, and the readers do not see any changes. There may be some garbage files on the disk (nodes that got written, but whose trees got lost from the list of clones in the forest), but apart from wastig space, they should be harmless.