Open bugs in Obnam

If you have a problem with Obnam, please send mail to the mailing list. Don't add one to this list. See the contact page for information about the list. This wiki page is meant to help developers keep track of confirmed bugs, and not as a support channel. (This has changed on 2012-07-08.)

See bug-reporting page for hints on what to include in a bug report, if you're unsure.

See also bugs that are done, and bugs.

See also:

On Tue, Mar 11, 2014 at 04:34:13PM +0100, Thomas Schwinge wrote:

Alternatively, wouldn't it make sense to change to, or at least have available, a mode where verify works not based on the data in the backup, but instead of the actual user data? Wouldn't doing it in this way build a yet greater confidence that obnam has backed up everything alright? That is, instead of traversing the data in the backup and verifying against the corresponding user data, it'd traverse the user data (as when doing a backup) and verify against the data in the backup.

I think you're right. I think we should have "obnam verify" traverse through both the backup generation and the live data, and report as follows:

  • if a file exists only in the backup, but not in the live data
  • if a file exists only in the live data, but not in the backup
    • also indicate whether the file matches (current) exclusion rules, since that matters: a file that is in live data but is not excluded is different from an excluded one
  • if a file exists in both the backup and the live data, but is different in any way

I don't have time to work on this at the moment, but I'll add it to the list of bugs in http://liw.fi/obnam/bugs/ and would be happy to review and merge a patch for this.

--liw

Posted Sat Mar 15 08:56:52 2014 Tags:

It seems obnam mount (the FUSE plugin) can't handle a client without non-checkpoint generations. This is unfortunate, even if it is fairly unlikely to happen. Should be easy enough to fix. --liw

Posted Wed Mar 5 08:02:41 2014

Obnam does not support the ext2/3/4 chattr attributes. It should back them up and set them on restore, when possible.

In addition, it should support the d attribute to exclude files from being backed up.

--liw

Posted Sat Feb 15 20:55:53 2014 Tags:

Obnam seems to be storing metadata quite inefficiently. I did this:

  • create a directory tree with about 1.3 million empty files
  • back that up

The data is about 500 megs (directory entries); the repository is about 82 gigabytes.

Where is the space used? Can we store it more efficiently and if we do, does that have an impact on runtime?

--liw

Posted Fri Nov 1 08:33:13 2013

Improve Obnam's progress reporting during committing.

  • how much work will there be to do the commit?
  • how many files to upload?
  • how many files to move in journal?
Posted Wed Oct 30 15:41:46 2013 Tags:

Obnam needs tests for using every filesystem type available as live data or repository.

Not sure how to arrange that without root access, but there's a need to do that.

Posted Wed Oct 30 15:41:46 2013

Obnam has no test for what happens when the filesystem fills up.

Posted Wed Oct 30 15:41:46 2013

Obnam could do with a mode in which it backs up the data from a block device, instead of the device node. If the block device contains a filesystem, it should backup only the parts of the device that are used by the filesystem, and skip unused parts. That could be used for backing up disk images as well.

Posted Wed Oct 30 15:41:46 2013 Tags:

There needs to be tools and documentation for key managment with Obnam.

How does not one replace one's key, or subkey, when it expires?

Posted Sat May 25 18:25:49 2013 Tags:

S.B. suggests that backup generations have an optional description.

  1. Named generations -- There are certain generations that are more important than others. Some are automatically created by Obnam itself, some are routinely scheduled, and some were explicitly created. For example, I always run Obam immediately before traveling with my laptop in case it gets stolen or broken. The same goes for backups before major system upgrades. It would be nice to have something approximately analogous to the Windows "restore point" functionality, which has a description field. Sometimes they are only automatically created system checkpoints. But if the user explicitly creates a new restore point, he can add the description "before traveling to Europe" or "before upgrading OS" or whatever. Similarly, the automatic backup script could be programmed to label it as "cron backup".
Posted Sun Mar 24 17:16:31 2013 Tags:

S.B. suggests that generations could be tagged so they aren't automatically deleted.

  1. Unforgettable generations -- In scenarios similar to the above, I would also find it useful to be able to mark certain important generations as "unforgettable". That way, when I run an automatic time based forget command, I can be sure that it will preserve certain milestone generations, even if they weren't the last generation of the month or the week or the day or whatever.
Posted Sun Mar 24 17:16:31 2013 Tags:

Ben Kelly reported on August 31, 2012, that he's seeing crashes due to file descriptor leaks. See list mail archive for logs and suggested patches. I have not been able to reproduce this, however. --liw

Posted Fri Feb 8 20:16:16 2013

If you accidentally backup some large or sensitive files, but don't want to delete all the generations they're in, it would be handy for Obnam to be able to delete just the specific files from the generations, and leave the rest.

Posted Sat Dec 29 15:02:42 2012 Tags:

Obnam does not currently seem to notice when the sftp connection breaks. It should, and it should then abort the backup. --liw

Posted Tue Dec 4 13:27:25 2012 Tags:

Obnam should, arguably, use ctime changes to trigger backups, so that if a file's size and mtime are the same, because whatever fool program modified a file reset the mtime, obnam will still backup the changed data.

I have code for this, but it requires a repository format change, and breaks the upgrade from format 5 to 6. --liw

Posted Sun Nov 25 17:59:51 2012 Tags:

Problem: If chunk size is reasonably large (say, a megabyte), then most files will be smaller, and the repository ends up with a large number of identical files.

Idea: collect chunks into groups, called "salsa tins".

  • salsa tin = list of chunks
  • salsa tin has an id
  • chunk id = salsa tin id + suitable number of extra bits for index into list
  • chunk id may be 64 bits total, or 64+32, or whatever seems convenient
  • no chunk gets stored alone, only in salsa tins

This lets a client put things into the repository at will, without synchronisation or locking beyond what the filesystem provides (exclusive creation of files).


Having multiple chunks in a single file complicates the logic for managing files in the repository, and deleting unused chunks.

Therefore, an alternative idea: instead of shoving multiple chunks into one file, allow files to use parts of chunks. Currently a file's metadata lists the chunks that have its contents. Change this to be a list of (chunk id, offset, length) triplets, where offset and length specify a part of a chunk. This way, a client can create one chunk that contains the data of many small files, and they can all just use the relevant part of the chunk. Managing removal of those files is easy: it is the current code without modification.

--liw

Posted Fri Nov 23 16:13:10 2012 Tags:

Obnam needs a way to remove clients from the repository. The current remove-client command just deals with encryption.

Suggested-by: Daniel Silverstone

Posted Wed Nov 7 19:43:46 2012 Tags:

Obnam needs a way to rename clients in the client list.

Suggested-by: Daniel Silverstone

Posted Wed Nov 7 19:43:46 2012 Tags:

Would it be faster to use the sftp put and get methods instead of the current open/read/write/close code for transferring files to and from the repository?

Posted Wed Nov 7 19:43:46 2012 Tags:

When larch is processing a journal (committing or deleting at startup, committing at end), obnam should be showing useful progress reporting for that.

Posted Sat Nov 3 13:19:21 2012 Tags:

Make obnam fsck remove extraneous files (e.g., tmp*). --liw

Posted Sat Oct 20 19:28:35 2012 Tags:
<zeri> liw: I tried out obnam yesterday and noticed you guys
    simply call gpg -c --batch for the symmetric encryption part
    <zeri> liw: also if I am not misstaken you sort of have a master
    key for the repository (64 bit hex number encrypted for all keys
    in the repository that is used as passphrase)
<zeri> liw: since you do not sepcify the
    s2k-algo,s2k-mode,s2k-count configuration options of gpg as
    well as the compression-algo option a gpg.conf that is usually
    considered good will slow down obnam to a couple of bytes/sec
    <zeri> liw: since keyderivation is of no use if a sufficiently
    big random secret is used you might want to consider specifying
    --s2k-mode 1 to disable most of the keystrengthening in gpg and
    simply hash the password once with a salt ... speeding up the
    encryption of every block at least by one order of magnitude
    <zeri> (default behaviour is the compute a hash chain of at least
    1024 length up to 65011712 which was in my gpg.conf)
<zeri> also I didn't check whether you do compression in obnam
    but gpg can do that for you as well but it was turned off in my
    gpg.conf (I used gpg primarily for large tar balls where the one
    time overhead doesn't matter)
<zeri> liw: then again I might be totally wrong and overlooked
    some switch to do all that without chaning the code :)
<zeri> oh and I didn't emphasise this yet ... this hash chain
    i talked about before is computed for every "chunk" which were
    between a couple of bytes and 16k in my test ... the effort to
    compute this hashchain (to optain encryption/authentication keys)
    exceeds the effort to ecrypt 16k with any blockcipher by far
    I suppose
<zeri> and it's sole purpose is to prevent weak passwords for
    being guesed in short time (since the computational effort to
    test a password is equivalent to computing this hash chain thus
    slowing bruteforce down by the factor of the length of the chain)
<zeri> "64 bit hex number encrypted" << that should have been 64
   digits :)
Posted Sat Oct 20 19:28:35 2012 Tags:

Instead of in-place conversions, which are error prone and clunky, a better way would be nice. Maybe some kind of dump/undump pair, using a streamable format?

Posted Sat Oct 20 19:28:35 2012 Tags:

The progress reporting for "obnam forget" seems to be broken. When forgetting more than two generations, the progress display is stuck at

forgetting generations: 2/64 done

until the very end. It's updated to 64/64 right before finishing.

-- weinzwang

Posted Thu Sep 6 09:10:11 2012 Tags:

The Idea is to change or extend the --exclude-caches feature so that one can configure which filename to look for that will make obnam skip the directory


Changing --exclude-caches seems wrong to me: it has a specific purpose (to implement the cache directory tagging spec, http://www.bford.info/cachedir/spec.html).

Adding a new option to ignore directories that contain a specific file (or directory) would be fine.

--liw


bwh points out that the owner of the directory and the tag file should be the same.

Posted Mon Jun 18 20:26:06 2012 Tags:

It would be good for Obnam to do the whole-file checksum with a different checksum algorithm, or by using a suitable salt, to catch problems with single-chunk files, e.g., when there is a hash collision. --liw

Posted Fri Jun 15 23:07:02 2012 Tags:

First of all, I realise that Obnam stores full paths because it is necessary for saving every file in the system, even when belonging to different users.

However, for certain cases where just a backup of a directory is needed, this could be flexibilized, letting the backup store only the path that is given in the command line, following rsync's spirit.

When does this show up? For example, when migrating from any other backup system, the easier way would be to dump all the generations from the older backup system, one by one, to a temporal place. For each generation, Obnam is run in order to replicate the same history. However, since Obnam stores full paths, the path to the temporary directory used for the migration is also stored. This can happen in production servers, where making the conversion into the original directory where the data belongs to is not possible.

Rickard Nilsson suggested on the mailing list to have a "root" option that could be used for stripping the first part of the path. The stripped part would be the one not mentioned in the command line. That way, the backup will have a path computed like this: root + given_path.

~$ obnam backup --repository=/media/backups/... --root=/ mydata

...would be stored into /mydata instead of in /home/${USER}/mydata

Thank you for taking this into consideration!


If this gets implemented, I suggest the following:

  • The Repository class will provide a hook for mangling the pathnames.
  • The hook will get the pathname as it exists in live data and will return the pathname to store in the backup.
  • The hook will be called at every point where live data pathnames are used by Repository.
  • Someone writes a plugin that adds the suitable functionality.

--liw


For the fun of it I added a mangle_filanem() method to Repository. What I quickly learned: If you backup /root/bar the process backs up "/", "/root", "/foo/bar". In reality you only want "bar" in the backup.

So either the mangling hook is allowed to drop paths entirely. But this feels very crude.

I propose not to change Repository and think of Repository just getting virtual paths from its callers. So instead the functions calling into Repository should be changed. In this case the backup command. I have stopped here.

-- Elrond

Posted Fri Jun 15 09:28:45 2012 Tags:

From Enrico: It might be good to have a way for Obnam to automatically exclude certain kinds of common stuff, such as web browser caches, Liferea caches, etc. This should be easy to enable, and should be off by default (safe defaults are important).

Posted Tue Jun 5 09:17:32 2012 Tags:

Currently, obnam fsck reports chunks that are unused:

chunk 16541095925909528379 not used by anyone

but doesn't do anything about it. There should be an option to remove those unused chunks from the repository. --weinzwang

Posted Sat May 26 22:12:59 2012 Tags:

Obnam should, at least optionally, use fsync or other methods to ensure that everything gets committed to disk by the kernel by the end of a backup run. --liw

I want this to not have a huge performance impact, though. Learning from the lessons of dpkg, sqlite/liferea/firefox, etc, and using fsync/fdatasync and sync_file_range in the right ways is going to be necessary. --liw

Posted Sun Apr 22 10:44:42 2012 Tags:

obnam force-lock currently doesn't work. As a workaround, remove the lockfiles (all files named lock inside the repository) by hand.

find [repository path] -name lock -exec rm '{}' \;

--weinzwang


I confirm that I see this too. This bug exists because I changed how Obnam uses locks: it now locks each directory properly, instead of just the per-client directory. However, I haven't fixed "force-lock" to deal with other locks, so now it's not possible to force the locks for other directories than the per-client one. This is awkward.

To fix this, Obnam needs to know that it can safely remove the locks. There's two cases:

  • the lock was created by some other client; in this case, the user (not Obnam automatically) needs to decide if it is safe to remove the lock: just running "obnam force-lock" should not do that, instead the user should provide an option like "--really-force-locks" or something
  • the lock was created by the same client, i.e., Obnam running on the same host; in this case, if the Obnam process no longer exists, the lock can be safely removed, otherwise the locks should not be removed (again, unless "--really-force-locks" is used)

To implement this, we need Obnam to store the hostname and process id of the Obnam instance that created the lock, preferably in a way that does not leak sensitive information easily (don't store the client name in cleartext, but the md5sum of it, or something).

--liw


As of 0.27, force-locks unconditionally breaks locks, but the lock files will contain sufficient information to allow us to be more intelligent about the breaking of locks in the future.

--kinnison

--

This is not good enough -- I'd like obnam to be able to break locks more kindly -- but it's good enough for 1.0, I think, so removing the blocker tag. --liw


Making the lock breaking more benign and intelligent is a wishlist. Adding tag. --liw

Posted Tue Apr 3 07:57:46 2012 Tags:

Currently, there seems to be no easy way to forget all (or all but the newest) checkpoint generations. Something like

obnam --keep 1c forget

would be nice.

-- weinzwang

Posted Tue Feb 28 14:35:03 2012 Tags:

When --one-file-system is used, it would be nice to not cross bind-mounts. No idea how to figure that out, but it must be possible. --liw

You could look at the inode numbers for . and ./foodir/.. and check they're the same? -- kinnison

The inode check will not work if foodir is a symlink. --mathstuf

Posted Sun Jan 15 18:39:03 2012 Tags:

Obnam should, optionally, ask for a gpg passphrase, for the key specified with --encrypt-with, so that a user without a gpg agent will be able to do encrypted backups. Obnam should read the passphrase if its ask-passphrase setting is true, and it has access to a terminal. It should not have a setting for the passphrase itself, just for reading it from a terminal (just so that people who don't know better don't put their passphrase in a config file or similar).

Those running obnam from cron will need to have a passphraseless key, since there's no way to give obnam a passphrase in that case, without storing it in the crontab or a config file, and then it's no better than not having a passphrase.

See Debian bug #649769.

--liw

From my understanding, having a symmetric passphrase stored in a config file is not useless at all. My purpose in encrypting the backup data is to prevent the remote server from having my data in plain-view; or if I back it up to an external drive, I wouldn't want it to be accessible to anyone who picks it up. But if someone gains access to my config file, he'll have direct access to all of my data anyway--he wouldn't need to access my backups.

If I use a passphrase, then if my house burns down and I lose everything, I can get a new computer and download my data and decrypt it with my passphrase--which is long enough to be unfeasible to crack, yet completely memorized by me.

If I use a key, then if my house burns down and I don't have a working copy of my key outside my house, my backups are totally useless, and I really HAVE lost everything. (Sure, I should take precautions to keep from losing my key--but things happen.)

--Adam

It's possible to get obnam to request a passphrase when running from cron:

  1. Ensure 'use-agent' is enabled in ~/.gnupg/gpg.conf.
  2. Ensure the gpg-agent is running, and GPG_AGENT_INFO is set in your regular environment. Note that if obnam already asks for an enccryption passphrase when run normally, then 1 & 2 are already correctly set.
  3. Ensure the environment obnam is called from in cron is exporting GPG_AGENT_INFO correctly. This means you must set and export the GPG_AGENT_INFO environment variable in your cron script. gpg writes this information to ~/.gnupg/gpg-agent-info-$(hostname), so in your cron script you must have:

    source "~/.gnupg/gpg-agent-info-$(hostname)" && export GPG_AGENT_INFO

Then call obnam as normal.

This will only work on a desktop system where there is someone to notice that a pinentry window has popped up. However it looks like there may be a way to forward the gpg-agent socket over ssh, and thus run obnam with encryption from cron on a headless remote machine (See here). You'd probably have to store the private key on the remote machine though.. so not sure how useful that would be.

--Scott

Posted Sun Jan 1 15:33:57 2012 Tags:

This is an idea for optimizing Obnam.

Store MD5 of string containing names + relevant metadata of all files in a directory, then re-compute that when backing up a directory: if checksums are same, then no file in the directory has changed, and there's no need to check each of them separately, saving many tree lookups

Consider only non-dirs, since subsubdir can change without it being visible at grandparent level. Thus, recursing is always necessary.

--liw

Posted Sat Dec 10 21:54:36 2011 Tags:

Obnam should support multiple repositories, to be chosen at invocation time.

  • all repositories configured in config files
  • nicknames for repositories so it's easy to choose
  • --repository should accept nicknames
  • choose many repositories for one run
  • use all available repositories by default

--liw


Could this also be achieved by running Obnam from a wrapper script that uses a different repository for each run? Could Obnam be run in parallel instances backing up the same data to different repos? Is that possible now? --Adam


Adam, it can certainly be done by using wrapper scripts (I've been doing that), and while I haven't actually tried it, there should be no problem with backing up to multiple repositories concurrently, though you may need to fiddle with the configs so that they use different log files. --liw


After some thinking, I think I don't want nicknames for repositories, I want "profiles".Here's a concrete suggestion:

[config]
encrypt-with = CAFEF00D
profile = all
log = /var/log/obnam/obnam-%(profile)s.log

[profile "online"]
repository = sftp://liw@personal.backup.server.example.com/~/repo/
use-if = ping -c1 personal.backup.server.example.com

[profile "usb-drive"]
repository = /media/Drum/obnam-repo/
use-if = test -d /media/Drum/obnam-repo/

[profile "at-work"]
repository = /mnt/backups/
use-if = ping -c1 fs.work.example.com
pre-command = sudo mount /mnt/backups
post-command = sudo umount /mnt/backups
  • if --profile=all, then iterate automatically over all profiles
  • otherwise, use only the chosen profiles
  • some day: run some/all profiles in parallel in one obnam instance; initially, user may run parallel obnam instances
  • log file should embed profile name somehow
  • should profile be selected based on user too? that can be done with "use-if = test $USER = liw"; better support can be added later, if there's a need

--liw

Posted Sat Dec 10 21:54:36 2011 Tags:

Obnam is currently using paramiko as the SFTP implementation. It is a bit more limited than the SFTP protocol is, and so some stuff that Obnam should be doing, such as restoring hardlinks across SFTP, are not possible. There may also be some bugs with regards to timestamp handling.

Possible fixes:

--liw

Posted Sat Dec 10 21:54:36 2011 Tags:

If a file is sparse, and has a large hole, it would be good to skip over it with SEEK_HOLE and SEEK_DATA. --liw

Posted Sat Dec 10 21:54:36 2011 Tags:

Joey asked:

have you done anything in obnam to deal with it needing to keep the symmetric key, decrypted, in RAM?

yeah, it's tough. probably could be avoided by having gpg decrypt the passphrase and pipe it to the encrypting gpg .. but then gpg would constantly be using the public key

It might be possible to have a C extension that holds the symmetric key, locks it into RAM, and feeds it to gpg whenever necessary, via a file descriptor.

--liw

Posted Sat Dec 10 21:54:36 2011 Tags:

From Joey Hess:

My take on this is that, by choosing to use a tool that uses hashes, I am giving up (near-)absolute certainty for speed, or space, or whatever. So it's important that the hash type be good at collision resistance (for example, no two likely filenames should hash the same; "/etc/passwd" should only tend to collide with blobs that are very unlike a filename). It's also important that the tool be upfront about using hashes, and about what hash it uses. And if it's not designed to allow swapping the hash out when it gets broken, I will trust it less (hello git).

Ah, the replacement of hash functions is an interesting problem.

For pathnames, it's not at all important, I think, except perhaps for performance, since pathnames will be compared byte-by-byte instead of by hashes.

For file data, replacing is easy, if one is willing to back up everything from scratch. Supporting several hashes in the same backup store is a little bit more work, but not a whole lot: instead of having just one tree for mapping checksums to chunk identifiers, one would have one per checksum algorithm.

--liw

Posted Sat Dec 10 21:54:36 2011 Tags: