Next: , Previous: , Up: Crash Tolerance   [Contents][Index]


16.4 Crash recovery

If a crash occurs, the snapshot file (even or odd) containing the database state reflecting the most recent successful gdbm_sync call is the snapshot file whose permission bits are read-only and whose last-modification timestamp is greatest. If both snapshot files are readable, we choose the one with the most recent last-modification timestamp. Modern operating systems record timestamps in nanoseconds, which gives sufficient confidence that the timestamps of the two snapshots will differ. However, one can’t rule out the possibility that the two snapshot files will be both readable and have identical timestamps2. To cope with this, GDBM version 1.21 introduced the new extended database format, which stores in the database file header the number of synchronizations performed so far. This number can reliably be used to select the most recent snapshot, independently of its timestamp. We strongly suggest using this new format when writing crash-tolerant applications. See Numsync, for a detailed discussion.

The gdbm_latest_snapshot function is provided, that selects the right snapshot among the two. Invoke it as:

const char *recovery_file = NULL;
result = gdbm_latest_snapshot (even, odd, &recovery_file);

where even and odd are names of the snapshot files. On success, it stores the pointer to the most recent snapshot file name in recovery_file and returns GDBM_SNAPSHOT_OK. To finalize the recovery, rename this file to the name of your database file and re-open it using gdbm_open. The remaining snapshot can be used as a backup copy.

If an error occurs, gdbm_latest_snapshot returns one of the following error codes.

gdbm_latest_snapshot: GDBM_SNAPSHOT_BAD

Neither snapshot file is readable. This means that the crash has occurred before gdbm_failure_atomic completed. In this case, it is best to fall back on a safe backup copy of the data file.

gdbm_latest_snapshot: GDBM_SNAPSHOT_ERR

System error ocurred in gdbm_latest_snapshot. Examine the system errno variable for details. Its possible values are:

EACCES

The file mode of one of the snapshot files was incorrect. Each snapshot file can be either readable (0400) or writable (0200), but not both. This probably means that someone touched one or both snapshot files after the crash and before your attempt to recover from it. This case needs additional investigation. If you’re sure that the only change someone made to the files is altering their modes, and your database is in numsync format (see Numsync), you can reset the modes to 0400 and retry the recovery.

This error can also be returned by underlying stat call, meaning that search permission was denied for one of the directories in the path prefix of a snapshot file name. That again means that someone has messed with permissions after the crash.

EINVAL

Some arguments passed to gdbm_latest_snapshot were not valid. It is a programmer’s error which means that your application needs to be fixed.

ENOSYS

Function is not implemented. This means GDBM was built without crash-tolerance support.

Other value (EBADF, EFAULT, etc)

An error occurred when trying to stat the snapshot file. See ERRORS in stat(2) man page, for a discussion of possible errno values.

gdbm_latest_snapshot: GDBM_SNAPSHOT_SAME

File modes and modification dates of both snapshot files are exactly the same.

gdbm_latest_snapshot: GDBM_SNAPSHOT_SUSPICIOUS

For the database in extended numsync format (see Numsync): the numsync values of the two snapshot differ by more than one. Check the arguments to the gdbm_latest_snapshot function. The most probably reason of such an error is that the even and odd parameters point to snapshot files belonging to different database files.

If you get any of these errors, we strongly suggest to undertake manual recovery.


Footnotes

(2)

This can happen, for example, if the storage is very fast and the system clock is low-resolution, or if the system administrator sets the system clock backwards. In the latter case one can end up with the most recent snapshot file having modification time earlier than that of the obsolete snapshot.


Next: , Previous: , Up: Crash Tolerance   [Contents][Index]