Gadfly Recovery

Version:

1.1.1.1

In the event of a software glitch or crash Gadfly may terminate without having stored committed updates. A recovery strategy attempts to make sure that the unapplied commited updates are applied when the database restarts. It is always assumed that there is only one primary (server) process controlling the database (possibly with multiple clients).

Gadfly uses a simple LOG with deferred updates recovery mechanism. Recovery should be possible in the presence of non-disk failures (server crash, system crash). Recovery after a disk crash is not available for Gadfly as yet, sorry.

Due to portability problems Gadfly does not prevent multiple processes from "controlling" the database at once. For read only access multiple instances are not a problem, but for access with modification, the processes may collide and corrupt the database. For a read-write database, make sure only one (server) process controls the database at any given time.

The only concurrency control mechanism that provides serializability for Gadfly as yet is the trivial one -- the server serves all clients serially. This will likely change for some variant of the system at some point.

This section explains the basic recovery mechanism.

Normal operation

Precommit

During normal operations any active tables are in memory in the process. Uncommitted updates for a transaction are kept in "shadow tables" until the transaction commits using:

connection.commit()

The shadow tables remember the mutations that have been applied to them. The permanent table copies are only modified after commit time. A commit commits all updates for all cursors for the connection. Unless the autocommit feature is disabled (see below) a commit normally always triggers a checkpoint too. A rollback:

connection.rollback()

explicitly discards all uncommitted updates and restores the connection to the previously committed state.

There is a 3rd level of shadowing for statement sequences executed by a cursor. In particular the design attempts to make sure that if:

cursor.execute(statement)

fails with an error, then the shadow database will contain no updates from the partially executed statement (which may be a sequence of statements) but will reflect other completed updates that may have not been committed.

Commit

At commit, operations applied to shadow tables are written out in order of application to a log file before being permanently applied to the active database. Finally a commit record is written to the log and the log is flushed. At this point the transaction is considered committed and recoverable, and a new transaction begins. Finally the values of the shadow tables replace the values of the permanent tables in the active database, (but not in the database disk files until checkpoint, if autocheckpoint is disabled).

Checkpoint

A checkpoint operation brings the persistent copies of the tables on disk in sync with the in-memory copies in the active database. Checkpoints occur at server shut down or periodically during server operation. The checkpoint operation runs in isolation (with no database access allowed during checkpoint).

Note: database connections normally run a checkpoint after every commit, unless you set:

connection.autocheckpoint = 0

which asks that checkpoints be done explicitly by the program using:

connection.commit() # if appropriate
connection.checkpoint()

Explicit checkpoints should make the database perform better, since the disk files are written less frequently, but in order to prevent unneeded (possibly time consuming) recovery operations after a database is shutdown and restarted it is important to always execute an explicit checkpoint at server shutdown, and periodically during long server runs.

Note that if any outstanding operations are uncommitted at the time of a checkpoint (when autocheckpoint is disabled) the updates will be lost (ie, it is equivalent to a rollback).

At checkpoint the old persistent value of each table that has been updated since the last checkpoint is copied to a back up file, and the currently active value is written to the permanent table file. Finally if the data definitions have changed the old definitions are stored to a backup file and the new definitions are written to the permanent data definition file. To signal successful checkpoint the log file is then deleted.

At this point (after log deletion) the database is considered quiescent (no recovery required). Finally all back up table files are deleted. [Note, it might be good to keep old logs around... Comments?]

Each table file representation is annotated with a checksum, so the recovery system can check that the file was stored correctly.

Recovery

When a database restarts it automatically determines whether the last active instance shut down normally and whether recovery is required. Gadfly discovers the need for recovery by detecting a non-empty current log file.

To recover the system Gadfly first scans the log file to determine committed transactions. Then Gadfly rescans the log file applying the operations of committed transactions to the in memory table values in the order recorded. When reading in table values for the purpose of recovery Gadfly looks for a backup file for the table first. If the backup is not corrupt, its value is used, otherwise the permanent table file is used.

After recovery Gadfly runs a normal checkpoint before resuming normal operation.

Please note: Although I have attempted to provide a robust implementation for this software I do not guarantee its correctness. I hope it will work well for you but I do not assume any legal responsibility for problems anyone may have during use of these programs.