TDBM(3-LOCAL) TDBM(3-LOCAL) NAME tdbm - dbm database functions with nested atomic transac- tions SYNOPSIS #include <tdbm.h> DbmRc DbmAbort(tid) Tid *tid; DbmRc DbmBegin(parent, child) Tid *parent; Tid **child; DbmRc DbmClose(dbm) Dbm *dbm; DbmRc DbmCommit(tid) Tid *tid; DbmRc DbmDelete(dbm, tid, key) Dbm *dbm; Tid *tid; Datum key; char *DbmErrorString(rc) DbmRc rc; DbmRc DbmFetch(dbm, tid, key, value) Dbm *dbm; Tid *tid; Datum key; Datum *value; DbmRc DbmFind(dbm, tid, key, value) Dbm *dbm; Tid *tid; Datum key; Datum *value; DbmRc DbmFirstkey(dbm, tid, key) Dbm *dbm; Tid *tid; Datum *key; DbmRc DbmNextkey(dbm, tid, key) Dbm *dbm; Tid *tid; Datum *key; DbmRc DbmOpen(pathname, type, config, dbm, recovery) char *pathname; DbmFileType type; DbmConfig *config; Dbm **dbm; DbmRecovery **recovery; DbmRc DbmPrecommit(tid, tidname) Tid *tid; DbmTidName *tidname; DbmErrorClass DbmRcClass(rc) DbmRc rc; DbmRc DbmRecover(dbm, recovery, tid) Dbm *dbm; DbmRecovery **recovery; Tid **tid; DbmRc DbmStore(dbm, tid, key, value, mode) Dbm *dbm; Tid *tid; Datum key; Datum value; DbmStoreMode mode; DbmRc DbmTidName(tid, tidname) Tid *tid; TidName *tidname; DESCRIPTION Tdbm is a collection of functions that implement a simple database made up of key/content pairs. While similar to the UNIX dbm(3) and ndbm(3) packages, there are a number of significant differences: + Nested atomic transactions are supported across a single database. + Volatile (temporary and memory resident) databases can be used. + A database is implemented as a single file. + Very large objects can be stored. + In a multi-threaded environment, concurrent transactions are possible. + In many cases, performance should be improved. The usual Datum data structure has an additional compo- nent, a datum descriptor: typedef u_char EntryDesc; typedef struct { char *dptr; int dsize; EntryDesc desc; } Datum; The descriptor can be used to specify the alignment requirements of a key or value. For the value, the system preserves the requested alignment when it is read so that it can be accessed directly from the system buffer, obvi- ating the need to copy the value into properly aligned memory. For the key, the alignment is not preserved but the specified alignment can be determined later. The low-order two bits of the descriptor specify align- ment. The following constants are defined: ALIGN0 - No alignment required ALIGN2 - Align on any even address ALIGN4 - Align on any address divisible by 4 ALIGN8 - Align on any address divisible by 8 Before a database can be used, it must be opened by call- ing DbmOpen(). The pathname argument identifies the database to be used; it is created if necessary. If type is DBM_PERSISTENT, then the database will be a normal Unix file. The file will be exclusively locked using flock(2). If it is DBM_VOLATILE, then the database is temporary and will disappear when it is no longer referenced (or when the program terminates). A database can be opened multi- ple times except for one special case. For volatile databases, if pathname is NULL or the null string, then a unique internal name is effectively assigned to the database. If not NULL, config allows the user to override system defaults for database parameters: If recovery is not NULL, a list of precommitted transactions is returned (see DbmPrecommit() and DbmRecover()). typedef struct DbmConfig { int mode; /* Mode when creating new dbm */ int pagesize; /* Size of each page (bucket) in the dbm */ int allocunits; /* Allocation units, w.r.t. pagesize */ int lockoptional; /* Don't worry if dbm can't be locked */ } DbmConfig; typedef struct DbmTidName { int hostid; int start_time; int count; } DbmTidName; typedef struct DbmRecovery { char *pathname; DbmTidName tidname; struct DbmRecovery *next; } DbmRecovery; A default is overridden only if a configuration option is non-zero. The lockoptional flag, when set, indicates that the open may proceed even if the database could not be locked (probably because it is already locked). This was added because versions of SunOS as recent as 4.1.2 some- times won't unlock a database on program termination. You don't want to do this unless only a single process will use the database or all processes opening the database are read-only. An identifier associated with the opened database is returned. Most operations on a database occur within the context of a transaction. A transaction is initiated by calling DbmBegin(). If the parent argument is NULL, then this is a top-level transaction, otherwise the new transaction is a subtransaction of parent. A new transaction identifier tid is returned. In the current implementation, all oper- ations within a top-level transaction must be associated with a single database; a transaction is implicitly asso- ciated with a database based on the first I/O operation it performs. N.B. The value (or key) returned by a function must be copied if it is going to be modified. Also, a datum's dptr may no longer be valid if the transaction with which it is associated aborts. New entries are stored in the database using DbmStore(). The given value is put into the database using key. The mode argument is either DBM_REPLACE or DBM_INSERT. The former deletes an existing instance of an entry with the same key before storing the new entry while the latter insists that there be no existing entry. The key and value are copied, so they may be freed after this call. DbmFetch() is used to retrieve an entry. DbmFind() locates an entry without retrieving the value. Both func- tions set the descriptor for the key and the value. An entry is deleted from the database by calling DbmDelete(). A database can be traversed in (apparently) random order. DbmFirstkey() retrieves the key of the ``first'' entry in the database. DbmNextkey() may be called to retrieve the keys of successive entries. Both functions set the descriptor for the key. Note that these functions should be used carefully since they could end up locking the entire database while their transaction exists. A transaction is committed by calling DbmCommit() or aborted by calling DbmAbort(). Aborting a transaction rolls back all modifications made to the database by the transaction. In either case, the transaction must not have any subtransactions. A top-level transaction may be prepared for commitment, but not actually committed, by calling DbmPrecommit(). Upon successful completion, the system guarantees that a subsequent commit will succeed (modulo media failures). After this call, the only operations allowed on the trans- action are DbmCommit() and DbmAbort(). If tidname is not NULL, then it is initialized to the (globally unique) name of the transaction. If DbmOpen() returns recovery infor- mation, then one or more precommitted transactions exist. DbmRecover() should be called (until *recovery is NULL) to return a transaction identifier and the transaction name for each precommitted transaction. These functions are expected to be used by a two phase commit protocol. The transaction name (TidName) for a particular transaction can be obtained using DbmTidName(). A database is closed using DbmClose(). Closing a database aborts any active transactions using it. The given dbm identifier is invalidated, yet if the database was opened multiple times, operations on the database may continue using the other identifiers. DbmErrorString() returns an error message corresponding to the given result code. DbmRcClass() returns an indication of the type of error that has occurred (e.g., DBM_FATAL). WARNING It is unwise for a program to update a database that it is accessing via NFS; databases should always be accessed directly on the file's server. NFS file locking is not performed. Also, byte ordering is not addressed by the current implementation. FILES As part of the atomic commit operation, a transaction file is created. Its name is that of the database with a ``.trans'' appended. The mode of this file has special meaning to tdbm and therefore should not be changed by the user. SEE ALSO dbm(3), ndbm(3). The GNU gdbm library, Philip A. Nelson, Computer Science Department, Western Washington Univer- sity. The Berkeley Hash package, Margo Seltzer . See: ``A New Hash Package for UNIX'' by Margo Seltzer and Ozan Yigit in Winter 1991 Usenix Proceedings. An interactive program, tdbm(1), is available to manipu- late and inspect tdbm files. DIAGNOSTICS When an unusual error occurs, a message is likely to be printed on stderr. BUGS Because creation of a new database occurs outside of the transaction mechanism, rollback is not possible. Multithreaded operation is currently supported only for the U.B.C. Threads kernel. There are currently configuration-dependent limits on the length of a key (it must fit into a database ``page'') and the length of a value (depends on the size of the page allocation bit map and other factors). Although very large, there are limits on the maximum size of a database. A database cannot span filesystems (unless the O/S provides it transparently). A mechanism to support efficient tree locking should be added. On systems without the realpath() library routine (proba- bly any non-Sun system), be sure to use identical path- names when referring to the same database file. The space used by deleted entries is not reclaimed by the operating system but may be reused by the database. The database file may contain holes, making it appear much bigger that it really is. There is no automatic ``shrink- ing'' of the database. If desired, this must be done by traversing the old database, copying each entry to a new database. The package cannot recover from media failures. The location of the transaction file is not user config- urable. It goes in the same directory as the database file. The obvious method of deleting (or adding) items while iterating through the database using DbmFirstKey(), DbmNextKey(), is buggy. No updates should be performed until you're done iterating. The wish list of additional functionality is too long (but suggestions and bug reports are welcome). The following user-visible improvements currently top the list: + Deadlock detection (eg., via timestamps). + Performance instrumentation ``DbmStats()''. + Make DbmRc a structure that includes errno. + Arbitrary size keys. + Multi-volume dbms (e.g., make it possible to allocate pages on a different filesystem once a limit is reached (soft or hard)). + Multi-dbm transactions. AUTHOR Barry Brachman Dept. of Computer Science University of British Columbia 18 February 1994 7