TDBM(3-LOCAL)                                       TDBM(3-LOCAL)


NAME
       tdbm  - dbm database functions with nested atomic transac-
       tions

SYNOPSIS
       #include <tdbm.h>

       DbmRc DbmAbort(tid)
       Tid *tid;

       DbmRc DbmBegin(parent, child)
       Tid *parent;
       Tid **child;

       DbmRc DbmClose(dbm)
       Dbm *dbm;

       DbmRc DbmCommit(tid)
       Tid *tid;

       DbmRc DbmDelete(dbm, tid, key)
       Dbm *dbm;
       Tid *tid;
       Datum key;

       char *DbmErrorString(rc)
       DbmRc rc;

       DbmRc DbmFetch(dbm, tid, key, value)
       Dbm *dbm;
       Tid *tid;
       Datum key;
       Datum *value;

       DbmRc DbmFind(dbm, tid, key, value)
       Dbm *dbm;
       Tid *tid;
       Datum key;
       Datum *value;

       DbmRc DbmFirstkey(dbm, tid, key)
       Dbm *dbm;
       Tid *tid;
       Datum *key;

       DbmRc DbmNextkey(dbm, tid, key)
       Dbm *dbm;
       Tid *tid;
       Datum *key;

       DbmRc DbmOpen(pathname, type, config, dbm, recovery)
       char *pathname;
       DbmFileType type;
       DbmConfig *config;

       Dbm **dbm;
       DbmRecovery **recovery;

       DbmRc DbmPrecommit(tid, tidname)
       Tid *tid;
       DbmTidName *tidname;

       DbmErrorClass DbmRcClass(rc)
       DbmRc rc;

       DbmRc DbmRecover(dbm, recovery, tid)
       Dbm *dbm;
       DbmRecovery **recovery;
       Tid **tid;

       DbmRc DbmStore(dbm, tid, key, value, mode)
       Dbm *dbm;
       Tid *tid;
       Datum key;
       Datum value;
       DbmStoreMode mode;

       DbmRc DbmTidName(tid, tidname)
       Tid *tid;
       TidName *tidname;


DESCRIPTION
       Tdbm is a collection of functions that implement a  simple
       database  made  up of key/content pairs.  While similar to
       the UNIX dbm(3) and ndbm(3) packages, there are  a  number
       of significant differences:

              +  Nested  atomic transactions are supported across
                 a single database.

              +  Volatile   (temporary   and   memory   resident)
                 databases can be used.

              +  A database is implemented as a single file.

              +  Very large objects can be stored.

              +  In   a  multi-threaded  environment,  concurrent
                 transactions are possible.

              +  In many cases, performance should be improved.

       The usual Datum data structure has  an  additional  compo-
       nent, a datum descriptor:
         typedef u_char EntryDesc;

         typedef struct {
           char        *dptr;
           int         dsize;
           EntryDesc   desc;
         } Datum;

       The  descriptor  can  be  used  to  specify  the alignment
       requirements of a key or value.  For the value, the system
       preserves  the requested alignment when it is read so that
       it can be accessed directly from the system buffer,  obvi-
       ating  the  need  to  copy the value into properly aligned
       memory.  For the key, the alignment is not  preserved  but
       the specified alignment can be determined later.

       The  low-order  two  bits of the descriptor specify align-
       ment.  The following constants are defined:

              ALIGN0 - No alignment required

              ALIGN2 - Align on any even address

              ALIGN4 - Align on any address divisible by 4

              ALIGN8 - Align on any address divisible by 8

       Before a database can be used, it must be opened by  call-
       ing  DbmOpen().   The  pathname  argument  identifies  the
       database to be used; it is created if necessary.  If  type
       is DBM_PERSISTENT, then the database will be a normal Unix
       file.  The file will be exclusively locked using flock(2).
       If  it is DBM_VOLATILE, then the database is temporary and
       will disappear when it is no longer  referenced  (or  when
       the  program terminates).  A database can be opened multi-
       ple times except  for  one  special  case.   For  volatile
       databases,  if pathname is NULL or the null string, then a
       unique  internal  name  is  effectively  assigned  to  the
       database.

       If  not  NULL,  config  allows the user to override system
       defaults for database parameters: If recovery is not NULL,
       a  list  of  precommitted  transactions  is  returned (see
       DbmPrecommit() and DbmRecover()).

         typedef struct DbmConfig {
           int         mode;         /* Mode when creating new dbm */
           int         pagesize;     /* Size of each page (bucket) in the dbm */
           int         allocunits;   /* Allocation units, w.r.t. pagesize */
           int            lockoptional; /* Don't worry if dbm can't be locked */
         } DbmConfig;

         typedef struct DbmTidName {
           int         hostid;
           int         start_time;
           int         count;
         } DbmTidName;

         typedef struct DbmRecovery {
           char        *pathname;
           DbmTidName  tidname;
           struct DbmRecovery *next;
         } DbmRecovery;

       A default is overridden only if a configuration option  is
       non-zero.  The lockoptional flag, when set, indicates that
       the open may proceed even if the  database  could  not  be
       locked  (probably because it is already locked).  This was
       added because versions of SunOS as recent as  4.1.2  some-
       times won't unlock a database on program termination.  You
       don't want to do this unless only a  single  process  will
       use the database or all processes opening the database are
       read-only.   An  identifier  associated  with  the  opened
       database is returned.

       Most  operations on a database occur within the context of
       a transaction.  A  transaction  is  initiated  by  calling
       DbmBegin().   If the parent argument is NULL, then this is
       a top-level transaction, otherwise the new transaction  is
       a  subtransaction of parent.  A new transaction identifier
       tid is returned.  In the current implementation, all oper-
       ations  within  a top-level transaction must be associated
       with a single database; a transaction is implicitly  asso-
       ciated with a database based on the first I/O operation it
       performs.

       N.B.  The value (or key) returned by a  function  must  be
       copied  if  it  is  going to be modified.  Also, a datum's
       dptr may no longer be valid if the transaction with  which
       it is associated aborts.

       New  entries  are stored in the database using DbmStore().
       The given value is put into the database using  key.   The
       mode  argument  is  either DBM_REPLACE or DBM_INSERT.  The
       former deletes an existing instance of an entry  with  the
       same  key  before  storing  the new entry while the latter
       insists that there be no  existing  entry.   The  key  and
       value are copied, so they may be freed after this call.

       DbmFetch()  is  used  to  retrieve  an  entry.   DbmFind()
       locates an entry without retrieving the value.  Both func-
       tions set the descriptor for the key and the value.

       An   entry   is  deleted  from  the  database  by  calling
       DbmDelete().

       A database can be traversed in (apparently) random  order.
       DbmFirstkey()  retrieves the key of the ``first'' entry in
       the database.  DbmNextkey() may be called to retrieve  the
       keys  of  successive  entries.   Both  functions  set  the
       descriptor for the key.  Note that these functions  should
       be  used  carefully  since  they  could end up locking the
       entire database while their transaction exists.

       A transaction  is  committed  by  calling  DbmCommit()  or
       aborted  by  calling  DbmAbort().   Aborting a transaction
       rolls back all modifications made to the database  by  the
       transaction.   In  either  case,  the transaction must not
       have any subtransactions.

       A top-level transaction may be  prepared  for  commitment,
       but  not  actually  committed,  by calling DbmPrecommit().
       Upon successful completion, the system guarantees  that  a
       subsequent  commit  will  succeed (modulo media failures).
       After this call, the only operations allowed on the trans-
       action  are DbmCommit() and DbmAbort().  If tidname is not
       NULL, then it is initialized to the (globally unique) name
       of  the transaction.  If DbmOpen() returns recovery infor-
       mation, then one or more precommitted transactions  exist.
       DbmRecover() should be called (until *recovery is NULL) to
       return a transaction identifier and the  transaction  name
       for  each  precommitted  transaction.  These functions are
       expected to be used by a two phase commit  protocol.   The
       transaction  name  (TidName)  for a particular transaction
       can be obtained using DbmTidName().

       A database is closed using DbmClose().  Closing a database
       aborts  any  active  transactions using it.  The given dbm
       identifier is invalidated, yet if the database was  opened
       multiple  times,  operations  on the database may continue
       using the other identifiers.

       DbmErrorString() returns an error message corresponding to
       the given result code.  DbmRcClass() returns an indication
       of the type of error that has occurred (e.g.,  DBM_FATAL).

WARNING
       It is unwise for a program to update a database that it is
       accessing via NFS; databases  should  always  be  accessed
       directly  on  the  file's server.  NFS file locking is not
       performed.  Also, byte ordering is not  addressed  by  the
       current implementation.

FILES
       As part of the atomic commit operation, a transaction file
       is created.  Its name is  that  of  the  database  with  a
       ``.trans''  appended.   The  mode of this file has special
       meaning to tdbm and therefore should not be changed by the
       user.

SEE ALSO
       dbm(3), ndbm(3).

       The  GNU gdbm library, Philip A. Nelson ,
       Computer Science Department,  Western  Washington  Univer-
       sity.

       The     Berkeley     Hash     package,    Margo    Seltzer
       .  See: ``A New Hash  Package
       for  UNIX'' by Margo Seltzer and Ozan Yigit in Winter 1991
       Usenix Proceedings.

       An interactive program, tdbm(1), is available  to  manipu-
       late and inspect tdbm files.

DIAGNOSTICS
       When  an  unusual  error occurs, a message is likely to be
       printed on stderr.

BUGS
       Because creation of a new database occurs outside  of  the
       transaction mechanism, rollback is not possible.

       Multithreaded  operation  is  currently supported only for
       the U.B.C. Threads kernel.

       There are currently configuration-dependent limits on  the
       length of a key (it must fit into a database ``page'') and
       the length of a value (depends on the  size  of  the  page
       allocation bit map and other factors).

       Although  very large, there are limits on the maximum size
       of a database.  A database cannot span filesystems (unless
       the O/S provides it transparently).

       A  mechanism  to  support efficient tree locking should be
       added.

       On systems without the realpath() library routine  (proba-
       bly  any  non-Sun  system), be sure to use identical path-
       names when referring to the same database file.

       The space used by deleted entries is not reclaimed by  the
       operating  system  but may be reused by the database.  The
       database file may contain holes,  making  it  appear  much
       bigger that it really is.  There is no automatic ``shrink-
       ing'' of the database.  If desired, this must be  done  by
       traversing  the  old database, copying each entry to a new
       database.

       The package cannot recover from media failures.

       The location of the transaction file is not  user  config-
       urable.   It  goes  in  the same directory as the database
       file.

       The obvious method of deleting  (or  adding)  items  while
       iterating   through   the  database  using  DbmFirstKey(),
       DbmNextKey(), is buggy.  No updates  should  be  performed
       until you're done iterating.

       The wish list of additional functionality is too long (but
       suggestions and bug reports are welcome).   The  following
       user-visible improvements currently top the list:

       +  Deadlock detection (eg., via timestamps).

       +  Performance instrumentation ``DbmStats()''.

       +  Make DbmRc a structure that includes errno.

       +  Arbitrary size keys.

       +  Multi-volume  dbms  (e.g., make it possible to allocate
          pages on a different filesystem once a limit is reached
          (soft or hard)).

       +  Multi-dbm transactions.

AUTHOR
       Barry Brachman 
       Dept. of Computer Science
       University of British Columbia


                         18 February 1994                       7