[med-svn] [r-cran-filehash] 03/05: New upstream version 2.3

Andreas Tille tille at debian.org
Tue Oct 10 09:47:22 UTC 2017


This is an automated email from the git hooks/post-receive script.

tille pushed a commit to branch master
in repository r-cran-filehash.

commit 22864ed6bafeb303dff664b845ac66c61e3867ea
Author: Andreas Tille <tille at debian.org>
Date:   Tue Oct 10 11:45:42 2017 +0200

    New upstream version 2.3
---
 DESCRIPTION                |  22 +++
 MD5                        |  44 +++++
 NAMESPACE                  |  62 +++++++
 R/coerce.R                 |  43 +++++
 R/dump.R                   |  67 +++++++
 R/filehash-DB1.R           | 442 ++++++++++++++++++++++++++++++++++++++++++++
 R/filehash-RDS.R           | 183 +++++++++++++++++++
 R/filehash.R               | 306 +++++++++++++++++++++++++++++++
 R/hash.R                   |  10 +
 R/queue.R                  |  91 ++++++++++
 R/stack.R                  |  91 ++++++++++
 R/zzz.R                    |  22 +++
 build/vignette.rds         | Bin 0 -> 209 bytes
 debian/README.test         |  10 -
 debian/changelog           |   5 -
 debian/compat              |   1 -
 debian/control             |  29 ---
 debian/copyright           |  29 ---
 debian/docs                |   3 -
 debian/rules               |  15 --
 debian/source/format       |   1 -
 debian/tests/control       |   3 -
 debian/tests/run-unit-test |  37 ----
 debian/watch               |   2 -
 inst/CITATION              |  14 ++
 inst/COPYING               |  19 ++
 inst/NEWS                  |  90 +++++++++
 inst/doc/filehash.R        | 145 +++++++++++++++
 inst/doc/filehash.Rnw      | 443 +++++++++++++++++++++++++++++++++++++++++++++
 inst/doc/filehash.pdf      | Bin 0 -> 100431 bytes
 man/createQ.Rd             |  31 ++++
 man/createS.Rd             |  31 ++++
 man/db2env.Rd              |  96 ++++++++++
 man/dbInit.Rd              |  64 +++++++
 man/dump.Rd                |  65 +++++++
 man/filehash-class.Rd      | 150 +++++++++++++++
 man/filehashFormats.Rd     |  30 +++
 man/filehashOption.Rd      |  27 +++
 man/push.Rd                |  43 +++++
 man/queue-class.Rd         |  47 +++++
 man/stack-class.Rd         |  50 +++++
 src/hash.c                 |  84 +++++++++
 src/lockfile.c             |  21 +++
 src/readKeyMap.c           |  65 +++++++
 src/sha1.c                 | 371 +++++++++++++++++++++++++++++++++++++
 src/sha1.h                 |  24 +++
 tests/SHA1SUM              |   2 +
 tests/misc/create-testdb.R |  14 ++
 tests/reg-tests.R          | 183 +++++++++++++++++++
 tests/reg-tests.Rout.save  | 304 +++++++++++++++++++++++++++++++
 tests/testdb-v1.1          | Bin 0 -> 726 bytes
 tests/testdb-v2.0          | Bin 0 -> 726 bytes
 tests/versions.R           |  22 +++
 tests/versions.Rout.save   | 114 ++++++++++++
 vignettes/combined.bib     |  50 +++++
 vignettes/filehash.Rnw     | 443 +++++++++++++++++++++++++++++++++++++++++++++
 56 files changed, 4425 insertions(+), 135 deletions(-)

diff --git a/DESCRIPTION b/DESCRIPTION
new file mode 100644
index 0000000..264febc
--- /dev/null
+++ b/DESCRIPTION
@@ -0,0 +1,22 @@
+Package: filehash
+Date: 2015-08-12
+Version: 2.3
+Depends: R (>= 3.0.0), methods
+Collate: filehash.R filehash-DB1.R filehash-RDS.R coerce.R dump.R
+        hash.R queue.R stack.R zzz.R
+Title: Simple Key-Value Database
+Author: Roger D. Peng <rdpeng at jhu.edu>
+Maintainer: Roger D. Peng <rdpeng at jhu.edu>
+Description: Implements a simple key-value style database where character string keys
+  are associated with data values that are stored on the disk. A simple interface is provided for inserting,
+  retrieving, and deleting data from the database. Utilities are provided that allow 'filehash' databases to be
+  treated much like environments and lists are already used in R. These utilities are provided to encourage
+  interactive and exploratory analysis on large datasets. Three different file formats for representing the
+  database are currently available and new formats can easily be incorporated by third parties for use in the
+  'filehash' framework.
+License: GPL (>= 2)
+URL: http://github.com/rdpeng/filehash
+Packaged: 2015-08-12 14:57:00 UTC; rdpeng
+NeedsCompilation: yes
+Repository: CRAN
+Date/Publication: 2015-08-16 07:30:57
diff --git a/MD5 b/MD5
new file mode 100644
index 0000000..1fd5c05
--- /dev/null
+++ b/MD5
@@ -0,0 +1,44 @@
+a50b1c1bdc0c3c65e36b52d5b85b364a *DESCRIPTION
+369879e4ab19e6934e6136e2f5a86f9f *NAMESPACE
+45232e1ac4dac258e45556bd5f43123c *R/coerce.R
+90f64221f6f44767deed77323c0d4db2 *R/dump.R
+d20b08ebd20413d3c82045b78e07d662 *R/filehash-DB1.R
+792728111ce93475ae36b03871c7fb51 *R/filehash-RDS.R
+b0769cf1a3599dced35d14ce286a651c *R/filehash.R
+23024206925dc990b4acd2f5f14d18b4 *R/hash.R
+b096fd971b2464562c885b8f6d3caeab *R/queue.R
+a75a45aef48fc227a52cc6144180eed0 *R/stack.R
+5d89ecc5246dec2fa4b5b6550d00c1bc *R/zzz.R
+aface47053e12f9628bb70a828a1ad75 *build/vignette.rds
+ed14a5c660ea0f0e723a406175038eb7 *inst/CITATION
+b128d2038f8d0c5c554b72de80298e4a *inst/COPYING
+44422cadef3c067ffd43b9f00a18aa9d *inst/NEWS
+6b5ee8f3a31a761dd38bd0238efbdfc1 *inst/doc/filehash.R
+7fb6c57e7c3a9b572359b21e60e07d3f *inst/doc/filehash.Rnw
+425c7ead37800f5dcfc3af75c2d7f43d *inst/doc/filehash.pdf
+196634a9bcac54d5a8c2f30c622caf8d *man/createQ.Rd
+76d0e8fb13ec5e82d2db2fe69c153399 *man/createS.Rd
+994907132b6735cd18cb2e58fbe38246 *man/db2env.Rd
+666fa5e79d7b3eef0600c59653613c1c *man/dbInit.Rd
+7540db5f93596a70ed9a552bda857059 *man/dump.Rd
+fd3538ab1ab32963e4450a394c1970e6 *man/filehash-class.Rd
+45f7f3a21e0d0cb2afbed678f4cf1e44 *man/filehashFormats.Rd
+fe2418282768cb27bdcef448742c263e *man/filehashOption.Rd
+19f14bdb7f0a406d23e38ea6d0a22405 *man/push.Rd
+3206913d23a67ae002e0214ee2093e94 *man/queue-class.Rd
+d6937904dedc8cd4e083e1b941d78b70 *man/stack-class.Rd
+e3ae87905dbded19881cffd6c80a8be4 *src/hash.c
+1f69eebc69b381da6e96c802a8c93003 *src/lockfile.c
+8a9e91209bf674c2db625d29298caad2 *src/readKeyMap.c
+2212ffe253fda0b122c209ae4e220c71 *src/sha1.c
+7cf279b5c8a6743e49a2c37d105733f5 *src/sha1.h
+d83a9585d249e9a093cb3c545cf49834 *tests/SHA1SUM
+f00793baa7e5a70812957058ec569332 *tests/misc/create-testdb.R
+4900ff9e4624ba08887b7ca8a702df8c *tests/reg-tests.R
+4872c25b98a848e6f7ef872f9dfbf617 *tests/reg-tests.Rout.save
+5b7464763d85ba9406c9e4dddce80d97 *tests/testdb-v1.1
+5b7464763d85ba9406c9e4dddce80d97 *tests/testdb-v2.0
+40829c8958672fbc650561dde99ea5a0 *tests/versions.R
+6e36a99376f5c4281e1acc0885d9d91c *tests/versions.Rout.save
+b2631aaa4f28eae69ba9e6b9b5bbbe80 *vignettes/combined.bib
+7fb6c57e7c3a9b572359b21e60e07d3f *vignettes/filehash.Rnw
diff --git a/NAMESPACE b/NAMESPACE
new file mode 100644
index 0000000..7b05eca
--- /dev/null
+++ b/NAMESPACE
@@ -0,0 +1,62 @@
+useDynLib(filehash)
+import(methods)
+
+## Classes
+exportClasses(
+              "filehash",
+              "filehashRDS",
+              "filehashDB1"
+              )
+
+
+## Primary interface
+exportMethods(
+              "dbInsert",
+              "dbFetch",
+              "dbExists",
+              "dbList",
+              "dbDelete",
+              "dbReorganize",
+              "dbUnlink",
+              "dbMultiFetch",
+              "dbCreate",
+              "dbInit",
+              "dbLoad",
+              "dbLazyLoad"
+              )
+
+exportMethods("[[", "[", "[[<-", "$<-", "$")
+
+export(
+       "filehashOption",
+       "registerFormatDB",
+       "filehashFormats"
+       )
+
+
+## Miscellaneous functions
+exportMethods(
+              "show",
+              "with",
+              "coerce",
+              "lapply",
+              "names",
+              "length"
+              )
+
+export(
+       "dumpDF",
+       "dumpObjects",
+       "db2env",
+       "dumpImage",
+       "dumpList",
+       "dumpEnv"
+       )
+
+## Stack and Queue stuff
+exportClasses("stack", "queue")
+exportMethods("isEmpty", "top", "push", "pop")
+exportMethods("mpush")
+
+export("createQ", "initQ")
+export("createS", "initS")
diff --git a/R/coerce.R b/R/coerce.R
new file mode 100644
index 0000000..23912b0
--- /dev/null
+++ b/R/coerce.R
@@ -0,0 +1,43 @@
+######################################################################
+## Copyright (C) 2006, Roger D. Peng <rpeng at jhsph.edu>
+##     
+## This program is free software; you can redistribute it and/or modify
+## it under the terms of the GNU General Public License as published by
+## the Free Software Foundation; either version 2 of the License, or
+## (at your option) any later version.
+## 
+## This program is distributed in the hope that it will be useful,
+## but WITHOUT ANY WARRANTY; without even the implied warranty of
+## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+## GNU General Public License for more details.
+## 
+## You should have received a copy of the GNU General Public License
+## along with this program; if not, write to the Free Software
+## Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+## 02110-1301, USA
+#####################################################################
+
+toDBType <- function(from, type, dbpath = NULL) {
+    if(is.null(dbpath))
+        dbpath <- dbName(from)
+    if(!dbCreate(dbpath, type = type))
+        stop("could not create ", type, " database")
+    db <- dbInit(dbpath, type = type)
+    keys <- dbList(from)
+    
+    for(key in keys)
+        dbInsert(db, key, dbFetch(from, key))
+    invisible(db)
+}
+
+setAs("filehashDB1", "filehashRDS",
+      function(from) {
+          dbpath <- paste(dbName(from), "RDS", sep = "")
+          toDBType(from, "RDS", dbpath)
+      })
+      
+setAs("filehashDB1", "list",
+      function(from) {
+          keys <- dbList(from)
+          dbMultiFetch(from, keys)
+      })
diff --git a/R/dump.R b/R/dump.R
new file mode 100644
index 0000000..b381e88
--- /dev/null
+++ b/R/dump.R
@@ -0,0 +1,67 @@
+######################################################################
+## Copyright (C) 2006--2008, Roger D. Peng <rpeng at jhsph.edu>
+##     
+## This program is free software; you can redistribute it and/or modify
+## it under the terms of the GNU General Public License as published by
+## the Free Software Foundation; either version 2 of the License, or
+## (at your option) any later version.
+## 
+## This program is distributed in the hope that it will be useful,
+## but WITHOUT ANY WARRANTY; without even the implied warranty of
+## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+## GNU General Public License for more details.
+## 
+## You should have received a copy of the GNU General Public License
+## along with this program; if not, write to the Free Software
+## Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+## 02110-1301, USA
+#####################################################################
+
+dumpEnv <- function(env, dbName) {
+        keys <- ls(env, all.names = TRUE)
+        dumpObjects(list = keys, dbName = dbName, envir = env)
+}
+
+dumpImage <- function(dbName = "Rworkspace", type = NULL) {
+        dumpObjects(list = ls(envir = globalenv(), all.names = TRUE),
+                    dbName = dbName, type = type, envir = globalenv())
+}
+
+dumpObjects <- function(..., list = character(0), dbName, type = NULL,
+                        envir = parent.frame()) {
+        names <- as.character(substitute(list(...)))[-1]
+        list <- c(list, names)
+        if(!dbCreate(dbName, type))
+                stop("could not create database file")
+        db <- dbInit(dbName, type)
+
+        for(i in seq(along = list)) 
+                dbInsert(db, list[i], get(list[i], envir))
+        db
+}
+
+dumpDF <- function(data, dbName = NULL, type = NULL) {
+        if(is.null(dbName))
+                dbName <- as.character(substitute(data))
+        dumpList(as.list(data), dbName = dbName, type = type)
+}
+
+dumpList <- function(data, dbName = NULL, type = NULL) {
+        if(!is.list(data))
+                stop("'data' must be a list")
+        vnames <- names(data)
+        
+        if(is.null(vnames) || isTRUE("" %in% vnames))
+                stop("list must have non-empty names")
+        if(is.null(dbName))
+                dbName <- as.character(substitute(data))
+        
+        if(!dbCreate(dbName, type))
+                stop("could not create database file")
+        db <- dbInit(dbName, type)
+
+        for(i in seq(along = vnames))
+                dbInsert(db, vnames[i], data[[vnames[i]]])
+        db
+}
+
diff --git a/R/filehash-DB1.R b/R/filehash-DB1.R
new file mode 100644
index 0000000..b516127
--- /dev/null
+++ b/R/filehash-DB1.R
@@ -0,0 +1,442 @@
+######################################################################
+## Copyright (C) 2006--2008, Roger D. Peng <rpeng at jhsph.edu>
+##     
+## This program is free software; you can redistribute it and/or modify
+## it under the terms of the GNU General Public License as published by
+## the Free Software Foundation; either version 2 of the License, or
+## (at your option) any later version.
+## 
+## This program is distributed in the hope that it will be useful,
+## but WITHOUT ANY WARRANTY; without even the implied warranty of
+## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+## GNU General Public License for more details.
+## 
+## You should have received a copy of the GNU General Public License
+## along with this program; if not, write to the Free Software
+## Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+## 02110-1301, USA
+#####################################################################
+
+######################################################################
+## Class 'filehashDB1'
+
+## Database entries
+##
+## File format: [key]        [nbytes data] [data]
+##              serialized   serialized    raw bytes (serialized)
+##
+
+######################################################################
+
+## 'meta' is a list of functions for updating the file size of the
+## database and the file map.
+
+setClass("filehashDB1",
+         representation(datafile = "character",
+                        meta = "list"),
+         contains = "filehash"
+         )
+
+setValidity("filehashDB1",
+            function(object) {
+                    if(!file.exists(object at datafile))
+                            return(gettextf("datafile '%s' does not exist",
+                                            datafile))
+                    TRUE
+            })
+
+createDB1 <- function(dbName) {
+        if(!hasWorkingFtell())
+                stop("need working 'ftell()' to use 'DB1' format")
+        if(file.exists(dbName)) {
+                message(gettextf("database '%s' already exists", dbName))
+                return(TRUE)
+        }
+        status <- file.create(dbName)
+
+        if(!status)
+                stop(gettextf("unable to create database file '%s'", dbName))
+        TRUE
+}
+
+makeMetaEnv <- function(filename) {
+        dbmap <- NULL  ## 'NULL' indicates the map needs to be read
+        dbfilesize <- file.info(filename)$size
+
+        updatesize <- function(size) {
+                dbfilesize <<- size
+        }
+        updatemap <- function(map) {
+                dbmap <<- map
+        }
+        getsize <- function() {
+                dbfilesize
+        }
+        getmap <- function() {
+                dbmap
+        }
+        list(updatesize = updatesize,
+             updatemap = updatemap,
+             getmap = getmap,
+             getsize = getsize)
+}
+
+initializeDB1 <- function(dbName) {
+        if(!hasWorkingFtell())
+                stop("need working 'ftell()' to use DB1 format")
+        dbName <- normalizePath(dbName)
+
+        new("filehashDB1",
+            datafile = dbName,
+            meta = makeMetaEnv(dbName),
+            name = basename(dbName)
+            )
+}
+
+
+readKeyMap <- function(con, map = NULL, pos = 0) {
+        if(is.null(map)) {
+                ## using 'hash = TRUE' is critical because it can have a major
+                ## impact on performance for large databases
+                map <- new.env(hash = TRUE, parent = emptyenv())
+                pos <- 0
+        }
+        if(pos < 0)
+                stop("'pos' cannot be negative")
+        filename <- path.expand(summary(con)$description)
+        filesize <- file.info(filename)$size
+
+        if(pos > filesize)
+                stop("'pos' cannot be greater than file size")
+        .Call("read_key_map", filename, map, filesize, pos)
+}
+
+readSingleKey <- function(con, map, key) {
+        start <- map[[key]]
+
+        if(is.null(start))
+                stop(gettextf("unable to obtain value for key '%s'", key))
+
+        seek(con, start, rw = "read")
+        unserialize(con)
+}
+
+readKeys <- function(con, map, keys) {
+        r <- lapply(keys, function(key) readSingleKey(con, map, key))
+        names(r) <- keys
+        r
+}
+
+gotoEndPos <- function(con) {
+        ## Move connection to the end
+        seek(con, 0, "end")
+        seek(con)
+}
+
+writeNullKeyValue <- function(con, key) {
+        writestart <- gotoEndPos(con)
+
+        handler <- function(cond) {
+                ## Rewind the file back to where writing began and truncate at
+                ## that position
+                seek(con, writestart, "start", "write")
+                truncate(con)
+                cond
+        }
+        tryCatch({
+                serialize(key, con)
+
+                len <- as.integer(-1)
+                serialize(len, con)
+        }, interrupt = handler, error = handler, finally = {
+                flush(con)
+        })
+}
+
+writeKeyValue <- function(con, key, value) {
+        writestart <- gotoEndPos(con)
+
+        handler <- function(cond) {
+                ## Rewind the file back to where writing began and
+                ## truncate at that position; this is probably a bad
+                ## idea for files > 2GB
+                seek(con, writestart, "start", "write")
+                truncate(con)
+                cond
+        }
+        tryCatch({
+                serialize(key, con)
+
+                byteData <- serialize(value, NULL)
+                len <- length(byteData)
+                serialize(len, con)
+
+                writeBin(byteData, con)
+        }, interrupt = handler, error = handler, finally = {
+                flush(con)
+        })
+}
+
+setMethod("lockFile", "file", function(db, ...) {
+        ## Use 3 underscores for lock file
+        sprintf("%s___LOCK", summary(db)$description)
+})
+
+createLockFile <- function(name) {
+        if(.Platform$OS.type != "windows") 
+                status <- .Call("lock_file", name)
+        else {
+                ## TODO: are these optimal values for max.attempts
+                ## and sleep.duration?
+                max.attempts <- 4
+                sleep.duration <- 0.5
+                attempts <- 0
+                status <- -1
+                while ((attempts <= max.attempts) && ! isTRUE(status >= 0)) {
+                        attempts <- attempts + 1
+                        status <- .Call("lock_file", name)
+
+                        if(!isTRUE(status >= 0))
+                                Sys.sleep(sleep.duration)
+                }
+        }
+        if(!isTRUE(status >= 0))
+                stop("cannot create lock file ", sQuote(name))
+        TRUE
+}
+
+deleteLockFile <- function(name) {
+        if(!file.remove(name))
+                stop(paste('cannot remove lock file "', name, '"', sep=''))
+        TRUE
+}
+
+################################################################################
+## Internal utilities
+
+filesize <- gotoEndPos
+
+setGeneric("checkMap", function(db, ...) standardGeneric("checkMap"))
+
+setMethod("checkMap", "filehashDB1",
+          function(db, filecon, ...) {
+                  old.size <- db at meta$getsize()
+                  cur.size <- tryCatch({
+                          filesize(filecon)
+                  }, error = function(err) {
+                          old.size
+                  })
+                  size.change <- old.size != cur.size
+                  map <- getMap(db)
+                  map0 <- map
+
+                  if(is.null(map))
+                          map <- readKeyMap(filecon)
+                  else if(size.change) {
+                          ## Modify 'map.old' directly
+                          map <- tryCatch({
+                                  readKeyMap(filecon, map, old.size)
+                          }, error = function(err) {
+                                  message(conditionMessage(err))
+                                  map0
+                          })
+                  }
+                  else
+                          map <- map0
+                  if(!identical(map, map0)) {
+                          db at meta$updatemap(map)
+                          db at meta$updatesize(cur.size)
+                  }
+                  invisible(db)
+          })
+
+
+setGeneric("getMap", function(db) standardGeneric("getMap"))
+
+setMethod("getMap", "filehashDB1",
+          function(db) {
+                  db at meta$getmap()
+          })
+
+################################################################################
+## Interface functions
+
+openDBConn <- function(filename, mode) {
+        con <- try({
+                file(filename, mode)
+        }, silent = TRUE)
+
+        if(inherits(con, "try-error"))
+                stop("unable to open connection to database")
+        con
+}
+
+setMethod("dbInsert",
+          signature(db = "filehashDB1", key = "character", value = "ANY"),
+          function(db, key, value, ...) {
+                  con <- openDBConn(db at datafile, "ab")
+                  on.exit(close(con))
+
+                  lockname <- lockFile(con)
+                  createLockFile(lockname)
+                  on.exit(deleteLockFile(lockname), add = TRUE)
+
+                  invisible(writeKeyValue(con, key, value))
+          })
+
+setMethod("dbFetch",
+          signature(db = "filehashDB1", key = "character"),
+          function(db, key, ...) {
+                  con <- openDBConn(db at datafile, "rb")
+                  on.exit(close(con))
+
+                  lockname <- lockFile(con)
+                  createLockFile(lockname)
+                  on.exit(deleteLockFile(lockname), add = TRUE)
+
+                  checkMap(db, con)
+                  map <- getMap(db)
+
+                  val <- readSingleKey(con, map, key)
+                  val
+          })
+
+setMethod("dbMultiFetch",
+          signature(db = "filehashDB1", key = "character"),
+          function(db, key, ...) {
+                  con <- openDBConn(db at datafile, "rb")
+                  on.exit(close(con))
+
+                  lockname <- lockFile(con)
+                  createLockFile(lockname)
+                  on.exit(deleteLockFile(lockname), add = TRUE)
+
+                  checkMap(db, con)
+                  map <- getMap(db)
+
+                  readKeys(con, map, key)
+          })
+
+setMethod("dbExists", signature(db = "filehashDB1", key = "character"),
+          function(db, key, ...) {
+                  dbkeys <- dbList(db)
+                  key %in% dbkeys
+          })
+
+setMethod("dbList", "filehashDB1",
+          function(db, ...) {
+                  con <- openDBConn(db at datafile, "rb")
+                  on.exit(close(con))
+
+                  lockname <- lockFile(con)
+                  createLockFile(lockname)
+                  on.exit(deleteLockFile(lockname), add = TRUE)
+
+                  checkMap(db, con)
+                  map <- getMap(db)
+
+                  if(length(map) == 0)
+                          character(0)
+                  else {
+                          keys <- as.list(map, all.names = TRUE)
+                          use <- !sapply(keys, is.null)
+                          names(keys[use])
+                  }
+          })
+
+setMethod("dbDelete", signature(db = "filehashDB1", key = "character"),
+          function(db, key, ...) {
+                  con <- openDBConn(db at datafile, "ab")
+                  on.exit(close(con))
+
+                  lockname <- lockFile(con)
+                  createLockFile(lockname)
+                  on.exit(deleteLockFile(lockname), add = TRUE)
+
+                  invisible(writeNullKeyValue(con, key))
+          })
+
+setMethod("dbUnlink", "filehashDB1",
+          function(db, ...) {
+                  file.remove(db at datafile)
+          })
+
+reorganizeDB <- function(db, ...) {
+        datafile <- db at datafile
+
+        ## Find a temporary file name
+        tempdata <- paste(datafile, "Tmp", sep = "")
+        i <- 0
+        while(file.exists(tempdata)) {
+                i <- i + 1
+                tempdata <- paste(datafile, "Tmp", i, sep = "")
+        }
+        if(!dbCreate(tempdata, type = "DB1")) {
+                warning("could not create temporary database")
+                return(FALSE)
+        }
+        on.exit(file.remove(tempdata))
+
+        tempdb <- dbInit(tempdata, type = "DB1")
+        keys <- dbList(db)
+
+        ## Copy all keys to temporary database
+        nkeys <- length(keys)
+        cat("Reorganizing database: ")
+
+        for(i in seq_along(keys)) {
+                key <- keys[i]
+                msg <- sprintf("%d%% (%d/%d)", round (100 * i / nkeys),
+                               i, nkeys)
+                cat(msg)
+
+                dbInsert(tempdb, key, dbFetch(db, key))
+
+                back <- paste(rep("\b", nchar(msg)), collapse = "")
+                cat(back)
+        }
+        cat("\n")
+        status <- file.rename(tempdata, datafile)
+
+        if(!isTRUE(status)) {
+                on.exit()
+                warning("temporary database could not be renamed and is left in ",
+                        tempdata)
+                return(FALSE)
+        }
+        on.exit()
+        cat("Finished; reload database with 'dbInit'\n")
+        TRUE
+}
+
+setMethod("dbReorganize", "filehashDB1", reorganizeDB)
+
+
+################################################################################
+## Test system's ftell()
+
+hasWorkingFtell <- function() {
+        tfile <- tempfile()
+        con <- file(tfile, "wb")
+
+        tryCatch({
+                bytes <- raw(10)
+                begin <- seek(con)
+
+                if(begin != 0)
+                        return(FALSE)
+                writeBin(bytes, con)
+                end <- seek(con)
+                offset <- end - begin
+                isTRUE(offset == 10)
+        }, error = function(e) {
+                FALSE
+        }, finally = {
+                close(con)
+                unlink(tfile)
+        })
+}
+
+######################################################################
+
+
diff --git a/R/filehash-RDS.R b/R/filehash-RDS.R
new file mode 100644
index 0000000..fbb0e1d
--- /dev/null
+++ b/R/filehash-RDS.R
@@ -0,0 +1,183 @@
+######################################################################
+## Copyright (C) 2006, Roger D. Peng <rpeng at jhsph.edu>
+##     
+## This program is free software; you can redistribute it and/or modify
+## it under the terms of the GNU General Public License as published by
+## the Free Software Foundation; either version 2 of the License, or
+## (at your option) any later version.
+## 
+## This program is distributed in the hope that it will be useful,
+## but WITHOUT ANY WARRANTY; without even the implied warranty of
+## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+## GNU General Public License for more details.
+## 
+## You should have received a copy of the GNU General Public License
+## along with this program; if not, write to the Free Software
+## Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+## 02110-1301, USA
+#####################################################################
+
+################################################################################
+## Class 'filehashRDS'
+
+setClass("filehashRDS",
+         representation(dir = "character"),
+         contains = "filehash"
+         )
+
+setValidity("filehashRDS",
+            function(object) {
+                    if(length(object at dir) != 1)
+                            return("only one directory should be set in 'dir'")
+                    if(!file.exists(object at dir))
+                            return(gettextf("directory '%s' does not exist",
+                                            object at dir))
+                    TRUE
+            })
+
+createRDS <- function(dbName) {
+        if(!file.exists(dbName)) {
+                status <- dir.create(dbName)
+
+                if(!status)
+                        stop(gettextf("unable to create database directory '%s'",
+                                      dbName))
+        }
+        else
+                message(gettextf("database '%s' already exists", dbName))
+        TRUE
+}
+
+initializeRDS <- function(dbName) {
+        ## Trailing '/' causes a problem in Windows?
+        dbName <- sub("/$", "", dbName, perl = TRUE)
+        new("filehashRDS", dir = normalizePath(dbName),
+            name = basename(dbName))
+}
+
+## For case-insensitive file systems, objects with the same name but
+## differ by capitalization might get clobbered.  `mangleName()'
+## inserts a "@" before each capital letter and `unMangleName()'
+## reverses the operation.
+
+mangleName <- function(oname) {
+        if(any(grep("@",oname,fixed=TRUE))) 
+                stop("RDS format cannot cope with objects with @ characters",
+                        " in their names")
+        gsub("([A-Z])", "@\\1", oname, perl = TRUE)
+}
+
+unMangleName <- function(mname) {
+        gsub("@", "", mname, fixed = TRUE)
+}
+
+## Function for mapping a key to a path on the filesystem
+setGeneric("objectFile", function(db, key) standardGeneric("objectFile"))
+setMethod("objectFile", signature(db = "filehashRDS", key = "character"),
+          function(db, key) {
+                  file.path(db at dir, mangleName(key))
+          })
+
+################################################################################
+## Interface functions
+
+setMethod("dbInsert",
+          signature(db = "filehashRDS", key = "character", value = "ANY"),
+          function(db, key, value, safe = TRUE, ...) {
+                  writefile <- if(safe)
+                          tempfile()
+                  else
+                          objectFile(db, key)
+                  con <- gzfile(writefile, "wb")
+
+                  writestatus <- tryCatch({
+                          serialize(value, con)
+                  }, condition = function(cond) {
+                          cond
+                  }, finally = {
+                          close(con)
+                  })
+                  if(inherits(writestatus, "condition"))
+                          stop(gettextf("unable to write object '%s'", key))
+                  if(!safe)
+                          return(invisible(!inherits(writestatus, "condition")))
+
+                  cpstatus <- file.copy(writefile, objectFile(db, key),
+                                        overwrite = TRUE)
+
+                  if(!cpstatus)
+                          stop(gettextf("unable to insert object '%s'", key))
+                  else {
+                          rmstatus <- file.remove(writefile)
+
+                          if(!rmstatus)
+                                  warning("unable to remove temporary file")
+                  }
+                  invisible(cpstatus)
+          })
+
+setMethod("dbFetch", signature(db = "filehashRDS", key = "character"),
+          function(db, key, ...) {
+                  ## Create filename from key
+                  ofile <- objectFile(db, key)
+                  ## Open connection
+                  val <- tryCatch({
+                          con<-gzfile(ofile)
+                          # note it is necessary to split creating and opening
+                          # the connection into two steps so that the connection
+                          # can be closed/destroyed successfully if ofile does 
+                          # not exist (avoiding connection leaks).
+                          open(con,"rb")
+                          ## Read data
+                          unserialize(con)
+                  }, condition = function(cond) {
+                          cond
+                  }, finally = {
+                          close(con)
+                  })
+                  if(inherits(val, "condition")) 
+                          stop(gettextf("unable to obtain value for key '%s'",
+                                        key))
+                  val
+          })
+
+setMethod("dbMultiFetch",
+          signature(db = "filehashRDS", key = "character"),
+          function(db, key, ...) {
+                  r <- lapply(key, function(k) dbFetch(db, k))
+                  names(r) <- key
+                  r
+          })
+
+setMethod("dbExists", signature(db = "filehashRDS", key = "character"),
+          function(db, key, ...) {
+                  key %in% dbList(db)
+          })
+
+setMethod("dbList", "filehashRDS",
+          function(db, ...) {
+                  ## list all keys/files in the database
+                  fileList <- dir(db at dir, all.files = TRUE, full.names = TRUE)
+                  use <- !file.info(fileList)$isdir
+                  fileList <- basename(fileList[use])
+
+                  unMangleName(fileList)
+          })
+
+setMethod("dbDelete", signature(db = "filehashRDS", key = "character"),
+          function(db, key, ...) {
+                  ofile <- objectFile(db, key)
+
+                  ## remove/delete the file
+                  status <- file.remove(ofile)
+                  invisible(isTRUE(all(status)))
+          })
+
+setMethod("dbUnlink", "filehashRDS",
+          function(db, ...) {
+                  ## delete the entire database directory
+                  d <- db at dir
+                  status <- unlink(d, recursive = TRUE)
+                  invisible(status)
+          })
+
diff --git a/R/filehash.R b/R/filehash.R
new file mode 100644
index 0000000..79ffeb0
--- /dev/null
+++ b/R/filehash.R
@@ -0,0 +1,306 @@
+######################################################################
+## Copyright (C) 2006, Roger D. Peng <rpeng at jhsph.edu>
+##     
+## This program is free software; you can redistribute it and/or modify
+## it under the terms of the GNU General Public License as published by
+## the Free Software Foundation; either version 2 of the License, or
+## (at your option) any later version.
+## 
+## This program is distributed in the hope that it will be useful,
+## but WITHOUT ANY WARRANTY; without even the implied warranty of
+## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+## GNU General Public License for more details.
+## 
+## You should have received a copy of the GNU General Public License
+## along with this program; if not, write to the Free Software
+## Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+## 02110-1301, USA
+#####################################################################
+
+######################################################################
+## Class 'filehash'
+
+setClass("filehash", representation(name = "character"))
+
+setValidity("filehash", function(object) {
+    if(length(object at name) == 0)
+        "database name has length 0"
+    else
+        TRUE
+})
+
+setGeneric("dbName", function(db) standardGeneric("dbName"))
+setMethod("dbName", "filehash", function(db) db at name)
+
+setMethod("show", "filehash",
+          function(object) {
+              if(length(object at name) == 0)
+                  stop("database does not have a name")
+              cat(gettextf("'%s' database '%s'\n", as.character(class(object)),
+                           object at name))
+          })
+
+
+######################################################################
+
+registerFormatDB <- function(name, funlist) {
+    if(!all(c("initialize", "create") %in% names(funlist)))
+        stop("need both 'initialize' and 'create' functions in 'funlist'")
+    r <- list(list(create = funlist[["create"]],
+                   initialize = funlist[["initialize"]]))
+    names(r) <- name
+    do.call("filehashFormats", r)
+    TRUE
+}
+
+filehashFormats <- function(...) {
+    args <- list(...)
+    n <- names(args)
+
+    for(n in names(args)) 
+        assign(n, args[[n]], .filehashFormats)
+    current <- as.list(.filehashFormats)
+
+    if(length(args) == 0)
+        current
+    else
+    invisible(current)
+}
+
+######################################################################
+## Create necessary database files.  On successful creation, return
+## TRUE.  If the database already exists, don't do anything but return
+## TRUE (and print a message).  If there's any other strange
+## condition, return FALSE.
+
+dbStartup <- function(dbName, type, action = c("initialize", "create")) {
+    action <- match.arg(action)
+    validFormat <- type %in% names(filehashFormats())
+    
+    if(!validFormat) 
+        stop(gettextf("'%s' not a valid database format", type))
+    formatList <- filehashFormats()[[type]]
+    doFUN <- formatList[[action]]
+
+    if(!is.function(doFUN))
+        stop(gettextf("'%s' function for database format '%s' is not valid",
+                      action, type))
+    doFUN(dbName)
+}    
+
+setGeneric("dbCreate", function(db, ...) standardGeneric("dbCreate"))
+
+setMethod("dbCreate", "ANY",
+          function(db, type = NULL, ...) {
+              if(is.null(type))
+                  type <- filehashOption()$defaultType
+
+              dbStartup(db, type, "create")
+          })
+          
+setGeneric("dbInit", function(db, ...) standardGeneric("dbInit"))
+
+setMethod("dbInit", "ANY",
+          function(db, type = NULL, ...) {
+              if(is.null(type))
+                  type <- filehashOption()$defaultType
+              dbStartup(db, type, "initialize")
+          })
+
+######################################################################
+## Set options and retrieve list of options
+
+filehashOption <- function(...) {
+    args <- list(...)
+    n <- names(args)
+
+    for(n in names(args)) 
+        assign(n, args[[n]], .filehashOptions)
+    current <- as.list(.filehashOptions)
+
+    if(length(args) == 0)
+        current
+    else
+        invisible(current)
+}
+
+######################################################################
+## Load active bindings into an environment
+
+setGeneric("dbLoad", function(db, ...) standardGeneric("dbLoad"))
+
+setMethod("dbLoad", "filehash",
+          function(db, env = parent.frame(2), keys = NULL, ...) {
+              if(is.null(keys))
+                  keys <- dbList(db)
+              else if(!is.character(keys))
+                  stop("'keys' should be a character vector")
+              active <- sapply(keys, function(k) {
+                  exists(k, env, inherits = FALSE)
+              })
+              if(any(active)) {
+                  warning("keys with active/regular bindings ignored: ",
+                          paste(sQuote(keys[active]), collapse = ", "))
+                  keys <- keys[!active]
+              }                      
+              make.f <- function(k) {
+                  key <- k
+                  function(value) {
+                      if(!missing(value)) {
+                          dbInsert(db, key, value)
+                          invisible(value)
+                      }
+                      else {
+                          obj <- dbFetch(db, key)
+                          obj
+                      }
+                  }
+              }
+              for(k in keys) 
+                  makeActiveBinding(k, make.f(k), env)
+              invisible(keys)
+          })
+
+setGeneric("dbLazyLoad", function(db, ...) standardGeneric("dbLazyLoad"))
+
+setMethod("dbLazyLoad", "filehash",
+          function(db, env = parent.frame(2), keys = NULL, ...) {
+              if(is.null(keys))
+                  keys <- dbList(db)
+              else if(!is.character(keys))
+                  stop("'keys' should be a character vector")
+              
+              wrap <- function(x, env) {
+                  key <- x
+                  delayedAssign(x, dbFetch(db, key), environment(), env)            
+              }
+              for(k in keys) 
+                  wrap(k, env)
+              invisible(keys)
+          })
+          
+## Load active bindings into an environment and return the environment
+
+db2env <- function(db) {
+    if(is.character(db))
+        db <- dbInit(db)  ## use the default type
+    env <- new.env(hash = TRUE)
+    dbLoad(db, env)
+    env
+}
+
+######################################################################
+## Other methods
+
+setGeneric("names")
+setMethod("names", "filehash",
+          function(x) {
+                  dbList(x)
+          })
+
+setGeneric("length")
+setMethod("length", "filehash",
+          function(x) {
+                  length(dbList(x))
+          })
+
+setAs("filehash", "list",
+      function(from) {
+              env <- new.env(hash = TRUE)
+              dbLoad(from, env)
+              as.list(env, all.names = TRUE)
+      })
+
+setGeneric("with")
+setMethod("with", "filehash",
+          function(data, expr, ...) {
+              env <- db2env(data)
+              eval(substitute(expr), env, enclos = parent.frame())
+          })
+
+setGeneric("lapply")
+setMethod("lapply", signature(X = "filehash"),
+          function(X, FUN, ..., keep.names = TRUE) {
+              FUN <- match.fun(FUN)
+              keys <- dbList(X)
+              rval <- vector("list", length = length(keys))
+              
+              for(i in seq(along = keys)) {
+                  obj <- dbFetch(X, keys[i])
+                  rval[[i]] <- FUN(obj, ...)
+              }
+              if(keep.names)
+                  names(rval) <- keys
+              rval
+          })
+
+######################################################################
+## Database interface
+
+setGeneric("dbMultiFetch", function(db, key, ...) {
+        standardGeneric("dbMultiFetch")
+})
+setGeneric("dbInsert", function(db, key, value, ...) {
+        standardGeneric("dbInsert")
+})
+setGeneric("dbFetch", function(db, key, ...) standardGeneric("dbFetch"))
+setGeneric("dbExists", function(db, key, ...) standardGeneric("dbExists"))
+setGeneric("dbList", function(db, ...) standardGeneric("dbList"))
+setGeneric("dbDelete", function(db, key, ...) standardGeneric("dbDelete"))
+setGeneric("dbReorganize", function(db, ...) standardGeneric("dbReorganize"))
+setGeneric("dbUnlink", function(db, ...) standardGeneric("dbUnlink"))
+
+## Other
+setOldClass(c("file", "connection"))
+setGeneric("lockFile", function(db, ...) standardGeneric("lockFile"))
+
+######################################################################
+## Extractor/replacement
+
+setMethod("[[", signature(x = "filehash", i = "character", j = "missing"),
+          function(x, i, j) {
+              dbFetch(x, i)
+          })
+
+setMethod("$", signature(x = "filehash"),
+          function(x, name) {
+              dbFetch(x, name)
+          })
+
+setReplaceMethod("[[", signature(x = "filehash", i = "character", j = "missing"),
+                 function(x, i, j, value) {
+                     dbInsert(x, i, value)
+                     x
+                 })
+
+setReplaceMethod("$", signature(x = "filehash"),
+                 function(x, name, value) {
+                     dbInsert(x, name, value)
+                     x
+                 })
+
+
+## Need to define these because they're not automatically caught.
+## Don't need this if R >= 2.4.0.
+
+setReplaceMethod("[[", signature(x = "filehash", i = "numeric", j = "missing"),
+                 function(x, i, j, value) {
+                     stop("numeric indices not allowed")
+                 })
+
+setMethod("[[", signature(x = "filehash", i = "numeric", j = "missing"),
+          function(x, i, j) {
+              stop("numeric indices not allowed")
+          })
+
+setMethod("[", signature(x = "filehash", i = "character", j = "missing",
+                         drop = "missing"),
+          function(x, i , j, drop) {
+                  dbMultiFetch(x, i)
+          })
+
+
+
+
+
+
diff --git a/R/hash.R b/R/hash.R
new file mode 100644
index 0000000..7fb8ab2
--- /dev/null
+++ b/R/hash.R
@@ -0,0 +1,10 @@
+sha1 <- function(object, skip = 14L) {
+	## Setting 'skip = 14' gives us the same results as
+	## 'digest(object, "sha1")'
+	bytes <- serialize(object, NULL)
+	.Call("sha1_object", bytes, skip)
+}
+
+sha1_file <- function(filename, skip = 0L) {
+	.Call("sha1_file", filename, skip)
+}
diff --git a/R/queue.R b/R/queue.R
new file mode 100644
index 0000000..25ce47f
--- /dev/null
+++ b/R/queue.R
@@ -0,0 +1,91 @@
+setClass("queue",
+         representation(queue = "filehashDB1",
+                        name = "character")
+         )
+
+setMethod("show", "queue",
+          function(object) {
+                  cat(gettextf("<queue: %s>\n", object at name))
+                  invisible(object)
+          })
+
+createQ <- function(filename) {
+        dbCreate(filename, "DB1")
+        queue <- dbInit(filename, "DB1")
+        dbInsert(queue, "head", NULL)
+        dbInsert(queue, "tail", NULL)
+
+        new("queue", queue = queue, name = filename)
+}
+
+initQ <- function(filename) {
+        new("queue",
+            queue = dbInit(filename, "DB1"),
+            name = filename)
+}
+
+## Public
+setGeneric("pop", function(db, ...) standardGeneric("pop"))
+setGeneric("push", function(db, val, ...) standardGeneric("push"))
+setGeneric("isEmpty", function(db, ...) standardGeneric("isEmpty"))
+setGeneric("top", function(db, ...) standardGeneric("top"))
+
+
+################################################################################
+## Methods
+
+setMethod("lockFile", "queue",
+          function(db, ...) {
+                  paste(db at name, "qlock", sep = ".")
+          })
+
+setMethod("push", c("queue", "ANY"), function(db, val, ...) {
+        ## Create a new tail node
+        node <- list(value = val,
+                     nextkey = NULL)
+        key <- sha1(node)
+
+        createLockFile(lockFile(db))
+        on.exit(deleteLockFile(lockFile(db)))
+
+        if(isEmpty(db))
+                dbInsert(db at queue, "head", key)
+        else {
+                ## Convert tail node to regular node
+                tailkey <- dbFetch(db at queue, "tail")
+                oldtail <- dbFetch(db at queue, tailkey)
+                oldtail$nextkey <- key
+                dbInsert(db at queue, tailkey, oldtail)
+        }
+        ## Insert new node and point tail to new node
+        dbInsert(db at queue, key, node)
+        dbInsert(db at queue, "tail", key)
+})
+
+setMethod("isEmpty", "queue", function(db) {
+        is.null(dbFetch(db at queue, "head"))
+})
+
+setMethod("top", "queue", function(db, ...) {
+        createLockFile(lockFile(db))
+        on.exit(deleteLockFile(lockFile(db)))
+
+        if(isEmpty(db))
+                stop("queue is empty")
+        h <- dbFetch(db at queue, "head")
+        node <- dbFetch(db at queue, h)
+        node$value
+})
+
+setMethod("pop", "queue", function(db, ...) {
+        createLockFile(lockFile(db))
+        on.exit(deleteLockFile(lockFile(db)))
+
+        if(isEmpty(db))
+                stop("queue is empty")
+        h <- dbFetch(db at queue, "head")
+        node <- dbFetch(db at queue, h)
+        dbInsert(db at queue, "head", node$nextkey)
+        dbDelete(db at queue, h)
+        node$value
+})
diff --git a/R/stack.R b/R/stack.R
new file mode 100644
index 0000000..36d4e4a
--- /dev/null
+++ b/R/stack.R
@@ -0,0 +1,91 @@
+setClass("stack",
+         representation(stack = "filehashDB1",
+                        name = "character"))
+
+setMethod("show", "stack",
+          function(object) {
+                  cat(gettextf("<stack: %s>\n", object at name))
+                  invisible(object)
+          })
+
+createS <- function(filename) {
+        dbCreate(filename, "DB1")
+        stack <- dbInit(filename, "DB1")
+        dbInsert(stack, "top", NULL)
+
+        new("stack", stack = stack, name = filename)
+}
+
+initS <- function(filename) {
+        new("stack",
+            stack = dbInit(filename, "DB1"),
+             name = filename)
+}
+
+setMethod("lockFile", "stack",
+          function(db, ...) {
+                  paste(db at name, "slock", sep = ".")
+          })
+
+setMethod("push", c("stack", "ANY"), function(db, val, ...) {
+        node <- list(value = val,
+                    nextkey = dbFetch(db at stack, "top"))
+        topkey <- sha1(node)
+
+        createLockFile(lockFile(db))
+        on.exit(deleteLockFile(lockFile(db)))
+
+        dbInsert(db at stack, topkey, node)
+        dbInsert(db at stack, "top", topkey)
+})
+
+setGeneric("mpush", function(db, vals, ...) standardGeneric("mpush"))
+
+setMethod("mpush", c("stack", "ANY"), function(db, vals, ...) {
+        if(!is.list(vals))
+                vals <- as.list(vals)
+        createLockFile(lockFile(db))
+        on.exit(deleteLockFile(lockFile(db)))
+
+        topkey <- dbFetch(db at stack, "top")
+
+        for(i in seq_along(vals)) {
+                node <- list(value = vals[[i]],
+                             nextkey = topkey)
+                topkey <- sha1(node)
+
+                dbInsert(db at stack, topkey, node)
+                dbInsert(db at stack, "top", topkey)
+        }
+})
+
+setMethod("isEmpty", "stack", function(db, ...) {
+        h <- dbFetch(db at stack, "top")
+        is.null(h)
+})
+
+
+setMethod("top", "stack", function(db, ...) {
+        createLockFile(lockFile(db))
+        on.exit(deleteLockFile(lockFile(db)))
+
+        if(isEmpty(db))
+                stop("stack is empty")
+        h <- dbFetch(db at stack, "top")
+        node <- dbFetch(db at stack, h)
+        node$value
+})
+
+setMethod("pop", "stack", function(db, ...) {
+        createLockFile(lockFile(db))
+        on.exit(deleteLockFile(lockFile(db)))
+
+        if(isEmpty(db))
+                stop("stack is empty")
+        h <- dbFetch(db at stack, "top")
+        node <- dbFetch(db at stack, h)
+
+        dbInsert(db at stack, "top", node$nextkey)
+        dbDelete(db at stack, h)
+        node$value
+})
diff --git a/R/zzz.R b/R/zzz.R
new file mode 100644
index 0000000..2ed8cd1
--- /dev/null
+++ b/R/zzz.R
@@ -0,0 +1,22 @@
+.onLoad <- function(lib, pkg) {
+        assign("defaultType", "DB1", .filehashOptions)
+
+        for(type in c("DB1", "RDS")) {
+                cname <- paste("create", type, sep = "")
+                iname <- paste("initialize", type, sep = "")
+                r <- list(create = get(cname, mode = "function"),
+                          initialize = get(iname, mode="function"))
+                assign(type, r, .filehashFormats)
+        }
+}
+
+.onAttach <- function(lib, pkg) {
+        dcf <- read.dcf(file.path(lib, pkg, "DESCRIPTION"))
+        msg <- gettextf("%s: %s (%s %s)", dcf[, "Package"], dcf[, "Title"],
+                        as.character(dcf[, "Version"]), dcf[, "Date"])
+        packageStartupMessage(paste(strwrap(msg), collapse = "\n"))
+}
+
+.filehashOptions <- new.env()
+
+.filehashFormats <- new.env()
diff --git a/build/vignette.rds b/build/vignette.rds
new file mode 100644
index 0000000..0f52621
Binary files /dev/null and b/build/vignette.rds differ
diff --git a/debian/README.test b/debian/README.test
deleted file mode 100644
index 3d2b347..0000000
--- a/debian/README.test
+++ /dev/null
@@ -1,10 +0,0 @@
-Notes on how this package can be tested.
-────────────────────────────────────────
-
-To run the unit tests provided by the package you can do
-
-   sh run-unit-test
-
-in this directory.
-
-
diff --git a/debian/changelog b/debian/changelog
deleted file mode 100644
index f253012..0000000
--- a/debian/changelog
+++ /dev/null
@@ -1,5 +0,0 @@
-r-cran-filehash (2.3-1) unstable; urgency=low
-
-  * Initial release (closes: #837344)
-
- -- Andreas Tille <tille at debian.org>  Sat, 10 Sep 2016 21:40:05 +0200
diff --git a/debian/compat b/debian/compat
deleted file mode 100644
index ec63514..0000000
--- a/debian/compat
+++ /dev/null
@@ -1 +0,0 @@
-9
diff --git a/debian/control b/debian/control
deleted file mode 100644
index ada3a10..0000000
--- a/debian/control
+++ /dev/null
@@ -1,29 +0,0 @@
-Source: r-cran-filehash
-Maintainer: Debian Med Packaging Team <debian-med-packaging at lists.alioth.debian.org>
-Uploaders: Andreas Tille <tille at debian.org>
-Section: gnu-r
-Priority: optional
-Build-Depends: debhelper (>= 9),
-               cdbs,
-               r-base-dev
-Standards-Version: 3.9.8
-Vcs-Browser: https://anonscm.debian.org/viewvc/debian-med/trunk/packages/R/r-cran-filehash/trunk/
-Vcs-Svn: svn://anonscm.debian.org/debian-med/trunk/packages/R/r-cran-filehash/trunk/
-Homepage: http://cran.r-project.org/web/packages/filehash
-
-Package: r-cran-filehash
-Architecture: any
-Depends: ${R:Depends},
-         ${misc:Depends},
-         ${shlibs:Depends}
-Description: GNU R simple key-value database
- This GNU R package implements a simple key-value style database where
- character string keys are associated with data values that are stored on
- the disk. A simple interface is provided for inserting, retrieving, and
- deleting data from the database. Utilities are provided that allow
- 'filehash' databases to be treated much like environments and lists are
- already used in R. These utilities are provided to encourage interactive
- and exploratory analysis on large datasets. Three different file formats
- for representing the database are currently available and new formats
- can easily be incorporated by third parties for use in the 'filehash'
- framework.
diff --git a/debian/copyright b/debian/copyright
deleted file mode 100644
index 0c0aa2e..0000000
--- a/debian/copyright
+++ /dev/null
@@ -1,29 +0,0 @@
-Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
-Upstream-Contact: Roger D. Peng <rdpeng at jhu.edu>
-Source: http://cran.r-project.org/web/packages/filehash
-
-Files: *
-Copyright: 2009-2016 Roger D. Peng <rdpeng at jhu.edu>
-License: GPL-2+
-
-Files: debian/*
-Copyright: 2016 Andreas Tille <tille at debian.org>
-License: GPL-2+
-
-License: GPL-2+
-    This program is free software; you can redistribute it and/or modify
-    it under the terms of the GNU General Public License as published by
-    the Free Software Foundation; either version 2 of the License, or
-    (at your option) any later version.
- .
-    This program is distributed in the hope that it will be useful,
-    but WITHOUT ANY WARRANTY; without even the implied warranty of
-    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-    GNU General Public License for more details.
- .
-    You should have received a copy of the GNU General Public License
-    along with this program; if not, write to the Free Software
-    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
- .
- On Debian systems, the complete text of the GNU General Public
- License can be found in `/usr/share/common-licenses/GPL-2'.
diff --git a/debian/docs b/debian/docs
deleted file mode 100644
index 960011c..0000000
--- a/debian/docs
+++ /dev/null
@@ -1,3 +0,0 @@
-tests
-debian/README.test
-debian/tests/run-unit-test
diff --git a/debian/rules b/debian/rules
deleted file mode 100755
index e2bc9e7..0000000
--- a/debian/rules
+++ /dev/null
@@ -1,15 +0,0 @@
-#!/usr/bin/make -f
-
-include /usr/share/R/debian/r-cran.mk
-
-install/$(package)::
-	rm -rf debian/$(package)/usr/lib/R/site-library/$(cranName)/LICENSE
-
-# if I would only know how to hook in after dh_installdocs - forget this magic
-# cdbs thingy and remove the file rather in the test sccript ...
-	# Delete tests depending from devtools since this is not (yet) packaged
-#	cd debian/$(package)/usr/share/doc/$(package)/tests/ ; \
-#	if grep -qR devtools * ; then \
-#	    rm -f `grep -lR devtools *` ; \
-#	fi
-
diff --git a/debian/source/format b/debian/source/format
deleted file mode 100644
index 163aaf8..0000000
--- a/debian/source/format
+++ /dev/null
@@ -1 +0,0 @@
-3.0 (quilt)
diff --git a/debian/tests/control b/debian/tests/control
deleted file mode 100644
index d2aa55a..0000000
--- a/debian/tests/control
+++ /dev/null
@@ -1,3 +0,0 @@
-Tests: run-unit-test
-Depends: @
-Restrictions: allow-stderr
diff --git a/debian/tests/run-unit-test b/debian/tests/run-unit-test
deleted file mode 100644
index d13482a..0000000
--- a/debian/tests/run-unit-test
+++ /dev/null
@@ -1,37 +0,0 @@
-#!/bin/sh -e
-
-pkg=r-cran-filehash
-
-# The saved result files do contain some differences in metadata and we also
-# need to ignore version differences of R
-filter() {
-    grep -v -e '^R version' \
-        -e '^Copyright (C)' \
-        -e '^R : Copyright 20' \
-        -e '^Version 2.0' \
-        -e '^Platform:' \
-        -e '^Spam version .* is loaded.' \
-        -e '^ISBN 3-900051-07-0' \
-        $1 | \
-    sed -e '/^> *proc\.time()$/,$d'
-}
-
-if [ "$ADTTMP" = "" ] ; then
-  ADTTMP=`mktemp -d /tmp/${pkg}-test.XXXXXX`
-  trap "rm -rf $ADTTMP" 0 INT QUIT ABRT PIPE TERM
-fi
-cd $ADTTMP
-cp -a /usr/share/doc/${pkg}/tests/* $ADTTMP
-find . -name "*.gz" -exec gunzip \{\} \;
-for htest in `ls *.R | sed 's/\.R$//'` ; do
-   LC_ALL=C R --no-save < ${htest}.R 2>&1 | tee > ${htest}.Rout
-   filter ${htest}.Rout.save > ${htest}.Rout.save_
-   filter ${htest}.Rout > ${htest}.Rout_
-   diff -u --ignore-all-space ${htest}.Rout.save_ ${htest}.Rout_
-   if [ ! $? ] ; then
-     echo "Test ${htest} failed"
-     exit 1
-   else
-     echo "Test ${htest} passed"
-   fi
-done
diff --git a/debian/watch b/debian/watch
deleted file mode 100644
index 4f57bbb..0000000
--- a/debian/watch
+++ /dev/null
@@ -1,2 +0,0 @@
-version=3
-http://cran.r-project.org/src/contrib/filehash_([-\d.]*)\.tar\.gz
diff --git a/inst/CITATION b/inst/CITATION
new file mode 100644
index 0000000..a38a6f5
--- /dev/null
+++ b/inst/CITATION
@@ -0,0 +1,14 @@
+citHeader("The reference for the 'filehash' package is:")
+
+citEntry(entry = "article",
+         title = "Interacting with data using the filehash package",
+         author = personList(person("Roger", "Peng", "D.")),
+	 journal = "R News",
+         year = "2006",
+	 volume = "6",
+         number = "4",
+         pages = "19--24",
+	 url = "http://CRAN.R-project.org/doc/Rnews/",
+         textVersion = paste("Peng RD (2006).", dQuote("Interacting with data using the filehash package,"), "R News, 6 (4), 19--24.")
+         )
+
diff --git a/inst/COPYING b/inst/COPYING
new file mode 100644
index 0000000..ffea8cd
--- /dev/null
+++ b/inst/COPYING
@@ -0,0 +1,19 @@
+License
+=======
+
+`filehash' is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2 of the License, or (at your
+option) any later version.
+
+This program is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; if not, write to the Free Software
+Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+02110-1301, USA
+
+
diff --git a/inst/NEWS b/inst/NEWS
new file mode 100644
index 0000000..be2e358
--- /dev/null
+++ b/inst/NEWS
@@ -0,0 +1,90 @@
+Check the 'filehash' git repository for the latest updates on the
+package at http://repo.or.cz/w/filehash.git
+
+Version 1.0
+-----------
+
+* The 'DB' format has been removed; users should use 'DB1' instead
+
+* Internals of 'DB1' format have changed so that it should be a bit
+more reliable but perhaps a little slower
+
+* The 'dbDisconnect' generic has been removed since it is no longer
+necessary for the 'DB1' format (as it was before).  It was never
+needed for the 'RDS' format and one never existed for that format.
+
+
+Version 0.9
+-----------
+
+* For 'filehashRDS' class, the 'dbDir' slot has been renamed to 'dir'.
+
+* An attempt has been made to normalize the error handling to make it
+consistent.
+
+* The various 'dump' functions have been given a 'type' argument
+
+
+Version 0.8
+-----------
+
+* Added function dbLazyLoad for lazy loading filehash databases.
+
+* dbCreate and dbInit are now generics with a method for character
+vectors.  The behavior should be the same as before, by default.
+
+* dbLoad is generic.
+
+* The second argument to dbMultiFetch is 'key', not 'keys'.
+
+* dbInitialize is deprecated
+
+* 'DB1' and 'RDS' formats use normalizePath() for resolving paths to
+directories
+
+* There is a vignette now [via vignette("filehash")]
+
+
+Version 0.6-3
+-------------
+
+* Added methods for "[[", "$", "[[<-", and "$<-" for filehash
+objects. Only character indices are allowed
+
+* filehash-DB functions use the new serialize() from R 2.4.0 so that
+numeric data will not suffer from rounding error due to previous use
+of serialize(ascii = TRUE).
+
+* New format filehash-DB1 which stores the key index/map and data in a
+single file.
+
+* New "filehash" method for lapply so that functions can be applied to
+database entries.
+
+
+Version 0.4-1
+-------------
+
+* Patch release, changed some internals for the "DB" type databases
+
+* Added test database for regression testing in future releases
+
+
+Version 0.4
+-----------
+
+* Added name mangling scheme to prevent clobbering on case-insensitive
+OSes like Windows (thanks to Bill Venables and David Brahm)
+
+* Added dumpImage, dumpObjects, dumpDF functions for dumping various
+things to filehash databases
+
+* Added filehashOption() function for setting global options; right
+now only the default database type can be set
+
+* dbLoad and db2env are regular functions now rather than
+generics/methods.  dbLoad's default 'env' is the parent frame now
+
+* Added a "filehash" method for 'with'
+
+* Added new generic dbUnlink which deletes a database from the disk
diff --git a/inst/doc/filehash.R b/inst/doc/filehash.R
new file mode 100644
index 0000000..5e3ec0f
--- /dev/null
+++ b/inst/doc/filehash.R
@@ -0,0 +1,145 @@
+### R code from vignette source 'filehash.Rnw'
+
+###################################################
+### code chunk number 1: options
+###################################################
+options(width=60)
+
+
+###################################################
+### code chunk number 2: exampleGlobalEnv
+###################################################
+x <- 1
+print(x)
+
+
+###################################################
+### code chunk number 3: create
+###################################################
+library(filehash)
+dbCreate("mydb")
+db <- dbInit("mydb")
+
+
+###################################################
+### code chunk number 4: setseed1
+###################################################
+set.seed(100)
+
+
+###################################################
+### code chunk number 5: insert
+###################################################
+dbInsert(db, "a", rnorm(100))
+
+
+###################################################
+### code chunk number 6: fetch
+###################################################
+value <- dbFetch(db, "a")
+mean(value)
+
+
+###################################################
+### code chunk number 7: delete
+###################################################
+dbInsert(db, "b", 123)
+dbDelete(db, "a")
+dbList(db)
+dbExists(db, "a")
+
+
+###################################################
+### code chunk number 8: accessors
+###################################################
+db$a <- rnorm(100, 1)
+mean(db$a)
+mean(db[["a"]])
+db$b <- rnorm(100, 2)
+dbList(db)
+
+
+###################################################
+### code chunk number 9: characteronly
+###################################################
+e <- local({
+    err <- function(e) e
+    tryCatch(db[[1]], error = err)
+})
+conditionMessage(e)
+
+
+###################################################
+### code chunk number 10: with
+###################################################
+with(db, c(a = mean(a), b = mean(b)))
+
+
+###################################################
+### code chunk number 11: sapply
+###################################################
+sapply(db[c("a", "b")], mean)
+
+
+###################################################
+### code chunk number 12: lapply
+###################################################
+unlist(lapply(db, mean))
+
+
+###################################################
+### code chunk number 13: cleanupMyDB
+###################################################
+dbUnlink(db)
+rm(list = ls(all = TRUE))
+
+
+###################################################
+### code chunk number 14: setseed2
+###################################################
+set.seed(200)
+
+
+###################################################
+### code chunk number 15: testDB
+###################################################
+dbCreate("testDB")
+db <- dbInit("testDB")
+db$x <- rnorm(100)
+db$y <- runif(100)
+db$a <- letters
+dbLoad(db)
+ls()
+
+
+###################################################
+### code chunk number 16: accessbinding
+###################################################
+mean(y)
+sort(a)
+
+
+###################################################
+### code chunk number 17: assignvalue
+###################################################
+y <- rnorm(100, 2)
+mean(y)
+
+
+###################################################
+### code chunk number 18: removeandload
+###################################################
+rm(list = ls())
+db <- dbInit("testDB")
+dbLoad(db)
+ls()
+mean(y)
+
+
+###################################################
+### code chunk number 19: cleanupTestDB
+###################################################
+dbUnlink(db)
+rm(list = ls(all = TRUE))
+
+
diff --git a/inst/doc/filehash.Rnw b/inst/doc/filehash.Rnw
new file mode 100644
index 0000000..c82aead
--- /dev/null
+++ b/inst/doc/filehash.Rnw
@@ -0,0 +1,443 @@
+\documentclass{article}
+
+%%\VignetteIndexEntry{The filehash Package}
+%%\VignetteDepends{filehash}
+
+\usepackage{charter}
+\usepackage{courier}
+\usepackage[noae]{Sweave}
+\usepackage[margin=1in]{geometry}
+\usepackage{natbib}
+
+\title{Interacting with Data using the \textbf{filehash} Package for
+R}
+
+\author{Roger D. Peng $<$rpeng at jhsph.edu$>$\\\textit{Department of
+Biostatistics}\\\textit{Johns Hopkins Bloomberg School of Public Health}}
+
+\date{}
+
+\newcommand{\pkg}{\textbf}
+\newcommand{\code}{\texttt}
+
+\begin{document}
+
+\maketitle
+
+\begin{abstract}
+The \pkg{filehash} package for R implements a simple key-value style
+database where character string keys are associated with data values
+that are stored on the disk.  A simple interface is provided for
+inserting, retrieving, and deleting data from the database.  Utilities
+are provided that allow \pkg{filehash} databases to be treated much
+like environments and lists are already used in R.  These utilities
+are provided to encourage interactive and exploratory analysis on
+large datasets.  Three different file formats for representing the
+database are currently available and new formats can easily be
+incorporated by third parties for use in the \pkg{filehash} framework.
+\end{abstract}
+
+<<options,results=hide,echo=false>>=
+options(width=60)
+@ 
+
+\section{Overview and Motivation}
+
+Working with large datasets in R can be cumbersome because of the need
+to keep objects in physical memory.  While many might generally see
+that as a feature of the system, the need to keep whole objects in
+memory creates challenges to those who might want to work
+interactively with large datasets.  Here we take a simple definition
+of ``large dataset'' to be any dataset that cannot be loaded into R as
+a single R object because of memory limitations.  For example, a very
+large data frame might be too large for all of the columns and rows to
+be loaded at once.  In such a situation, one might load only a subset
+of the rows or columns, if that is possible.
+
+In a key-value database, an arbitrary data object (a ``value'') has a
+``key'' associated with it, usually a character string.  When one
+requests the value associated with a particular key, it is the
+database's job to match up the key with the correct value and return
+the value to the requester.
+
+The most straightforward example of a key-value database in R is the
+global environment.  Every object in R has a name and a value
+associated with it.  When you execute at the R prompt
+<<exampleGlobalEnv,results=hide>>=
+x <- 1
+print(x)
+@ 
+the first line assigns the value 1 to the name/key ``x''.  The second
+line requests the value of ``x'' and prints out 1 to the console.  R
+handles the task of finding the appropriate value for ``x'' by
+searching through a series of environments, including the namespaces
+of the packages on the search list.
+
+In most cases, R stores the values associated with keys in memory, so
+that the value of \code{x} in the example above was stored in and
+retrieved from physical memory.  However, the idea of a key-value
+database can be generalized beyond this particular configuration.  For
+example, as of R 2.0.0, much of the R code for R packages is stored in
+a lazy-loaded database, where the values are initially stored on disk
+and loaded into memory on first access~\citep{Rnews:Ripley:2004}.
+Hence, when R starts up, it uses relatively little memory, while the
+memory usage increases as more objects are requested.  Data could also
+be stored on other computers (e.g. websites) and retrieved over the
+network.
+
+The general S language concept of a database is described in Chapter 5
+of the Green Book~\citep{cham:1998} and earlier in~\cite{cham:1991}.
+Although the S and R languages have different semantics with respect
+to how variable names are looked up and bound to values, the general
+concept of using a key-value database applies to both languages.
+Duncan Temple Lang has implemented this general database framework for
+R in the \pkg{RObjectTables} package of
+Omegahat~\citep{TempleLang:2002}. The \pkg{RObjectTables} package
+provides an interface for connecting R with arbitrary backend systems,
+allowing data values to be stored in potentially any format or
+location.  While the package itself does not include a specific
+implementation, some examples are provided on the package's website.
+
+The \pkg{filehash} package provides a full read-write implementation
+of a key-value database for R.  The package does not depend on any
+external packages (beyond those provided in a standard R installation)
+or software systems and is written entirely in R, making it readily
+usable on most platforms.  The \pkg{filehash} package can be thought
+of as a specific implementation of the database concept described
+in~\cite{cham:1991}, taking a slightly different approach to the
+problem.  Both~\cite{TempleLang:2002} and~\cite{cham:1991} focus on
+generalizing the notion of ``attach()-ing'' a database in an R/S
+session so that variable names can be looked up automatically via the
+search list.  The \pkg{filehash} package represents a database as an
+instance of an S4 class and operates directly on the S4 object via
+various methods.
+
+Key-value databases are sometimes called hash tables and indeed, the
+name of the package comes from the idea of having a ``file-based hash
+table''.  With \pkg{filehash} the values are stored in a file on the
+disk rather than in memory.  When a user requests the values
+associated with a key, \pkg{filehash} finds the object on the disk,
+loads the value into R and returns it to the user.  The package offers
+two formats for storing data on the disk: The values can be stored (1)
+concatenated together in a single file or (2) separately as a
+directory of files.
+
+
+
+
+\section{Related R packages}
+
+There are other packages on CRAN designed specifically to help users
+work with large datasets.  Two packages that come immediately to mind
+are the \pkg{g.data} package by David Brahm~\citep{brahm:2002} and the
+\pkg{biglm} package by Thomas Lumley.  The \pkg{g.data} package takes
+advantage of the lazy evaluation mechanism in R via the
+\code{delayedAssign} function.  Briefly, objects are loaded into R as
+promises to load the actual data associated with an object name.  The
+first time an object is requested, the promise is evaluated and the
+data are loaded.  From then on, the data reside in memory.  The
+mechanism used in \pkg{g.data} is similar to the one used by the
+lazy-loaded databases described in~\cite{Rnews:Ripley:2004}.  The
+\pkg{biglm} package allows users to fit linear models on datasets that
+are too large to fit in memory.  However, the \pkg{biglm} package does
+not provide methods for dealing with large datasets in general.  The
+\pkg{filehash} package also draws inspiration from Luke Tierney's
+experimental \pkg{gdbm} package which implements a key-value database
+via the GNU dbm (GDBM) library.  The use of GDBM creates an external
+dependence since the GDBM C library has to be compiled on each system.
+In addition, I encountered a problem where databases created on 32-bit
+machines could not be transferred to and read on 64-bit machines (and
+vice versa).  However, with the increasing use of 64-bit machines in
+the future, it seems this problem will eventually go away.
+
+The R Special Interest Group on Databases has developed a number of
+packages that provide an R interface to commonly used relational
+database management systems (RDBMS) such as MySQL (\pkg{RMySQL}),
+PostgreSQL (\pkg{RPgSQL}), and Oracle (\pkg{ROracle}).  These packages
+use the S4 classes and generics defined in the \pkg{DBI} package and
+have the advantage that they offer much better database functionality,
+inherited via the use of a true database management system.  However,
+this benefit comes with the cost of having to install and use
+third-party software.  While installing an RDBMS may not be an
+issue---many systems have them pre-installed and the \pkg{RSQLite}
+package comes bundled with the source for the RDBMS---the need for the
+RDBMS and knowledge of structured query language (SQL) nevertheless
+adds some overhead.  This overhead may serve as an impediment for
+users in need of a database for simpler applications.
+
+
+
+\section{Creating a filehash database}
+
+Databases can be created with \pkg{filehash} using the \code{dbCreate}
+function.  The one required argument is the name of the database,
+which we call here ``mydb''.  
+<<create>>=
+library(filehash)
+dbCreate("mydb")
+db <- dbInit("mydb")
+@ 
+You can also specify the \code{type} argument which controls how the
+database is represented on the backend.  We will discuss the different
+backends in further detail later.  For now, we use the default backend
+which is called ``DB1''.  
+
+Once the database is created, it must be initialized in order to be
+accessed.  The \code{dbInit} function returns an S4 object inheriting
+from class ``filehash''.  Since this is a newly created database,
+there are no objects in it.
+
+\section{Accessing a filehash database}
+
+<<setseed1,results=hide,echo=false>>=
+set.seed(100)
+@ 
+
+The primary interface to filehash databases consists of the functions
+\code{dbFetch}, \code{dbInsert}, \code{dbExists}, \code{dbList}, and
+\code{dbDelete}.  These functions are all generic---specific methods
+exists for each type of database backend.  They all take as their
+first argument an object of class ``filehash''.  To insert some data
+into the database we can simply call \code{dbInsert}
+<<insert>>=
+dbInsert(db, "a", rnorm(100))
+@ 
+Here we have associated with the key ``a'' 100 standard normal random
+variates.  We can retrieve those values with \code{dbFetch}.
+<<fetch>>=
+value <- dbFetch(db, "a")
+mean(value)
+@ 
+
+The function \code{dbList} lists all of the keys that are available in
+the database, \code{dbExists} tests to see if a given key is in the
+database, and \code{dbDelete} deletes a key-value pair from the
+database
+<<delete>>=
+dbInsert(db, "b", 123)
+dbDelete(db, "a")
+dbList(db)
+dbExists(db, "a")
+@ 
+
+While using functions like \code{dbInsert} and \code{dbFetch} is
+straightforward it can often be easier on the fingers to use standard
+R subset and accessor functions like \code{\$}, \code{[[}, and
+\code{[}. Filehash databases have methods for these functions so that
+objects can be accessed in a more compact manner. Similarly,
+replacement methods for these functions are also available. The
+\verb+[+ function can be used to access multiple objects from the
+database, in which case a list is returned.
+
+<<accessors>>=
+db$a <- rnorm(100, 1)
+mean(db$a)
+mean(db[["a"]])
+db$b <- rnorm(100, 2)
+dbList(db)
+@ 
+For all of the accessor functions, only character indices are allowed.
+Numeric indices are caught and an error is given.
+<<characteronly>>=
+e <- local({
+    err <- function(e) e
+    tryCatch(db[[1]], error = err)
+})
+conditionMessage(e)
+@ 
+Finally, there is method for the \code{with} generic function which
+operates much like using \code{with} on lists or environments.  
+
+The following three statements all return the same value.
+<<with>>=
+with(db, c(a = mean(a), b = mean(b)))
+@ 
+When using \code{with}, the values of ``a'' and ``b'' are looked up in
+the database.
+<<sapply>>=
+sapply(db[c("a", "b")], mean)
+@ 
+Here, using \code{[} on \code{db} returns a list with the values
+associated with ``a'' and ``b''.  Then \code{sapply} is applied in the
+usual way on the returned list.
+<<lapply>>=
+unlist(lapply(db, mean))
+@ 
+In the last statement we call \code{lapply} directly on the
+``filehash'' object.  The \pkg{filehash} package defines a method for
+\code{lapply} that allows the user to apply a function on all the
+elements of a database directly.  The method essentially loops through
+all the keys in the database, loads each object separately and applies
+the supplied function to each object.  \code{lapply} returns a named
+list with each element being the result of applying the supplied
+function to an object in the database.  There is an argument
+\code{keep.names} to the \code{lapply} method which, if set to
+\code{FALSE}, will drop all the names from the list.
+
+<<cleanupMyDB,results=hide,echo=false>>=
+dbUnlink(db)
+rm(list = ls(all = TRUE))
+@ 
+
+\section{Loading filehash databases}
+
+<<setseed2,results=hide,echo=false>>=
+set.seed(200)
+@ 
+
+An alternative way of working with a filehash database is to load it
+into an environment and access the element names directly, without
+having to use any of the accessor functions.  The \pkg{filehash}
+function \code{dbLoad} works much like the standard R \code{load}
+function except that \code{dbLoad} loads active bindings into a given
+environment rather than the actual data.  The active bindings are
+created via the \code{makeActiveBinding} function in the \pkg{base}
+package.  \code{dbLoad} takes a filehash database and creates symbols
+in an environment corresponding to the keys in the database.  It then
+calls \code{makeActiveBinding} to associate with each key a function
+which loads the data associated with a given key.  Conceptually,
+active bindings are like pointers to the database.  After calling
+\code{dbLoad}, anytime an object with an active binding is accessed
+the associated function (installed by \code{makeActiveBinding}) loads
+the data from the database.
+
+We can create a simple database to demonstrate the active binding
+mechanism.
+<<testDB>>=
+dbCreate("testDB")
+db <- dbInit("testDB")
+db$x <- rnorm(100)
+db$y <- runif(100)
+db$a <- letters
+dbLoad(db)
+ls()
+@ 
+Notice that we appear to have some additional objects in our
+workspace.  However, the values of these objects are not stored in
+memory---they are stored in the database.  When one of the objects is
+accessed, the value is automatically loaded from the database.
+<<accessbinding>>=
+mean(y)
+sort(a)
+@ 
+If I assign a different value to one of these objects, its
+associated value is updated in the database via the active binding
+mechanism.
+<<assignvalue>>=
+y <- rnorm(100, 2)
+mean(y)
+@ 
+If I subsequently remove the database and reload it later, the
+updated value for ``y'' persists.
+<<removeandload>>=
+rm(list = ls())
+db <- dbInit("testDB")
+dbLoad(db)
+ls()
+mean(y)
+@ 
+
+Perhaps one disadvantage of the active binding approach taken here is
+that whenever an object is accessed, the data must be reloaded into R.
+This behavior is distinctly different from the the delayed assignment
+approach taken in \pkg{g.data} where an object must only be loaded
+once and then is subsequently in memory.  However, when using delayed
+assignments, if one cycles through all of the objects in the database,
+one could eventually exhaust the available memory.
+
+<<cleanupTestDB,results=hide,echo=false>>=
+dbUnlink(db)
+rm(list = ls(all = TRUE))
+@ 
+
+\section{Other filehash utilities}
+
+There are a few other utilities included with the \pkg{filehash}
+package.  Two of the utilities, \code{dumpObjects} and
+\code{dumpImage}, are analogues of \code{save} and \code{save.image}.
+Rather than save objects to an R workspace, \code{dumpObjects} saves
+the given objects to a ``filehash'' database so that in the future,
+individual objects can be reloaded if desired.  Similarly,
+\code{dumpImage} saves the entire workspace to a ``filehash''
+database.
+
+The function \code{dumpList} takes a list and creates a ``filehash''
+database with values from the list.  The list must have a non-empty
+name for every element in order for \code{dumpList} to succeed.
+\code{dumpDF} creates a ``filehash'' database from a data frame where
+each column of the data frame is an element in the database.
+Essentially, \code{dumpDF} converts the data frame to a list and calls
+\code{dumpList}.
+
+
+\section{Filehash database backends}
+
+Currently, the \pkg{filehash} package can represent databases in two 
+different formats.  The default format is called ``DB1'' and it stores
+the keys and values in a single file.  From experience, this format
+works well overall but can be a little slow to initialize when there
+are many thousands of keys.  Briefly, the ``filehash'' object in R
+stores a map which associates keys with a byte location in the
+database file where the corresponding value is stored.  Given the byte
+location, we can \code{seek} to that location in the file and read the
+data directly.  Before reading in the data, a check is made to make
+sure that the map is up to date.  This format depends critically on
+having a working \code{ftell} at the system level and a crude check is
+made when trying to initialize a database of this format.
+
+The second format is called ``RDS'' and it stores objects as separate
+files on the disk in a directory with the same name as the database.
+This format is the most straightforward and simple of the available
+formats.  When a request is made for a specific key, \pkg{filehash}
+finds the appropriate file in the directory and reads the file into R.
+The only catch is that on operating systems that use case-insensitive
+file names, objects whose names differ only in case will collide on
+the filesystem.  To workaround this, object names with capital letters
+are stored with mangled names on the disk.  An advantage of this
+format is that most of the organizational work is delegated to the
+filesystem.
+
+
+\section{Extending filehash}
+
+The \pkg{filehash} package has a mechanism for developing new backend
+formats, should the need arise.  The function \code{registerFormatDB}
+can be used to make \pkg{filehash} aware of a new database format that
+may be implemented in a separate R package or a file.
+\code{registerFormatDB} takes two arguments: a \code{name} for the new
+format (like ``DB1'' or ``RDS'') and a list of functions.  The list
+should contain two functions: one function named ``create'' for
+creating a database, given the database name, and another function
+named ``initialize'' for initializing the database.  In addition, one
+needs to define methods for \code{dbInsert}, \code{dbFetch}, etc.
+
+A list of available backend formats can be obtained via the
+\code{filehashFormats} function.  Upon registering a new backend
+format, the new format will be listed when \code{filehashFormats} is
+called.
+
+The interface for registering new backend formats is still
+experimental and could change in the future.
+
+
+\section{Discussion}
+
+The \pkg{filehash} package has been designed be useful in both a
+programming setting and an interactive setting.  Its main purpose is
+to allow for simpler interaction with large datasets where
+simultaneous access to the full dataset is not needed.  While the
+package may not be optimal for all settings, one goal was to write a
+simple package in pure R that users to could install with minimal
+overhead.  In the future I hope to add functionality for interacting
+with databases stored on remote computers and perhaps incorporate a
+``real'' database backend.  Some work has already begun on developing
+a backend based on the \pkg{RSQLite} package.
+
+
+
+\bibliographystyle{alpha}
+\bibliography{combined}
+
+
+\end{document}
+
diff --git a/inst/doc/filehash.pdf b/inst/doc/filehash.pdf
new file mode 100644
index 0000000..952abff
Binary files /dev/null and b/inst/doc/filehash.pdf differ
diff --git a/man/createQ.Rd b/man/createQ.Rd
new file mode 100644
index 0000000..bfd6d6d
--- /dev/null
+++ b/man/createQ.Rd
@@ -0,0 +1,31 @@
+\name{createQ}
+\alias{createQ}
+\alias{initQ}
+
+%- Also NEED an '\alias' for EACH other topic documented here.
+\title{Create/Initialize Queue}
+\description{
+  Create or initialize a queue data structure using \code{filehash}
+  databases
+}
+\usage{
+createQ(filename)
+initQ(filename)
+}
+%- maybe also 'usage' for other objects documented here.
+\arguments{
+  \item{filename}{character, file name for storing the queue data
+    structure}
+}
+\details{
+  A new queue can be created using \code{createQ}, which creates a file
+  for storing the queue information and returns an object of class
+  \code{"queue"}.
+}
+\value{
+  The \code{createQ} and \code{initQ} functions both return an object of
+  class \code{"queue"}.
+}
+\author{Roger D. Peng \email{rpeng at jhsph.edu}}
+
+\keyword{database}
diff --git a/man/createS.Rd b/man/createS.Rd
new file mode 100644
index 0000000..bdf5d54
--- /dev/null
+++ b/man/createS.Rd
@@ -0,0 +1,31 @@
+\name{createS}
+\alias{createS}
+\alias{initS}
+
+%- Also NEED an '\alias' for EACH other topic documented here.
+\title{Create/Initialize Stack}
+\description{
+  Create or initialize a stack data structure using \code{filehash}
+  databases
+}
+\usage{
+createS(filename)
+initS(filename)
+}
+%- maybe also 'usage' for other objects documented here.
+\arguments{
+  \item{filename}{character, file name for storing the stack data
+    structure}
+}
+\details{
+  A new stack can be created using \code{createS}, which creates a file
+  for storing the stack information and returns an object of class
+  \code{"stack"}.
+}
+\value{
+  The \code{createS} and \code{initS} functions both return an object of
+  class \code{"stack"}.
+}
+\author{Roger D. Peng \email{rpeng at jhsph.edu}}
+
+\keyword{database}
diff --git a/man/db2env.Rd b/man/db2env.Rd
new file mode 100644
index 0000000..45bfb9e
--- /dev/null
+++ b/man/db2env.Rd
@@ -0,0 +1,96 @@
+\name{dbLoad}
+\alias{dbLoad}
+\alias{dbLoad,filehash-method}
+\alias{dbLazyLoad}
+\alias{dbLazyLoad,filehash-method}
+\alias{db2env}
+
+\title{Load database into environment}
+\description{
+  Load entire database into an environment
+}
+\usage{
+db2env(db)
+dbLoad(db, ...)
+dbLazyLoad(db, ...)
+
+\S4method{dbLoad}{filehash}(db, env = parent.frame(2), keys = NULL, ...)
+\S4method{dbLazyLoad}{filehash}(db, env = parent.frame(2), keys = NULL, ...)
+}
+\arguments{
+  \item{db}{database object}
+  \item{env}{an environment}
+  \item{keys}{character vector of database keys to load}
+  \item{...}{other arguments passed to methods}
+}
+\details{
+  \code{db2env} loads the entire database \code{db} into an environment
+  via calls to \code{makeActiveBinding}.  Therefore, the data themselves
+  are not stored in the environment, but a function pointing to the data
+  in the database is stored.  When an element of the environment is
+  accessed, the function is called to retrieve the data from the
+  database.  If the data in the database is changed, the changes will be
+  reflected in the environment.
+
+  \code{dbLoad} loads objects in the database directly into the
+  environment specified, like \code{load} does except with active bindings.
+  \code{dbLoad} takes a second argument \code{env}, which is an
+  environment, and the default for \code{env} is \code{parent.frame()}. 
+
+  The use of \code{makeActiveBinding} in \code{db2env} and \code{dbLoad}
+  allows for potentially large databases to, at least conceptually, be
+  used in R, as long as you don't need simultaneous access to all of the
+  elements in the database.
+
+  With \code{dbLazyLoad} database objects are
+  "lazy-loaded" into the environment.  Promises to load the
+  objects are created in the environment specified by \code{env}.  Upon
+  first access, those objects are copied into the environment and will
+  from then on reside in memory.  Changes to the database will not be
+  reflected in the object residing in the environment after first
+  access.  Conversely, changes to the object in the environment will not
+  be reflected in the database.  This type of loading is useful for
+  read-only databases.
+}
+
+\value{
+  For \code{db2env}, an environment is returned, the elements of which
+  are the keys of the database.  For \code{dbLoad} and \code{dbLazyLoad}, a character vector
+  is returned (invisibly) containing the keys associated with the values
+  loaded into the environment.
+}
+
+\author{Roger D. Peng}
+
+\seealso{
+  \code{\link{dbInit}} and \code{\link{filehash-class}}
+}
+
+\examples{
+dbCreate("myDB")
+db <- dbInit("myDB")
+dbInsert(db, "a", rnorm(100))
+dbInsert(db, "b", 1:10)
+
+env <- db2env(db)
+ls(env)  ## "a", "b"
+print(env$b)
+mean(env$a)
+env$a <- rnorm(100)
+mean(env$a)
+
+env$b[1:5] <- 5:1
+print(env$b)
+
+env <- new.env()
+dbLoad(db, env)
+ls(env)
+
+env <- new.env()
+dbLazyLoad(db, env)
+ls(env)
+
+as(db, "list")
+}
+
+\keyword{database}
diff --git a/man/dbInit.Rd b/man/dbInit.Rd
new file mode 100644
index 0000000..b56eda0
--- /dev/null
+++ b/man/dbInit.Rd
@@ -0,0 +1,64 @@
+\name{dbInit}
+\alias{dbInit}
+\alias{dbInitialize}
+\alias{dbCreate}
+\alias{dbCreate,ANY-method}
+\alias{dbInit,ANY-method}
+\alias{dbReconnect}
+\alias{dbReconnect,filehashDB1-method}
+
+%\alias{dbInitialize}
+
+\title{Simple file-based hash table}
+\description{
+  Interface for creating and initializing a simple file-based hash table
+}
+\usage{
+dbCreate(db, ...)
+dbInit(db, ...)
+dbReconnect(db, ...)
+
+\S4method{dbCreate}{ANY}(db, type = NULL, ...)
+\S4method{dbInit}{ANY}(db, type = NULL, ...)
+\S4method{dbReconnect}{filehashDB1}(db, ...)
+}
+
+\arguments{
+  \item{db}{name of database or a database object}
+  \item{type}{type of database format.  If missing, the default type
+    will be used}
+  \item{...}{other arguments passed to methods}
+}
+
+\details{
+  \code{dbCreate} creates the necessary files or directory for the
+  database.  If those files already exist nothing is done.
+
+  \code{dbInit} takes a database name and returns an object
+  inheriting from class \code{"filehash"}.
+
+  The \code{type} argument specifies the format in which the database
+  should be stored on the disk.  If not specified, the default
+  type will be used (as specified by \code{filehashOption}).  
+}
+
+\note{
+  The function \code{dbInitialize} has been deprecated.  Use
+  \code{dbInit} instead.
+}
+
+\value{
+  \code{dbCreate} returns \code{TRUE} upon success and \code{FALSE} in
+  the event of an error.  \code{dbInit} returns an object
+  inheriting from class \code{"filehash"}
+}
+
+\author{Roger D. Peng}
+
+\seealso{
+  See \code{\link{filehash-class}} more information and examples and
+  \code{\link{filehashOption}} for setting the default database type.
+}
+
+\keyword{database}% at least one, from doc/KEYWORDS
+
diff --git a/man/dump.Rd b/man/dump.Rd
new file mode 100644
index 0000000..a34849e
--- /dev/null
+++ b/man/dump.Rd
@@ -0,0 +1,65 @@
+\name{dumpObjects}
+\alias{dumpObjects}
+\alias{dumpImage}
+\alias{dumpDF}
+\alias{dumpList}
+\alias{dumpEnv}
+
+\title{Dump objects of database}
+\description{
+  Dump R objects to a filehash database
+}
+\usage{
+dumpObjects(..., list = character(0), dbName, type = NULL, envir = parent.frame())
+dumpImage(dbName = "Rworkspace", type = NULL)
+dumpDF(data, dbName = NULL, type = NULL)
+dumpList(data, dbName = NULL, type = NULL)
+dumpEnv(env, dbName)
+}
+
+\arguments{
+  \item{\dots}{R objects to dump}
+  \item{list}{character vector of names of objects to dump}
+  \item{dbName}{character, name of database to which objects should be
+    dumped}
+  \item{type}{type of database to create}
+  \item{envir}{environment from which to obtain objects}
+  \item{data}{a data frame or a list}
+  \item{env}{an environment}
+}
+\details{
+  Objects dumped to a database can later be loaded via \code{dbLoad} or
+  can be accessed with \code{dbFetch}, \code{dbList}, etc.
+  Alternatively, the \code{with} method can be used to evaluate code in
+  the context of a database.  If a database with name \code{dbName}
+  already exists, objects will be inserted into the existing database
+  (and values for already-existing keys will be overwritten).
+
+  \code{dumpDF} is different in that each variable in the data frame is
+  stored as a separate object in the database.  So each variable can be
+  read from the database separately rather than having to load the
+  entire data frame into memory.  \code{dumpList} works in a simlar
+  way.
+
+  The \code{dumpEnv} function takes an environment and stores each
+  element of the environment in a \code{filehash} database.
+}
+
+\value{
+  An object of class \code{"filehash"} is returned and a database is
+  created.
+}
+
+\author{Roger D. Peng}
+
+\examples{
+data <- data.frame(y = rnorm(100), x = rnorm(100), z = rnorm(100))
+db <- dumpDF(data, dbName = "dataframe.dump")
+fit <- with(db, lm(y ~ x + z))
+summary(fit)
+
+db <- dumpList(list(a = 1, b = 2, c = 3), "list.dump")
+db$a
+}
+\keyword{database}% at least one, from doc/KEYWORDS
+
diff --git a/man/filehash-class.Rd b/man/filehash-class.Rd
new file mode 100644
index 0000000..4b75d63
--- /dev/null
+++ b/man/filehash-class.Rd
@@ -0,0 +1,150 @@
+\name{filehash-class}
+\docType{class}
+\alias{filehash-class}
+\alias{filehashDB-class}
+\alias{filehashRDS-class}
+\alias{filehashDB1-class}
+\alias{dbFetch}
+\alias{dbMultiFetch}
+\alias{dbInsert}
+\alias{dbExists}
+\alias{dbList}
+\alias{dbDelete}
+\alias{dbReorganize}
+\alias{dbUnlink}
+\alias{dbDelete,filehashDB,character-method}
+\alias{dbExists,filehashDB,character-method}
+\alias{dbFetch,filehashDB,character-method}
+\alias{dbInsert,filehashDB,character-method}
+\alias{dbList,filehashDB-method}
+\alias{dbUnlink,filehashDB-method}
+\alias{dbReorganize,filehashDB-method}
+\alias{dbMultiFetch,filehashDB1-method}
+\alias{dbDelete,filehashDB1,character-method}
+\alias{dbExists,filehashDB1,character-method}
+\alias{dbFetch,filehashDB1,character-method}
+\alias{dbMultiFetch,filehashDB1,character-method}
+\alias{dbInsert,filehashDB1,character-method}
+\alias{dbList,filehashDB1-method}
+\alias{dbUnlink,filehashDB1-method}
+\alias{dbReorganize,filehashDB1-method}
+\alias{dbDelete,filehashRDS,character-method}
+\alias{dbExists,filehashRDS,character-method}
+\alias{dbFetch,filehashRDS,character-method}
+\alias{dbMultiFetch,filehashRDS,character-method}
+\alias{dbInsert,filehashRDS,character-method}
+\alias{dbList,filehashRDS-method}
+\alias{dbUnlink,filehashRDS-method}
+\alias{show,filehash-method}
+\alias{with,filehash-method}
+\alias{coerce,filehashDB,filehashRDS-method}
+\alias{coerce,filehashRDS,filehashDB-method}
+\alias{coerce,filehashDB1,filehashRDS-method}
+\alias{coerce,filehashDB1,list-method}
+\alias{coerce,filehashDB,filehashDB1-method}
+\alias{coerce,filehash,list-method}
+\alias{lapply,filehash-method}
+\alias{names,filehash-method}
+\alias{length,filehash-method}
+
+\alias{[,filehash,character,missing,missing-method}
+\alias{[[,filehash,character,missing-method}
+\alias{[[,filehash,numeric,missing-method}
+\alias{[[<-,filehash,character,missing-method}
+\alias{[[<-,filehash,numeric,missing-method}
+\alias{$<-,filehash-method}
+\alias{$,filehash-method}
+
+\title{Class "filehash"}
+
+\description{
+  These functions form the interface for a simple file-based key-value
+  database (i.e. hash table).
+}
+
+\section{Objects from the Class}{
+  Objects can be created by calls of the form \code{new("filehash", ...)}.
+}
+
+\section{Slots}{
+  \describe{
+    \item{\code{name}:}{Object of class \code{"character"}, name of the
+      database.}
+  }
+}
+
+\section{Additional slots for "filehashDB1"}{
+  \describe{
+    \item{\code{datafile}:}{full path to the database file.}
+    \item{\code{meta}:}{list containing an environment for database
+      metadata.}
+  }
+}
+
+\section{Additional slots for "filehashRDS"}{
+  \describe{
+    \item{dir:}{Directory where files are stored.}
+  }
+}
+
+\section{Methods}{
+  \describe{
+    \item{dbDelete}{The \code{dbDelete} function is for deleting
+      elements, but for the \code{"DB1"} format all it does is remove the
+      key from the lookup table. 
+      The actual data are still in the database (but inaccessible).  If
+      you reinsert data for the same key, the new data are simply
+      appended on to the end of the file.  Therefore, it's possible to
+      have multiple copies of data lying around after a while,
+      potentially making the database file big.  The \code{"RDS"} format
+      does not have this problem.}
+    \item{dbExists}{check to see if a key exists.}
+    \item{dbFetch}{retrieve the value associated with a given key.}
+    \item{dbMultiFetch}{retrieve values associated with multiple keys (a
+      list of those values is returned).}
+    \item{dbInsert}{insert a key-value pair into the database.  If
+      that key already exists, its associated value is overwritten. For
+      \code{"RDS"} type databases, there is a \code{safe} option
+      (defaults to \code{TRUE}) which allows the user to insert objects
+      somewhat more safely (objects should not be lost in the event of
+      an interrupt).}
+    \item{dbList}{list all keys in the database.}
+    \item{dbReorganize}{The \code{dbReorganize} function is there for
+      the purpose of rewriting the database to remove all of the stale
+      entries.  Basically, this function creates a new copy of the
+      database and then overwrites the old copy.  This function has not
+      been tested extensively and so should be considered
+      \emph{experimental}.  \code{dbReorganize} is not needed when using
+      the \code{"RDS"} format.}
+    \item{dbUnlink}{delete an entire database from the disk}
+    \item{show}{print method}
+    \item{with}{allows \code{with} to be used with \code{"filehash"}
+      objects much like it can be used with lists or data frames}
+    \item{[[,[[<-}{elements of a database can be accessed using the \code{[[}
+      operator much like a list or environment, but only character
+      indices are allowed}
+    \item{$,$<-}{elements of a database can be accessed using the \code{$}
+      operator much like with a list or environment}
+    \item{lapply}{works much like \code{lapply} with lists; a list is
+      returned.}
+    \item{names}{returns all of the keys in the database}
+    \item{length}{returns the number of elements in the database}
+  }
+}
+
+\author{Roger D. Peng \email{rpeng at jhsph.edu}}
+
+\examples{
+dbCreate("myDB")  ## Create database 'myDB'
+db <- dbInit("myDB")
+dbInsert(db, "a", 1:10)
+dbInsert(db, "b", rnorm(1000))
+dbExists(db, "b")  ## 'TRUE'
+
+dbList(db)  ## c("a", "b")
+dbDelete(db, "a")
+dbList(db) ## "b"
+
+with(db, mean(b))
+}
+\keyword{classes}
diff --git a/man/filehashFormats.Rd b/man/filehashFormats.Rd
new file mode 100644
index 0000000..6dded7d
--- /dev/null
+++ b/man/filehashFormats.Rd
@@ -0,0 +1,30 @@
+\name{filehashFormats}
+\alias{filehashFormats}
+\alias{registerFormatDB}
+
+\title{List and register filehash formats}
+\description{
+  List and register filehash backend database formats.
+}
+\usage{
+registerFormatDB(name, funlist)
+filehashFormats(...)
+}
+%- maybe also 'usage' for other objects documented here.
+\arguments{
+  \item{name}{character, name of database format}
+  \item{funlist}{list of functions for creating and initializing a
+    database format}
+  \item{\dots}{list of functions for registering a new database format}
+}
+\details{
+  \code{registerFormatDB} can be used to register new filehash backend
+  database formats.  \code{filehashFormats} called with no arguments
+  lists information on available formats.
+}
+\value{
+  \code{filehashFormats} returns a list containing information on the
+  available filehash formats.
+}
+
+\keyword{utilities}% at least one, from doc/KEYWORDS
diff --git a/man/filehashOption.Rd b/man/filehashOption.Rd
new file mode 100644
index 0000000..77542ed
--- /dev/null
+++ b/man/filehashOption.Rd
@@ -0,0 +1,27 @@
+\name{filehashOption}
+\alias{filehashOption}
+
+\title{Set filehash options}
+\description{
+  Set global filehash options
+}
+\usage{
+filehashOption(...)
+}
+
+\arguments{
+  \item{\dots}{name-value pairs for options}
+}
+\details{
+  Currently, the only option that can be set is the default database
+  type (\code{defaultType}) which can be "DB1", "RDS" or "DB". 
+}
+\value{
+  \code{filehashOptions} returns a list of current settings for all
+  options.
+}
+
+\author{Roger D. Peng}
+
+\keyword{database}% at least one, from doc/KEYWORDS
+
diff --git a/man/push.Rd b/man/push.Rd
new file mode 100644
index 0000000..b0b8fec
--- /dev/null
+++ b/man/push.Rd
@@ -0,0 +1,43 @@
+\name{stackqueue}
+\alias{stackqueue}
+\alias{push}
+\alias{pop}
+\alias{mpush}
+\alias{top}
+\alias{isEmpty}
+
+%- Also NEED an '\alias' for EACH other topic documented here.
+\title{Operations on Stacks/Queues}
+\description{
+  Functions for interacting with stack and queue data structures
+  implemented using \code{filehash} databases.
+}
+\usage{
+push(db, val, ...)
+mpush(db, vals, ...)
+pop(db, ...)
+top(db, ...)
+isEmpty(db, ...)
+}
+%- maybe also 'usage' for other objects documented here.
+\arguments{
+  \item{db}{an object of class \code{"stack"} or \code{"queue"}}
+  \item{val}{an R object}
+  \item{vals}{a list of R objects}
+  \item{\dots}{arguments passed to other methods}
+}
+\details{
+  Note that for \code{mpush}, if \code{vals} is not a list it will be
+  coerced to a list via \code{as.list}.  Currently, \code{mpush} is only
+  implemented for \code{"stack"}s.
+}
+\value{
+  \code{push} and \code{mpush} return nothing useful; \code{pop} returns
+  a value from the stack/queue and deletes that value from the
+  stack/queue; \code{top} returns the "top" value from the stack/queue;
+  \code{isEmpty} returns \code{TRUE}/\code{FALSE} depending on whether
+  the stack/queue is empty or not.  Both \code{pop} and \code{top}
+  signal an error if the stack/queue is empty.
+}
+\author{Roger D. Peng \email{rpeng at jhsph.edu}}
+\keyword{database}% __ONLY ONE__ keyword per line
diff --git a/man/queue-class.Rd b/man/queue-class.Rd
new file mode 100644
index 0000000..09f3204
--- /dev/null
+++ b/man/queue-class.Rd
@@ -0,0 +1,47 @@
+\name{queue-class}
+\docType{class}
+\alias{queue-class}
+\alias{isEmpty,queue-method}
+\alias{pop,queue-method}
+\alias{push,queue-method}
+\alias{show,queue-method}
+\alias{top,queue-method}
+
+\title{Class "queue"}
+\description{A queue implementation using a \code{filehash} database}
+\section{Objects from the Class}{
+Objects can be created by calls of the form \code{new("queue", ...)} or
+by calling \code{createQ}.  Existing queues can be initialized with
+\code{initQ}.
+}
+\section{Slots}{
+	 \describe{
+    \item{\code{queue}:}{Object of class \code{"filehashDB1"}}
+    \item{\code{name}:}{Object of class \code{"character"}: the name of
+      the queue (default is the file name in which the queue data are
+      stored)}
+  }
+}
+\section{Methods}{
+  \describe{
+    \item{isEmpty}{\code{signature(db = "queue")}: returns
+      \code{TRUE}/\code{FALSE} depending on whether there are elements
+      in the queue.}
+    \item{pop}{\code{signature(db = "queue")}: returns the value of the
+      "top" (i.e. head) of the queue and subsequently removes that
+      element from the queue; an error is signaled if the queue is empty}
+    \item{push}{\code{signature(db = "queue")}: adds an element to the
+      tail ("bottom") of the queue}
+    \item{show}{\code{signature(object = "queue")}: prints the name of
+      the queue}
+    \item{top}{\code{signature(db = "queue")}: returns the value of the
+      "top" (i.e. head) of the queue; an error is signaled if the queue
+      is empty}
+  }
+}
+\author{Roger D. Peng \email{rpeng at jhsph.edu}}
+
+\examples{
+showClass("queue")
+}
+\keyword{classes}
diff --git a/man/stack-class.Rd b/man/stack-class.Rd
new file mode 100644
index 0000000..8fbddb8
--- /dev/null
+++ b/man/stack-class.Rd
@@ -0,0 +1,50 @@
+\name{stack-class}
+\docType{class}
+\alias{stack-class}
+\alias{isEmpty,stack-method}
+\alias{mpush,stack-method}
+\alias{pop,stack-method}
+\alias{push,stack-method}
+\alias{show,stack-method}
+\alias{top,stack-method}
+
+\title{Class "stack"}
+\description{A stack implementation using a \code{filehash} database}
+\section{Objects from the Class}{
+Objects can be created by calls of the form \code{new("stack", ...)} or
+by calling \code{createS}.  Existing queues can be initialized with
+\code{initS}.
+}
+\section{Slots}{
+  \describe{
+    \item{\code{stack}:}{Object of class \code{"filehashDB1"}}
+    \item{\code{name}:}{Object of class \code{"character"}: the name of
+      the stack (default is the file name in which the stack data are
+      stored)}
+  }
+}
+\section{Methods}{
+  \describe{
+    \item{isEmpty}{\code{signature(db = "stack")}: returns
+      \code{TRUE}/\code{FALSE} depending on whether there are elements
+      in the stack.}
+    \item{pop}{\code{signature(db = "stack")}: returns the value of the
+      top of the stack and subsequently removes that
+      element from the stack; an error is signaled if the stack is empty}
+    \item{push}{\code{signature(db = "stack")}: adds an element to the
+      top of the stack}
+    \item{show}{\code{signature(object = "stack")}: prints the name of
+      the stack}
+    \item{top}{\code{signature(db = "stack")}: returns the value of the
+      top of the stack; an error is signaled if the stack
+      is empty}
+    \item{mpush}{\code{signature(db = "stack")}: works like \code{push}
+      except it can push multiple objects in a list on to the stack}
+  }
+}
+\author{Roger D. Peng \email{rpeng at jhsph.edu}}
+
+\examples{
+showClass("stack")
+}
+\keyword{classes}
diff --git a/src/hash.c b/src/hash.c
new file mode 100644
index 0000000..7cdf4fb
--- /dev/null
+++ b/src/hash.c
@@ -0,0 +1,84 @@
+#include <R.h>
+#include <Rinternals.h>
+#include "sha1.h"
+
+/* 
+ * This code is adapted from the 'digest.c' code in the 'digest'
+ * package by Dirk Eddelbuettel <edd at debian.org> with contributions by
+ * Antoine Lucas, Jarek Tuszynski, Henrik Bengtsson and Simon Urbanek
+ */
+
+SEXP sha1_object(SEXP object, SEXP skip_bytes)
+{
+	char output[41];  /* SHA-1 is 40 bytes + '\0' */
+	int i, skip;
+	SEXP result;
+	sha1_context ctx;
+	unsigned char buffer[20];
+	Rbyte *data;
+	int nChar = length(object);
+
+	PROTECT(object = coerceVector(object, RAWSXP));
+	data = RAW(object);
+	PROTECT(skip_bytes = coerceVector(skip_bytes, INTSXP));
+	skip = INTEGER(skip_bytes)[0];
+	
+	if(skip > 0) {
+		if(skip >= nChar)
+			nChar = 0;
+		else {
+			nChar -= skip;
+			data += skip;
+		}
+	}
+	sha1_starts(&ctx);
+	sha1_update(&ctx, (uint8 *) data, nChar);
+	sha1_finish(&ctx, buffer);
+
+	for(i=0; i < 20; i++)
+		sprintf(output + i * 2, "%02x", buffer[i]);
+
+	PROTECT(result = allocVector(STRSXP, 1));
+	SET_STRING_ELT(result, 0, mkChar(output));
+	UNPROTECT(3);
+
+	return result;
+}
+
+
+SEXP sha1_file(SEXP filename, SEXP skip_bytes)
+{
+	char output[41];  /* SHA-1 is 40 bytes + '\0' */
+	int nChar, i, skip;
+	FILE *fp;
+	SEXP result;
+	sha1_context ctx;
+	unsigned char buf[1024];
+	unsigned char sha1sum[20];
+
+	PROTECT(skip_bytes = coerceVector(skip_bytes, INTSXP));
+	PROTECT(filename = coerceVector(filename, STRSXP));
+
+	skip = INTEGER(skip_bytes)[0];
+
+	if(!(fp = fopen(CHAR(STRING_ELT(filename, 0)), "rb"))) 
+		error("unable to open input file");
+	if (skip > 0) 
+		fseek(fp, skip, SEEK_SET);
+	sha1_starts(&ctx);
+
+	while((nChar = fread(buf, 1, sizeof(buf), fp)) > 0)
+		sha1_update(&ctx, buf, nChar);
+
+	fclose(fp);
+	sha1_finish(&ctx, sha1sum);
+	
+	for(i=0; i < 20; i++)
+		sprintf(output + i * 2, "%02x", sha1sum[i]);
+
+	PROTECT(result = allocVector(STRSXP, 1));
+	SET_STRING_ELT(result, 0, mkChar(output));
+	UNPROTECT(3);
+
+	return result;
+}
diff --git a/src/lockfile.c b/src/lockfile.c
new file mode 100644
index 0000000..0420339
--- /dev/null
+++ b/src/lockfile.c
@@ -0,0 +1,21 @@
+#include <R.h>
+#include <Rinternals.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+SEXP lock_file(SEXP filename)
+{
+	int fd;
+	SEXP status;
+
+	if(!isString(filename))
+		error("'filename' should be character");
+	PROTECT(status = allocVector(INTSXP, 1));
+
+	fd = open(CHAR(STRING_ELT(filename, 0)),
+		  O_WRONLY | O_CREAT | O_EXCL, 0666);
+	INTEGER(status)[0] = fd;
+	close(fd);
+	UNPROTECT(1);
+	return status;
+}
diff --git a/src/readKeyMap.c b/src/readKeyMap.c
new file mode 100644
index 0000000..f312915
--- /dev/null
+++ b/src/readKeyMap.c
@@ -0,0 +1,65 @@
+#define NEED_CONNECTION_PSTREAMS
+
+#include <R.h>
+#include <Rinternals.h>
+
+SEXP read_key_map(SEXP filename, SEXP map, SEXP filesize, SEXP pos) 
+{
+	SEXP key, datalen;
+	FILE *fp;	
+	int status, len;
+	struct R_inpstream_st in;
+	
+	if(!isEnvironment(map))
+		error("'map' should be an environment");
+	if(!isString(filename))
+		error("'filename' should be character");
+
+	PROTECT(filesize = coerceVector(filesize, INTSXP));
+	PROTECT(pos = coerceVector(pos, INTSXP));
+
+	fp = fopen(CHAR(STRING_ELT(filename, 0)), "rb");
+
+	if(INTEGER(pos)[0] > 0) {
+		status = fseek(fp, INTEGER(pos)[0], SEEK_SET);
+
+		if(status < 0)
+			error("problem with initial file pointer seek");
+	}
+	
+	/* Initialize the incoming R file stream */
+	R_InitFileInPStream(&in, fp, R_pstream_any_format, NULL, NULL);
+
+	while(INTEGER(pos)[0] < INTEGER(filesize)[0]) {
+		PROTECT(key = R_Unserialize(&in));
+		PROTECT(datalen = R_Unserialize(&in));
+		len = INTEGER(datalen)[0];
+
+		/* calculate the position of file pointer */
+		INTEGER(pos)[0] = ftell(fp);
+
+		if(len <= 0) {
+			/* key has been deleted; set pos to NULL */
+			defineVar(install(CHAR(STRING_ELT(key, 0))),
+				  R_NilValue, map);
+			UNPROTECT(2);
+			continue;
+		}
+		/* create a new entry in the key map */
+		defineVar(install(CHAR(STRING_ELT(key, 0))), duplicate(pos), map);
+
+		/* advance to the next key */
+		status = fseek(fp, len, SEEK_CUR);
+
+		if(status < 0) {
+			fclose(fp);
+			error("problem with seek");
+		}
+		INTEGER(pos)[0] = INTEGER(pos)[0] + len;
+
+		UNPROTECT(2);
+	}
+	UNPROTECT(2);
+	fclose(fp);
+	return map;
+}
diff --git a/src/sha1.c b/src/sha1.c
new file mode 100644
index 0000000..082ef97
--- /dev/null
+++ b/src/sha1.c
@@ -0,0 +1,371 @@
+/*
+ * FIPS-180-1 compliant SHA-1 implementation,
+ * by Christophe Devine <devine at cr0.net>;
+ * this program is licensed under the GPL.
+ */
+
+#include <string.h>
+
+#include "sha1.h"
+
+#define GET_UINT32(n,b,i)                       \
+{                                               \
+    (n) = ( (uint32) (b)[(i)    ] << 24 )       \
+        | ( (uint32) (b)[(i) + 1] << 16 )       \
+        | ( (uint32) (b)[(i) + 2] <<  8 )       \
+        | ( (uint32) (b)[(i) + 3]       );      \
+}
+
+#define PUT_UINT32(n,b,i)                       \
+{                                               \
+    (b)[(i)    ] = (uint8) ( (n) >> 24 );       \
+    (b)[(i) + 1] = (uint8) ( (n) >> 16 );       \
+    (b)[(i) + 2] = (uint8) ( (n) >>  8 );       \
+    (b)[(i) + 3] = (uint8) ( (n)       );       \
+}
+
+void sha1_starts( sha1_context *ctx )
+{
+    ctx->total[0] = 0;
+    ctx->total[1] = 0;
+
+    ctx->state[0] = 0x67452301;
+    ctx->state[1] = 0xEFCDAB89;
+    ctx->state[2] = 0x98BADCFE;
+    ctx->state[3] = 0x10325476;
+    ctx->state[4] = 0xC3D2E1F0;
+}
+
+void sha1_process( sha1_context *ctx, uint8 data[64] )
+{
+    uint32 temp, W[16], A, B, C, D, E;
+
+    GET_UINT32( W[0],  data,  0 );
+    GET_UINT32( W[1],  data,  4 );
+    GET_UINT32( W[2],  data,  8 );
+    GET_UINT32( W[3],  data, 12 );
+    GET_UINT32( W[4],  data, 16 );
+    GET_UINT32( W[5],  data, 20 );
+    GET_UINT32( W[6],  data, 24 );
+    GET_UINT32( W[7],  data, 28 );
+    GET_UINT32( W[8],  data, 32 );
+    GET_UINT32( W[9],  data, 36 );
+    GET_UINT32( W[10], data, 40 );
+    GET_UINT32( W[11], data, 44 );
+    GET_UINT32( W[12], data, 48 );
+    GET_UINT32( W[13], data, 52 );
+    GET_UINT32( W[14], data, 56 );
+    GET_UINT32( W[15], data, 60 );
+
+#define S(x,n) ((x << n) | ((x & 0xFFFFFFFF) >> (32 - n)))
+
+#define R(t)                                            \
+(                                                       \
+    temp = W[(t -  3) & 0x0F] ^ W[(t - 8) & 0x0F] ^     \
+           W[(t - 14) & 0x0F] ^ W[ t      & 0x0F],      \
+    ( W[t & 0x0F] = S(temp,1) )                         \
+)
+
+#define P(a,b,c,d,e,x)                                  \
+{                                                       \
+    e += S(a,5) + F(b,c,d) + K + x; b = S(b,30);        \
+}
+
+    A = ctx->state[0];
+    B = ctx->state[1];
+    C = ctx->state[2];
+    D = ctx->state[3];
+    E = ctx->state[4];
+
+#define F(x,y,z) (z ^ (x & (y ^ z)))
+#define K 0x5A827999
+
+    P( A, B, C, D, E, W[0]  );
+    P( E, A, B, C, D, W[1]  );
+    P( D, E, A, B, C, W[2]  );
+    P( C, D, E, A, B, W[3]  );
+    P( B, C, D, E, A, W[4]  );
+    P( A, B, C, D, E, W[5]  );
+    P( E, A, B, C, D, W[6]  );
+    P( D, E, A, B, C, W[7]  );
+    P( C, D, E, A, B, W[8]  );
+    P( B, C, D, E, A, W[9]  );
+    P( A, B, C, D, E, W[10] );
+    P( E, A, B, C, D, W[11] );
+    P( D, E, A, B, C, W[12] );
+    P( C, D, E, A, B, W[13] );
+    P( B, C, D, E, A, W[14] );
+    P( A, B, C, D, E, W[15] );
+    P( E, A, B, C, D, R(16) );
+    P( D, E, A, B, C, R(17) );
+    P( C, D, E, A, B, R(18) );
+    P( B, C, D, E, A, R(19) );
+
+#undef K
+#undef F
+
+#define F(x,y,z) (x ^ y ^ z)
+#define K 0x6ED9EBA1
+
+    P( A, B, C, D, E, R(20) );
+    P( E, A, B, C, D, R(21) );
+    P( D, E, A, B, C, R(22) );
+    P( C, D, E, A, B, R(23) );
+    P( B, C, D, E, A, R(24) );
+    P( A, B, C, D, E, R(25) );
+    P( E, A, B, C, D, R(26) );
+    P( D, E, A, B, C, R(27) );
+    P( C, D, E, A, B, R(28) );
+    P( B, C, D, E, A, R(29) );
+    P( A, B, C, D, E, R(30) );
+    P( E, A, B, C, D, R(31) );
+    P( D, E, A, B, C, R(32) );
+    P( C, D, E, A, B, R(33) );
+    P( B, C, D, E, A, R(34) );
+    P( A, B, C, D, E, R(35) );
+    P( E, A, B, C, D, R(36) );
+    P( D, E, A, B, C, R(37) );
+    P( C, D, E, A, B, R(38) );
+    P( B, C, D, E, A, R(39) );
+
+#undef K
+#undef F
+
+#define F(x,y,z) ((x & y) | (z & (x | y)))
+#define K 0x8F1BBCDC
+
+    P( A, B, C, D, E, R(40) );
+    P( E, A, B, C, D, R(41) );
+    P( D, E, A, B, C, R(42) );
+    P( C, D, E, A, B, R(43) );
+    P( B, C, D, E, A, R(44) );
+    P( A, B, C, D, E, R(45) );
+    P( E, A, B, C, D, R(46) );
+    P( D, E, A, B, C, R(47) );
+    P( C, D, E, A, B, R(48) );
+    P( B, C, D, E, A, R(49) );
+    P( A, B, C, D, E, R(50) );
+    P( E, A, B, C, D, R(51) );
+    P( D, E, A, B, C, R(52) );
+    P( C, D, E, A, B, R(53) );
+    P( B, C, D, E, A, R(54) );
+    P( A, B, C, D, E, R(55) );
+    P( E, A, B, C, D, R(56) );
+    P( D, E, A, B, C, R(57) );
+    P( C, D, E, A, B, R(58) );
+    P( B, C, D, E, A, R(59) );
+
+#undef K
+#undef F
+
+#define F(x,y,z) (x ^ y ^ z)
+#define K 0xCA62C1D6
+
+    P( A, B, C, D, E, R(60) );
+    P( E, A, B, C, D, R(61) );
+    P( D, E, A, B, C, R(62) );
+    P( C, D, E, A, B, R(63) );
+    P( B, C, D, E, A, R(64) );
+    P( A, B, C, D, E, R(65) );
+    P( E, A, B, C, D, R(66) );
+    P( D, E, A, B, C, R(67) );
+    P( C, D, E, A, B, R(68) );
+    P( B, C, D, E, A, R(69) );
+    P( A, B, C, D, E, R(70) );
+    P( E, A, B, C, D, R(71) );
+    P( D, E, A, B, C, R(72) );
+    P( C, D, E, A, B, R(73) );
+    P( B, C, D, E, A, R(74) );
+    P( A, B, C, D, E, R(75) );
+    P( E, A, B, C, D, R(76) );
+    P( D, E, A, B, C, R(77) );
+    P( C, D, E, A, B, R(78) );
+    P( B, C, D, E, A, R(79) );
+
+#undef K
+#undef F
+
+    ctx->state[0] += A;
+    ctx->state[1] += B;
+    ctx->state[2] += C;
+    ctx->state[3] += D;
+    ctx->state[4] += E;
+}
+
+void sha1_update( sha1_context *ctx, uint8 *input, uint32 length )
+{
+    uint32 left, fill;
+
+    if( ! length ) return;
+
+    left = ctx->total[0] & 0x3F;
+    fill = 64 - left;
+
+    ctx->total[0] += length;
+    ctx->total[0] &= 0xFFFFFFFF;
+
+    if( ctx->total[0] < length )
+        ctx->total[1]++;
+
+    if( left && length >= fill )
+    {
+        memcpy( (void *) (ctx->buffer + left),
+                (void *) input, fill );
+        sha1_process( ctx, ctx->buffer );
+        length -= fill;
+        input  += fill;
+        left = 0;
+    }
+
+    while( length >= 64 )
+    {
+        sha1_process( ctx, input );
+        length -= 64;
+        input  += 64;
+    }
+
+    if( length )
+    {
+        memcpy( (void *) (ctx->buffer + left),
+                (void *) input, length );
+    }
+}
+
+static uint8 sha1_padding[64] =
+{
+ 0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
+};
+
+void sha1_finish( sha1_context *ctx, uint8 digest[20] )
+{
+    uint32 last, padn;
+    uint32 high, low;
+    uint8 msglen[8];
+
+    high = ( ctx->total[0] >> 29 )
+         | ( ctx->total[1] <<  3 );
+    low  = ( ctx->total[0] <<  3 );
+
+    PUT_UINT32( high, msglen, 0 );
+    PUT_UINT32( low,  msglen, 4 );
+
+    last = ctx->total[0] & 0x3F;
+    padn = ( last < 56 ) ? ( 56 - last ) : ( 120 - last );
+
+    sha1_update( ctx, sha1_padding, padn );
+    sha1_update( ctx, msglen, 8 );
+
+    PUT_UINT32( ctx->state[0], digest,  0 );
+    PUT_UINT32( ctx->state[1], digest,  4 );
+    PUT_UINT32( ctx->state[2], digest,  8 );
+    PUT_UINT32( ctx->state[3], digest, 12 );
+    PUT_UINT32( ctx->state[4], digest, 16 );
+}
+
+#ifdef TEST
+
+#include <stdlib.h>
+#include <stdio.h>
+
+/*
+ * those are the standard FIPS-180-1 test vectors
+ */
+
+static char *msg[] = 
+{
+    "abc",
+    "abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq",
+    NULL
+};
+
+static char *val[] =
+{
+    "a9993e364706816aba3e25717850c26c9cd0d89d",
+    "84983e441c3bd26ebaae4aa1f95129e5e54670f1",
+    "34aa973cd4c4daa4f61eeb2bdbad27316534016f"
+};
+
+int main( int argc, char *argv[] )
+{
+    FILE *f;
+    int i, j;
+    char output[41];
+    sha1_context ctx;
+    unsigned char buf[1000];
+    unsigned char sha1sum[20];
+
+    if( argc < 2 )
+    {
+        printf( "\n SHA-1 Validation Tests:\n\n" );
+
+        for( i = 0; i < 3; i++ )
+        {
+            printf( " Test %d ", i + 1 );
+
+            sha1_starts( &ctx );
+
+            if( i < 2 )
+            {
+                sha1_update( &ctx, (uint8 *) msg[i],
+                             strlen( msg[i] ) );
+            }
+            else
+            {
+                memset( buf, 'a', 1000 );
+
+                for( j = 0; j < 1000; j++ )
+                {
+                    sha1_update( &ctx, (uint8 *) buf, 1000 );
+                }
+            }
+
+            sha1_finish( &ctx, sha1sum );
+
+            for( j = 0; j < 20; j++ )
+            {
+                sprintf( output + j * 2, "%02x", sha1sum[j] );
+            }
+
+            if( memcmp( output, val[i], 40 ) )
+            {
+                printf( "failed!\n" );
+                return( 1 );
+            }
+
+            printf( "passed.\n" );
+        }
+
+        printf( "\n" );
+    }
+    else
+    {
+        if( ! ( f = fopen( argv[1], "rb" ) ) )
+        {
+            perror( "fopen" );
+            return( 1 );
+        }
+
+        sha1_starts( &ctx );
+
+        while( ( i = fread( buf, 1, sizeof( buf ), f ) ) > 0 )
+        {
+            sha1_update( &ctx, buf, i );
+        }
+
+        sha1_finish( &ctx, sha1sum );
+
+        for( j = 0; j < 20; j++ )
+        {
+            printf( "%02x", sha1sum[j] );
+        }
+
+        printf( "  %s\n", argv[1] );
+    }
+
+    return( 0 );
+}
+
+#endif
diff --git a/src/sha1.h b/src/sha1.h
new file mode 100644
index 0000000..806eba1
--- /dev/null
+++ b/src/sha1.h
@@ -0,0 +1,24 @@
+#ifndef _SHA1_H
+#define _SHA1_H
+
+#ifndef uint8
+#define uint8  unsigned char
+#endif
+
+#ifndef uint32
+#define uint32 unsigned long int
+#endif
+
+typedef struct
+{
+    uint32 total[2];
+    uint32 state[5];
+    uint8 buffer[64];
+}
+sha1_context;
+
+void sha1_starts( sha1_context *ctx );
+void sha1_update( sha1_context *ctx, uint8 *input, uint32 length );
+void sha1_finish( sha1_context *ctx, uint8 digest[20] );
+
+#endif /* sha1.h */
diff --git a/tests/SHA1SUM b/tests/SHA1SUM
new file mode 100644
index 0000000..89b6e02
--- /dev/null
+++ b/tests/SHA1SUM
@@ -0,0 +1,2 @@
+6b1babdfa60a17a2e79cd9187ea06b3df3c46624  testdb-v1.1
+6b1babdfa60a17a2e79cd9187ea06b3df3c46624  testdb-v2.0
diff --git a/tests/misc/create-testdb.R b/tests/misc/create-testdb.R
new file mode 100644
index 0000000..f0687f2
--- /dev/null
+++ b/tests/misc/create-testdb.R
@@ -0,0 +1,14 @@
+library(filehash)
+
+name <- sprintf("testdb-v%s", packageDescription("filehash", fields = "Version"))
+dbCreate(name, "DB1")
+db <- dbInit(name, "DB1")
+
+set.seed(1)
+dbInsert(db, "a", rnorm(10))
+dbInsert(db, "b", runif(7))
+dbInsert(db, "list", list(1, 2, 3, 4, 5, 6, "a"))
+dbInsert(db, "c", 1L)
+dbInsert(db, "entry", "string")
+dbDelete(db, "b")
+         
diff --git a/tests/reg-tests.R b/tests/reg-tests.R
new file mode 100644
index 0000000..93000d2
--- /dev/null
+++ b/tests/reg-tests.R
@@ -0,0 +1,183 @@
+suppressMessages(library(filehash))
+
+######################################################################
+## Test 'filehashRDS' class
+
+dbCreate("mydbRDS", "RDS")
+db <- dbInit("mydbRDS", "RDS")
+show(db)
+
+## Put some data into it
+set.seed(1000)
+dbInsert(db, "a", 1:10)
+dbInsert(db, "b", rnorm(100))
+dbInsert(db, "c", 100:1)
+dbInsert(db, "d", runif(1000))
+dbInsert(db, "other", "hello")
+
+dbList(db)
+
+dbExists(db, "e")
+dbExists(db, "a")
+
+env <- db2env(db)
+ls(env)
+
+env$a
+env$b
+env$c
+str(env$d)
+env$other
+
+env$b <- rnorm(100)
+mean(env$b)
+
+env$a[1:5] <- 5:1
+print(env$a)
+
+dbDelete(db, "c")
+
+tryCatch(print(env$c), error = function(e) cat(as.character(e)))
+tryCatch(dbFetch(db, "c"), error = function(e) cat(as.character(e)))
+
+## Check trailing '/' problem
+dbCreate("testRDSdb", "RDS")
+db <- dbInit("testRDSdb/", "RDS")
+print(db)
+
+######################################################################
+## test filehashDB1 class
+
+dbCreate("mydb", "DB1")
+db <- dbInit("mydb", "DB1")
+
+## Put some data into it
+set.seed(1000)
+dbInsert(db, "a", 1:10)
+dbInsert(db, "b", rnorm(100))
+dbInsert(db, "c", 100:1)
+dbInsert(db, "d", runif(1000))
+dbInsert(db, "other", "hello")
+
+dbList(db)
+
+env <- db2env(db)
+ls(env)
+
+env$a
+env$b
+env$c
+str(env$d)
+env$other
+
+env$b <- rnorm(100)
+mean(env$b)
+
+env$a[1:5] <- 5:1
+print(env$a)
+
+dbDelete(db, "c")
+
+tryCatch(print(env$c), error = function(e) cat(as.character(e)))
+tryCatch(dbFetch(db, "c"), error = function(e) cat(as.character(e)))
+
+numbers <- rnorm(100)
+dbInsert(db, "numbers", numbers)
+b <- dbFetch(db, "numbers")
+stopifnot(all.equal(numbers, b))
+stopifnot(identical(numbers, b))
+
+################################################################################
+## Other tests
+
+rm(list = ls())
+
+
+dbCreate("testLoadingDB", "DB1")
+db <- dbInit("testLoadingDB", "DB1")
+
+set.seed(234)
+
+db$a <- rnorm(100)
+db$b <- runif(1000)
+
+dbLoad(db)  ## 'a', 'b'
+summary(a)
+summary(b)
+
+rm(list = ls())
+db <- dbInit("testLoadingDB", "DB1")
+
+dbLazyLoad(db)
+
+summary(a)
+summary(b)
+
+
+
+################################################################################
+## Check dbReorganize
+
+dbCreate("test_reorg", "DB1")
+db <- dbInit("test_reorg", "DB1")
+
+set.seed(1000)
+dbInsert(db, "a", 1)
+dbInsert(db, "a", 1)
+dbInsert(db, "a", 1)
+dbInsert(db, "a", 1)
+dbInsert(db, "b", rnorm(1000))
+dbInsert(db, "b", rnorm(1000))
+dbInsert(db, "b", rnorm(1000))
+dbInsert(db, "b", rnorm(1000))
+dbInsert(db, "c", runif(1000))
+dbInsert(db, "c", runif(1000))
+dbInsert(db, "c", runif(1000))
+dbInsert(db, "c", runif(1000))
+
+summary(db$b)
+summary(db$c)
+
+print(file.info(db at datafile)$size)
+
+dbReorganize(db)
+
+db <- dbInit("test_reorg", "DB1")
+
+print(file.info(db at datafile)$size)
+
+summary(db$b)
+summary(db$c)
+
+
+################################################################################
+## Taken from the vignette
+
+file.remove("mydb")
+
+dbCreate("mydb")
+db <- dbInit("mydb")
+
+set.seed(100)
+
+dbInsert(db, "a", rnorm(100))
+value <- dbFetch(db, "a")
+mean(value)
+
+dbInsert(db, "b", 123)
+dbDelete(db, "a")
+dbList(db)
+dbExists(db, "a")
+
+file.remove("mydb")
+
+################################################################################
+## Check queue
+
+db <- createQ("testq")
+push(db, 1)
+push(db, 2)
+top(db)
+
+pop(db)
+top(db)
diff --git a/tests/reg-tests.Rout.save b/tests/reg-tests.Rout.save
new file mode 100644
index 0000000..32f4b48
--- /dev/null
+++ b/tests/reg-tests.Rout.save
@@ -0,0 +1,304 @@
+
+R version 2.10.1 Patched (--)
+Copyright (C)  The R Foundation for Statistical Computing
+ISBN 3-900051-07-0
+
+R is free software and comes with ABSOLUTELY NO WARRANTY.
+You are welcome to redistribute it under certain conditions.
+Type 'license()' or 'licence()' for distribution details.
+
+R is a collaborative project with many contributors.
+Type 'contributors()' for more information and
+'citation()' on how to cite R or R packages in publications.
+
+Type 'demo()' for some demos, 'help()' for on-line help, or
+'help.start()' for an HTML browser interface to help.
+Type 'q()' to quit R.
+
+> suppressMessages(library(filehash))
+> 
+> ######################################################################
+> ## Test 'filehashRDS' class
+> 
+> dbCreate("mydbRDS", "RDS")
+[1] TRUE
+> db <- dbInit("mydbRDS", "RDS")
+> show(db)
+'filehashRDS' database 'mydbRDS'
+> 
+> ## Put some data into it
+> set.seed(1000)
+> dbInsert(db, "a", 1:10)
+> dbInsert(db, "b", rnorm(100))
+> dbInsert(db, "c", 100:1)
+> dbInsert(db, "d", runif(1000))
+> dbInsert(db, "other", "hello")
+> 
+> dbList(db)
+[1] "a"     "b"     "c"     "d"     "other"
+> 
+> dbExists(db, "e")
+[1] FALSE
+> dbExists(db, "a")
+[1] TRUE
+> 
+> env <- db2env(db)
+> ls(env)
+[1] "a"     "b"     "c"     "d"     "other"
+> 
+> env$a
+ [1]  1  2  3  4  5  6  7  8  9 10
+> env$b
+  [1] -0.44577826 -1.20585657  0.04112631  0.63938841 -0.78655436 -0.38548930
+  [7] -0.47586788  0.71975069 -0.01850562 -1.37311776 -0.98242783 -0.55448870
+ [13]  0.12138119 -0.12087232 -1.33604105  0.17005748  0.15507872  0.02493187
+ [19] -2.04658541  0.21315411  2.67007166 -1.22701601  0.83424733  0.53257175
+ [25] -0.64682496  0.60316126 -1.78384414  0.33494217  0.56097572  1.22093565
+ [31] -0.21145359  0.69942953 -0.70643668 -0.46515095 -1.76619861  0.18928860
+ [37] -0.36618068  1.05760118 -0.74162146 -1.34835905 -0.51730643  1.41173570
+ [43]  0.18546503 -0.04369144 -0.21591338  1.46377535  0.22966664  0.10762363
+ [49] -1.37810256 -0.96818288  0.25171138 -1.09469370  0.39764284 -0.99630200
+ [55]  0.10057801  0.95368028 -1.79032293  0.31170122  2.55398801 -0.86083776
+ [61]  0.54392844 -0.39233804  1.23544190  1.19608644 -0.49574690 -0.29434122
+ [67] -0.57349748  1.61920873 -0.95692767  0.04123712 -1.49831044  0.66095916
+ [73]  0.28545762  1.38886629 -0.15934361 -0.46091890  0.16843807  1.39549302
+ [79]  0.72842626  0.33508995  1.16927649  0.24796682 -0.35814947  1.38349332
+ [85]  0.41206917 -0.12300786 -0.06622931 -2.32249088 -1.04565650  2.05787502
+ [91]  1.97153237 -1.92099520  0.46212607 -0.16072406 -0.10421153  0.46783940
+ [97]  0.44392082  0.82855281 -0.38705012  2.01893816
+> env$c
+  [1] 100  99  98  97  96  95  94  93  92  91  90  89  88  87  86  85  84  83
+ [19]  82  81  80  79  78  77  76  75  74  73  72  71  70  69  68  67  66  65
+ [37]  64  63  62  61  60  59  58  57  56  55  54  53  52  51  50  49  48  47
+ [55]  46  45  44  43  42  41  40  39  38  37  36  35  34  33  32  31  30  29
+ [73]  28  27  26  25  24  23  22  21  20  19  18  17  16  15  14  13  12  11
+ [91]  10   9   8   7   6   5   4   3   2   1
+> str(env$d)
+ num [1:1000] 0.0854 0.3317 0.5647 0.4989 0.4549 ...
+> env$other
+[1] "hello"
+> 
+> env$b <- rnorm(100)
+> mean(env$b)
+[1] -0.02208835
+> 
+> env$a[1:5] <- 5:1
+> print(env$a)
+ [1]  5  4  3  2  1  6  7  8  9 10
+> 
+> dbDelete(db, "c")
+> 
+> tryCatch(print(env$c), error = function(e) cat(as.character(e)))
+Error in dbFetch(db, key): unable to obtain value for key 'c'
+> tryCatch(dbFetch(db, "c"), error = function(e) cat(as.character(e)))
+Error in dbFetch(db, "c"): unable to obtain value for key 'c'
+> 
+> ## Check trailing '/' problem
+> dbCreate("testRDSdb", "RDS")
+[1] TRUE
+> db <- dbInit("testRDSdb/", "RDS")
+> print(db)
+'filehashRDS' database 'testRDSdb'
+> 
+> ######################################################################
+> ## test filehashDB1 class
+> 
+> dbCreate("mydb", "DB1")
+[1] TRUE
+> db <- dbInit("mydb", "DB1")
+> 
+> ## Put some data into it
+> set.seed(1000)
+> dbInsert(db, "a", 1:10)
+> dbInsert(db, "b", rnorm(100))
+> dbInsert(db, "c", 100:1)
+> dbInsert(db, "d", runif(1000))
+> dbInsert(db, "other", "hello")
+> 
+> dbList(db)
+[1] "a"     "b"     "other" "c"     "d"    
+> 
+> env <- db2env(db)
+> ls(env)
+[1] "a"     "b"     "c"     "d"     "other"
+> 
+> env$a
+ [1]  1  2  3  4  5  6  7  8  9 10
+> env$b
+  [1] -0.44577826 -1.20585657  0.04112631  0.63938841 -0.78655436 -0.38548930
+  [7] -0.47586788  0.71975069 -0.01850562 -1.37311776 -0.98242783 -0.55448870
+ [13]  0.12138119 -0.12087232 -1.33604105  0.17005748  0.15507872  0.02493187
+ [19] -2.04658541  0.21315411  2.67007166 -1.22701601  0.83424733  0.53257175
+ [25] -0.64682496  0.60316126 -1.78384414  0.33494217  0.56097572  1.22093565
+ [31] -0.21145359  0.69942953 -0.70643668 -0.46515095 -1.76619861  0.18928860
+ [37] -0.36618068  1.05760118 -0.74162146 -1.34835905 -0.51730643  1.41173570
+ [43]  0.18546503 -0.04369144 -0.21591338  1.46377535  0.22966664  0.10762363
+ [49] -1.37810256 -0.96818288  0.25171138 -1.09469370  0.39764284 -0.99630200
+ [55]  0.10057801  0.95368028 -1.79032293  0.31170122  2.55398801 -0.86083776
+ [61]  0.54392844 -0.39233804  1.23544190  1.19608644 -0.49574690 -0.29434122
+ [67] -0.57349748  1.61920873 -0.95692767  0.04123712 -1.49831044  0.66095916
+ [73]  0.28545762  1.38886629 -0.15934361 -0.46091890  0.16843807  1.39549302
+ [79]  0.72842626  0.33508995  1.16927649  0.24796682 -0.35814947  1.38349332
+ [85]  0.41206917 -0.12300786 -0.06622931 -2.32249088 -1.04565650  2.05787502
+ [91]  1.97153237 -1.92099520  0.46212607 -0.16072406 -0.10421153  0.46783940
+ [97]  0.44392082  0.82855281 -0.38705012  2.01893816
+> env$c
+  [1] 100  99  98  97  96  95  94  93  92  91  90  89  88  87  86  85  84  83
+ [19]  82  81  80  79  78  77  76  75  74  73  72  71  70  69  68  67  66  65
+ [37]  64  63  62  61  60  59  58  57  56  55  54  53  52  51  50  49  48  47
+ [55]  46  45  44  43  42  41  40  39  38  37  36  35  34  33  32  31  30  29
+ [73]  28  27  26  25  24  23  22  21  20  19  18  17  16  15  14  13  12  11
+ [91]  10   9   8   7   6   5   4   3   2   1
+> str(env$d)
+ num [1:1000] 0.0854 0.3317 0.5647 0.4989 0.4549 ...
+> env$other
+[1] "hello"
+> 
+> env$b <- rnorm(100)
+> mean(env$b)
+[1] -0.02208835
+> 
+> env$a[1:5] <- 5:1
+> print(env$a)
+ [1]  5  4  3  2  1  6  7  8  9 10
+> 
+> dbDelete(db, "c")
+> 
+> tryCatch(print(env$c), error = function(e) cat(as.character(e)))
+Error in readSingleKey(con, map, key): unable to obtain value for key 'c'
+> tryCatch(dbFetch(db, "c"), error = function(e) cat(as.character(e)))
+Error in readSingleKey(con, map, key): unable to obtain value for key 'c'
+> 
+> numbers <- rnorm(100)
+> dbInsert(db, "numbers", numbers)
+> b <- dbFetch(db, "numbers")
+> stopifnot(all.equal(numbers, b))
+> stopifnot(identical(numbers, b))
+> 
+> ################################################################################
+> ## Other tests
+> 
+> rm(list = ls())
+> 
+> 
+> dbCreate("testLoadingDB", "DB1")
+[1] TRUE
+> db <- dbInit("testLoadingDB", "DB1")
+> 
+> set.seed(234)
+> 
+> db$a <- rnorm(100)
+> db$b <- runif(1000)
+> 
+> dbLoad(db)  ## 'a', 'b'
+> summary(a)
+     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
+-3.036000 -0.642100  0.172000  0.004131  0.614100  2.107000 
+> summary(b)
+    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
+0.004583 0.229900 0.478600 0.482200 0.729200 0.999800 
+> 
+> rm(list = ls())
+> db <- dbInit("testLoadingDB", "DB1")
+> 
+> dbLazyLoad(db)
+> 
+> summary(a)
+     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
+-3.036000 -0.642100  0.172000  0.004131  0.614100  2.107000 
+> summary(b)
+    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
+0.004583 0.229900 0.478600 0.482200 0.729200 0.999800 
+> 
+> 
+> 
+> ################################################################################
+> ## Check dbReorganize
+> 
+> dbCreate("test_reorg", "DB1")
+[1] TRUE
+> db <- dbInit("test_reorg", "DB1")
+> 
+> set.seed(1000)
+> dbInsert(db, "a", 1)
+> dbInsert(db, "a", 1)
+> dbInsert(db, "a", 1)
+> dbInsert(db, "a", 1)
+> dbInsert(db, "b", rnorm(1000))
+> dbInsert(db, "b", rnorm(1000))
+> dbInsert(db, "b", rnorm(1000))
+> dbInsert(db, "b", rnorm(1000))
+> dbInsert(db, "c", runif(1000))
+> dbInsert(db, "c", runif(1000))
+> dbInsert(db, "c", runif(1000))
+> dbInsert(db, "c", runif(1000))
+> 
+> summary(db$b)
+    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
+-2.76800 -0.65520 -0.06100 -0.01269  0.65240  3.73900 
+> summary(db$c)
+     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
+0.0002346 0.2416000 0.4813000 0.4938000 0.7492000 0.9992000 
+> 
+> print(file.info(db at datafile)$size)
+[1] 64980
+> 
+> dbReorganize(db)
+Reorganizing database: 33% (1/3)67% (2/3)100% (3/3)
+Finished; reload database with 'dbInit'
+[1] TRUE
+> 
+> db <- dbInit("test_reorg", "DB1")
+> 
+> print(file.info(db at datafile)$size)
+[1] 16245
+> 
+> summary(db$b)
+    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
+-2.76800 -0.65520 -0.06100 -0.01269  0.65240  3.73900 
+> summary(db$c)
+     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
+0.0002346 0.2416000 0.4813000 0.4938000 0.7492000 0.9992000 
+> 
+> 
+> ################################################################################
+> ## Taken from the vignette
+> 
+> file.remove("mydb")
+[1] TRUE
+> 
+> dbCreate("mydb")
+[1] TRUE
+> db <- dbInit("mydb")
+> 
+> set.seed(100)
+> 
+> dbInsert(db, "a", rnorm(100))
+> value <- dbFetch(db, "a")
+> mean(value)
+[1] 0.002912563
+> 
+> dbInsert(db, "b", 123)
+> dbDelete(db, "a")
+> dbList(db)
+[1] "b"
+> dbExists(db, "a")
+[1] FALSE
+> 
+> file.remove("mydb")
+[1] TRUE
+> 
+> ################################################################################
+> ## Check queue
+> 
+> db <- createQ("testq")
+> push(db, 1)
+> push(db, 2)
+> top(db)
+[1] 1
+> 
+> pop(db)
+[1] 1
+> top(db)
+[1] 2
+> 
diff --git a/tests/testdb-v1.1 b/tests/testdb-v1.1
new file mode 100644
index 0000000..ebeaf0d
Binary files /dev/null and b/tests/testdb-v1.1 differ
diff --git a/tests/testdb-v2.0 b/tests/testdb-v2.0
new file mode 100644
index 0000000..ebeaf0d
Binary files /dev/null and b/tests/testdb-v2.0 differ
diff --git a/tests/versions.R b/tests/versions.R
new file mode 100644
index 0000000..7c53cc1
--- /dev/null
+++ b/tests/versions.R
@@ -0,0 +1,22 @@
+## Test databases
+
+suppressMessages(library(filehash))
+
+testdblist <- dir(pattern = glob2rx("testdb-v*"))
+
+for(testname in testdblist) {
+        msg <- sprintf("DATABASE: %s\n", testname)
+        cat(paste(rep("=", nchar(msg)), collapse = ""), "\n")
+        cat(msg)
+        cat(paste(rep("=", nchar(msg)), collapse = ""), "\n")
+        db <- dbInit(testname, "DB1")
+        keys <- dbList(db)
+        print(keys)
+
+        for(k in keys) {
+                cat("key:", k, "\n")
+                val <- dbFetch(db, k)
+                print(val)
+                cat("\n")
+        }
+}
diff --git a/tests/versions.Rout.save b/tests/versions.Rout.save
new file mode 100644
index 0000000..20219f6
--- /dev/null
+++ b/tests/versions.Rout.save
@@ -0,0 +1,114 @@
+
+R version 2.10.1 Patched (--)
+Copyright (C)  The R Foundation for Statistical Computing
+ISBN 3-900051-07-0
+
+R is free software and comes with ABSOLUTELY NO WARRANTY.
+You are welcome to redistribute it under certain conditions.
+Type 'license()' or 'licence()' for distribution details.
+
+R is a collaborative project with many contributors.
+Type 'contributors()' for more information and
+'citation()' on how to cite R or R packages in publications.
+
+Type 'demo()' for some demos, 'help()' for on-line help, or
+'help.start()' for an HTML browser interface to help.
+Type 'q()' to quit R.
+
+> ## Test databases
+> 
+> suppressMessages(library(filehash))
+> 
+> testdblist <- dir(pattern = glob2rx("testdb-v*"))
+> 
+> for(testname in testdblist) {
++         msg <- sprintf("DATABASE: %s\n", testname)
++         cat(paste(rep("=", nchar(msg)), collapse = ""), "\n")
++         cat(msg)
++         cat(paste(rep("=", nchar(msg)), collapse = ""), "\n")
++         db <- dbInit(testname, "DB1")
++         keys <- dbList(db)
++         print(keys)
++ 
++         for(k in keys) {
++                 cat("key:", k, "\n")
++                 val <- dbFetch(db, k)
++                 print(val)
++                 cat("\n")
++         }
++ }
+====================== 
+DATABASE: testdb-v1.1
+====================== 
+[1] "a"     "c"     "list"  "entry"
+key: a 
+ [1] -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078 -0.8204684
+ [7]  0.4874291  0.7383247  0.5757814 -0.3053884
+
+key: c 
+[1] 1
+
+key: list 
+[[1]]
+[1] 1
+
+[[2]]
+[1] 2
+
+[[3]]
+[1] 3
+
+[[4]]
+[1] 4
+
+[[5]]
+[1] 5
+
+[[6]]
+[1] 6
+
+[[7]]
+[1] "a"
+
+
+key: entry 
+[1] "string"
+
+====================== 
+DATABASE: testdb-v2.0
+====================== 
+[1] "a"     "c"     "list"  "entry"
+key: a 
+ [1] -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078 -0.8204684
+ [7]  0.4874291  0.7383247  0.5757814 -0.3053884
+
+key: c 
+[1] 1
+
+key: list 
+[[1]]
+[1] 1
+
+[[2]]
+[1] 2
+
+[[3]]
+[1] 3
+
+[[4]]
+[1] 4
+
+[[5]]
+[1] 5
+
+[[6]]
+[1] 6
+
+[[7]]
+[1] "a"
+
+
+key: entry 
+[1] "string"
+
+> 
diff --git a/vignettes/combined.bib b/vignettes/combined.bib
new file mode 100644
index 0000000..b142fd3
--- /dev/null
+++ b/vignettes/combined.bib
@@ -0,0 +1,50 @@
+
+ at Manual{	  templelang:2002,
+  title		= {RObjectTables: User-level attach()'able table support},
+  author	= {Duncan {Temple Lang}},
+  year		= {2002},
+  note		= {{R} package version 0.3-1},
+  url		= {http://www.omegahat.org/RObjectTables}
+}
+
+ at Article{	  rnews:ripley:2004,
+  author	= {Brian D. Ripley},
+  title		= {Lazy Loading and Packages in {R} 2.0.0},
+  journal	= {R News},
+  year		= 2004,
+  volume	= 4,
+  number	= 2,
+  pages		= {2--4},
+  month		= {September},
+  url		= http,
+  pdf		= rnews2004-2
+}
+
+ at TechReport{	  cham:1991,
+  author	= {John M. Chambers},
+  title		= {Data Management in {S}},
+  institution	= {AT\&T Bell Laboratories Statistics Research},
+  year		= {1991},
+  number	= {99},
+  month		= {December},
+  note		= {http://stat.bell-labs.com/doc/93.15.ps}
+}
+
+ at Book{		  cham:1998,
+  author	= {John M. Chambers},
+  title		= {Programming with Data: A Guide to the {S} Language},
+  publisher	= {Springer},
+  year		= {1998}
+}
+
+ at Article{	  brahm:2002,
+  author	= {David E. Brahm},
+  title		= {Delayed Data Packages},
+  journal	= {R News},
+  year		= 2002,
+  volume	= 2,
+  number	= 3,
+  pages		= {11--12},
+  month		= {December},
+  url		= {http://CRAN.R-project.org/doc/Rnews/}
+}
diff --git a/vignettes/filehash.Rnw b/vignettes/filehash.Rnw
new file mode 100644
index 0000000..c82aead
--- /dev/null
+++ b/vignettes/filehash.Rnw
@@ -0,0 +1,443 @@
+\documentclass{article}
+
+%%\VignetteIndexEntry{The filehash Package}
+%%\VignetteDepends{filehash}
+
+\usepackage{charter}
+\usepackage{courier}
+\usepackage[noae]{Sweave}
+\usepackage[margin=1in]{geometry}
+\usepackage{natbib}
+
+\title{Interacting with Data using the \textbf{filehash} Package for
+R}
+
+\author{Roger D. Peng $<$rpeng at jhsph.edu$>$\\\textit{Department of
+Biostatistics}\\\textit{Johns Hopkins Bloomberg School of Public Health}}
+
+\date{}
+
+\newcommand{\pkg}{\textbf}
+\newcommand{\code}{\texttt}
+
+\begin{document}
+
+\maketitle
+
+\begin{abstract}
+The \pkg{filehash} package for R implements a simple key-value style
+database where character string keys are associated with data values
+that are stored on the disk.  A simple interface is provided for
+inserting, retrieving, and deleting data from the database.  Utilities
+are provided that allow \pkg{filehash} databases to be treated much
+like environments and lists are already used in R.  These utilities
+are provided to encourage interactive and exploratory analysis on
+large datasets.  Three different file formats for representing the
+database are currently available and new formats can easily be
+incorporated by third parties for use in the \pkg{filehash} framework.
+\end{abstract}
+
+<<options,results=hide,echo=false>>=
+options(width=60)
+@ 
+
+\section{Overview and Motivation}
+
+Working with large datasets in R can be cumbersome because of the need
+to keep objects in physical memory.  While many might generally see
+that as a feature of the system, the need to keep whole objects in
+memory creates challenges to those who might want to work
+interactively with large datasets.  Here we take a simple definition
+of ``large dataset'' to be any dataset that cannot be loaded into R as
+a single R object because of memory limitations.  For example, a very
+large data frame might be too large for all of the columns and rows to
+be loaded at once.  In such a situation, one might load only a subset
+of the rows or columns, if that is possible.
+
+In a key-value database, an arbitrary data object (a ``value'') has a
+``key'' associated with it, usually a character string.  When one
+requests the value associated with a particular key, it is the
+database's job to match up the key with the correct value and return
+the value to the requester.
+
+The most straightforward example of a key-value database in R is the
+global environment.  Every object in R has a name and a value
+associated with it.  When you execute at the R prompt
+<<exampleGlobalEnv,results=hide>>=
+x <- 1
+print(x)
+@ 
+the first line assigns the value 1 to the name/key ``x''.  The second
+line requests the value of ``x'' and prints out 1 to the console.  R
+handles the task of finding the appropriate value for ``x'' by
+searching through a series of environments, including the namespaces
+of the packages on the search list.
+
+In most cases, R stores the values associated with keys in memory, so
+that the value of \code{x} in the example above was stored in and
+retrieved from physical memory.  However, the idea of a key-value
+database can be generalized beyond this particular configuration.  For
+example, as of R 2.0.0, much of the R code for R packages is stored in
+a lazy-loaded database, where the values are initially stored on disk
+and loaded into memory on first access~\citep{Rnews:Ripley:2004}.
+Hence, when R starts up, it uses relatively little memory, while the
+memory usage increases as more objects are requested.  Data could also
+be stored on other computers (e.g. websites) and retrieved over the
+network.
+
+The general S language concept of a database is described in Chapter 5
+of the Green Book~\citep{cham:1998} and earlier in~\cite{cham:1991}.
+Although the S and R languages have different semantics with respect
+to how variable names are looked up and bound to values, the general
+concept of using a key-value database applies to both languages.
+Duncan Temple Lang has implemented this general database framework for
+R in the \pkg{RObjectTables} package of
+Omegahat~\citep{TempleLang:2002}. The \pkg{RObjectTables} package
+provides an interface for connecting R with arbitrary backend systems,
+allowing data values to be stored in potentially any format or
+location.  While the package itself does not include a specific
+implementation, some examples are provided on the package's website.
+
+The \pkg{filehash} package provides a full read-write implementation
+of a key-value database for R.  The package does not depend on any
+external packages (beyond those provided in a standard R installation)
+or software systems and is written entirely in R, making it readily
+usable on most platforms.  The \pkg{filehash} package can be thought
+of as a specific implementation of the database concept described
+in~\cite{cham:1991}, taking a slightly different approach to the
+problem.  Both~\cite{TempleLang:2002} and~\cite{cham:1991} focus on
+generalizing the notion of ``attach()-ing'' a database in an R/S
+session so that variable names can be looked up automatically via the
+search list.  The \pkg{filehash} package represents a database as an
+instance of an S4 class and operates directly on the S4 object via
+various methods.
+
+Key-value databases are sometimes called hash tables and indeed, the
+name of the package comes from the idea of having a ``file-based hash
+table''.  With \pkg{filehash} the values are stored in a file on the
+disk rather than in memory.  When a user requests the values
+associated with a key, \pkg{filehash} finds the object on the disk,
+loads the value into R and returns it to the user.  The package offers
+two formats for storing data on the disk: The values can be stored (1)
+concatenated together in a single file or (2) separately as a
+directory of files.
+
+
+
+
+\section{Related R packages}
+
+There are other packages on CRAN designed specifically to help users
+work with large datasets.  Two packages that come immediately to mind
+are the \pkg{g.data} package by David Brahm~\citep{brahm:2002} and the
+\pkg{biglm} package by Thomas Lumley.  The \pkg{g.data} package takes
+advantage of the lazy evaluation mechanism in R via the
+\code{delayedAssign} function.  Briefly, objects are loaded into R as
+promises to load the actual data associated with an object name.  The
+first time an object is requested, the promise is evaluated and the
+data are loaded.  From then on, the data reside in memory.  The
+mechanism used in \pkg{g.data} is similar to the one used by the
+lazy-loaded databases described in~\cite{Rnews:Ripley:2004}.  The
+\pkg{biglm} package allows users to fit linear models on datasets that
+are too large to fit in memory.  However, the \pkg{biglm} package does
+not provide methods for dealing with large datasets in general.  The
+\pkg{filehash} package also draws inspiration from Luke Tierney's
+experimental \pkg{gdbm} package which implements a key-value database
+via the GNU dbm (GDBM) library.  The use of GDBM creates an external
+dependence since the GDBM C library has to be compiled on each system.
+In addition, I encountered a problem where databases created on 32-bit
+machines could not be transferred to and read on 64-bit machines (and
+vice versa).  However, with the increasing use of 64-bit machines in
+the future, it seems this problem will eventually go away.
+
+The R Special Interest Group on Databases has developed a number of
+packages that provide an R interface to commonly used relational
+database management systems (RDBMS) such as MySQL (\pkg{RMySQL}),
+PostgreSQL (\pkg{RPgSQL}), and Oracle (\pkg{ROracle}).  These packages
+use the S4 classes and generics defined in the \pkg{DBI} package and
+have the advantage that they offer much better database functionality,
+inherited via the use of a true database management system.  However,
+this benefit comes with the cost of having to install and use
+third-party software.  While installing an RDBMS may not be an
+issue---many systems have them pre-installed and the \pkg{RSQLite}
+package comes bundled with the source for the RDBMS---the need for the
+RDBMS and knowledge of structured query language (SQL) nevertheless
+adds some overhead.  This overhead may serve as an impediment for
+users in need of a database for simpler applications.
+
+
+
+\section{Creating a filehash database}
+
+Databases can be created with \pkg{filehash} using the \code{dbCreate}
+function.  The one required argument is the name of the database,
+which we call here ``mydb''.  
+<<create>>=
+library(filehash)
+dbCreate("mydb")
+db <- dbInit("mydb")
+@ 
+You can also specify the \code{type} argument which controls how the
+database is represented on the backend.  We will discuss the different
+backends in further detail later.  For now, we use the default backend
+which is called ``DB1''.  
+
+Once the database is created, it must be initialized in order to be
+accessed.  The \code{dbInit} function returns an S4 object inheriting
+from class ``filehash''.  Since this is a newly created database,
+there are no objects in it.
+
+\section{Accessing a filehash database}
+
+<<setseed1,results=hide,echo=false>>=
+set.seed(100)
+@ 
+
+The primary interface to filehash databases consists of the functions
+\code{dbFetch}, \code{dbInsert}, \code{dbExists}, \code{dbList}, and
+\code{dbDelete}.  These functions are all generic---specific methods
+exists for each type of database backend.  They all take as their
+first argument an object of class ``filehash''.  To insert some data
+into the database we can simply call \code{dbInsert}
+<<insert>>=
+dbInsert(db, "a", rnorm(100))
+@ 
+Here we have associated with the key ``a'' 100 standard normal random
+variates.  We can retrieve those values with \code{dbFetch}.
+<<fetch>>=
+value <- dbFetch(db, "a")
+mean(value)
+@ 
+
+The function \code{dbList} lists all of the keys that are available in
+the database, \code{dbExists} tests to see if a given key is in the
+database, and \code{dbDelete} deletes a key-value pair from the
+database
+<<delete>>=
+dbInsert(db, "b", 123)
+dbDelete(db, "a")
+dbList(db)
+dbExists(db, "a")
+@ 
+
+While using functions like \code{dbInsert} and \code{dbFetch} is
+straightforward it can often be easier on the fingers to use standard
+R subset and accessor functions like \code{\$}, \code{[[}, and
+\code{[}. Filehash databases have methods for these functions so that
+objects can be accessed in a more compact manner. Similarly,
+replacement methods for these functions are also available. The
+\verb+[+ function can be used to access multiple objects from the
+database, in which case a list is returned.
+
+<<accessors>>=
+db$a <- rnorm(100, 1)
+mean(db$a)
+mean(db[["a"]])
+db$b <- rnorm(100, 2)
+dbList(db)
+@ 
+For all of the accessor functions, only character indices are allowed.
+Numeric indices are caught and an error is given.
+<<characteronly>>=
+e <- local({
+    err <- function(e) e
+    tryCatch(db[[1]], error = err)
+})
+conditionMessage(e)
+@ 
+Finally, there is method for the \code{with} generic function which
+operates much like using \code{with} on lists or environments.  
+
+The following three statements all return the same value.
+<<with>>=
+with(db, c(a = mean(a), b = mean(b)))
+@ 
+When using \code{with}, the values of ``a'' and ``b'' are looked up in
+the database.
+<<sapply>>=
+sapply(db[c("a", "b")], mean)
+@ 
+Here, using \code{[} on \code{db} returns a list with the values
+associated with ``a'' and ``b''.  Then \code{sapply} is applied in the
+usual way on the returned list.
+<<lapply>>=
+unlist(lapply(db, mean))
+@ 
+In the last statement we call \code{lapply} directly on the
+``filehash'' object.  The \pkg{filehash} package defines a method for
+\code{lapply} that allows the user to apply a function on all the
+elements of a database directly.  The method essentially loops through
+all the keys in the database, loads each object separately and applies
+the supplied function to each object.  \code{lapply} returns a named
+list with each element being the result of applying the supplied
+function to an object in the database.  There is an argument
+\code{keep.names} to the \code{lapply} method which, if set to
+\code{FALSE}, will drop all the names from the list.
+
+<<cleanupMyDB,results=hide,echo=false>>=
+dbUnlink(db)
+rm(list = ls(all = TRUE))
+@ 
+
+\section{Loading filehash databases}
+
+<<setseed2,results=hide,echo=false>>=
+set.seed(200)
+@ 
+
+An alternative way of working with a filehash database is to load it
+into an environment and access the element names directly, without
+having to use any of the accessor functions.  The \pkg{filehash}
+function \code{dbLoad} works much like the standard R \code{load}
+function except that \code{dbLoad} loads active bindings into a given
+environment rather than the actual data.  The active bindings are
+created via the \code{makeActiveBinding} function in the \pkg{base}
+package.  \code{dbLoad} takes a filehash database and creates symbols
+in an environment corresponding to the keys in the database.  It then
+calls \code{makeActiveBinding} to associate with each key a function
+which loads the data associated with a given key.  Conceptually,
+active bindings are like pointers to the database.  After calling
+\code{dbLoad}, anytime an object with an active binding is accessed
+the associated function (installed by \code{makeActiveBinding}) loads
+the data from the database.
+
+We can create a simple database to demonstrate the active binding
+mechanism.
+<<testDB>>=
+dbCreate("testDB")
+db <- dbInit("testDB")
+db$x <- rnorm(100)
+db$y <- runif(100)
+db$a <- letters
+dbLoad(db)
+ls()
+@ 
+Notice that we appear to have some additional objects in our
+workspace.  However, the values of these objects are not stored in
+memory---they are stored in the database.  When one of the objects is
+accessed, the value is automatically loaded from the database.
+<<accessbinding>>=
+mean(y)
+sort(a)
+@ 
+If I assign a different value to one of these objects, its
+associated value is updated in the database via the active binding
+mechanism.
+<<assignvalue>>=
+y <- rnorm(100, 2)
+mean(y)
+@ 
+If I subsequently remove the database and reload it later, the
+updated value for ``y'' persists.
+<<removeandload>>=
+rm(list = ls())
+db <- dbInit("testDB")
+dbLoad(db)
+ls()
+mean(y)
+@ 
+
+Perhaps one disadvantage of the active binding approach taken here is
+that whenever an object is accessed, the data must be reloaded into R.
+This behavior is distinctly different from the the delayed assignment
+approach taken in \pkg{g.data} where an object must only be loaded
+once and then is subsequently in memory.  However, when using delayed
+assignments, if one cycles through all of the objects in the database,
+one could eventually exhaust the available memory.
+
+<<cleanupTestDB,results=hide,echo=false>>=
+dbUnlink(db)
+rm(list = ls(all = TRUE))
+@ 
+
+\section{Other filehash utilities}
+
+There are a few other utilities included with the \pkg{filehash}
+package.  Two of the utilities, \code{dumpObjects} and
+\code{dumpImage}, are analogues of \code{save} and \code{save.image}.
+Rather than save objects to an R workspace, \code{dumpObjects} saves
+the given objects to a ``filehash'' database so that in the future,
+individual objects can be reloaded if desired.  Similarly,
+\code{dumpImage} saves the entire workspace to a ``filehash''
+database.
+
+The function \code{dumpList} takes a list and creates a ``filehash''
+database with values from the list.  The list must have a non-empty
+name for every element in order for \code{dumpList} to succeed.
+\code{dumpDF} creates a ``filehash'' database from a data frame where
+each column of the data frame is an element in the database.
+Essentially, \code{dumpDF} converts the data frame to a list and calls
+\code{dumpList}.
+
+
+\section{Filehash database backends}
+
+Currently, the \pkg{filehash} package can represent databases in two 
+different formats.  The default format is called ``DB1'' and it stores
+the keys and values in a single file.  From experience, this format
+works well overall but can be a little slow to initialize when there
+are many thousands of keys.  Briefly, the ``filehash'' object in R
+stores a map which associates keys with a byte location in the
+database file where the corresponding value is stored.  Given the byte
+location, we can \code{seek} to that location in the file and read the
+data directly.  Before reading in the data, a check is made to make
+sure that the map is up to date.  This format depends critically on
+having a working \code{ftell} at the system level and a crude check is
+made when trying to initialize a database of this format.
+
+The second format is called ``RDS'' and it stores objects as separate
+files on the disk in a directory with the same name as the database.
+This format is the most straightforward and simple of the available
+formats.  When a request is made for a specific key, \pkg{filehash}
+finds the appropriate file in the directory and reads the file into R.
+The only catch is that on operating systems that use case-insensitive
+file names, objects whose names differ only in case will collide on
+the filesystem.  To workaround this, object names with capital letters
+are stored with mangled names on the disk.  An advantage of this
+format is that most of the organizational work is delegated to the
+filesystem.
+
+
+\section{Extending filehash}
+
+The \pkg{filehash} package has a mechanism for developing new backend
+formats, should the need arise.  The function \code{registerFormatDB}
+can be used to make \pkg{filehash} aware of a new database format that
+may be implemented in a separate R package or a file.
+\code{registerFormatDB} takes two arguments: a \code{name} for the new
+format (like ``DB1'' or ``RDS'') and a list of functions.  The list
+should contain two functions: one function named ``create'' for
+creating a database, given the database name, and another function
+named ``initialize'' for initializing the database.  In addition, one
+needs to define methods for \code{dbInsert}, \code{dbFetch}, etc.
+
+A list of available backend formats can be obtained via the
+\code{filehashFormats} function.  Upon registering a new backend
+format, the new format will be listed when \code{filehashFormats} is
+called.
+
+The interface for registering new backend formats is still
+experimental and could change in the future.
+
+
+\section{Discussion}
+
+The \pkg{filehash} package has been designed be useful in both a
+programming setting and an interactive setting.  Its main purpose is
+to allow for simpler interaction with large datasets where
+simultaneous access to the full dataset is not needed.  While the
+package may not be optimal for all settings, one goal was to write a
+simple package in pure R that users to could install with minimal
+overhead.  In the future I hope to add functionality for interacting
+with databases stored on remote computers and perhaps incorporate a
+``real'' database backend.  Some work has already begun on developing
+a backend based on the \pkg{RSQLite} package.
+
+
+
+\bibliographystyle{alpha}
+\bibliography{combined}
+
+
+\end{document}
+

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/r-cran-filehash.git



More information about the debian-med-commit mailing list