[med-svn] [r-bioc-ensembldb] 01/05: New upstream version 2.0.4

Andreas Tille tille at debian.org
Thu Oct 19 20:57:50 UTC 2017


This is an automated email from the git hooks/post-receive script.

tille pushed a commit to branch master
in repository r-bioc-ensembldb.

commit cc1e361604fdf4ddc49fac5c884eab27411a3896
Author: Andreas Tille <tille at debian.org>
Date:   Thu Oct 19 19:08:44 2017 +0200

    New upstream version 2.0.4
---
 DESCRIPTION                                        |   39 +-
 NAMESPACE                                          |  275 ++++-
 R/Classes.R                                        |  722 ++++++------
 R/Deprecated.R                                     |  182 +++
 R/Generics.R                                       |  211 ++--
 R/Methods-Filter.R                                 | 1173 +++++--------------
 R/Methods.R                                        | 1207 ++++++++++++--------
 R/dbhelpers.R                                      |  629 ++++++----
 R/functions-Filter.R                               |  324 ++++++
 R/{EnsDbFromGTF.R => functions-create-EnsDb.R}     |  880 +++++++++++---
 R/functions-utils.R                                |  176 ++-
 R/loadEnsDb.R                                      |    5 -
 R/makeEnsemblDbPackage.R                           |  213 ----
 R/runEnsDbApp.R                                    |   10 -
 R/select-methods.R                                 |  220 ++--
 build/vignette.rds                                 |  Bin 325 -> 367 bytes
 inst/NEWS                                          |  121 +-
 inst/doc/MySQL-backend.R                           |    4 +-
 inst/doc/MySQL-backend.Rmd                         |   18 +-
 inst/doc/MySQL-backend.html                        |  137 ++-
 inst/doc/ensembldb.R                               |  204 ++--
 inst/doc/ensembldb.Rmd                             |  569 +++++----
 inst/doc/ensembldb.html                            |  744 +++++++-----
 inst/doc/proteins.R                                |   94 ++
 inst/doc/proteins.Rmd                              |  273 +++++
 inst/doc/proteins.html                             |  369 ++++++
 inst/extended_tests/extended_tests.R               |  855 ++++++++++++++
 inst/extended_tests/performance_tests.R            |  173 +++
 inst/gff/Devosia_geojensis.ASM96941v1.32.gff3.gz   |  Bin 0 -> 272773 bytes
 inst/gtf/Devosia_geojensis.ASM96941v1.32.gtf.gz    |  Bin 0 -> 269508 bytes
 inst/perl/get_gene_transcript_exon_tables.pl       |  122 +-
 inst/perl/test_script.pl                           |   78 ++
 inst/scripts/checkEnsDbs.R                         |   22 +
 inst/scripts/generate-EnsDBs.R                     |  321 ++++++
 inst/shinyHappyPeople/server.R                     |    6 +-
 inst/test/testFunctionality.R                      |  293 -----
 inst/test/testInternals.R                          |  146 ---
 inst/unitTests/test_Filters.R                      |  241 ----
 inst/unitTests/test_Functionality.R                |  507 --------
 inst/unitTests/test_GFF.R                          |  179 ---
 inst/unitTests/test_GRangeFilter.R                 |  102 --
 inst/unitTests/test_SymbolFilter.R                 |   58 -
 inst/unitTests/test_buildEdb.R                     |   45 -
 inst/unitTests/test_getGenomeFaFile.R              |   49 -
 inst/unitTests/test_get_sequence.R                 |  189 ---
 inst/unitTests/test_mysql.R                        |   24 -
 inst/unitTests/test_ordering.R                     |  280 -----
 inst/unitTests/test_performance.R                  |   62 -
 inst/unitTests/test_select.R                       |  229 ----
 inst/unitTests/test_transcript_lengths.R           |  140 ---
 inst/unitTests/test_ucscChromosomeNames.R          |  508 --------
 inst/unitTests/test_validity.R                     |   11 -
 inst/unitTests/test_xByOverlap.R                   |  102 --
 man/Deprecated.Rd                                  |   92 ++
 man/EnsDb-AnnotationDbi.Rd                         |   62 +-
 man/EnsDb-class.Rd                                 |   69 +-
 man/EnsDb-exonsBy.Rd                               |  217 ++--
 man/EnsDb-lengths.Rd                               |   40 +-
 man/EnsDb-seqlevels.Rd                             |   11 +-
 man/EnsDb-sequences.Rd                             |    3 +-
 man/EnsDb-utils.Rd                                 |   28 +-
 man/EnsDb.Rd                                       |    1 -
 man/Filter-classes.Rd                              |  350 ++++++
 man/GeneidFilter-class.Rd                          |  451 --------
 man/ProteinFunctionality.Rd                        |  115 ++
 man/SeqendFilter.Rd                                |  237 ----
 man/hasProteinData-EnsDb-method.Rd                 |   32 +
 man/listEnsDbs.Rd                                  |    7 +-
 man/makeEnsemblDbPackage.Rd                        |   10 +-
 man/useMySQL-EnsDb-method.Rd                       |    3 +-
 readme.md                                          |   16 +
 tests/runTests.R                                   |    1 -
 tests/testthat.R                                   |    6 +
 tests/testthat/test_Classes.R                      |   85 ++
 tests/testthat/test_Methods-Filter.R               |  515 +++++++++
 .../test_Methods-with-returnFilterColumns.R        |  276 ++---
 tests/testthat/test_Methods.R                      |  893 +++++++++++++++
 tests/testthat/test_Protein-related-tests.R        |  253 ++++
 tests/testthat/test_SymbolFilter.R                 |   99 ++
 tests/testthat/test_dbhelpers.R                    |  405 +++++++
 tests/testthat/test_extractTranscriptSeqs.R        |   65 ++
 tests/testthat/test_functions-Filter.R             |  226 ++++
 tests/testthat/test_functions-create-EnsDb.R       |  234 ++++
 tests/testthat/test_functions-utils.R              |  106 ++
 tests/testthat/test_select-methods.R               |  414 +++++++
 tests/testthat/test_seqLevelStyle.R                |  445 ++++++++
 tests/testthat/test_validity.R                     |   20 +
 vignettes/MySQL-backend.Rmd                        |   18 +-
 vignettes/MySQL-backend.org                        |   29 +-
 vignettes/ensembldb.Rmd                            |  569 +++++----
 vignettes/ensembldb.org                            | 1187 +++++++++++++++----
 vignettes/images/dblayout.png                      |  Bin 444031 -> 204300 bytes
 vignettes/proteins.Rmd                             |  273 +++++
 vignettes/proteins.org                             |  485 ++++++++
 94 files changed, 13852 insertions(+), 8037 deletions(-)

diff --git a/DESCRIPTION b/DESCRIPTION
index 094a900..3d9115c 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -1,19 +1,29 @@
 Package: ensembldb
 Type: Package
-Title: Utilities to create and use an Ensembl based annotation database
-Version: 1.6.2
-Author: Johannes Rainer <johannes.rainer at eurac.edu>,
-    Tim Triche <tim.triche at usc.edu>
+Title: Utilities to create and use Ensembl-based annotation databases
+Version: 2.0.4
+Authors at R: c(person(given = "Johannes", family = "Rainer",
+	   email = "johannes.rainer at eurac.edu",
+	   role = c("aut", "cre")),
+	   person(given = "Tim", family = "Triche",
+	   email = "tim.triche at usc.edu",
+	   role = "ctb"),
+	   person(given = "Christian", family = "Weichenberger",
+	   email = "christian.weichenberger at eurac.edu", role = "ctb"))
+Author: Johannes Rainer <johannes.rainer at eurac.edu> with contributions
+	from Tim Triche and Christian Weichenberger.
 Maintainer: Johannes Rainer <johannes.rainer at eurac.edu>
 URL: https://github.com/jotsetung/ensembldb
 BugReports: https://github.com/jotsetung/ensembldb/issues
-Imports: methods, RSQLite, DBI, Biobase, GenomeInfoDb, AnnotationDbi
-        (>= 1.31.19), rtracklayer, S4Vectors, AnnotationHub, Rsamtools,
-        IRanges
+Imports: methods, RSQLite (>= 1.1), DBI, Biobase, GenomeInfoDb,
+        AnnotationDbi (>= 1.31.19), rtracklayer, S4Vectors,
+        AnnotationHub, Rsamtools, IRanges, ProtGenerics, Biostrings,
+        curl
 Depends: BiocGenerics (>= 0.15.10), GenomicRanges (>= 1.23.21),
-        GenomicFeatures (>= 1.23.18)
-Suggests: BiocStyle, knitr, rmarkdown, EnsDb.Hsapiens.v75 (>= 0.99.7),
-        RUnit, shiny, Gviz, BSgenome.Hsapiens.UCSC.hg19
+        GenomicFeatures (>= 1.23.18), AnnotationFilter (>= 0.99.7)
+Suggests: BiocStyle, knitr, rmarkdown, EnsDb.Hsapiens.v75 (>= 0.99.8),
+        shiny, testthat, BSgenome.Hsapiens.UCSC.hg19, ggbio (>=
+        1.24.0), Gviz (>= 1.20.0)
 Enhances: RMySQL
 VignetteBuilder: knitr
 Description: The package provides functions to create and use
@@ -27,11 +37,10 @@ Description: The package provides functions to create and use
     specific entries like genes encoded on a chromosome region or
     transcript models of lincRNA genes.
 Collate: Classes.R Generics.R functions-utils.R dbhelpers.R Methods.R
-        Methods-Filter.R loadEnsDb.R makeEnsemblDbPackage.R
-        EnsDbFromGTF.R runEnsDbApp.R select-methods.R seqname-utils.R
-        zzz.R
+        functions-Filter.R Methods-Filter.R functions-create-EnsDb.R
+        select-methods.R seqname-utils.R Deprecated.R zzz.R
 biocViews: Genetics, AnnotationData, Sequencing, Coverage
 License: LGPL
-RoxygenNote: 5.0.1
+RoxygenNote: 6.0.1
 NeedsCompilation: no
-Packaged: 2016-11-17 00:52:31 UTC; biocbuild
+Packaged: 2017-08-04 23:59:15 UTC; biocbuild
diff --git a/NAMESPACE b/NAMESPACE
index 02aa5b2..011bef1 100644
--- a/NAMESPACE
+++ b/NAMESPACE
@@ -1,63 +1,234 @@
 ## ensembldb NAMESPACE
 import(methods)
 
-importFrom("utils", "read.table", "str")
-import(BiocGenerics)
-import(S4Vectors)
-importFrom(DBI, dbDriver)
-importFrom(Biobase, createPackage)
-importFrom(GenomeInfoDb, Seqinfo, isCircular, genome, seqlengths, seqnames, seqlevels,
-           keepSeqlevels, seqlevelsStyle, "seqlevelsStyle<-", genomeStyles)
-importMethodsFrom(AnnotationDbi, dbconn, columns, keytypes, keys, select, mapIds)
-importFrom(rtracklayer, import)
-import(RSQLite)
-import(GenomicFeatures)
-##importMethodsFrom(GenomicFeatures, extractTranscriptSeqs)
-import(GenomicRanges)
-importFrom(IRanges, IRanges)
-importMethodsFrom(IRanges,subsetByOverlaps)
+importFrom("utils",
+           "read.table",
+           "str",
+           "download.file")
+import("BiocGenerics")
+importFrom("Biobase",
+           "createPackage",
+           "validMsg")
+importMethodsFrom("ProtGenerics",
+                  "proteins")
+import("S4Vectors")
+importFrom("DBI",
+           "dbDriver")
+import("RSQLite")
+importMethodsFrom("AnnotationDbi",
+                  "columns",
+                  "dbconn",
+                  "keys",
+                  "keytypes",
+                  "mapIds",
+                  "select")
+importFrom("GenomeInfoDb",
+           "Seqinfo",
+           "isCircular",
+           "genome",
+           "seqlengths",
+           "seqnames",
+           "seqlevels",
+           "keepSeqlevels",
+           "seqlevelsStyle",
+           "seqlevelsStyle<-",
+           "genomeStyles")
+importFrom("rtracklayer",
+           "import")
+
+## Ranges and stuff
+importFrom("IRanges",
+           "IRanges",
+           "IRangesList")
+importMethodsFrom("IRanges",
+                  "subsetByOverlaps")
+import("GenomicRanges")
+import("GenomicFeatures")
+
 ## AnnotationHub
-importFrom(AnnotationHub, AnnotationHub)
-importClassesFrom(AnnotationHub, AnnotationHub)
-importMethodsFrom(AnnotationHub, query, mcols)
+importFrom("AnnotationHub",
+           "AnnotationHub")
+importClassesFrom("AnnotationHub",
+                  "AnnotationHub")
+importMethodsFrom("AnnotationHub",
+                  "query",
+                  "mcols")
 ## Rsamtools
-importClassesFrom(Rsamtools, FaFile, RsamtoolsFile)
-importFrom(Rsamtools, FaFile)
-importMethodsFrom(Rsamtools, getSeq, indexFa, path)
-importFrom(Rsamtools, index)
-
-## biovizBase
-##importMethodsFrom(biovizBase, crunch)
-
-#exportPattern("^[[:alpha:]]+")
-export(fetchTablesFromEnsembl, makeEnsemblSQLiteFromTables, makeEnsembldbPackage,
-       ensDbFromGtf, ensDbFromGff, ensDbFromGRanges, ensDbFromAH, runEnsDbApp,
-       listEnsDbs)
-exportClasses(EnsDb, BasicFilter, EntrezidFilter, GeneidFilter, GenebiotypeFilter,
-              GenenameFilter, TxidFilter, TxbiotypeFilter, ExonidFilter,
-              SeqnameFilter, SeqstrandFilter, SeqstartFilter, SeqendFilter,
-              GRangesFilter, ExonrankFilter, SymbolFilter)
-## for EnsFilter
-exportMethods(column, print, show, value, where, "condition<-", "value<-",
-              seqnames, start, end, strand, seqlevels)
-## for class EnsDb:
-exportMethods(dbconn, condition, buildQuery, ensemblVersion, exons, exonsBy, genes,
-              getGenomeFaFile, lengthOf, listColumns, listGenebiotypes, listTxbiotypes,
-              listTables, organism, seqinfo, toSAF, transcripts, transcriptsBy,
-              disjointExons, metadata, promoters, cdsBy, fiveUTRsByTranscript,
-              threeUTRsByTranscript, getGeneRegionTrackForGviz, updateEnsDb,
-              transcriptsByOverlaps, exonsByOverlaps, returnFilterColumns,
-              "returnFilterColumns<-", useMySQL)
+importClassesFrom("Rsamtools",
+                  "FaFile",
+                  "RsamtoolsFile")
+importFrom("Rsamtools",
+           "FaFile")
+importMethodsFrom("Rsamtools",
+                  "getSeq",
+                  "indexFa",
+                  "path")
+importFrom("Rsamtools",
+           "index")
+
+## Stuff needed for protein annotations.
+importClassesFrom("Biostrings",
+                  "AAStringSet")
+importFrom("Biostrings",
+           "AAStringSet")
+
+importFrom("curl",
+           "curl")
+
+## AnnotationFilter
+importClassesFrom("AnnotationFilter",
+                  "AnnotationFilter",
+                  "CharacterFilter",
+                  "IntegerFilter",
+                  "ExonIdFilter",
+                  "ExonRankFilter",
+                  "ExonStartFilter",
+                  "ExonEndFilter",
+                  "GeneIdFilter",
+                  "GenenameFilter",
+                  "GeneBiotypeFilter",
+                  "GeneStartFilter",
+                  "GeneEndFilter",
+                  "EntrezFilter",
+                  "SymbolFilter",
+                  "TxIdFilter",
+                  "TxNameFilter",
+                  "TxBiotypeFilter",
+                  "TxStartFilter",
+                  "TxEndFilter",
+                  "ProteinIdFilter",
+                  "UniprotFilter",
+                  "SeqNameFilter",
+                  "SeqStrandFilter",
+                  "GRangesFilter",
+                  "AnnotationFilterList"
+                  )
+importMethodsFrom("AnnotationFilter",
+                  "field",
+                  "value",
+                  "condition",
+                  "supportedFilters")
+importFrom("AnnotationFilter",
+           "AnnotationFilter",
+           "ExonIdFilter",
+           "ExonRankFilter",
+           "ExonStartFilter",
+           "ExonEndFilter",
+           "GeneIdFilter",
+           "GenenameFilter",
+           "GeneBiotypeFilter",
+           "GeneStartFilter",
+           "GeneEndFilter",
+           "EntrezFilter",
+           "SymbolFilter",
+           "TxIdFilter",
+           "TxNameFilter",
+           "TxBiotypeFilter",
+           "TxStartFilter",
+           "TxEndFilter",
+           "ProteinIdFilter",
+           "UniprotFilter",
+           "SeqNameFilter",
+           "SeqStrandFilter",
+           "GRangesFilter",
+           "AnnotationFilterList",
+           "feature"
+           )
+
+## Functions
+export("ensDbFromAH",
+       "ensDbFromGff",
+       "ensDbFromGRanges",
+       "ensDbFromGtf",
+       "fetchTablesFromEnsembl",
+       "listEnsDbs",
+       "makeEnsemblSQLiteFromTables",
+       "makeEnsembldbPackage",
+       "runEnsDbApp"
+       )
+## Classes
+exportClasses(
+              "EnsDb",
+              "ProtDomIdFilter",
+              "UniprotDbFilter",
+              "UniprotMappingTypeFilter",
+              "OnlyCodingTxFilter"
+              )
+## Methods for EnsFilter
+exportMethods(
+    "seqnames",
+    "seqlevels",
+    "show"
+)
+## Methods for class EnsDb:
+exportMethods("cdsBy",
+              "dbconn",
+              "disjointExons",
+              "ensemblVersion",
+              "exons",
+              "exonsBy",
+              "exonsByOverlaps",
+              "fiveUTRsByTranscript",
+              "genes",
+              "getGeneRegionTrackForGviz",
+              "getGenomeFaFile",
+              "lengthOf",
+              "listColumns",
+              "listGenebiotypes",
+              "listTxbiotypes",
+              "listTables",
+              "metadata",
+              "organism",
+              "promoters",
+              "returnFilterColumns",
+              "returnFilterColumns<-",
+              "seqinfo",
+              "threeUTRsByTranscript",
+              "toSAF",
+              "transcripts",
+              "transcriptsBy",
+              "transcriptsByOverlaps",
+              "updateEnsDb",
+              "useMySQL",
+              "supportedFilters"
+              )
+## Protein data related stuff
+exportMethods("hasProteinData",
+              "proteins",
+              "listUniprotDbs",
+              "listUniprotMappingTypes")
+export("listProteinColumns")
 ## Methods for AnnotationDbi
-exportMethods(columns, keytypes, keys, select, mapIds)
+exportMethods("columns",
+              "keytypes",
+              "keys",
+              "select",
+              "mapIds")
 ## Methods for GenomeInfoDb and related stuff
-exportMethods("seqlevelsStyle", "seqlevelsStyle<-", "supportedSeqlevelsStyles",
-              seqlevels)
+exportMethods("seqlevelsStyle",
+              "seqlevelsStyle<-",
+              "supportedSeqlevelsStyles",
+              "seqlevels")
 
-## constructors
-export(EntrezidFilter, GeneidFilter, GenenameFilter, GenebiotypeFilter, TxidFilter,
-       TxbiotypeFilter, ExonidFilter, SeqnameFilter, SeqstrandFilter, SeqstartFilter,
-       SeqendFilter, EnsDb, GRangesFilter, ExonrankFilter, SymbolFilter)
+## Constructor functions:
+export(
+    "EnsDb",
+    "EntrezidFilter",
+    "ExonidFilter",
+    "ExonrankFilter",
+    "GeneidFilter",
+    "GenebiotypeFilter",
+    "SeqnameFilter",
+    "SeqstrandFilter",
+    "SeqstartFilter",
+    "SeqendFilter",
+    "TxidFilter",
+    "TxbiotypeFilter",
+    "UniprotDbFilter",
+    "UniprotMappingTypeFilter",
+    "OnlyCodingTxFilter",
+    "ProtDomIdFilter"
+)
 
 
 
diff --git a/R/Classes.R b/R/Classes.R
index ffa266a..1ca0314 100644
--- a/R/Classes.R
+++ b/R/Classes.R
@@ -8,397 +8,361 @@
 setClass("EnsDb",
          representation(ensdb="DBIConnection", tables="list", .properties="list"),
          prototype=list(ensdb=NULL, tables=list(), .properties=list())
-        )
+         )
 
+#' @title Filters supported by ensembldb
+#'
+#' @description \code{ensembldb} supports most of the filters from the
+#'     \code{\link{AnnotationFilter}} package to retrieve specific content from
+#'     \code{\linkS4class{EnsDb}} databases.
+#'
+#' @note For users of \code{ensembldb} version < 2.0: in the
+#'     \code{\link[AnnotationFilter]{GRangesFilter}} from the
+#'     \code{AnnotationFilter} package the \code{condition} parameter was
+#'     renamed to \code{type} (to be consistent with the \code{IRanges} package)
+#'     . In addition, the \code{condition = "overlapping"} is no longer
+#'     recognized. To retrieve all features overlapping the range
+#'     \code{type = "any"} has to be used.
+#'     
+#' @details \code{ensembldb} supports the following filters from the
+#' \code{AnnotationFilter} package:
+#' 
+#' \describe{
+#' 
+#' \item{GeneIdFilter}{
+#'     filter based on the Ensembl gene ID.
+#' }
+#'
+#' \item{GenenameFilter}{
+#'     filter based on the name of the gene as provided by Ensembl. In most cases
+#'     this will correspond to the official gene symbol.
+#' }
+#'
+#' \item{SymbolFilter}{
+#'     filter based on the gene names. \code{\linkS4class{EnsDb}} objects don't
+#'     have a dedicated \emph{symbol} column, the filtering is hence based on the
+#'     gene names.
+#' }
+#'
+#' \item{GeneBiotype}{
+#'     filter based on the biotype of genes (e.g. \code{"protein_coding"}).
+#' }
+#'
+#' \item{GeneStartFilter}{
+#'     filter based on the genomic start coordinate of genes.
+#' }
+#' 
+#' \item{GeneEndFilter}{
+#'     filter based on the genomic end coordinate of genes.
+#' }
+#' 
+#' \item{EntrezidFilter}{
+#'     filter based on the genes' NCBI Entrezgene ID.
+#' }
+#' 
+#' \item{TxIdFilter}{
+#'     filter based on the Ensembld transcript ID.
+#' }
+#' 
+#' \item{TxNameFilter}{
+#'     filter based on the Ensembld transcript ID; no transcript names are
+#'     provided in \code{\linkS4class{EnsDb}} databases.
+#' }
+#' 
+#' \item{TxBiotypeFilter}{
+#'     filter based on the transcripts' biotype.
+#' }
+#' 
+#' \item{TxStartFilter}{
+#'     filter based on the genomic start coordinate of the transcripts.
+#' }
+#' 
+#' \item{TxEndFilter}{
+#'     filter based on the genonic end coordinates of the transcripts.
+#' }
+#' 
+#' \item{ExonIdFilter}{
+#'     filter based on Ensembl exon IDs.
+#' }
+#'
+#' \item{ExonRankFilter}{
+#'     filter based on the index/rank of the exon within the transcrips.
+#' }
+#'
+#' \item{ExonStartFilter}{
+#'     filter based on the genomic start coordinates of the exons.
+#' }
+#' 
+#' \item{ExonEndFilter}{
+#'     filter based on the genomic end coordinates of the exons.
+#' }
+#'
+#' \item{GRangesFilter}{
+#'     Allows to fetch features within or overlapping specified genomic region(s)/
+#'     range(s). This filter takes a \code{\link[GenomicRanges]{GRanges}} object
+#'     as input and, if \code{type = "any"} (the default) will restrict
+#'     results to features (genes, transcripts or exons) that are partially
+#'     overlapping the region. Alternatively, by specifying
+#'     \code{condition = "within"} it will return features located within the
+#'     range. In addition, the \code{\link[AnnotationFilter]{GRangesFilter}}
+#'     supports \code{condition = "start"}, \code{condition = "end"} and
+#'     \code{condition = "equal"} filtering for features with the same start or
+#'     end coordinate or that are equal to the \code{GRanges}.
+#'
+#'     Note that the type of feature on which the filter is applied depends on
+#'     the method that is called, i.e. \code{\link{genes}} will filter on the
+#'     genomic coordinates of genes, \code{\link{transcripts}} on those of
+#'     transcripts and \code{\link{exons}} on exon coordinates.
+#'
+#'     Calls to the methods \code{\link{exonsBy}}, \code{\link{cdsBy}} and
+#'     \code{\link{transcriptsBy}} use the start and end coordinates of the
+#'     feature type specified with argument \code{by} (i.e. \code{"gene"},
+#'     \code{"transcript"} or \code{"exon"}) for the filtering.
+#'
+#'     If the specified \code{GRanges} object defines multiple regions, all
+#'     features within (or overlapping) any of these regions are returned.
+#'
+#'     Chromosome names/seqnames can be provided in UCSC format (e.g.
+#'     \code{"chrX"}) or Ensembl format (e.g. \code{"X"}); see
+#'     \code{\link{seqlevelsStyle}} for more information. 
+#' }
+#'
+#' \item{SeqNameFilter}{
+#'     filter based on chromosome names.
+#' }
+#'
+#' \item{SeqStrandFilter}{
+#'     filter based on the chromosome strand. The strand can be specified with
+#'     \code{value = "+"}, \code{value = "-"}, \code{value = -1} or
+#'     \code{value = 1}.
+#' }
+#' 
+#' \item{ProteinIdFilter}{
+#'     filter based on Ensembl protein IDs. This filter is only supported if the
+#'     \code{\linkS4class{EnsDb}} provides protein annotations; use the
+#'     \code{\link{hasProteinData}} method to evaluate.
+#' }
+#'
+#' \item{UniprotFilter}{
+#'     filter based on Uniprot IDs. This filter is only supported if the
+#'     \code{\linkS4class{EnsDb}} provides protein annotations; use the
+#'     \code{\link{hasProteinData}} method to evaluate.
+#' }
+#'
+#' }
+#'
+#' In addition, the following filters are defined by \code{ensembldb}:
+#' \describe{
+#' 
+#' \item{UniprotDbFilter}{
+#'     allows to filter results based on the specified Uniprot database name(s).
+#' }
+#' 
+#' \item{UniprotMappingTypeFilter}{
+#'     allows to filter results based on the mapping method/type that was used
+#'     to assign Uniprot IDs to Ensembl protein IDs.
+#' }
+#'
+#' \item{ProtDomIdFilter}{
+#'     allows to retrieve entries from the database matching the provided filter
+#'     criteria based on their protein  domain ID (\emph{protein_domain_id}).
+#' }
+#'
+#' \item{OnlyCodingTxFilter}{
+#'     allows to retrieve entries only for protein coding transcripts, i.e.
+#'     transcripts with a CDS. This filter does not take any input arguments.
+#' }
+#' 
+#' }
+#'
+#' @param condition \code{character(1)} specifying the \emph{condition} of the
+#'     filter. For \code{character}-based filters (such as
+#'     \code{\link[AnnotationFilter]{GeneIdFilter}}) \code{"=="}, \code{"!="},
+#'     \code{"startsWith"} and \code{"endsWith"} are supported. Allowed values
+#'     for \code{integer}-based filters (such as
+#'     \code{\link[AnnotationFilter]{GeneStartFilter}}) are \code{"=="},
+#'     \code{"!="}, \code{"<"}. \code{"<="}, \code{">"} and \code{">="}.
+#' 
+#' @param value The value(s) for the filter. For
+#'     \code{\link[AnnotationFilter]{GRangesFilter}} it has to be a
+#'     \code{\link[GenomicRanges]{GRanges}} object.
+#' 
+#' @note Protein annotation based filters can only be used if the
+#'     \code{\linkS4class{EnsDb}} database contains protein annotations, i.e.
+#'     if \code{\link{hasProteinData}} is \code{TRUE}. Also, only protein coding
+#'     transcripts will have protein annotations available, thus, non-coding
+#'     transcripts/genes will not be returned by the queries using protein
+#'     annotation filters.
+#' 
+#' @name Filter-classes
+#' @seealso
+#' \code{\link{supportedFilters}} to list all filters supported for \code{EnsDb}
+#'     objects.
+#'     \code{\link{listUniprotDbs}} and \code{\link{listUniprotMappingTypes}} to
+#'     list all Uniprot database names respectively mapping method types from
+#'     the database.
+#'
+#'     \code{\link[AnnotationFilter]{GeneIdFilter}} for more details on the
+#'     filter objects.
+#'
+#'     \code{\link{genes}}, \code{\link{transcripts}}, \code{\link{exons}},
+#'     \code{\link{listGenebiotypes}}, \code{\link{listTxbiotypes}}.
+#' 
+#' @author Johannes Rainer
+#' @examples
+#'
+#' ## Create a filter that could be used to retrieve all informations for
+#' ## the respective gene.
+#' gif <- GeneIdFilter("ENSG00000012817")
+#' gif
+#' 
+#' ## Create a filter for a chromosomal end position of a gene
+#' sef <- GeneEndFilter(10000, condition = ">")
+#' sef
+#' 
+#' ## For additional examples see the help page of "genes".
+#' 
+#' 
+#' ## Example for GRangesFilter:
+#' ## retrieve all genes overlapping the specified region
+#' grf <- GRangesFilter(GRanges("11", ranges = IRanges(114000000, 114000050),
+#'                              strand = "+"), type = "any")
+#' library(EnsDb.Hsapiens.v75)
+#' edb <- EnsDb.Hsapiens.v75
+#' genes(edb, filter = grf)
+#' 
+#' ## Get also all transcripts overlapping that region.
+#' transcripts(edb, filter = grf)
+#' 
+#' ## Retrieve all transcripts for the above gene
+#' gn <- genes(edb, filter = grf)
+#' txs <- transcripts(edb, filter = GenenameFilter(gn$gene_name))
+#' ## Next we simply plot their start and end coordinates.
+#' plot(3, 3, pch=NA, xlim=c(start(gn), end(gn)), ylim=c(0, length(txs)),
+#' yaxt="n", ylab="")
+#' ## Highlight the GRangesFilter region
+#' rect(xleft=start(grf), xright=end(grf), ybottom=0, ytop=length(txs),
+#' col="red", border="red")
+#' for(i in 1:length(txs)){
+#'     current <- txs[i]
+#'     rect(xleft=start(current), xright=end(current), ybottom=i-0.975, ytop=i-0.125, border="grey")
+#'     text(start(current), y=i-0.5,pos=4, cex=0.75, labels=current$tx_id)
+#' }
+#' ## Thus, we can see that only 4 transcripts of that gene are indeed
+#' ## overlapping the region.
+#' 
+#' 
+#' ## No exon is overlapping that region, thus we're not getting anything
+#' exons(edb, filter = grf)
+#' 
+#' 
+#' ## Example for ExonRankFilter
+#' ## Extract all exons 1 and (if present) 2 for all genes encoded on the
+#' ## Y chromosome
+#' exons(edb, columns = c("tx_id", "exon_idx"),
+#'       filter=list(SeqNameFilter("Y"),
+#'                   ExonRankFilter(3, condition = "<")))
+#' 
+#' 
+#' ## Get all transcripts for the gene SKA2
+#' transcripts(edb, filter = GenenameFilter("SKA2"))
+#' 
+#' ## Which is the same as using a SymbolFilter
+#' transcripts(edb, filter = SymbolFilter("SKA2"))
+#' 
+#' 
+#' ## Create a ProteinIdFilter:
+#' pf <- ProteinIdFilter("ENSP00000362111")
+#' pf
+#' ## Using this filter would retrieve all database entries that are associated
+#' ## with a protein with the ID "ENSP00000362111"
+#' if (hasProteinData(edb)) {
+#'     res <- genes(edb, filter = pf)
+#'     res
+#' }
+#'
+#' ## UniprotFilter:
+#' uf <- UniprotFilter("O60762")
+#' ## Get the transcripts encoding that protein:
+#' if (hasProteinData(edb)) {
+#'     transcripts(edb, filter = uf)
+#'     ## The mapping Ensembl protein ID to Uniprot ID can however be 1:n:
+#'     transcripts(edb, filter = TxIdFilter("ENST00000371588"),
+#'         columns = c("protein_id", "uniprot_id"))
+#' }
+#'
+#' ## ProtDomIdFilter:
+#' pdf <- ProtDomIdFilter("PF00335")
+#' ## Also here we could get all transcripts related to that protein domain
+#' if (hasProteinData(edb)) {
+#'     transcripts(edb, filter = pdf, columns = "protein_id")
+#' }
+#'
+NULL
 
-##***********************************************************************
-##
-##     BasicFilter classes
-##
-##     Allow to filter the results fetched from the database.
-##
-##     gene:
-##     - GeneidFilter
-##     - GenebiotypeFilter
-##     - GenenameFilter
-##     - EntrezidFilter
-##
-##     transcript:
-##     - TxidFilter
-##     - TxbiotypeFilter
-##
-##     exon:
-##     - ExonidFilter
-##
-##     chrom position (using info from exon):
-##     - SeqnameFilter
-##     - SeqstartFilter
-##     - SeqendFilter
-##     - SeqstrandFilter
-##     alternative: GRangesFilter. See below.
+############################################################
+## OnlyCodingTxFilter
 ##
-##***********************************************************************
-setClass("BasicFilter",
-         representation(
-             "VIRTUAL",
-             condition="character",
-             value="character",
-             .valueIsCharacter="logical"
-            ),
-         prototype=list(
-             condition="=",
-             value="",
-             .valueIsCharacter=TRUE
-            )
-        )
-
-## Table gene
-## filter for gene_id
-setClass("GeneidFilter", contains="BasicFilter",
-         prototype=list(
-             condition="=",
-             value="",
-             .valueIsCharacter=TRUE
-            )
-        )
-GeneidFilter <- function(value, condition="="){
-    if(missing(value)){
-        stop("A filter without a value makes no sense!")
-    }
-    if(length(value) > 1){
-        if(condition=="=")
-            condition="in"
-        if(condition=="!=")
-            condition="not in"
-    }
-    return(new("GeneidFilter", condition=condition, value=as.character(value)))
-}
-## filter for gene_biotype
-setClass("GenebiotypeFilter", contains="BasicFilter",
-         prototype=list(
-             condition="=",
-             value="",
-             .valueIsCharacter=TRUE
-            )
-        )
-GenebiotypeFilter <- function(value, condition="="){
-    if(missing(value)){
-        stop("A filter without a value makes no sense!")
-    }
-    if(length(value) > 1){
-        if(condition=="=")
-            condition="in"
-        if(condition=="!=")
-            condition="not in"
-    }
-    return(new("GenebiotypeFilter", condition=condition, value=as.character(value)))
-}
-## filter for gene_name
-setClass("GenenameFilter", contains="BasicFilter",
-         prototype=list(
-             condition="=",
-             value="",
-             .valueIsCharacter=TRUE
-            )
-        )
-GenenameFilter <- function(value, condition="="){
-    if(missing(value)){
-        stop("A filter without a value makes no sense!")
-    }
-    if(length(value) > 1){
-        if(condition=="=")
-            condition="in"
-        if(condition=="!=")
-            condition="not in"
-    }
-    return(new("GenenameFilter", condition=condition, value=as.character(value)))
-}
-## filter for entrezid
-setClass("EntrezidFilter", contains="BasicFilter",
-         prototype=list(
-             condition="=",
-             value="",
-             .valueIsCharacter=TRUE
-            )
-        )
-EntrezidFilter <- function(value, condition="="){
-    if(missing(value)){
-        stop("A filter without a value makes no sense!")
-    }
-    if(length(value) > 1){
-        if(condition=="=")
-            condition="in"
-        if(condition=="!=")
-            condition="not in"
-    }
-    return(new("EntrezidFilter", condition=condition, value=as.character(value)))
-}
-
-
-## Table transcript
-## filter for tx_id
-setClass("TxidFilter", contains="BasicFilter",
-         prototype=list(
-             condition="=",
-             value="",
-             .valueIsCharacter=TRUE
-            )
-        )
-TxidFilter <- function(value, condition="="){
-    if(missing(value)){
-        stop("A filter without a value makes no sense!")
-    }
-    if(length(value) > 1){
-        if(condition=="=")
-            condition="in"
-        if(condition=="!=")
-            condition="not in"
-    }
-    return(new("TxidFilter", condition=condition, value=as.character(value)))
-}
-## filter for gene_biotype
-setClass("TxbiotypeFilter", contains="BasicFilter",
-         prototype=list(
-             condition="=",
-             value="",
-             .valueIsCharacter=TRUE
-            )
-        )
-TxbiotypeFilter <- function(value, condition="="){
-    if(missing(value)){
-        stop("A filter without a value makes no sense!")
-    }
-    if(length(value) > 1){
-        if(condition=="=")
-            condition="in"
-        if(condition=="!=")
-            condition="not in"
-    }
-    return(new("TxbiotypeFilter", condition=condition, value=as.character(value)))
-}
-
-## Table exon
-## filter for exon_id
-setClass("ExonidFilter", contains="BasicFilter",
-         prototype=list(
-             condition="=",
-             value="",
-             .valueIsCharacter=TRUE
-            )
-        )
-ExonidFilter <- function(value, condition="="){
-    if(missing(value)){
-        stop("A filter without a value makes no sense!")
-    }
-    if(length(value) > 1){
-        if(condition=="=")
-            condition="in"
-        if(condition=="!=")
-            condition="not in"
-    }
-    return(new("ExonidFilter", condition=condition, value=as.character(value)))
-}
-
-## Table tx2exon
-## filter for exon_idx
-setClass("ExonrankFilter", contains="BasicFilter",
-         prototype=list(
-             condition="=",
-             value="",
-             .valueIsCharacter=FALSE
-            )
-        )
-ExonrankFilter <- function(value, condition="="){
-    if(missing(value)){
-        stop("A filter without a value makes no sense!")
-    }
-    if(any(is.na(as.numeric(value))))
-        stop("Argument 'value' has to be numeric!")
-    if(length(value) > 1){
-        if(condition=="=")
-            condition="in"
-        if(condition=="!=")
-            condition="not in"
-    }
-    return(new("ExonrankFilter", condition=condition, value=as.character(value)))
-}
-
-
-## chromosome positions
-## basic chromosome/seqname filter.
-setClass("SeqnameFilter", contains="BasicFilter",
-         prototype=list(
-             condition="=",
-             value="",
-             .valueIsCharacter=TRUE
-            )
-        )
-## builder...
-SeqnameFilter <- function(value, condition="="){
-    if(missing(value)){
-        stop("A filter without a value makes no sense!")
-    }
-    if(length(value) > 1){
-        if(condition=="=")
-            condition="in"
-        if(condition=="!=")
-            condition="not in"
-    }
-    return(new("SeqnameFilter", condition=condition, value=as.character(value)))
-}
-
-## basic chromosome strand filter.
-setClass("SeqstrandFilter", contains="BasicFilter",
-         prototype=list(
-             condition="=",
-             value="",
-             .valueIsCharacter=FALSE
-            )
-        )
-## builder...
-SeqstrandFilter <- function(value, condition="="){
-    if(missing(value)){
-        stop("A filter without a value makes no sense!")
-    }
-    ## checking value: should be +, -, will however be translated to -1, 1
-    if(class(value)=="character"){
-        value <- match.arg(value, c("1", "-1", "+1", "-", "+"))
-        if(value=="-")
-            value <- "-1"
-        if(value=="+")
-            value <- "+1"
-        ## OK, now transforming to number
-        value <- as.numeric(value)
-    }
-    if(!(value==1 | value==-1))
-        stop("The strand has to be either 1 or -1 (or \"+\" or \"-\")")
-    return(new("SeqstrandFilter", condition=condition, value=as.character(value)))
-}
-
-## chromstart filter
-setClass("SeqstartFilter", contains="BasicFilter",
-         representation(
-             feature="character"
-            ),
-         prototype=list(
-             condition=">",
-             value="",
-             .valueIsCharacter=FALSE,
-             feature="gene"
-            )
-        )
-SeqstartFilter <- function(value, condition="=", feature="gene"){
-    if(missing(value)){
-        stop("A filter without a value makes no sense!")
-    }
-    if(length(value) > 1){
-        value <- value[ 1 ]
-        warning("Multiple values provided, but only the first (", value,") will be considered")
-    }
-    return(new("SeqstartFilter", condition=condition, value=as.character(value),
-                feature=feature))
-}
-
-## chromend filter
-setClass("SeqendFilter", contains="BasicFilter",
-         representation(
-             feature="character"
-            ),
-         prototype=list(
-             condition="<",
-             value="",
-             .valueIsCharacter=FALSE,
-             feature="gene"
-            )
-        )
-SeqendFilter <- function(value, condition="=", feature="gene"){
-    if(missing(value)){
-        stop("A filter without a value makes no sense!")
-    }
-    if(length(value) > 1){
-        value <- value[ 1 ]
-        warning("Multiple values provided, but only the first (", value,") will be considered")
-    }
-    return(new("SeqendFilter", condition=condition, value=as.character(value),
-                feature=feature))
+## That's a special case filter that just returns transcripts
+## that have tx_cds_seq_start defined (i.e. not NULL).
+#' @rdname Filter-classes
+setClass("OnlyCodingTxFilter", contains = "CharacterFilter",
+         prototype = list(
+             condition = "==",
+             value = character(),
+             field = "empty"
+         ))
+#' @rdname Filter-classes
+OnlyCodingTxFilter <- function() {
+    new("OnlyCodingTxFilter")
 }
 
-
-###============================================================
-##  GRangesFilter
-##  adding new arguments since we can not overwrite the data type
-##  of the BasicFilter class... unfortunately.
-##  + grange <- value
-##  + location <- condition
-###------------------------------------------------------------
-setClass("GRangesFilter", contains="BasicFilter",
-         representation(grange="GRanges",
-                        feature="character",
-                        location="character"),
-         prototype=list(
-             grange=GRanges(),
-             .valueIsCharacter=FALSE,
-             condition="=",
-             location="within",
-             feature="gene",
-             value=""
+############################################################
+## ProtDomIdFilter
+#' @rdname Filter-classes
+setClass("ProtDomIdFilter", contains = "CharacterFilter",
+         prototype = list(
+             condition = "==",
+             value = "",
+             field = "prot_dom_id"
          ))
-## Constructor
-GRangesFilter <- function(value, condition="within", feature="gene"){
-    if(missing(value))
-        stop("No value provided for the filter!")
-    if(!is(value, "GRanges"))
-        stop("'value' has to be a GRanges object!")
-    if(length(value) == 0)
-        stop("No value provided for the filter!")
-    ## if(length(value) > 1){
-    ##     warning(paste0("GRanges in 'value' has length ", length(value),
-    ##                    "! Using only the first element!"))
-    ##     value <- value[1]
-    ## }
-    grf <- new("GRangesFilter", grange=value, location=condition,
-               feature=feature)
-    ##validObject(grf)
-    return(grf)
+#' @return For \code{ProtDomIdFilter}: A \code{ProtDomIdFilter} object.
+#' @rdname Filter-classes
+ProtDomIdFilter <- function(value, condition = "==") {
+    new("ProtDomIdFilter", condition = condition,
+        value = as.character(value))
 }
-###------------------------------------------------------------
-
 
-###============================================================
-##  SymbolFilter
-###------------------------------------------------------------
-setClass("SymbolFilter", contains = "BasicFilter",
+############################################################
+## UniprotDbFilter
+#' @rdname Filter-classes
+setClass("UniprotDbFilter", contains = "CharacterFilter",
          prototype = list(
-             condition = "=",
-             value = "",
-             .valueIsCharacter = TRUE
-         )
-         )
-SymbolFilter <- function(value, condition = "=") {
-    if(missing(value)){
-        stop("A filter without a value makes no sense!")
-    }
-    if(length(value) > 1) {
-        if(condition == "=")
-            condition = "in"
-        if(condition == "!=")
-            condition = "not in"
-    }
-    return(new("SymbolFilter", condition = condition,
-               value = as.character(value)))
+             condition = "==",
+             values = "",
+             field = "uniprot_db"
+         ))
+#' @return For \code{UniprotDbFilter}: A \code{UniprotDbFilter} object.
+#' @rdname Filter-classes
+UniprotDbFilter <- function(value, condition = "==") {
+    new("UniprotDbFilter", condition = condition,
+        value = as.character(value))
 }
 
 ############################################################
-## OnlyCodingTx
-##
-## That's a special case filter that just returns transcripts
-## that have tx_cds_seq_start defined (i.e. not NULL).
-setClass("OnlyCodingTx", contains = "BasicFilter",
+## UniprotMappingTypeFilter
+#' @rdname Filter-classes
+setClass("UniprotMappingTypeFilter", contains = "CharacterFilter",
          prototype = list(
-             condition = "=",
-             value = "",
-             .valueIsCharacter = TRUE
+             condition = "==",
+             values = "",
+             field = "uniprot_mapping_type"
          ))
-OnlyCodingTx <- function() {
-    return(new("OnlyCodingTx"))
+#' @return For \code{UniprotMappingTypeFilter}: A
+#' \code{UniprotMappingTypeFilter} object.
+#' @rdname Filter-classes
+UniprotMappingTypeFilter <- function(value, condition = "==") {
+    new("UniprotMappingTypeFilter", condition = condition,
+        value = as.character(value))
 }
+
diff --git a/R/Deprecated.R b/R/Deprecated.R
new file mode 100644
index 0000000..5b5456f
--- /dev/null
+++ b/R/Deprecated.R
@@ -0,0 +1,182 @@
+## Deprecated functions.
+
+#' @aliases ensembldb-deprecated
+#' 
+#' @title Deprecated functionality
+#'
+#' @description All functions, methods and classes listed on this page are
+#' deprecated and might be removed in future releases.
+#'
+#' @param value The value for the filter.
+#' @param condition The condition for the filter.
+#' 
+#' @name Deprecated 
+NULL
+#> NULL
+
+#' @description \code{GeneidFilter} creates a \code{GeneIdFilter}. Use
+#' \code{\link[AnnotationFilter]{GeneIdFilter}} instead.
+#' 
+#' @rdname Deprecated
+GeneidFilter <- function(value, condition = "==") {
+    .Deprecated("GeneIdFilter")
+    if (missing(value))
+        stop("A filter without a value makes no sense!")
+    ## if(length(value) > 1){
+    ##     if(condition=="=")
+    ##         condition="in"
+    ##     if(condition=="!=")
+    ##         condition="not in"
+    ## }
+    return(new("GeneIdFilter", condition = condition,
+               value = as.character(value), field = "gene_id"))
+}
+
+#' @description \code{GenebiotypeFilter} creates a \code{GeneBiotypeFilter}. Use
+#' \code{\link[AnnotationFilter]{GeneBiotypeFilter}} instead.
+#' 
+#' @rdname Deprecated
+GenebiotypeFilter <- function(value, condition = "=="){
+    .Deprecated("GeneBiotypeFilter")
+    if(missing(value))
+        stop("A filter without a value makes no sense!")
+    return(new("GeneBiotypeFilter", condition = condition,
+               value = as.character(value), field = "gene_biotype"))
+}
+
+#' @description \code{EntrezidFilter} creates a \code{EntrezFilter}. Use
+#' \code{\link[AnnotationFilter]{EntrezFilter}} instead.
+#' 
+#' @rdname Deprecated
+EntrezidFilter <- function(value, condition = "=="){
+    .Deprecated("EntrezFilter")
+    if(missing(value))
+        stop("A filter without a value makes no sense!")
+    return(new("EntrezFilter", condition = condition,
+               value = as.character(value), field = "entrez"))
+}
+
+#' @description \code{TxidFilter} creates a \code{TxIdFilter}. Use
+#' \code{\link[AnnotationFilter]{TxIdFilter}} instead.
+#' 
+#' @rdname Deprecated
+TxidFilter <- function(value, condition = "=="){
+    .Deprecated("TxIdFilter")
+    if(missing(value))
+        stop("A filter without a value makes no sense!")
+    return(new("TxIdFilter", condition = condition,
+               value = as.character(value), field = "tx_id"))
+}
+
+#' @description \code{TxbiotypeFilter} creates a \code{TxBiotypeFilter}. Use
+#' \code{\link[AnnotationFilter]{TxBiotypeFilter}} instead.
+#' 
+#' @rdname Deprecated
+TxbiotypeFilter <- function(value, condition="=="){
+    .Deprecated("TxBiotypeFilter")
+    if(missing(value))
+        stop("A filter without a value makes no sense!")
+    return(new("TxBiotypeFilter", condition=condition,
+               value=as.character(value), field = "tx_biotype"))
+}
+
+#' @description \code{ExonidFilter} creates a \code{ExonIdFilter}. Use
+#' \code{\link[AnnotationFilter]{ExonIdFilter}} instead.
+#' 
+#' @rdname Deprecated
+ExonidFilter <- function(value, condition="=="){
+    .Deprecated("ExonIdFilter")
+    if(missing(value))
+        stop("A filter without a value makes no sense!")
+    return(new("ExonIdFilter", condition=condition,
+               value=as.character(value), field = "exon_id"))
+}
+
+#' @description \code{ExonrankFilter} creates a \code{ExonRankFilter}. Use
+#' \code{\link[AnnotationFilter]{ExonRankFilter}} instead.
+#' 
+#' @rdname Deprecated
+ExonrankFilter <- function(value, condition="=="){
+    .Deprecated("ExonRankFilter")
+    if(missing(value))
+        stop("A filter without a value makes no sense!")
+    return(new("ExonRankFilter", condition=condition, value=as.integer(value),
+               field = "exon_rank"))
+}
+
+#' @description \code{SeqNameFilter} creates a \code{SeqNameFilter}. Use
+#' \code{\link[AnnotationFilter]{SeqNameFilter}} instead.
+#' 
+#' @rdname Deprecated
+SeqnameFilter <- function(value, condition="=="){
+    .Deprecated("SeqNameFilter")
+    if(missing(value))
+        stop("A filter without a value makes no sense!")
+    return(new("SeqNameFilter", condition=condition, value=as.character(value),
+               field = "seq_name"))
+}
+
+#' @description \code{SeqstrandFilter} creates a \code{SeqStrandFilter}. Use
+#' \code{\link[AnnotationFilter]{SeqStrandFilter}} instead.
+#' 
+#' @rdname Deprecated
+SeqstrandFilter <- function(value, condition="=="){
+    .Deprecated("SeqStrandFilter")
+    if(missing(value)){
+        stop("A filter without a value makes no sense!")
+    }
+    ## checking value: should be +, -, will however be translated to -1, 1
+    if(class(value)=="character"){
+        value <- match.arg(value, c("1", "-1", "+1", "-", "+"))
+        if(value=="-")
+            value <- "-1"
+        if(value=="+")
+            value <- "+1"
+        ## OK, now transforming to number
+        value <- as.numeric(value)
+    }
+    if(!(value==1 | value==-1))
+        stop("The strand has to be either 1 or -1 (or \"+\" or \"-\")")
+    return(new("SeqStrandFilter", condition=condition, value=as.character(value),
+               field = "seq_strand"))
+}
+
+#' @description \code{SeqstartFilter} creates a \code{GeneStartFilter},
+#' \code{TxStartFilter} or \code{ExonStartFilter} depending on the value of the
+#' parameter \code{feature}. Use \code{\link[AnnotationFilter]{GeneStartFilter}},
+#' \code{\link[AnnotationFilter]{TxStartFilter}} and
+#' \code{\link[AnnotationFilter]{ExonStartFilter}} instead.
+#'
+#' @param feature For \code{SeqstartFilter} and \code{SeqendFilter}: on what type
+#' of feature should the filter be applied? Supported are \code{"gene"},
+#' \code{"tx"} and \code{"exon"}.
+#' 
+#' @rdname Deprecated
+SeqstartFilter <- function(value, condition=">", feature="gene"){
+    .Deprecated(msg = paste0("The use of 'SeqstartFilter' is deprecated. Use ",
+                             "one of 'GeneStartFilter', 'ExonStartFilter'",
+                             " or 'TxStartFilter' instead."))
+    feature <- match.arg(feature, c("gene", "exon", "tx"))
+    return(new(paste0(sub("^([[:alpha:]])", "\\U\\1", feature, perl=TRUE),
+                      "StartFilter"), value = as.integer(value),
+               condition = condition,
+               field = paste0(feature, "_start")))
+}
+
+#' @description \code{SeqendFilter} creates a \code{GeneEndFilter},
+#' \code{TxEndFilter} or \code{ExonEndFilter} depending on the value of the
+#' parameter \code{feature}. Use \code{\link[AnnotationFilter]{GeneEndFilter}},
+#' \code{\link[AnnotationFilter]{TxEndFilter}} and
+#' \code{\link[AnnotationFilter]{ExonEndFilter}} instead.
+#'
+#' @rdname Deprecated
+SeqendFilter <- function(value, condition="<", feature="gene"){
+    .Deprecated(msg = paste0("The use of 'SeqendFilter' is deprecated. Use ",
+                             "one of 'GeneEndFilter', 'ExonEndFilter'",
+                             " or 'TxEndFilter' instead."))
+    feature <- match.arg(feature, c("gene", "exon", "tx"))
+    return(new(paste0(sub("^([[:alpha:]])", "\\U\\1", feature, perl=TRUE)),
+               "EndFilter"), value = as.integer(value),
+           condition = condition,
+           field = paste0(feature, "_end"))
+}
diff --git a/R/Generics.R b/R/Generics.R
index d420a43..859e1bc 100644
--- a/R/Generics.R
+++ b/R/Generics.R
@@ -3,145 +3,112 @@
 ##     Generic methods
 ##
 ##***********************************************************************
-if(!isGeneric("column"))
-    setGeneric("column", function(object, db, with.tables, ...)
-        standardGeneric("column"))
-if(!isGeneric("buildQuery"))
-    setGeneric("buildQuery", function(x, ...)
-        standardGeneric("buildQuery"))
-if(!isGeneric("cleanColumns"))
-    setGeneric("cleanColumns", function(x, columns, ...)
-        starndardGeneric("cleanColumns"))
-if(!isGeneric("condition"))
-    setGeneric("condition", function(x, ...)
-        standardGeneric("condition"))
-setGeneric("condition<-", function(x, value)
-        standardGeneric("condition<-"))
+## A
+
+## B
+setGeneric("buildQuery", function(x, ...)
+    standardGeneric("buildQuery"))
+
+## C
+setGeneric("cleanColumns", function(x, columns, ...)
+    starndardGeneric("cleanColumns"))
+
+## D
 setGeneric("dbSeqlevelsStyle", function(x, ...)
     standardGeneric("dbSeqlevelsStyle"))
 
-if(!isGeneric("genes"))
-    setGeneric("genes", function(x, ...)
-        standardGeneric("genes"))
-if(!isGeneric("getWhat"))
-    setGeneric("getWhat", function(x, ...)
-        standardGeneric("getWhat"))
-if(!isGeneric("ensemblVersion"))
-    setGeneric("ensemblVersion", function(x)
-        standardGeneric("ensemblVersion"))
-if(!isGeneric("exons"))
-    setGeneric("exons", function(x, ...)
-        standardGeneric("exons"))
-if(!isGeneric("exonsBy"))
-    setGeneric("exonsBy", function(x, ...)
-        standardGeneric("exonsBy"))
+## E
+setGeneric("ensemblVersion", function(x)
+    standardGeneric("ensemblVersion"))
+setGeneric("ensDbColumn", function(object, ...)
+    standardGeneric("ensDbColumn"))
+setGeneric("ensDbQuery", function(object, ...)
+    standardGeneric("ensDbQuery"))
 
+## F
+setGeneric("formatSeqnamesForQuery", function(x, sn, ...)
+    standardGeneric("formatSeqnamesForQuery"))
+setGeneric("formatSeqnamesFromQuery", function(x, sn, ...)
+    standardGeneric("formatSeqnamesFromQuery"))
+
+## G
 setGeneric("getGeneRegionTrackForGviz", function(x, ...)
     standardGeneric("getGeneRegionTrackForGviz"))
+setGeneric("getGenomeFaFile", function(x, ...)
+    standardGeneric("getGenomeFaFile"))
+setGeneric("getGenomeTwoBitFile", function(x, ...)
+    standardGeneric("getGenomeTwoBitFile"))
+setGeneric("getMetadataValue", function(x, name)
+    standardGeneric("getMetadataValue"))
+setGeneric("getProperty", function(x, name=NULL, ...)
+    standardGeneric("getProperty"))
+setGeneric("getWhat", function(x, ...)
+    standardGeneric("getWhat"))
 
-if(!isGeneric("getGenomeFaFile"))
-    setGeneric("getGenomeFaFile", function(x, ...)
-        standardGeneric("getGenomeFaFile"))
-if(!isGeneric("getGenomeTwoBitFile"))
-    setGeneric("getGenomeTwoBitFile", function(x, ...)
-        standardGeneric("getGenomeTwoBitFile"))
-if(!isGeneric("getMetadataValue"))
-    setGeneric("getMetadataValue", function(x, name)
-        standardGeneric("getMetadataValue"))
-if(!isGeneric("listColumns")){
-    setGeneric("listColumns", function(x, ...)
-        standardGeneric("listColumns"))
-}
-if(!isGeneric("listGenebiotypes")){
-    setGeneric("listGenebiotypes", function(x, ...)
-        standardGeneric("listGenebiotypes"))
-}
-if(!isGeneric("listTxbiotypes")){
-    setGeneric("listTxbiotypes", function(x, ...)
-        standardGeneric("listTxbiotypes"))
-}
-if(!isGeneric("lengthOf"))
-    setGeneric("lengthOf", function(x, ...)
-        standardGeneric("lengthOf"))
-if(!isGeneric("print"))
-    setGeneric("print", function(x, ...)
-        standardGeneric("print"))
-if(!isGeneric("requireTable"))
-    setGeneric("requireTable", function(x, db, ...)
-        standardGeneric("requireTable"))
+## H
+setGeneric("hasProteinData", function(x)
+    standardGeneric("hasProteinData"))
 
-setGeneric("supportedSeqlevelsStyles", function(x)
-           standardGeneric("supportedSeqlevelsStyles"))
+## L
+setGeneric("lengthOf", function(x, ...)
+    standardGeneric("lengthOf"))
+setGeneric("listColumns", function(x, ...)
+    standardGeneric("listColumns"))
+setGeneric("listGenebiotypes", function(x, ...)
+    standardGeneric("listGenebiotypes"))
+setGeneric("listTables", function(x, ...)
+    standardGeneric("listTables"))
+setGeneric("listTxbiotypes", function(x, ...)
+    standardGeneric("listTxbiotypes"))
+setGeneric("listUniprotDbs", function(object, ...)
+    standardGeneric("listUniprotDbs"))
+setGeneric("listUniprotMappingTypes", function(object, ...)
+    standardGeneric("listUniprotMappingTypes"))
+
+## O
+setGeneric("orderResultsInR", function(x)
+    standardGeneric("orderResultsInR"))
+setGeneric("orderResultsInR<-", function(x, value)
+    standardGeneric("orderResultsInR<-"))
 
-if(!isGeneric("seqinfo"))
-    setGeneric("seqinfo", function(x)
-        standardGeneric("seqinfo"))
-if(!isGeneric("show"))
-    setGeneric("show", function(object, ...)
-        standardGeneric("show"))
-if(!isGeneric("toSAF"))
-    setGeneric("toSAF", function(x, ...)
-        standardGeneric("toSAF"))
-if(!isGeneric("listTables")){
-    setGeneric("listTables", function(x, ...)
-        standardGeneric("listTables"))
-}
+## P
+setGeneric("print", function(x, ...)
+    standardGeneric("print"))
+setGeneric("properties", function(x, ...)
+    standardGeneric("properties"))
 
+## R
+setGeneric("requireTable", function(x, db, ...)
+    standardGeneric("requireTable"))
 setGeneric("returnFilterColumns", function(x)
     standardGeneric("returnFilterColumns"))
 setGeneric("returnFilterColumns<-", function(x, value)
     standardGeneric("returnFilterColumns<-"))
 
-if(!isGeneric("tablesByDegree")){
-    setGeneric("tablesByDegree", function(x, ...)
-        standardGeneric("tablesByDegree"))
-}
-if(!isGeneric("tablesForColumns"))
-    setGeneric("tablesForColumns", function(x, attributes, ...)
-        standardGeneric("tablesForColumns"))
+## S
+setGeneric("seqinfo", function(x)
+    standardGeneric("seqinfo"))
+setGeneric("setProperty", function(x, value=NULL, ...)
+    standardGeneric("setProperty"))
+## setGeneric("show", function(object, ...)
+##     standardGeneric("show"))
+setGeneric("supportedSeqlevelsStyles", function(x)
+    standardGeneric("supportedSeqlevelsStyles"))
 
-if(!isGeneric("transcriptLengths"))
-    setGeneric("transcriptLengths", function(x, with.cds_len=FALSE,
-                                             with.utr5_len=FALSE,
-                                             with.utr3_len=FALSE, ...)
-        standardGeneric("transcriptLengths"))
+## T
+setGeneric("tablesByDegree", function(x, ...)
+    standardGeneric("tablesByDegree"))
+setGeneric("tablesForColumns", function(x, attributes, ...)
+    standardGeneric("tablesForColumns"))
+setGeneric("toSAF", function(x, ...)
+    standardGeneric("toSAF"))
+## setGeneric("transcriptLengths", function(x, with.cds_len=FALSE,
+##                                          with.utr5_len=FALSE,
+##                                          with.utr3_len=FALSE, ...)
+##     standardGeneric("transcriptLengths"))
 
-if(!isGeneric("transcripts"))
-    setGeneric("transcripts", function(x, ...)
-        standardGeneric("transcripts"))
-if(!isGeneric("transcriptsBy"))
-    setGeneric("transcriptsBy", function(x, ...)
-        standardGeneric("transcriptsBy"))
+## U
 setGeneric("updateEnsDb", function(x, ...)
     standardGeneric("updateEnsDb"))
-##if(!isGeneric("value"))
-    setGeneric("value", function(x, db, ...)
-        standardGeneric("value"))
-setGeneric("value<-", function(x, value)
-    standardGeneric("value<-"))
-if(!isGeneric("where"))
-    setGeneric("where", function(object, db, with.tables, ...)
-        standardGeneric("where"))
-
-####============================================================
-##  Private methods
-##
-####------------------------------------------------------------
-setGeneric("properties", function(x, ...)
-    standardGeneric("properties"))
-## setGeneric("properties<-", function(x, name, value, ...)
-##             standardGeneric("properties<-"))
-setGeneric("getProperty", function(x, name=NULL, ...)
-    standardGeneric("getProperty"))
-setGeneric("setProperty", function(x, value=NULL, ...)
-    standardGeneric("setProperty"))
-setGeneric("formatSeqnamesForQuery", function(x, sn, ...)
-    standardGeneric("formatSeqnamesForQuery"))
-setGeneric("formatSeqnamesFromQuery", function(x, sn, ...)
-    standardGeneric("formatSeqnamesFromQuery"))
-setGeneric("orderResultsInR", function(x)
-           standardGeneric("orderResultsInR"))
-setGeneric("orderResultsInR<-", function(x, value)
-           standardGeneric("orderResultsInR<-"))
 setGeneric("useMySQL", function(x, host = "localhost", port = 3306, user, pass)
-           standardGeneric("useMySQL"))
+    standardGeneric("useMySQL"))
diff --git a/R/Methods-Filter.R b/R/Methods-Filter.R
index ebae8c0..edb08b2 100644
--- a/R/Methods-Filter.R
+++ b/R/Methods-Filter.R
@@ -1,933 +1,292 @@
-##***********************************************************************
-##
-##     Methods for BasicFilter classes.
-##
-##***********************************************************************
-validateConditionFilter <- function(object){
-    if(object at .valueIsCharacter){
-        ## condition has to be either = or in
-        if(!any(c("=", "in", "not in", "like", "!=")==object at condition)){
-            return(paste("only \"=\", \"!=\", \"in\" , \"not in\" and \"like\"",
-                         "allowed for condition",
-                         ", I've got", object at condition))
-        }
-    }else{
-        ## condition has to be = < > >= <=
-        if(!any(c("=", ">", "<", ">=", "<=", "in", "not in")==object at condition)){
-            return(paste("only \"=\", \">\", \"<\", \">=\", \"<=\" , \"in\" and \"not in\"",
-                         " are allowed for condition, I've got", object at condition))
-        }
-    }
-    if(length(object at value) > 1){
-        if(any(!object at condition %in% c("in", "not in")))
-            return(paste("only \"in\" and \"not in\" are allowed if value",
-                         "is a vector with more than one value!"))
-    }
-    if(!object at .valueIsCharacter){
-        vals <- object at value
-        if(length(vals) == 1){
-            if(vals == ""){
-                vals <- "0"
-            }
-        }
-        ## value has to be numeric!!!
-        suppressWarnings(
-            if(any(is.na(is.numeric(vals))))
-                return(paste("value has to be numeric!!!"))
-        )
-    }
-    return(TRUE)
-}
-setValidity("BasicFilter", validateConditionFilter)
-setMethod("initialize", "BasicFilter", function(.Object, ...){
-    OK <- validateConditionFilter(.Object)
-    if(class(OK)=="character"){
-        stop(OK)
-    }
-    callNextMethod(.Object, ...)
-})
-
-.where <- function(object, db=NULL){
-    if(is.null(db)){
-        Vals <- value(object)
-    }else{
-        Vals <- value(object, db)
-    }
-    ## if not a number we have to single quote!
-    if(object at .valueIsCharacter){
-        Vals <- sQuote(gsub(unique(Vals),pattern="'",replacement="''"))
-    }else{
-        Vals <- unique(Vals)
-    }
-    ## check, if there are more than one, concatenate in that case on put () aroung
-    if(length(Vals) > 1){
-        Vals <- paste0("(", paste(Vals, collapse=",") ,")")
-    }
-    return(paste(condition(object), Vals))
-}
-setMethod("where", signature(object="BasicFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-    return(.where(object))
-})
-setMethod("where", signature(object="BasicFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-    return(.where(object, db=db))
-})
-setMethod("where", signature(object="BasicFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-    return(.where(object, db=db))
-})
-setMethod("condition", "BasicFilter", function(x, ...){
-    if(length(unique(value(x))) > 1){
-        if(x at condition=="in" | x at condition=="not in")
-            return(x at condition)
-        if(x at condition=="!="){
-            return("not in")
-        }else if(x at condition=="="){
-            return("in")
-        }else{
-            stop("With more than 1 value only conditions \"=\" and \"!=\" are allowed!")
-        }
-    }else{
-        ## check first if we do have "in" or "not in" and if
-        ## cast it to a = and != respectively
-        if(x at condition=="in")
-            return("=")
-        if(x at condition=="not in")
-            return("!=")
-        return(x at condition)
-    }
-})
-setReplaceMethod("condition", "BasicFilter", function(x, value){
-    if(x at .valueIsCharacter){
-        allowed <- c("=", "!=", "in", "not in", "like")
-        if(!any(allowed == value)){
-            stop("Only ", paste(allowed, collapse=", "), " are allowed if the value from",
-                 " the filter is of type character!")
-        }
-        if(value == "=" & length(x at value) > 1)
-            value <- "in"
-        if(value == "!=" & length(x at value) > 1)
-            value <- "not in"
-        if(value == "in" & length(x at value) == 1)
-            value <- "="
-        if(value == "not in" & length(x at value) == 1)
-            value <- "!="
-    }else{
-        allowed <- c("=", ">", "<", ">=", "<=")
-        if(!any(allowed == value)){
-            stop("Only ", paste(allowed, collapse=", "), " are allowed if the value from",
-                 " the filter is numeric!")
-        }
-    }
-    x at condition <- value
-    validObject(x)
-    return(x)
-})
-setMethod("value", signature(x="BasicFilter", db="missing"),
-          function(x, db, ...){
-              return(x at value)
-          })
-setMethod("value", signature(x="BasicFilter", db="EnsDb"),
-          function(x, db, ...){
-              return(x at value)
-          })
-setReplaceMethod("value", "BasicFilter", function(x, value){
-    if(is.numeric(value)){
-        x at .valueIsCharacter <- FALSE
-    }else{
-        x at .valueIsCharacter <- TRUE
-    }
-    x at value <- as.character(value)
-    ## Checking if condition matches the value.
-    if(length(value) > 1){
-        if(x at condition == "=")
-            x at condition <- "in"
-        if(x at condition == "!=")
-            x at condition <- "not in"
-    }else{
-        if(x at condition == "in")
-            x at condition <- "="
-        if(x at condition == "not in")
-            x at condition <- "!="
-    }
-    ## Test validity
-    validObject(x)
-    return(x)
-})
-## setMethod("requireTable", "EnsFilter", function(object, ...){
-##     return(object at required.table)
-## })
-setMethod("print", "BasicFilter", function(x, ...){
-    show(x)
-})
-setMethod("show", "BasicFilter", function(object){
-    cat("| Object of class:", class(object), "\n")
-    cat("| condition:", object at condition, "\n")
-    cat("| value:", value(object), "\n")
-})
-
-##***********************************************************************
-##
-##     where for a list.
-##
-##***********************************************************************
-setMethod("where", signature(object="list",db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              wherequery <- paste(" where", paste(unlist(lapply(object, where)),
-                                                  collapse=" and "))
-              return(wherequery)
-          })
-setMethod("where", signature(object="list",db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              wherequery <- paste(" where", paste(unlist(lapply(object, where, db)),
-                                                  collapse=" and "))
-              return(wherequery)
-          })
-setMethod("where", signature(object="list",db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              wherequery <- paste(" where", paste(unlist(lapply(object, where, db,
-                                                                with.tables=with.tables)),
-                                                  collapse=" and "))
-              return(wherequery)
-          })
-
-
-
-##***********************************************************************
-##
-##     Methods for GeneidFilter classes.
-##
-##***********************************************************************
-setMethod("where", signature(object="GeneidFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              suff <- callNextMethod()
-              return(paste(column(object), suff))
-          })
-setMethod("column", signature(object="GeneidFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              return("gene_id")
-          })
-setMethod("where", signature(object="GeneidFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(where(object, db, with.tables=tn))
-          })
-setMethod("column", signature(object="GeneidFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(column(object, db, with.tables=tn))
-          })
-setMethod("where", signature(object="GeneidFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              suff <- callNextMethod()
-              return(paste(column(object, db, with.tables), suff))
-          })
-setMethod("column", signature("GeneidFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
-                            use.names=FALSE))
-          })
-
-
-
-##***********************************************************************
-##
-##     Methods for EntrezidFilter classes.
-##
-##***********************************************************************
-setMethod("where", signature(object="EntrezidFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              suff <- callNextMethod()
-              return(paste(column(object), suff))
-          })
-setMethod("column", signature(object="EntrezidFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              return("entrezid")
-          })
-setMethod("where", signature(object="EntrezidFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(where(object, db, with.tables=tn))
-          })
-setMethod("column", signature(object="EntrezidFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(column(object, db, with.tables=tn))
-          })
-setMethod("where", signature(object="EntrezidFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              suff <- callNextMethod()
-              return(paste(column(object, db, with.tables=with.tables), suff))
-          })
-setMethod("column", signature("EntrezidFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
-                            use.names=FALSE))
-          })
-
-
-##***********************************************************************
-##
-##     Methods for GenebiotypeFilter classes.
-##
-##***********************************************************************
-setMethod("where", signature(object="GenebiotypeFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              suff <- callNextMethod()
-              return(paste(column(object), suff))
-          })
-setMethod("column", signature(object="GenebiotypeFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              return("gene_biotype")
-          })
-setMethod("where", signature(object="GenebiotypeFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(where(object, db, with.tables=tn))
-          })
-setMethod("column", signature(object="GenebiotypeFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(column(object, db, with.tables=tn))
-          })
-setMethod("where", signature(object="GenebiotypeFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              suff <- callNextMethod()
-              return(paste(column(object, db, with.tables=with.tables), suff))
-          })
-setMethod("column", signature(object="GenebiotypeFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
-                            use.names=FALSE))
-          })
-
-
-
-##***********************************************************************
-##
-##     Methods for GenenameFilter classes.
-##
-##***********************************************************************
-setMethod("where", signature(object="GenenameFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              suff <- callNextMethod()
-              return(paste(column(object), suff))
-          })
-setMethod("column", signature(object="GenenameFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              return("gene_name")
-          })
-setMethod("where", signature(object="GenenameFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(where(object, db, with.tables=tn))
-          })
-setMethod("column", signature(object="GenenameFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(column(object, db, with.tables=tn))
-          })
-setMethod("where", signature(object="GenenameFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables="character", ...){
-              suff <- callNextMethod()
-              return(paste(column(object, db, with.tables=with.tables), suff))
-          })
-setMethod("column", signature(object="GenenameFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
-                            use.names=FALSE))
-          })
-
-
-
-
-##***********************************************************************
-##
-##     Methods for TxidFilter classes.
-##
-##***********************************************************************
-setMethod("where", signature(object="TxidFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              suff <- callNextMethod()
-              return(paste(column(object), suff))
-          })
-setMethod("column", signature(object="TxidFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              return("tx_id")
-          })
-setMethod("where", signature(object="TxidFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(where(object, db, with.tables=tn))
-          })
-setMethod("column", signature(object="TxidFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(column(object, db, with.tables=tn))
-          })
-setMethod("where", signature(object="TxidFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              suff <- callNextMethod()
-              return(paste(column(object, db, with.tables=with.tables), suff))
-          })
-setMethod("column", signature(object="TxidFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
-                            use.names=FALSE))
-          })
-
-
-
-
-##***********************************************************************
-##
-##     Methods for TxbiotypeFilter classes.
-##
-##***********************************************************************
-setMethod("where", signature(object="TxbiotypeFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              suff <- callNextMethod()
-              return(paste(column(object), suff))
-          })
-setMethod("column", signature(object="TxbiotypeFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables...){
-              return("tx_biotype")
-          })
-setMethod("where", signature(object="TxbiotypeFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(where(object, db, with.tables=tn))
-          })
-setMethod("column", signature(object="TxbiotypeFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(column(object, db, with.tables=tn))
-          })
-setMethod("where", signature(object="TxbiotypeFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              suff <- callNextMethod()
-              return(paste(column(object, db, with.tables=with.tables), suff))
-          })
-setMethod("column", signature(object="TxbiotypeFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
-                            use.names=FALSE))
-          })
-
-
-
-
-##***********************************************************************
-##
-##     Methods for ExonidFilter classes.
-##
-##***********************************************************************
-setMethod("where", signature(object="ExonidFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              suff <- callNextMethod()
-              return(paste(column(object), suff))
-          })
-setMethod("column", signature(object="ExonidFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              return("exon_id")
-          })
-setMethod("where", signature(object="ExonidFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(where(object, db, with.tables=tn))
-          })
-setMethod("column", signature(object="ExonidFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(column(object, db, with.tables=tn))
-          })
-setMethod("where", signature(object="ExonidFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              suff <- callNextMethod()
-              return(paste(column(object, db, with.tables=with.tables), suff))
-          })
-setMethod("column", signature(object="ExonidFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
-                            use.names=FALSE))
-          })
-
-
-##***********************************************************************
-##
-##     Methods for ExonrankFilter classes.
-##
-##***********************************************************************
-setMethod("where", signature(object="ExonrankFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              suff <- callNextMethod()
-              return(paste(column(object), suff))
-          })
-setMethod("column", signature(object="ExonrankFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              return("exon_idx")
-          })
-setMethod("where", signature(object="ExonrankFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(where(object, db, with.tables=tn))
-          })
-setMethod("column", signature(object="ExonrankFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(column(object, db, with.tables=tn))
-          })
-setMethod("where", signature(object="ExonrankFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              suff <- callNextMethod()
-              return(paste(column(object, db, with.tables=with.tables), suff))
-          })
-setMethod("column", signature(object="ExonrankFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
-                            use.names=FALSE))
-          })
-setReplaceMethod("value", "ExonrankFilter", function(x, value){
-    if(any(is.na(as.numeric(value))))
-        stop("Argument 'value' has to be numeric!")
-    x at value <- value
-    validObject(x)
-    return(x)
-})
-
-
-##***********************************************************************
-##
-##     Methods for SeqnameFilter classes.
-##
-##***********************************************************************
-setMethod("where", signature(object="SeqnameFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              suff <- callNextMethod()
-              return(paste(column(object), suff))
-          })
-setMethod("column", signature(object="SeqnameFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              return("seq_name")
-          })
-setMethod("where", signature(object="SeqnameFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(where(object, db, with.tables=tn))
-          })
-setMethod("column", signature(object="SeqnameFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(column(object, db, with.tables=tn))
-          })
-setMethod("where", signature(object="SeqnameFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              suff <- callNextMethod()
-              return(paste(column(object, db, with.tables=with.tables), suff))
-          })
-setMethod("column", signature(object="SeqnameFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
-                            use.names=FALSE))
-          })
-## Overwriting the value method allows us to fix chromosome names (e.g. with prefix chr)
-## to be usable for EnsDb and Ensembl based chromosome names (i.e. without chr).
-setMethod("value", signature(x="SeqnameFilter", db="EnsDb"),
-          function(x, db, ...){
-              val <- formatSeqnamesForQuery(db, value(x))
-              if(any(is.na(val))){
-                  stop("A value of <NA> is not allowed for a SeqnameFilter!")
+## Methods for filter classes.
+
+#' @description Extract the field/column name from an AnnotationFilter and
+#'     ensure that it matches the correct database column name. Depending on
+#'     whether argument \code{db} is present, the column names are also
+#'     prefixed with the name of the corresponding table.
+#'
+#' @param object An \code{AnnotationFilter} object.
+#'
+#' @param db An \code{EnsDb} object.
+#'
+#' @param with.tables \code{character} specifying the tables that should be
+#'     considered when prefixing the column name.
+#' 
+#' @noRd
+setMethod("ensDbColumn", "AnnotationFilter",
+          function(object, db, with.tables = character()) {
+              clmn <- .fieldInEnsDb(object at field)
+              if (missing(db))
+                  return(clmn)
+              if (length(with.tables) == 0)
+                  with.tables <- names(listTables(db))
+              unlist(prefixColumns(db, clmn, with.tables = with.tables),
+                     use.names = FALSE)
+          })
+
+setMethod("ensDbColumn", "AnnotationFilterList",
+          function(object, db, with.tables = character()) {
+              if (length(object) == 0)
+                  return(character())
+              unique(unlist(lapply(object, ensDbColumn, db,
+                                   with.tables = with.tables)))
+          })
+
+#' @description Build the \emph{where} query for an \code{AnnotationFilter} or
+#'     \code{AnnotationFilterList}.
+#'
+#' @noRd
+setMethod("ensDbQuery", "AnnotationFilter",
+          function(object, db, with.tables = character()) {
+              .queryForEnsDbWithTables(object, db, with.tables)
+          })
+
+setMethod("ensDbQuery", "AnnotationFilterList",
+          function(object, db, with.tables = character()) {
+              wq <- NULL
+              if (length(object)) {
+                  wq <- ensDbQuery(object[[1]], db, with.tables = with.tables)
+                  if (length(object) > 1) {
+                      for (i in 2:length(object)) {
+                          wq <- paste(wq, .logOp2SQL(object at logOp[(i -1)]),
+                                      ensDbQuery(object[[i]], db,
+                                                 with.tables = with.tables))
+                      }
+                  }
+                  ## Encapsule all inside brackets.
+                  wq <- paste0("(", wq, ")")
               }
-              return(val)
-              ##return(ucscToEns(value(x)))
-          })
-
-
-##***********************************************************************
-##
-##     Methods for SeqstrandFilter classes.
-##
-##***********************************************************************
-setMethod("where", signature(object="SeqstrandFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              suff <- callNextMethod()
-              return(paste(column(object), suff))
-          })
-setMethod("column", signature(object="SeqstrandFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              return("seq_strand")
-          })
-setMethod("where", signature(object="SeqstrandFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(where(object, db, with.tables=tn))
-          })
-setMethod("column", signature(object="SeqstrandFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(column(object, db, with.tables=tn))
-          })
-setMethod("where", signature(object="SeqstrandFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              suff <- callNextMethod()
-              return(paste(column(object, db, with.tables=with.tables), suff))
-          })
-setMethod("column", signature(object="SeqstrandFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
-                            use.names=FALSE))
-          })
-
-
-
-##***********************************************************************
-##
-##     Methods for SeqstartFilter classes.
-##
-##***********************************************************************
-setMethod("where", signature(object="SeqstartFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              suff <- callNextMethod()
-              return(paste(column(object), suff))
-          })
-setMethod("column", signature(object="SeqstartFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              ## assuming that we follow the naming convention:
-              ## <feature>_seq_end for the naming of the database columns.
-              feature <- object at feature
-              feature <- match.arg(feature, c("gene", "transcript", "exon", "tx"))
-              if(object at feature=="transcript")
-                  feature <- "tx"
-              return(paste0(feature, "_seq_start"))
-          })
-setMethod("where", signature(object="SeqstartFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(where(object, db, with.tables=tn))
-          })
-setMethod("column", signature(object="SeqstartFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(column(object, db, with.tables=tn))
-          })
-setMethod("where", signature(object="SeqstartFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              suff <- callNextMethod()
-              return(paste(column(object, db, with.tables=with.tables), suff))
-          })
-setMethod("column", signature(object="SeqstartFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
-                            use.names=FALSE))
-          })
-
-
-
-##***********************************************************************
-##
-##     Methods for SeqendFilter classes.
-##
-##***********************************************************************
-setMethod("where", signature(object="SeqendFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              suff <- callNextMethod()
-              return(paste(column(object), suff))
-          })
-setMethod("column", signature(object="SeqendFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              ## assuming that we follow the naming convention:
-              ## <feature>_seq_end for the naming of the database columns.
-              feature <- object at feature
-              feature <- match.arg(feature, c("gene", "transcript", "exon", "tx"))
-              if(object at feature=="transcript")
-                  feature <- "tx"
-              return(paste0(feature, "_seq_end"))
-          })
-setMethod("where", signature(object="SeqendFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(where(object, db, with.tables=tn))
-          })
-setMethod("column", signature(object="SeqendFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(column(object, db, with.tables=tn))
-          })
-setMethod("where", signature(object="SeqendFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              suff <- callNextMethod()
-              return(paste(column(object, db, with.tables=with.tables), suff))
-          })
-setMethod("column", signature(object="SeqendFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
-                            use.names=FALSE))
+              wq
+          })
+
+#' Need an ensDbQuery for SeqNameFilter to support different chromosome naming
+#' styles
+#' 
+#' @noRd
+setMethod("ensDbQuery", "SeqNameFilter",
+          function(object, db, with.tables = character()) {
+              ## val <- sQuote(value(object, db))
+              ## Doing all the stuff in here:
+              vals <- value(object)
+              clmn <- .fieldInEnsDb(field(object))
+              if (!missing(db)) {
+                  ## o Eventually rename the seqname based on the seqlevelsStyle.
+                  vals <- formatSeqnamesForQuery(db, vals)
+                  if (length(with.tables) == 0)
+                      with.tables <- names(listTables(db))
+                  clmn <- unlist(prefixColumns(db, clmn,
+                                               with.tables = with.tables))
+              }
+              ## o Quote the values.
+              vals <- sQuote(vals)
+              ## o Concatenate values.
+              if (length(vals) > 1)
+                  vals <- paste0("(", paste0(vals, collapse = ","), ")")
+              paste(clmn, .conditionForEnsDb(object), vals)
+          })
+
+setMethod("ensDbQuery", "SeqStrandFilter",
+          function(object, db, with.tables = character()) {
+              ## We have to ensure that value is converted to +1, -1.
+              val <- strand2num(value(object))
+              clmn <- .fieldInEnsDb(field(object))
+              if (!missing(db)) {
+                  if (length(with.tables) == 0)
+                      with.tables <- names(listTables(db))
+                  clmn <- unlist(prefixColumns(db, clmn,
+                                               with.tables = with.tables))
+              }
+              paste(clmn, .conditionForEnsDb(object), val)
           })
 
 
-###============================================================
-##    Methods for GRangesFilter
-##    + show
-##    + condition
-##    + value
-##    + where
-##    + column
-##    + start
-##    + end
-##    + seqnames
-##    + strand
-###------------------------------------------------------------
-## Overwrite the validation method.
-setValidity("GRangesFilter", function(object){
-    if(!any(object at location == c("within", "overlapping") )){
-        return(paste0("Argument condition should be either 'within' or 'overlapping'! Got ",
-                      object at location, "!"))
-    }
-    ## GRanges has to have valid values for start, end and seqnames!
-    if(length(start(object)) == 0)
-        return("start coordinate of the range is missing!")
-    if(length(end(object)) == 0)
-        return("end coordinate of the range is missing!")
-    if(length(seqnames(object)) == 0)
-        return("A valid seqname is required from the submitted GRanges!")
-    return(TRUE)
-})
-setMethod("show", "GRangesFilter", function(object){
-    cat("| Object of class:" , class(object), "\n")
-    cat("| region:\n")
-    cat("| + start:", paste0(start(object), collapse=", "), "\n")
-    cat("| + end:  ", paste0(end(object), collapse=", "), "\n")
-    cat("| + seqname:", paste0(seqnames(object), collapse=", "), "\n")
-    cat("| + strand: ", paste0(strand(object), collapse=", "), "\n")
-    cat("| condition:", condition(object), "\n")
-})
-setMethod("condition", "GRangesFilter", function(x, ...){
-    return(x at location)
-})
-setReplaceMethod("condition", "GRangesFilter", function(x, value){
-    value <- match.arg(value, c("within", "overlapping"))
-    x at location <- value
-    validObject(x)
-    return(x)
-})
-setMethod("value", signature(x="GRangesFilter", db="missing"),
-          function(x, db, ...){
-              return(x at grange)
-          })
-setMethod("value", signature(x="GRangesFilter", db="EnsDb"),
-          function(x, db, ...){
-              return(x at grange)
-          })
 setMethod("start", signature(x="GRangesFilter"),
           function(x, ...){
-              return(start(value(x)))
+              start(value(x))
           })
+
 setMethod("end", signature(x="GRangesFilter"),
           function(x, ...){
-              return(end(value(x)))
+              end(value(x))
           })
+
 setMethod("strand", signature(x="GRangesFilter"),
           function(x, ...){
-              strnd <- as.character(strand(value(x)))
-              return(strnd)
+              as.character(strand(value(x)))
           })
+
+#' @description \code{seqnames}: accessor for the sequence names of the
+#' \code{GRanges} object within a \code{GRangesFilter}
+#' @param x For \code{seqnames}, \code{seqlevels}: a \code{GRangesFilter} object.
+#' 
+#' @rdname Filter-classes
 setMethod("seqnames", signature(x="GRangesFilter"),
           function(x){
-              return(as.character(seqnames(value(x))))
+              as.character(seqnames(value(x)))
           })
+
+#' @description \code{seqnames}: accessor for the \code{seqlevels} of the
+#' \code{GRanges} object within a \code{GRangesFilter}
+#' 
+#' @rdname Filter-classes
 setMethod("seqlevels", signature(x="GRangesFilter"),
           function(x){
-              return(seqlevels(value(x)))
+              seqlevels(value(x))
           })
-## The column method for GRangesFilter returns all columns required for the query, i.e.
-## the _seq_start, _seq_end for the feature, seq_name and seq_strand.
-## Note: this method has to return a named vector!
-setMethod("column", signature(object="GRangesFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              ## assuming that we follow the naming convention:
-              ## <feature>_seq_end for the naming of the database columns.
+
+setMethod("ensDbColumn", "GRangesFilter",
+          function(object, db, with.tables = character(), ...){
               feature <- object at feature
-              feature <- match.arg(feature, c("gene", "transcript", "exon", "tx"))
-              if(object at feature=="transcript")
+              feature <- match.arg(feature, c("gene", "transcript", "exon",
+                                              "tx"))
+              if(object at feature == "transcript")
                   feature <- "tx"
-              cols <- c(start=paste0(feature, "_seq_start"),
-                        end=paste0(feature, "_seq_end"),
-                        seqname="seq_name",
-                        strand="seq_strand")
-              return(cols)
-          })
-setMethod("column", signature(object="GRangesFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(column(object, db, with.tables=tn))
-          })
-## Providing also the columns.
-setMethod("column", signature(object="GRangesFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              cols <- unlist(prefixColumns(db, column(object), with.tables=with.tables),
-                             use.names=FALSE)
-              ## We have to give the vector the required names!
-              names(cols) <- 1:length(cols)
-              names(cols)[grep(cols, pattern="seq_name")] <- "seqname"
-              names(cols)[grep(cols, pattern="seq_strand")] <- "strand"
-              names(cols)[grep(cols, pattern="seq_start")] <- "start"
-              names(cols)[grep(cols, pattern="seq_end")] <- "end"
-              return(cols[c("start", "end", "seqname", "strand")])
-          })
-## Where for GRangesFilter only.
-setMethod("where", signature(object="GRangesFilter", db="missing", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              ## Get the names of the columns we're going to query.
-              cols <- column(object)
-              query <- buildWhereForGRanges(object, cols)
-              return(query)
-          })
-setMethod("where", signature(object="GRangesFilter", db="EnsDb", with.tables="missing"),
-          function(object, db, with.tables, ...){
-              tn <- names(listTables(db))
-              return(where(object, db, with.tables=tn))
-          })
-setMethod("where", signature(object="GRangesFilter", db="EnsDb", with.tables="character"),
-          function(object, db, with.tables, ...){
-              cols <- column(object, db, with.tables)
-              query <- buildWhereForGRanges(object, cols, db=db)
-              return(query)
+              cols <- c(start = paste0(feature, "_seq_start"),
+                        end = paste0(feature, "_seq_end"),
+                        seqname = "seq_name",
+                        strand = "seq_strand")
+
+              if (!missing(db)) {
+                  if (length(with.tables) == 0)
+                      with.tables <- names(listTables(db))
+                  cols <- unlist(prefixColumns(db, cols,
+                                               with.tables = with.tables),
+                                 use.names = FALSE)
+                  ## We have to give the vector the required names!
+                  names(cols) <- 1:length(cols)
+                  names(cols)[grep(cols, pattern = "seq_name")] <- "seqname"
+                  names(cols)[grep(cols, pattern = "seq_strand")] <- "strand"
+                  names(cols)[grep(cols, pattern = "seq_start")] <- "start"
+                  names(cols)[grep(cols, pattern = "seq_end")] <- "end"
+              }
+              cols[c("start", "end", "seqname", "strand")]
           })
 
+setMethod("ensDbQuery", "GRangesFilter",
+          function(object, db, with.tables = character()) {
+              cols <- ensDbColumn(object, db, with.tables)
+              if (missing(db))
+                  db <- NULL
+              buildWhereForGRanges(object, cols, db = db)
+          })
 
-## grf: GRangesFilter
-buildWhereForGRanges <- function(grf, columns, db=NULL){
-    condition <- condition(grf)
-    if(!any(condition == c("within", "overlapping")))
-        stop(paste0("'condition' for GRangesFilter should either be ",
-                    "'within' or 'overlapping', got ", condition, "."))
-    if(is.null(names(columns))){
-        stop(paste0("The vector with the required column names for the",
-                    " GRangesFilter query has to have names!"))
-    }
-    if(!all(c("start", "end", "seqname", "strand") %in% names(columns)))
-        stop(paste0("'columns' has to be a named vector with names being ",
-                    "'start', 'end', 'seqname', 'strand'!"))
-    ## Build the query to fetch all features that are located within the range
-    quers <- sapply(value(grf), function(z){
-        if(!is.null(db)){
-            seqn <- formatSeqnamesForQuery(db, as.character(seqnames(z)))
-        }else{
-            seqn <- as.character(seqnames(z))
-        }
-        if(condition == "within"){
-            query <- paste0(columns["start"], " >= ", start(z), " and ",
-                            columns["end"], " <= ", end(z), " and ",
-                            columns["seqname"], " == '", seqn, "'")
-        }
-        ## Build the query to fetch all features (partially) overlapping the range. This
-        ## includes also all features (genes or transcripts) that have an intron at that
-        ## position.
-        if(condition == "overlapping"){
-            query <- paste0(columns["start"], " <= ", end(z), " and ",
-                            columns["end"], " >= ", start(z), " and ",
-                            columns["seqname"], " = '", seqn, "'")
-        }
-        ## Include the strand, if it's not "*"
-        if(as.character(strand(z)) != "*"){
-            query <- paste0(query, " and ", columns["strand"], " = ",
-                            strand2num(as.character(strand(z))))
-        }
-        return(query)
-    })
-    if(length(quers) > 1)
-        quers <- paste0("(", quers, ")")
-    query <- paste0(quers, collapse=" or ")
-    ## Collapse now the queries.
-    return(query)
-}
-
-
-
-## map chromosome strand...
-strand2num <- function(x){
-    if(x == "+" | x == "-"){
-        return(as.numeric(paste0(x, 1)))
-    }else{
-        stop("Only '+' and '-' supported!")
-    }
-}
-num2strand <- function(x){
-    if(x < 0){
-        return("-")
-    }else{
-        return("+")
-    }
-}
+setMethod("ensDbColumn", signature(object = "OnlyCodingTxFilter"),
+          function(object, db, ...) {
+              "tx.tx_cds_seq_start"
+          })
 
-##***********************************************************************
-##
-##     Methods for SymbolFilter classes.
-##
-##***********************************************************************
-setMethod("where", signature(object = "SymbolFilter", db = "missing",
-                             with.tables = "missing"),
-          function(object, db, with.tables, ...) {
-    suff <- callNextMethod()
-    return(paste(column(object), suff))
-})
-setMethod("column", signature(object = "SymbolFilter", db = "missing",
-                              with.tables = "missing"),
-          function(object, db, with.tables, ...) {
-    return("symbol")
-})
-setMethod("where", signature(object = "SymbolFilter", db = "EnsDb",
-                             with.tables = "missing"),
-          function(object, db, with.tables, ...) {
-    tn <- names(listTables(db))
-    return(where(object, db, with.tables = tn))
-})
-setMethod("column", signature(object = "SymbolFilter", db = "EnsDb",
-                              with.tables = "missing"),
-          function(object, db, with.tables, ...) {
-    tn <- names(listTables(db))
-    return(column(object, db, with.tables = tn))
-})
-setMethod("where", signature(object = "SymbolFilter", db = "EnsDb",
-                             with.tables="character"),
-          function(object, db, with.tables = "character", ...) {
-    suff <- callNextMethod()
-    return(paste(column(object, db, with.tables = with.tables), suff))
-})
-setMethod("column", signature(object = "SymbolFilter", db = "EnsDb",
-                              with.tables = "character"),
-          function(object, db, with.tables, ...) {
-    return(unlist(prefixColumns(db, "gene_name",
-                                with.tables = with.tables),
-                  use.names = FALSE))
-})
+setMethod("ensDbQuery", "OnlyCodingTxFilter",
+          function(object, db, with.tables = character()) {
+              "tx.tx_cds_seq_start is not null"
+          })
 
-##***********************************************************************
-##
-##     Methods for OnlyCodingTx classes.
-##
-##***********************************************************************
-setMethod("where", signature(object = "OnlyCodingTx", db = "EnsDb",
-                             with.tables = "missing"),
-          function(object, db, with.tables, ...) {
-    tn <- names(listTables(db))
-    return(where(object, db, with.tables = tn))
-})
-setMethod("column", signature(object = "OnlyCodingTx", db = "EnsDb",
-                              with.tables = "missing"),
-          function(object, db, with.tables, ...) {
-    tn <- names(listTables(db))
-    return(column(object, db, with.tables = tn))
-})
-setMethod("where", signature(object = "OnlyCodingTx", db = "EnsDb",
-                             with.tables="character"),
-          function(object, db, with.tables = "character", ...) {
-              ## Hard coded.
-              return("tx.tx_cds_seq_start is not null")
-})
-setMethod("column", signature(object = "OnlyCodingTx", db = "EnsDb",
-                              with.tables = "character"),
-          function(object, db, with.tables, ...) {
-              return("tx.tx_cds_seq_start")
-})
+setMethod("ensDbColumn", "ProteinIdFilter",
+          function(object, db, with.tables = character(), ...) {
+              if (missing(db)) {
+                  return(callNextMethod())
+              }
+              if (!hasProteinData(db))
+                  stop("The 'EnsDb' database used does not provide",
+                       " protein annotations! A 'ProteinIdFilter' can not",
+                       " be used.")
+              callNextMethod()
+          })
+
+setMethod("ensDbQuery", "ProteinIdFilter",
+          function(object, db, with.tables = character()) {
+              if (missing(db))
+                  return(callNextMethod())
+              if (!hasProteinData(db))
+                  stop("The 'EnsDb' database used does not provide",
+                       " protein annotations! A 'ProteinIdFilter' can not",
+                       " be used.")
+              .queryForEnsDbWithTables(object, db, with.tables)
+          })
+
+setMethod("ensDbColumn", "UniprotFilter",
+          function(object, db, with.tables = character(), ...) {
+              if (missing(db))
+                  return(callNextMethod())
+              if (!hasProteinData(db))
+                  stop("The 'EnsDb' database used does not provide",
+                       " protein annotations! A 'UniprotFilter' can not",
+                       " be used.")
+              callNextMethod()
+          })
+
+setMethod("ensDbQuery", "UniprotFilter",
+          function(object, db, with.tables = character()) {
+              if (missing(db))
+                  return(callNextMethod())
+              if (!hasProteinData(db))
+                  stop("The 'EnsDb' database used does not provide",
+                       " protein annotations! A 'UniprotFilter' can not",
+                       " be used.")
+              .queryForEnsDbWithTables(object, db, with.tables)
+          })
+
+setMethod("ensDbColumn", "ProtDomIdFilter",
+          function(object, db, with.tables = character(), ...) {
+              if (missing(db))
+                  return(callNextMethod())
+              if (!hasProteinData(db))
+                  stop("The 'EnsDb' database used does not provide",
+                       " protein annotations! A 'ProtDomIdFilter' can not",
+                       " be used.")
+              callNextMethod()
+          })
+
+setMethod("ensDbQuery", "ProtDomIdFilter",
+          function(object, db, with.tables = character()) {
+              if (missing(db))
+                  return(callNextMethod())
+              if (!hasProteinData(db))
+                  stop("The 'EnsDb' database used does not provide",
+                       " protein annotations! A 'ProtDomIdFilter' can not",
+                       " be used.")
+              .queryForEnsDbWithTables(object, db, with.tables)
+          })
+
+setMethod("ensDbColumn", "UniprotDbFilter",
+          function(object, db, with.tables = character(), ...) {
+              if (missing(db))
+                  return(callNextMethod())
+              if (!hasProteinData(db))
+                  stop("The 'EnsDb' database used does not provide",
+                       " protein annotations! A 'UniprotDbFilter' can not",
+                       " be used.")
+              callNextMethod()
+          })
+
+setMethod("ensDbQuery", "UniprotDbFilter",
+          function(object, db, with.tables = character()) {
+              if (missing(db))
+                  return(callNextMethod())
+              if (!hasProteinData(db))
+                  stop("The 'EnsDb' database used does not provide",
+                       " protein annotations! A 'ProteinIdFilter' can not",
+                       " be used.")
+              .queryForEnsDbWithTables(object, db, with.tables)
+          })
+
+setMethod("ensDbColumn", "UniprotMappingTypeFilter",
+          function(object, db, with.tables = character(), ...) {
+              if (missing(db))
+                  return(callNextMethod())
+              if (!hasProteinData(db))
+                  stop("The 'EnsDb' database used does not provide",
+                       " protein annotations! A 'UniprotMappingTypeFilter' ",
+                       "can not be used.")
+              callNextMethod()
+          })
+
+setMethod("ensDbQuery", "UniprotMappingTypeFilter",
+          function(object, db, with.tables = character()) {
+              if (missing(db))
+                  return(callNextMethod())
+              if (!hasProteinData(db))
+                  stop("The 'EnsDb' database used does not provide",
+                       " protein annotations! A 'UniprotMappingTypeFilter' can not",
+                       " be used.")
+              .queryForEnsDbWithTables(object, db, with.tables)
+          })
diff --git a/R/Methods.R b/R/Methods.R
index bb7c255..35b279c 100644
--- a/R/Methods.R
+++ b/R/Methods.R
@@ -20,13 +20,19 @@ setMethod("show", "EnsDb", function(object) {
         ## gene and transcript info.
         cat(paste0("| No. of genes: ",
                    dbGetQuery(object at ensdb,
-                              "select count(distinct gene_id) from gene")[1, 1], ".\n"))
+                              "select count(distinct gene_id) from gene")[1, 1],
+                   ".\n"))
         cat(paste0("| No. of transcripts: ",
                    dbGetQuery(object at ensdb,
-                              "select count(distinct tx_id) from tx")[1, 1], ".\n"))
+                              "select count(distinct tx_id) from tx")[1, 1],
+                   ".\n"))
+        if (hasProteinData(object))
+            cat("|Protein data available.\n")
     }
 })
 
+############################################################
+## organism
 setMethod("organism", "EnsDb", function(object){
     Species <- .getMetaDataValue(object at ensdb, "Organism")
     ## reformat the e.g. homo_sapiens string into Homo sapiens
@@ -36,22 +42,57 @@ setMethod("organism", "EnsDb", function(object){
     return(Species)
 })
 
+############################################################
+## metadata
 setMethod("metadata", "EnsDb", function(x, ...){
     Res <- dbGetQuery(dbconn(x), "select * from metadata")
     return(Res)
 })
-#####
+
+############################################################
 ## Validation
 ##
 validateEnsDb <- function(object){
     ## check if the database contains all required tables...
     if(!is.null(object at ensdb)){
+        msg <- validMsg(NULL, NULL)
         OK <- dbHasRequiredTables(object at ensdb)
         if (is.character(OK))
-            return(OK)
+            msg <- validMsg(msg, OK)
         OK <- dbHasValidTables(object at ensdb)
         if (is.character(OK))
-            return(OK)
+            msg <- validMsg(msg, OK)
+        if (hasProteinData(object)) {
+            OK <- dbHasRequiredTables(
+                object at ensdb,
+                tables = .ensdb_protein_tables(dbSchemaVersion(dbconn(object))))
+            if (is.character(OK))
+                msg <- validMsg(msg, OK)
+            OK <- dbHasValidTables(
+                object at ensdb,
+                tables = .ensdb_protein_tables(dbSchemaVersion(dbconn(object))))
+            if (is.character(OK))
+                msg <- validMsg(msg, OK)
+            cdsTx <- dbGetQuery(dbconn(object),
+                                "select tx_id, tx_cds_seq_start from tx");
+            if (is.character(cdsTx$tx_cds_seq_start)) {
+                suppressWarnings(
+                    cdsTx[, "tx_cds_seq_start"] <- as.numeric(cdsTx$tx_cds_seq_start)
+                )
+            }
+            cdsTx <- cdsTx[!is.na(cdsTx$tx_cds_seq_start), "tx_id"]
+            protTx <- dbGetQuery(dbconn(object),
+                                 "select distinct tx_id from protein")$tx_id
+            if (!all(cdsTx %in% protTx))
+                msg <- validMsg(msg, paste0("Not all transcripts with a CDS ",
+                                            "are assigned to a protein ID!"))
+            if (!all(protTx %in% cdsTx))
+                msg <- validMsg(msg, paste0("Not all proteins are assigned to ",
+                                            "a transcript with a CDS!"))
+
+        }
+        if (is.null(msg)) TRUE
+        else msg
     }
     return(TRUE)
 }
@@ -64,19 +105,24 @@ setMethod("initialize", "EnsDb", function(.Object,...){
     callNextMethod(.Object, ...)
 })
 
-### connection:
-## returns the connection object to the SQL database
+############################################################
+## dbconn
 setMethod("dbconn", "EnsDb", function(x){
     return(x at ensdb)
 })
 
-### ensemblVersion
+############################################################
+## ensemblVersion
+##
 ## returns the ensembl version of the package.
 setMethod("ensemblVersion", "EnsDb", function(x){
     eVersion <- getMetadataValue(x, "ensembl_version")
     return(eVersion)
 })
-### getMetadataValue
+
+############################################################
+## getMetadataValue
+##
 ## returns the metadata value for the specified name/key
 setMethod("getMetadataValue", "EnsDb", function(x, name){
     if(missing(name))
@@ -84,8 +130,8 @@ setMethod("getMetadataValue", "EnsDb", function(x, name){
     return(metadata(x)[metadata(x)$name==name, "value"])
 })
 
-### seqinfo
-## returns the sequence/chromosome information from the database.
+############################################################
+## seqinfo
 setMethod("seqinfo", "EnsDb", function(x){
     Chrs <- dbGetQuery(dbconn(x), "select * from chromosome")
     Chr.build <- .getMetaDataValue(dbconn(x), "genome_build")
@@ -96,14 +142,17 @@ setMethod("seqinfo", "EnsDb", function(x){
     return(SI)
 })
 
-### seqlevels
+############################################################
+## seqlevels
 setMethod("seqlevels", "EnsDb", function(x){
     Chrs <- dbGetQuery(dbconn(x), "select distinct seq_name from chromosome")
     Chrs <- formatSeqnamesFromQuery(x, Chrs$seq_name)
     return(Chrs)
 })
 
-### getGenomeFaFile
+############################################################
+## getGenomeFaFile
+##
 ## queries the dna.toplevel.fa file from AnnotationHub matching the current
 ## Ensembl version
 ## Update: if we can't find a FaFile matching the Ensembl version we suggest ones
@@ -177,12 +226,11 @@ setMethod("getGenomeFaFile", "EnsDb", function(x, pattern="dna.toplevel.fa"){
     return(ensVers)
 }
 
-####============================================================
+############################################################
 ##  getGenomeTwoBitFile
 ##
 ##  Search and retrieve a genomic DNA resource through a TwoBitFile
 ##  from AnnotationHub.
-####------------------------------------------------------------
 setMethod("getGenomeTwoBitFile", "EnsDb", function(x){
     ah <- AnnotationHub()
     ## Reduce the AnnotationHub to species, provider and genome version.
@@ -224,16 +272,11 @@ setMethod("getGenomeTwoBitFile", "EnsDb", function(x){
     return(Dna)
 })
 
-
-
-### listTables
-## returns a named list with database table columns
+############################################################
+## listTables
 setMethod("listTables", "EnsDb", function(x, ...){
     if(length(x at tables)==0){
         tables <- dbListTables(dbconn(x))
-        ## Quick fix for EnsDbs containing also protein data (issue #30):
-        tables <- tables[!(tables %in% c("protein", "uniprot",
-                                         "protein_domain"))]
         ## read the columns for these tables.
         Tables <- vector(length=length(tables), "list")
         for(i in 1:length(Tables)){
@@ -254,16 +297,13 @@ setMethod("listTables", "EnsDb", function(x, ...){
     return(Tab)
 })
 
-### listColumns
-## lists all columns.
+############################################################
+## listColumns
 setMethod("listColumns", "EnsDb", function(x,
                                            table,
                                            skip.keys=TRUE, ...){
     if(length(x at tables)==0){
         tables <- dbListTables(dbconn(x))
-        ## Quick fix for EnsDbs containing also protein data (issue #30):
-        tables <- tables[!(tables %in% c("protein", "uniprot",
-                                         "protein_domain"))]
         ## read the columns for these tables.
         Tables <- vector(length=length(tables), "list")
         for(i in 1:length(Tables)){
@@ -276,12 +316,13 @@ setMethod("listColumns", "EnsDb", function(x,
         x at tables <- Tables
     }
     Tab <- x at tables
-    ## Manually add tx_name as a "virtual" column; getWhat will insert the tx_id into that.
+    ## Manually add tx_name as a "virtual" column; getWhat will insert
+    ## the tx_id into that.
     Tab$tx <- unique(c(Tab$tx, "tx_name"))
     ## Manually add the symbol as a "virtual" column.
     Tab$gene <- unique(c(Tab$gene, "symbol"))
     if(!missing(table)){
-        columns <- Tab[[ table ]]
+        columns <- unlist(Tab[names(Tab) %in% table], use.names = FALSE)
     }else{
         columns <- unlist(Tab, use.names=FALSE)
     }
@@ -294,49 +335,62 @@ setMethod("listColumns", "EnsDb", function(x,
         if(length(idx) > 0)
             columns <- columns[ -idx ]
     }
-    return(columns)
+    return(unique(columns))
 })
 
+############################################################
+## listGenebiotypes
 setMethod("listGenebiotypes", "EnsDb", function(x, ...){
     return(dbGetQuery(dbconn(x), "select distinct gene_biotype from gene")[,1])
 })
+
+############################################################
+## listTxbiotypes
 setMethod("listTxbiotypes", "EnsDb", function(x, ...){
     return(dbGetQuery(dbconn(x), "select distinct tx_biotype from tx")[,1])
 })
 
-### cleanColumns
+############################################################
+## cleanColumns
+##
 ## checks columns and removes all that are not present in database tables
 ## the method checks internally whether the columns are in the full form,
 ## i.e. gene.gene_id (<table name>.<column name>)
-setMethod("cleanColumns", "EnsDb", function(x,
-                                            columns, ...){
+setMethod("cleanColumns", "EnsDb", function(x, columns, ...){
     if(missing(columns))
         stop("No columns submitted!")
     ## vote of the majority
     full.name <- length(grep(columns, pattern=".", fixed=TRUE)) >
-        floor(length(columns) /2)
-    if(full.name){
+        floor(length(columns) / 2)
+    if (full.name) {
         suppressWarnings(
             full.columns <- unlist(prefixColumns(x,
                                                  unlist(listTables(x)),
-                                                 clean=FALSE),
+                                                 clean = FALSE),
                                    use.names=TRUE)
-          )
+        )
         bm <- columns %in% full.columns
         removed <- columns[ !bm ]
-    }else{
-        bm <- columns %in% unlist(listTables(x)[ c("gene", "tx", "exon",
-                                                   "tx2exon", "chromosome") ])
-        removed <- columns[ !bm ]
+    } else {
+        dbtabs <- names(listTables(x))
+        dbtabs <- dbtabs[dbtabs != "metadata"]
+        bm <- columns %in% unlist(listTables(x)[dbtabs])
+        removed <- columns[!bm]
     }
     if(length(removed) > 0){
-        warning("Columns ", paste(sQuote(removed), collapse=", "),
-                " are not valid and have been removed")
-    }
-    return(columns[ bm ])
+        if (length(removed) == 1)
+            warning("Column ", paste(sQuote(removed), collapse=", "),
+                    " is not present in the database and has been removed")
+        else
+            warning("Columns ", paste(sQuote(removed), collapse=", "),
+                    " are not present in the database and have been removed")
+    }
+    return(columns[bm])
 })
 
-### tablesForColumns
+############################################################
+## tablesForColumns
+##
 ## returns the tables for the specified columns.
 setMethod("tablesForColumns", "EnsDb", function(x, columns, ...){
     if(missing(columns))
@@ -351,48 +405,72 @@ setMethod("tablesForColumns", "EnsDb", function(x, columns, ...){
     return(Tables)
 })
 
+############################################################
+## tablesByDegree
+##
 ## returns the table names ordered by degree, i.e. edges to other tables
 setMethod("tablesByDegree", "EnsDb", function(x,
                                               tab=names(listTables(x)),
                                               ...){
-    ## ## to do this with a graph:
-    ## DBgraph <- graphNEL(nodes=c("gene", "tx", "tx2exon", "exon", "chromosome", "information"),
-    ##                  edgeL=list(gene=c("tx", "chromosome"),
-    ##                      tx=c("gene", "tx2exon"),
-    ##                      tx2exon=c("tx", "exon"),
-    ##                      exon="tx2exon",
-    ##                      chromosome="gene"
-    ##                          ))
-    ## Tab <- names(sort(degree(DBgraph), decreasing=TRUE))
-    Table.order <- c(gene=1, tx=2, tx2exon=3, exon=4, chromosome=5, metadata=6)
-    ##Table.order <- c(gene=2, tx=1, tx2exon=3, exon=4, chromosome=5, metadata=6)
+    Table.order <- c(gene = 1, tx = 2, tx2exon = 3, exon = 4, chromosome = 5,
+                     protein = 6, uniprot = 7, protein_domain = 8,
+                     entrezgene = 9,
+                     metadata = 99)
     Tab <- tab[ order(Table.order[ tab ]) ]
     return(Tab)
 })
 
+############################################################
+## hasProteinData
+##
+## Simply check if the database has required tables protein, uniprot
+## and protein_domain.
+#' @title Determine whether protein data is available in the database
+#' 
+#' @aliases hasProteinData
+#' 
+#' @description Determines whether the \code{\linkS4class{EnsDb}}
+#'     provides protein annotation data.
+#' 
+#' @param x The \code{\linkS4class{EnsDb}} object.
+#' 
+#' @return A logical of length one, \code{TRUE} if protein annotations are
+#'     available and \code{FALSE} otherwise.
+#' 
+#' @author Johannes Rainer
+#' 
+#' @seealso \code{\link{listTables}}
+#' 
+#' @examples
+#' library(EnsDb.Hsapiens.v75)
+#' ## Does this database/package have protein annotations?
+#' hasProteinData(EnsDb.Hsapiens.v75)
+setMethod("hasProteinData", "EnsDb", function(x) {
+    tabs <- listTables(x)
+    return(all(c("protein", "uniprot", "protein_domain") %in%
+               names(tabs)))
+})
 
-
-
-### genes:
+############################################################
+## genes
+##
 ## get genes from the database.
 setMethod("genes", "EnsDb", function(x,
-                                     columns=listColumns(x, "gene"),
-                                     filter, order.by="",
-                                     order.type="asc",
-                                     return.type="GRanges"){
+                                     columns = c(listColumns(x, "gene"),
+                                                 "entrezid"),
+                                     filter = AnnotationFilterList(),
+                                     order.by = "",
+                                     order.type = "asc",
+                                     return.type = "GRanges"){
     return.type <- match.arg(return.type, c("data.frame", "GRanges", "DataFrame"))
-    columns <- unique(c(columns, "gene_id"))
+    columns <- cleanColumns(x, unique(c(columns, "gene_id")))
     ## if return.type is GRanges we require columns: seq_name, gene_seq_start
     ## and gene_seq_end and seq_strand
     if(return.type=="GRanges"){
         columns <- unique(c(columns, c("gene_seq_start", "gene_seq_end",
                                        "seq_name", "seq_strand")))
     }
-    if(missing(filter)){
-        filter=list()
-    }else{
-        filter <- checkFilter(filter)
-    }
+    filter <- .processFilterParam(filter, x)
     filter <- setFeatureInGRangesFilter(filter, "gene")
     ## Eventually add columns for the filters:
     columns <- addFilterColumns(columns, filter, x)
@@ -408,19 +486,24 @@ setMethod("genes", "EnsDb", function(x,
             order.by <- ""
     }
     Res <- getWhat(x, columns=columns, filter=filter,
-                   order.by=order.by, order.type=order.type)
-    if(return.type=="data.frame" | return.type=="DataFrame"){
+                   order.by=order.by, order.type=order.type,
+                   startWith = "gene", join = "suggested")
+    ## issue #48: collapse entrezid column if dbschema 2.0 is used.
+    if (as.numeric(dbSchemaVersion(x)) > 1 & any(columns == "entrezid"))
+        Res <- .collapseEntrezidInTable(Res, by = "gene_id")
+    if (return.type=="data.frame" | return.type=="DataFrame") {
         notThere <- !(retColumns %in% colnames(Res))
         if(any(notThere))
-            warning(paste0("Columns ", paste(retColumns[notThere], collapse=", "),
-                           " not present in the result data.frame!"))
+            warning("Columns ",
+                           paste0("'", retColumns[notThere], "'", collapse=", "),
+                           " not found in the database!")
         retColumns <- retColumns[!notThere]
-        Res <- Res[, retColumns]
+        Res <- Res[, retColumns, drop = FALSE]
         if(return.type=="DataFrame")
             Res <- DataFrame(Res)
         return(Res)
     }
-    if(return.type=="GRanges"){
+    if (return.type=="GRanges") {
         metacols <- columns[ !(columns %in% c("seq_name",
                                               "seq_strand",
                                               "gene_seq_start",
@@ -440,13 +523,16 @@ setMethod("genes", "EnsDb", function(x,
     }
 })
 
-### transcripts:
+############################################################
+## transcripts:
+##
 ## get transcripts from the database.
-setMethod("transcripts", "EnsDb", function(x, columns=listColumns(x, "tx"),
-                                           filter, order.by="", order.type="asc",
-                                           return.type="GRanges"){
+setMethod("transcripts", "EnsDb", function(x, columns = listColumns(x, "tx"),
+                                           filter = AnnotationFilterList(),
+                                           order.by = "", order.type = "asc",
+                                           return.type = "GRanges"){
     return.type <- match.arg(return.type, c("data.frame", "GRanges", "DataFrame"))
-    columns <- unique(c(columns, "tx_id"))
+    columns <- cleanColumns(x, unique(c(columns, "tx_id")))
     ## if return.type is GRanges we require columns: seq_name, gene_seq_start
     ## and gene_seq_end and seq_strand
     if(return.type=="GRanges"){
@@ -455,11 +541,7 @@ setMethod("transcripts", "EnsDb", function(x, columns=listColumns(x, "tx"),
                                        "seq_name",
                                        "seq_strand")))
     }
-    if(missing(filter)){
-        filter=list()
-    }else{
-        filter <- checkFilter(filter)
-    }
+    filter <- .processFilterParam(filter, x)
     filter <- setFeatureInGRangesFilter(filter, "tx")
     ## Eventually add columns for the filters:
     columns <- addFilterColumns(columns, filter, x)
@@ -474,15 +556,20 @@ setMethod("transcripts", "EnsDb", function(x, columns=listColumns(x, "tx"),
         if(is.null(order.by))
             order.by <- ""
     }
-    Res <- getWhat(x, columns=columns, filter=filter,
-                   order.by=order.by, order.type=order.type)
+    Res <- getWhat(x, columns=columns, filter = filter,
+                   order.by=order.by, order.type=order.type,
+                   startWith = "tx", join = "suggested")
+    ## issue #48: collapse entrezid column if dbschema 2.0 is used.
+    if (as.numeric(dbSchemaVersion(x)) > 1 & any(columns == "entrezid"))
+        Res <- .collapseEntrezidInTable(Res, by = "tx_id")
     if(return.type=="data.frame" | return.type=="DataFrame"){
         notThere <- !(retColumns %in% colnames(Res))
         if(any(notThere))
-            warning(paste0("Columns ", paste(retColumns[notThere], collapse=", "),
-                           " not present in the result data.frame!"))
+            warning("Columns ", paste0("'", retColumns[notThere], "'",
+                                       collapse=", "),
+                           " not found in the database!")
         retColumns <- retColumns[!notThere]
-        Res <- Res[, retColumns]
+        Res <- Res[, retColumns, drop = FALSE]
         if(return.type=="DataFrame")
             Res <- DataFrame(Res)
         return(Res)
@@ -512,8 +599,9 @@ setMethod("transcripts", "EnsDb", function(x, columns=listColumns(x, "tx"),
     }
 })
 
-### promoters:
-## get promoter regions from the database.
+############################################################
+## promoters:
+##
 setMethod("promoters", "EnsDb",
           function(x, upstream=2000, downstream=200, ...)
           {
@@ -524,17 +612,20 @@ setMethod("promoters", "EnsDb",
           }
 )
 
-### exons:
+############################################################
+## exons
+##
 ## get exons from the database.
-setMethod("exons", "EnsDb", function(x, columns=listColumns(x, "exon"), filter,
-                                     order.by="", order.type="asc",
-                                     return.type="GRanges"){
+setMethod("exons", "EnsDb", function(x, columns = listColumns(x, "exon"),
+                                     filter = AnnotationFilterList(),
+                                     order.by = "", order.type = "asc",
+                                     return.type = "GRanges"){
     return.type <- match.arg(return.type, c("data.frame", "GRanges", "DataFrame"))
     if(!any(columns %in% c(listColumns(x, "exon"), "exon_idx"))){
         ## have to have at least one column from the gene table...
         columns <- c(columns, "exon_id")
     }
-    columns <- unique(c(columns, "exon_id"))
+    columns <- cleanColumns(x, unique(c(columns, "exon_id")))
     ## if return.type is GRanges we require columns: seq_name, gene_seq_start
     ## and gene_seq_end and seq_strand
     if(return.type=="GRanges"){
@@ -543,11 +634,7 @@ setMethod("exons", "EnsDb", function(x, columns=listColumns(x, "exon"), filter,
                                        "seq_name",
                                        "seq_strand")))
     }
-    if(missing(filter)){
-        filter=list()
-    }else{
-        filter <- checkFilter(filter)
-    }
+    filter <- .processFilterParam(filter, x)
     filter <- setFeatureInGRangesFilter(filter, "exon")
     ## Eventually add columns for the filters:
     columns <- addFilterColumns(columns, filter, x)
@@ -563,14 +650,19 @@ setMethod("exons", "EnsDb", function(x, columns=listColumns(x, "exon"), filter,
             order.by <- ""
     }
     Res <- getWhat(x, columns=columns, filter=filter,
-                   order.by=order.by, order.type=order.type)
+                   order.by=order.by, order.type=order.type,
+                   startWith = "exon", join = "suggested")
+    ## issue #48: collapse entrezid column if dbschema 2.0 is used.
+    if (as.numeric(dbSchemaVersion(x)) > 1 & any(columns == "entrezid"))
+        Res <- .collapseEntrezidInTable(Res, by = "exon_id")
     if(return.type=="data.frame" | return.type=="DataFrame"){
         notThere <- !(retColumns %in% colnames(Res))
         if(any(notThere))
-            warning(paste0("Columns ", paste(retColumns[notThere], collapse=", "),
-                           " not present in the result data.frame!"))
+            warning("Columns ", paste0("'", retColumns[notThere], "'",
+                                       collapse=", "),
+                           " not found in the database!")
         retColumns <- retColumns[!notThere]
-        Res <- Res[, retColumns]
+        Res <- Res[, retColumns, drop = FALSE]
         if(return.type=="DataFrame")
             Res <- DataFrame(Res)
         return(Res)
@@ -600,12 +692,14 @@ setMethod("exons", "EnsDb", function(x, columns=listColumns(x, "exon"), filter,
     }
 })
 
-
+############################################################
+## exonsBy
+##
 ## should return a GRangesList
-## still considerably slower than the corresponding call in the GenomicFeatures package.
 setMethod("exonsBy", "EnsDb", function(x, by = c("tx", "gene"),
                                        columns = listColumns(x, "exon"),
-                                       filter, use.names = FALSE) {
+                                       filter = AnnotationFilterList(),
+                                       use.names = FALSE) {
     by <- match.arg(by, c("tx", "gene"))
     bySuff <- "_id"
     if (use.names) {
@@ -617,15 +711,11 @@ setMethod("exonsBy", "EnsDb", function(x, by = c("tx", "gene"),
             bySuff <- "_name"
         }
     }
-    if (missing(filter)) {
-        filter <- list()
-    } else {
-        filter <- checkFilter(filter)
-    }
+    filter <- .processFilterParam(filter, x)
     ## We're applying eventual GRangesFilter to either gene or tx.
     filter <- setFeatureInGRangesFilter(filter, by)
     ## Eventually add columns for the filters:
-    columns <- unique(c(columns, "exon_id"))
+    columns <- cleanColumns(x, unique(c(columns, "exon_id")))
     columns <- addFilterColumns(columns, filter, x)
     ## Quick fix; rename any exon_rank to exon_idx.
     columns[columns == "exon_rank"] <- "exon_idx"
@@ -634,7 +724,7 @@ setMethod("exonsBy", "EnsDb", function(x, by = c("tx", "gene"),
     min.columns <- c(paste0(by, "_id"), "seq_name","exon_seq_start",
                      "exon_seq_end", "exon_id", "seq_strand")
     by.id.full <- unlist(prefixColumns(x, columns = paste0(by, "_id"),
-                                       clean = FALSE),
+                                        clean = FALSE),
                          use.names = FALSE)
     if (by == "gene") {
         ## tx columns have to be removed, since the same exon can be part of
@@ -675,7 +765,11 @@ setMethod("exonsBy", "EnsDb", function(x, by = c("tx", "gene"),
         }
     }
     Res <- getWhat(x, columns = columns, filter = filter,
-                   order.by = order.by, skip.order.check = TRUE)
+                   order.by = order.by, skip.order.check = TRUE,
+                   startWith = by, join = "suggested")
+    ## issue #48: collapse entrezid column if dbschema 2.0 is used.
+    if (as.numeric(dbSchemaVersion(x)) > 1 & any(columns == "entrezid"))
+        Res <- .collapseEntrezidInTable(Res, by = "exon_id")
     ## Now, order in R, if not already done in SQL.
     if (orderR) {
         if (by == "gene") {
@@ -695,8 +789,9 @@ setMethod("exonsBy", "EnsDb", function(x, by = c("tx", "gene"),
     ret_cols[ret_cols == "exon_idx"] <- "exon_rank"
     notThere <- !(ret_cols %in% colnames(Res))
     if (any(notThere))
-        warning(paste0("Columns ", paste(ret_cols[notThere], collapse = ", "),
-                       " not present in the result data.frame!"))
+        warning("Columns ", paste0("'", ret_cols[notThere], "'",
+                                   collapse = ", "),
+                " not found in the database!")
     ret_cols <- ret_cols[!notThere]
     columns.metadata <- ret_cols[!(ret_cols %in% c("seq_name", "seq_strand",
                                                    "exon_seq_start",
@@ -712,12 +807,12 @@ setMethod("exonsBy", "EnsDb", function(x, by = c("tx", "gene"),
     return(split(GR, Res[, paste0(by, bySuff)]))
 })
 
-
 ############################################################
 ## transcriptsBy
+##
 setMethod("transcriptsBy", "EnsDb", function(x, by = c("gene", "exon"),
                                              columns = listColumns(x, "tx"),
-                                             filter){
+                                             filter = AnnotationFilterList()) {
     if (any(by == "cds"))
         stop("fetching transcripts by cds is not (yet) implemented.")
     by <- match.arg(by, c("gene", "exon"))
@@ -735,18 +830,14 @@ setMethod("transcriptsBy", "EnsDb", function(x, by = c("gene", "exon"),
                 " transcripts are fetched.")
     columns <- columns[!torem]
     ## Process filters
-    if (missing(filter)) {
-        filter <- list()
-    } else {
-        filter <- checkFilter(filter)
-    }
+    filter <- .processFilterParam(filter, x)
     ## GRanges filter should be based on either gene or exon coors.
     filter <- setFeatureInGRangesFilter(filter, by)
     ## Eventually add columns for the filters:
     columns <- addFilterColumns(columns, filter, x)
     ret_cols <- unique(columns)
     ## define the minimal columns that we need...
-    columns <- unique(c(columns, min.columns))
+    columns <- cleanColumns(x, unique(c(columns, min.columns)))
     ## get the seqinfo:
     suppressWarnings(
         SI <- seqinfo(x)
@@ -762,7 +853,11 @@ setMethod("transcriptsBy", "EnsDb", function(x, by = c("gene", "exon"),
                            " when seq_strand = -1 then (tx_seq_end * -1) end")
     }
     Res <- getWhat(x, columns=columns, filter=filter,
-                   order.by=order.by, skip.order.check=TRUE)
+                   order.by=order.by, skip.order.check=TRUE,
+                   startWith = by, join = "suggested")
+    ## issue #48: collapse entrezid column if dbschema 2.0 is used.
+    if (as.numeric(dbSchemaVersion(x)) > 1 & any(columns == "entrezid"))
+        Res <- .collapseEntrezidInTable(Res, by = "tx_id")
     if (orderR) {
         startEnd <- (Res$seq_strand == 1) * Res$tx_seq_start +
             (Res$seq_strand == -1) * (Res$tx_seq_end * -1)
@@ -775,13 +870,13 @@ setMethod("transcriptsBy", "EnsDb", function(x, by = c("gene", "exon"),
     ret_cols[ret_cols == "exon_idx"] <- "exon_rank"
     notThere <- !(ret_cols %in% colnames(Res))
     if(any(notThere))
-        warning(paste0("Columns ", paste(ret_cols[notThere], collapse=", "),
-                       " not present in the result data.frame!"))
+        warning("Columns ", paste0("'", ret_cols[notThere], "'", collapse=", "),
+                " not found in the database!")
     ret_cols <- ret_cols[!notThere]
     columns.metadata <- ret_cols[!(ret_cols %in% c("seq_name", "seq_strand",
                                                    "tx_seq_start",
                                                    "tx_seq_end"))]
-    columns.metadata <- match(columns.metadata, colnames(Res))   ## presumably faster...
+    columns.metadata <- match(columns.metadata, colnames(Res))
     GR <- GRanges(seqnames=Rle(Res$seq_name),
                   strand=Rle(Res$seq_strand),
                   ranges=IRanges(start=Res$tx_seq_start, end=Res$tx_seq_end),
@@ -791,15 +886,16 @@ setMethod("transcriptsBy", "EnsDb", function(x, by = c("gene", "exon"),
     return(split(GR, Res[ , byId]))
 })
 
-
+############################################################
+## lengthOf
 ## for GRangesList...
 setMethod("lengthOf", "GRangesList", function(x, ...){
     return(sum(width(reduce(x))))
 ##    return(unlist(lapply(width(reduce(x)), sum)))
 })
-
 ## return the length of genes or transcripts
-setMethod("lengthOf", "EnsDb", function(x, of="gene", filter=list()){
+setMethod("lengthOf", "EnsDb", function(x, of="gene",
+                                        filter=AnnotationFilterList()){
     of <- match.arg(of, c("gene", "tx"))
     ## get the exons by gene or transcript from the database...
     suppressWarnings(
@@ -828,10 +924,12 @@ setMethod("lengthOf", "EnsDb", function(x, of="gene", filter=list()){
 ## })
 ## implement the method from the GenomicFeatures package
 .transcriptLengths <- function(x, with.cds_len=FALSE, with.utr5_len=FALSE,
-                               with.utr3_len=FALSE, filter=list()){
+                               with.utr3_len=FALSE,
+                               filter = AnnotationFilterList()){
     ## First we're going to fetch the exonsBy.
     ## Or use getWhat???
     ## Dash, have to make two queries!
+    filter <- .processFilterParam(filter, x)
     allTxs <- transcripts(x, filter=filter)
     exns <- exonsBy(x, filter=filter)
     ## Match ordering
@@ -901,27 +999,29 @@ setMethod("lengthOf", "EnsDb", function(x, of="gene", filter=list()){
     return(Res)
 }
 
-## cdsBy... return coding region ranges by tx or by gene.
+############################################################
+## cdsBy
+##
+## Return coding region ranges by tx or by gene.
 setMethod("cdsBy", "EnsDb", function(x, by = c("tx", "gene"),
-                                     columns = NULL, filter,
+                                     columns = NULL,
+                                     filter = AnnotationFilterList(),
                                      use.names = FALSE){
     by <- match.arg(by, c("tx", "gene"))
-    if (missing(filter)) {
-        filter = list()
-    } else {
-        filter <- checkFilter(filter)
-    }
+    filter <- .processFilterParam(filter, x)
     filter <- setFeatureInGRangesFilter(filter, by)
+    columns <- cleanColumns(x, columns)
     ## Eventually add columns for the filters:
     columns <- addFilterColumns(columns, filter, x)
     ## Add a filter ensuring that only coding transcripts are queried.
-    filter <- c(list(OnlyCodingTx()), filter)
+    filter <- AnnotationFilterList(OnlyCodingTxFilter() ,filter)
     bySuff <- "_id"
     if (by == "tx") {
         ## adding exon_id, exon_idx to the columns.
         columns <- unique(c(columns, "exon_id", "exon_idx"))
         if (use.names)
-            warning("Not considering use.names as no transcript names are available.")
+            warning("Not considering use.names as no transcript names are",
+                    " available.")
     } else {
         columns <- unique(c("gene_id", columns))
         if( use.names) {
@@ -946,14 +1046,21 @@ setMethod("cdsBy", "EnsDb", function(x, by = c("tx", "gene"),
             order.by <- "tx.tx_id, tx2exon.exon_idx"
         } else {
             ## Here we want to sort the transcripts by tx start.
-            order.by <- "gene.gene_id, case when seq_strand = 1 then tx_cds_seq_start when seq_strand = -1 then (tx_cds_seq_end * -1) end"
+            order.by <- paste0("gene.gene_id, case when seq_strand = 1 then",
+                               " tx_cds_seq_start when seq_strand = -1 then",
+                               "(tx_cds_seq_end * -1) end")
         }
     }
     Res <- getWhat(x, columns = fetchCols,
                    filter = filter,
                    order.by = order.by,
-                   skip.order.check = TRUE)
-    ## Remove rows with NA in tx_cds_seq_start; that's the case for "old" databases.
+                   skip.order.check = TRUE,
+                   startWith = by, join = "suggested")
+    ## issue #48: collapse entrezid column if dbschema 2.0 is used.
+    if (as.numeric(dbSchemaVersion(x)) > 1 & any(columns == "entrezid"))
+        Res <- .collapseEntrezidInTable(Res, by = "exon_id")
+    ## Remove rows with NA in tx_cds_seq_start; that's the case for "old"
+    ## databases.
     nas <- is.na(Res$tx_cds_seq_start)
     if (any(nas))
         Res <- Res[!nas, ]
@@ -1016,18 +1123,17 @@ setMethod("cdsBy", "EnsDb", function(x, by = c("tx", "gene"),
 
 ############################################################
 ## getUTRsByTranscript
-getUTRsByTranscript <- function(x, what, columns = NULL, filter) {
-    if (missing(filter)) {
-        filter <- list()
-    } else {
-        filter <- checkFilter(filter)
-    }
+##
+getUTRsByTranscript <- function(x, what, columns = NULL,
+                                filter = AnnotationFilterList()) {
+    filter <- .processFilterParam(filter, x)
+    columns <- cleanColumns(x, columns)
     filter <- setFeatureInGRangesFilter(filter, "tx")
     ## Eventually add columns for the filters:
     columns <- addFilterColumns(columns, filter, x)
     columns <- unique(c(columns, "exon_id", "exon_idx"))
     ## Add the filter for coding tx only.
-    filter <- c(list(OnlyCodingTx()), filter)
+    filter <- AnnotationFilterList(OnlyCodingTxFilter(), filter)
     ## what do we need: tx_cds_seq_start, tx_cds_seq_end and exon_idx
     fetchCols <- unique(c("tx_id", columns, "tx_cds_seq_start",
                           "tx_cds_seq_end", "seq_name", "seq_strand",
@@ -1042,7 +1148,11 @@ getUTRsByTranscript <- function(x, what, columns = NULL, filter) {
     Res <- getWhat(x, columns=fetchCols,
                    filter=filter,
                    order.by=order.by,
-                   skip.order.check=TRUE)
+                   skip.order.check=TRUE,
+                   startWith = "tx", join = "suggested")
+    ## issue #48: collapse entrezid column if dbschema 2.0 is used.
+    if (as.numeric(dbSchemaVersion(x)) > 1 & any(columns == "entrezid"))
+        Res <- .collapseEntrezidInTable(Res, by = "exon_id")
     nas <- is.na(Res$tx_cds_seq_start)
     if (any(nas))
         Res <- Res[!nas, ]
@@ -1139,27 +1249,27 @@ getUTRsByTranscript <- function(x, what, columns = NULL, filter) {
     return(GR)
 }
 
+############################################################
 ## threeUTRsByTranscript
-setMethod("threeUTRsByTranscript", "EnsDb", function(x, columns=NULL, filter){
-    if(missing(filter)){
-        filter=list()
-    }else{
-        filter <- checkFilter(filter)
-    }
-    return(getUTRsByTranscript(x=x, what="three", columns=columns, filter=filter))
+##
+setMethod("threeUTRsByTranscript", "EnsDb",
+          function(x, columns = NULL, filter = AnnotationFilterList()) {
+              filter <- .processFilterParam(filter, x)
+              getUTRsByTranscript(x = x, what = "three", columns = columns,
+                                  filter = filter)
 })
 
+############################################################
 ## fiveUTRsByTranscript
-setMethod("fiveUTRsByTranscript", "EnsDb", function(x, columns=NULL, filter){
-    if(missing(filter)){
-        filter=list()
-    }else{
-        filter <- checkFilter(filter)
-    }
-    return(getUTRsByTranscript(x=x, what="five", columns=columns, filter=filter))
+##
+setMethod("fiveUTRsByTranscript", "EnsDb",
+          function(x, columns = NULL, filter = AnnotationFilterList()) {
+    filter <- .processFilterParam(filter, x)
+    getUTRsByTranscript(x = x, what = "five", columns = columns,
+                        filter = filter)
 })
 
-
+############################################################
 ## toSAF... function to transform a GRangesList into a data.frame
 ## corresponding to the SAF format.
 ## assuming the names of the GRangesList to be the GeneID and the
@@ -1174,79 +1284,16 @@ setMethod("fiveUTRsByTranscript", "EnsDb", function(x, columns=NULL, filter){
     colnames(DF)[ colnames(DF)=="strand" ] <- "Strand"
     return(DF[ , c("GeneID", "Chr", "Start", "End", "Strand")])
 }
-
 ## for GRangesList...
 setMethod("toSAF", "GRangesList", function(x, ...){
     return(.toSaf(x))
 })
 
-.requireTable <- function(db, attr){
-    return(names(prefixColumns(db, columns=attr)))
-}
-## these function determine which tables we need for the submitted filters.
-setMethod("requireTable", signature(x="GeneidFilter", db="EnsDb"),
-          function(x, db, ...){
-              return(.requireTable(db=db, attr="gene_id"))
-          })
-setMethod("requireTable", signature(x="EntrezidFilter", db="EnsDb"),
-          function(x, db, ...){
-              return(.requireTable(db=db, attr="entrezid"))
-          })
-setMethod("requireTable", signature(x="GenebiotypeFilter", db="EnsDb"),
-          function(x, db, ...){
-              return(.requireTable(db=db, attr="gene_biotype"))
-          })
-setMethod("requireTable", signature(x="GenenameFilter", db="EnsDb"),
-          function(x, db, ...){
-              return(.requireTable(db=db, attr="gene_name"))
-          })
-setMethod("requireTable", signature(x="TxidFilter", db="EnsDb"),
-          function(x, db, ...){
-              return(.requireTable(db=db, attr="tx_id"))
-          })
-setMethod("requireTable", signature(x="TxbiotypeFilter", db="EnsDb"),
-          function(x, db, ...){
-              return(.requireTable(db=db, attr="tx_biotype"))
-          })
-setMethod("requireTable", signature(x="ExonidFilter", db="EnsDb"),
-          function(x, db, ...){
-              return(.requireTable(db=db, attr="exon_id"))
-          })
-setMethod("requireTable", signature(x="SeqnameFilter", db="EnsDb"),
-          function(x, db, ...){
-              return(.requireTable(db=db, attr="seq_name"))
-          })
-setMethod("requireTable", signature(x="SeqstrandFilter", db="EnsDb"),
-          function(x, db, ...){
-              return(.requireTable(db=db, attr="seq_name"))
-          })
-setMethod("requireTable", signature(x="SeqstartFilter", db="EnsDb"),
-          function(x, db, ...){
-              if(x at feature=="gene")
-                  return(.requireTable(db=db, attr="gene_seq_start"))
-              if(x at feature=="transcript" | x at feature=="tx")
-                  return(.requireTable(db=db, attr="tx_seq_start"))
-              if(x at feature=="exon")
-                  return(.requireTable(db=db, attr="exon_seq_start"))
-              return(NA)
-          })
-setMethod("requireTable", signature(x="SeqendFilter", db="EnsDb"),
-          function(x, db, ...){
-              if(x at feature=="gene")
-                  return(.requireTable(db=db, attr="gene_seq_end"))
-              if(x at feature=="transcript" | x at feature=="tx")
-                  return(.requireTable(db=db, attr="tx_seq_end"))
-              if(x at feature=="exon")
-                  return(.requireTable(db=db, attr="exon_seq_end"))
-              return(NA)
-          })
-setMethod("requireTable", signature(x = "SymbolFilter", db = "EnsDb"),
-          function(x, db, ...) {
-    return(.requireTable(db = db, attr = "gene_name"))
-})
+############################################################
+## buildQuery
 setMethod("buildQuery", "EnsDb",
           function(x, columns=c("gene_id", "gene_biotype", "gene_name"),
-                   filter=list(), order.by="",
+                   filter = AnnotationFilterList(), order.by="",
                    order.type="asc",
                    skip.order.check=FALSE){
               return(.buildQuery(x=x,
@@ -1256,38 +1303,50 @@ setMethod("buildQuery", "EnsDb",
                                  order.type=order.type,
                                  skip.order.check=skip.order.check))
           })
-####
+
+############################################################
+## getWhat
+##
 ## Method that wraps the internal .getWhat function to retrieve data from the
 ## database. In addition, if present, we're renaming chromosome names depending
 ## on the ucscChromosomeNames option.
+## Additional parameters:
+## o startWith: the name of the database table from which the join should start
+##   or NULL for the default behaviour (i.e. genes-> tx etc).
+## o join: the type of join that should be used; one of "join",
+##   "left outer join" or "suggested".
 setMethod("getWhat", "EnsDb",
           function(x, columns = c("gene_id", "gene_biotype", "gene_name"),
-                   filter = list(), order.by = "", order.type = "asc",
-                   group.by = NULL, skip.order.check = FALSE) {
+                   filter = AnnotationFilterList(), order.by = "",
+                   order.type = "asc", group.by = NULL,
+                   skip.order.check = FALSE, startWith = NULL,
+                   join = "suggested") {
               Res <- .getWhat(x = x,
                               columns = columns,
                               filter = filter,
                               order.by = order.by,
                               order.type = order.type,
                               group.by = group.by,
-                              skip.order.check = skip.order.check)
+                              skip.order.check = skip.order.check,
+                              startWith = startWith,
+                              join = join)
               ## Eventually renaming seqnames according to the specified style.
               if(any(colnames(Res) == "seq_name"))
                   Res$seq_name <- formatSeqnamesFromQuery(x, Res$seq_name)
               return(Res)
           })
 
-## that's basically a copy of the code from the GenomicFeatures package.
+############################################################
+## disjointExons
+##
+## that's similar to the code from the GenomicFeatures package.
 setMethod("disjointExons", "EnsDb",
-          function(x, aggregateGenes=FALSE, includeTranscripts=TRUE, filter, ...){
-              if(missing(filter)){
-                  filter <- list()
-              }else{
-                  filter <- checkFilter(filter)
-              }
+          function(x, aggregateGenes = FALSE, includeTranscripts = TRUE,
+                   filter = AnnotationFilterList(), ...){
+              filter <- .processFilterParam(filter, x)
 
-              exonsByGene <- exonsBy(x, by="gene", filter=filter)
-              exonicParts <- disjoin(unlist(exonsByGene, use.names=FALSE))
+              exonsByGene <- exonsBy(x, by = "gene", filter = filter)
+              exonicParts <- disjoin(unlist(exonsByGene, use.names = FALSE))
 
               if (aggregateGenes) {
                   foGG <- findOverlaps(exonsByGene, exonsByGene)
@@ -1329,183 +1388,153 @@ setMethod("disjointExons", "EnsDb",
           }
          )
 
-
-### utility functions
-## checkFilter:
-## checks the filter argument and ensures that a list of Filter object is returned
-checkFilter <- function(x){
-    if(is(x, "list")){
-        if(length(x)==0)
-            return(x)
-        ## check if all elements are Filter classes.
-        IsAFilter <- unlist(lapply(x, function(z){
-                                        return(is(z, "BasicFilter"))
-                                    }))
-        if(any(!IsAFilter))
-            stop("One of more elements in filter are not filter objects!")
-    }else{
-        if(is(x, "BasicFilter")){
-            x <- list(x)
-        }else{
-            stop("filter has to be a filter object or a list of filter objects!")
-        }
-    }
-    return(x)
-}
-
+############################################################
+## getGeneRegionTrackForGviz
 ## Fetch data to add as a GeneTrack.
 ## filter ...                 Used to filter the result.
-## chromosome, start, end ... Either all or none has to be specified. If specified, the function
-##                            first retrieves all transcripts that have an exon in the specified
-##                            range and adds them as a TranscriptidFilter to the filters. The
-##                            query to fetch the "real" data is performed after.
-## featureIs ...              Wheter gene_biotype or tx_biotype should be mapped to the column
-##                            feature.
-setMethod("getGeneRegionTrackForGviz", "EnsDb", function(x, filter=list(),
-                                                         chromosome=NULL,
-                                                         start=NULL,
-                                                         end=NULL,
-                                                         featureIs="gene_biotype"){
-    featureIs <- match.arg(featureIs, c("gene_biotype", "tx_biotype"))
-    filter <- checkFilter(filter)
-    if(missing(chromosome))
-        chromosome <- NULL
-    if(missing(start))
-        start <- NULL
-    if(missing(end))
-        end <- NULL
-    ## if only chromosome is specified, create a SeqnameFilter and add it to the filter
-    if(is.null(start) & is.null(end) & !is.null(chromosome)){
-        filter <- c(filter, list(SeqnameFilter(chromosome)))
-        chromosome <- NULL
-    }
-    if(any(c(!is.null(chromosome), !is.null(start), !is.null(end)))){
-        ## Require however that all are defined!!!
-        if(all(c(!is.null(chromosome), !is.null(start), !is.null(end)))){
-            ## Fix eventually provided UCSC chromosome names:
-            chromosome <- ucscToEns(chromosome)
-            ## Fetch all transcripts in that region:
-            tids <- dbGetQuery(dbconn(x),
-                               paste0("select distinct tx.tx_id from tx join gene on",
-                                      " (tx.gene_id=gene.gene_id)",
-                                      " where seq_name='", chromosome, "' and (",
-                                      "(tx_seq_start >=",start," and tx_seq_start <=",end,") or ",
-                                      "(tx_seq_end >=",start," and tx_seq_end <=",end,") or ",
-                                      "(tx_seq_start <=",start," and tx_seq_end >=",end,")",
-                                      ")"))[, "tx_id"]
-            if(length(tids) == 0)
-                stop(paste0("Did not find any transcript on chromosome ", chromosome,
-                            " from ", start, " to ", end, "!"))
-            filter <- c(filter, TxidFilter(tids))
-        }else{
-            stop(paste0("Either all or none of arguments 'chromosome', 'start' and 'end' ",
-                        " have to be specified!"))
+## chromosome, start, end ... Either all or none has to be specified. If
+##                            specified, the function first retrieves all
+##                            transcripts that have an exon in the specified
+##                            range and adds them as a TranscriptidFilter to
+##                            the filters. The query to fetch the "real" data
+##                            is performed afterwards.
+## featureIs ...              Wheter gene_biotype or tx_biotype should be
+##                            mapped to the column feature.
+setMethod(
+    "getGeneRegionTrackForGviz",
+    "EnsDb",
+    function(x, filter = AnnotationFilterList(), chromosome = NULL,
+             start = NULL, end = NULL, featureIs = "gene_biotype")
+    {
+        featureIs <- match.arg(featureIs, c("gene_biotype", "tx_biotype"))
+        filter <- .processFilterParam(filter, x)
+        if(missing(chromosome))
+            chromosome <- NULL
+        if(missing(start))
+            start <- NULL
+        if(missing(end))
+            end <- NULL
+        ## If only chromosome is specified, create a SeqNameFilter and
+        ## add it to the filter
+        if(is.null(start) & is.null(end) & !is.null(chromosome)){
+            filter <- AnnotationFilterList(filter, SeqNameFilter(chromosome))
+            chromosome <- NULL
         }
-    }
-    ## Return a data.frame with columns: chromosome, start, end, width, strand, feature,
-    ## gene, exon, transcript and symbol.
-    ## 1) Query the data as we usually would.
-    ## 2) Perform an additional query to get cds and utr, remove all entries from the
-    ##    first result for the same transcripts and rbind the data.frames.
-    needCols <- c("seq_name", "exon_seq_start", "exon_seq_end", "seq_strand",
-                  featureIs, "gene_id", "exon_id",
-                  "exon_idx", "tx_id", "gene_name")
-    ## That's the names to which we map the original columns from the EnsDb.
-    names(needCols) <- c("chromosome", "start", "end", "strand",
-                         "feature", "gene", "exon", "exon_rank", "transcript",
-                         "symbol")
-    txs <- transcripts(x, filter=filter,
-                       columns=needCols, return.type="data.frame")
-    ## Rename columns
-    idx <- match(needCols, colnames(txs))
-    notThere <- is.na(idx)
-    idx <- idx[!notThere]
-    colnames(txs)[idx] <- names(needCols)[!notThere]
-    ## now processing the 5utr
-    fUtr <- fiveUTRsByTranscript(x, filter=filter, columns=needCols)
-    if(length(fUtr) > 0){
-        fUtr <- as(unlist(fUtr, use.names=FALSE), "data.frame")
-        fUtr <- fUtr[, !(colnames(fUtr) %in% c("width", "seq_name", "exon_seq_start",
-                                               "exon_seq_end", "strand"))]
-        colnames(fUtr)[1] <- "chromosome"
-        idx <- match(needCols, colnames(fUtr))
-        notThere <- is.na(idx)
-        idx <- idx[!notThere]
-        colnames(fUtr)[idx] <- names(needCols)[!notThere]
-        ## Force being in the correct ordering:
-        fUtr <- fUtr[, names(needCols)]
-        fUtr$feature <- "utr5"
-        ## Remove transcripts from the txs data.frame
-        txs <- txs[!(txs$transcript %in% fUtr$transcript), , drop=FALSE]
-    }
-    tUtr <- threeUTRsByTranscript(x, filter=filter, columns=needCols)
-    if(length(tUtr) > 0){
-        tUtr <- as(unlist(tUtr, use.names=FALSE), "data.frame")
-        tUtr <- tUtr[, !(colnames(tUtr) %in% c("width", "seq_name", "exon_seq_start",
-                                               "exon_seq_end", "strand"))]
-        colnames(tUtr)[1] <- "chromosome"
-        idx <- match(needCols, colnames(tUtr))
-        notThere <- is.na(idx)
-        idx <- idx[!notThere]
-        colnames(tUtr)[idx] <- names(needCols)[!notThere]
-        ## Force being in the correct ordering:
-        tUtr <- tUtr[, names(needCols)]
-        tUtr$feature <- "utr3"
-        ## Remove transcripts from the txs data.frame
-        if(nrow(txs) > 0){
-            txs <- txs[!(txs$transcript %in% tUtr$transcript), , drop=FALSE]
+        if(any(c(!is.null(chromosome), !is.null(start), !is.null(end)))){
+            ## Require however that all are defined!!!
+            if(all(c(!is.null(chromosome), !is.null(start), !is.null(end)))){
+                ## Fix eventually provided UCSC chromosome names:
+                chromosome <- ucscToEns(chromosome)
+                ## Define a GRangesFilter to include all features that overlap
+                ## that region.
+                grg <- GRangesFilter(GRanges(seqnames = chromosome,
+                                             ranges = IRanges(start, end)),
+                                     feature = "tx", type = "any")
+                tids <- transcripts(x, filter = grg, columns = "tx_id")$tx_id
+                filter <- AnnotationFilterList(filter, TxIdFilter(tids))
+            }else{
+                stop("Either all or none of arguments 'chromosome', 'start' and",
+                     " 'end' have to be specified!")
+            }
         }
-    }
-    cds <- cdsBy(x, filter=filter, columns=needCols)
-    if(length(cds) > 0){
-        cds <- as(unlist(cds, use.names=FALSE), "data.frame")
-        cds <- cds[, !(colnames(cds) %in% c("width", "seq_name", "exon_seq_start",
-                                            "exon_seq_end", "strand"))]
-        colnames(cds)[1] <- "chromosome"
-        idx <- match(needCols, colnames(cds))
+        ## Return a data.frame with columns: chromosome, start, end, width,
+        ## strand, feature,
+        ## gene, exon, transcript and symbol.
+        ## 1) Query the data as we usually would.
+        ## 2) Perform an additional query to get cds and utr, remove all entries
+        ##    from the first result for the same transcripts and rbind the
+        ##    data.frames.
+        needCols <- c("seq_name", "exon_seq_start", "exon_seq_end", "seq_strand",
+                      featureIs, "gene_id", "exon_id",
+                      "exon_idx", "tx_id", "gene_name")
+        ## That's the names to which we map the original columns from the EnsDb.
+        names(needCols) <- c("chromosome", "start", "end", "strand",
+                             "feature", "gene", "exon", "exon_rank", "transcript",
+                             "symbol")
+        txs <- transcripts(x, filter = filter,
+                           columns = needCols, return.type="data.frame")
+        ## Rename columns
+        idx <- match(needCols, colnames(txs))
         notThere <- is.na(idx)
         idx <- idx[!notThere]
-        colnames(cds)[idx] <- names(needCols)[!notThere]
-        ## Force being in the correct ordering:
-        cds <- cds[, names(needCols)]
-        ## Remove transcripts from the txs data.frame
-        if(nrow(txs) > 0){
-            txs <- txs[!(txs$transcript %in% cds$transcript), , drop=FALSE]
+        colnames(txs)[idx] <- names(needCols)[!notThere]
+        ## now processing the 5utr
+        fUtr <- fiveUTRsByTranscript(x, filter = filter, columns=needCols)
+        if(length(fUtr) > 0){
+            fUtr <- as(unlist(fUtr, use.names=FALSE), "data.frame")
+            fUtr <- fUtr[, !(colnames(fUtr) %in% c("width", "seq_name",
+                                                   "exon_seq_start",
+                                                   "exon_seq_end", "strand"))]
+            colnames(fUtr)[1] <- "chromosome"
+            idx <- match(needCols, colnames(fUtr))
+            notThere <- is.na(idx)
+            idx <- idx[!notThere]
+            colnames(fUtr)[idx] <- names(needCols)[!notThere]
+            ## Force being in the correct ordering:
+            fUtr <- fUtr[, names(needCols)]
+            fUtr$feature <- "utr5"
+            ## Remove transcripts from the txs data.frame
+            txs <- txs[!(txs$transcript %in% fUtr$transcript), , drop=FALSE]
         }
-    }
-    if(length(fUtr) > 0){
-        txs <- rbind(txs, fUtr)
-    }
-    if(length(tUtr) > 0){
-        txs <- rbind(txs, tUtr)
-    }
-    if(length(cds) > 0){
-        txs <- rbind(txs, cds)
-    }
-    ## Convert into GRanges.
-    suppressWarnings(
-        SI <- seqinfo(x)
-    )
-    SI <- SI[as.character(unique(txs$chromosome))]
-    GR <- GRanges(seqnames=Rle(txs$chromosome),
-                  strand=Rle(txs$strand),
-                  ranges=IRanges(start=txs$start, end=txs$end),
-                  seqinfo=SI,
-                  txs[, c("feature", "gene", "exon", "exon_rank",
-                          "transcript", "symbol"), drop=FALSE])
-    return(GR)
-})
-
-
-## Simple helper function to set the @feature in GRangesFilter depending on the calling method.
-setFeatureInGRangesFilter <- function(x, feature){
-    for(i in seq(along.with=x)){
-        if(is(x[[i]], "GRangesFilter")){
-            x[[i]]@feature <- feature
+        tUtr <- threeUTRsByTranscript(x, filter = filter, columns=needCols)
+        if(length(tUtr) > 0){
+            tUtr <- as(unlist(tUtr, use.names=FALSE), "data.frame")
+            tUtr <- tUtr[, !(colnames(tUtr) %in% c("width", "seq_name",
+                                                   "exon_seq_start",
+                                                   "exon_seq_end", "strand"))]
+            colnames(tUtr)[1] <- "chromosome"
+            idx <- match(needCols, colnames(tUtr))
+            notThere <- is.na(idx)
+            idx <- idx[!notThere]
+            colnames(tUtr)[idx] <- names(needCols)[!notThere]
+            ## Force being in the correct ordering:
+            tUtr <- tUtr[, names(needCols)]
+            tUtr$feature <- "utr3"
+            ## Remove transcripts from the txs data.frame
+            if(nrow(txs) > 0){
+                txs <- txs[!(txs$transcript %in% tUtr$transcript), , drop=FALSE]
+            }
         }
-    }
-    return(x)
-}
+        cds <- cdsBy(x, filter = filter, columns = needCols)
+        if(length(cds) > 0){
+            cds <- as(unlist(cds, use.names=FALSE), "data.frame")
+            cds <- cds[, !(colnames(cds) %in% c("width", "seq_name",
+                                                "exon_seq_start",
+                                                "exon_seq_end", "strand"))]
+            colnames(cds)[1] <- "chromosome"
+            idx <- match(needCols, colnames(cds))
+            notThere <- is.na(idx)
+            idx <- idx[!notThere]
+            colnames(cds)[idx] <- names(needCols)[!notThere]
+            ## Force being in the correct ordering:
+            cds <- cds[, names(needCols)]
+            ## Remove transcripts from the txs data.frame
+            if(nrow(txs) > 0){
+                txs <- txs[!(txs$transcript %in% cds$transcript), , drop=FALSE]
+            }
+        }
+        if(length(fUtr) > 0){
+            txs <- rbind(txs, fUtr)
+        }
+        if(length(tUtr) > 0){
+            txs <- rbind(txs, tUtr)
+        }
+        if(length(cds) > 0){
+            txs <- rbind(txs, cds)
+        }
+        ## Convert into GRanges.
+        suppressWarnings(
+            SI <- seqinfo(x)
+        )
+        SI <- SI[as.character(unique(txs$chromosome))]
+        GR <- GRanges(seqnames=Rle(txs$chromosome),
+                      strand=Rle(txs$strand),
+                      ranges=IRanges(start=txs$start, end=txs$end),
+                      seqinfo=SI,
+                      txs[, c("feature", "gene", "exon", "exon_rank",
+                              "transcript", "symbol"), drop=FALSE])
+        return(GR)
+    })
 
 ####============================================================
 ##  properties
@@ -1567,6 +1596,18 @@ setMethod("setProperty", "EnsDb", function(x, ...){
     return(x)
 })
 
+#' remove the property with the specified name.
+#' @noRd
+dropProperty <- function(x, name) {
+    if (missing(name))
+        return(x)
+    prps <- x at .properties
+    if (any(names(prps) == name))
+        prps <- prps[names(prps) != name]
+    x at .properties <- prps
+    x
+}
+
 ####============================================================
 ##  updateEnsDb
 ##
@@ -1589,19 +1630,17 @@ setMethod("updateEnsDb", "EnsDb", function(x, ...){
 setMethod("transcriptsByOverlaps", "EnsDb",
           function(x, ranges, maxgap = 0L, minoverlap = 1L,
                    type = c("any", "start", "end"),
-                   columns=listColumns(x, "tx"),
-                   filter) {
-    if(missing(ranges))
+                   columns = listColumns(x, "tx"),
+                   filter = AnnotationFilterList()) {
+    if (missing(ranges))
         stop("Parameter 'ranges' is missing!")
-    if(missing(filter)){
-        filter <- list()
-    }else{
-        filter <- checkFilter(filter)
-    }
+    filter <- .processFilterParam(filter, x)
     SLs <- unique(as.character(seqnames(ranges)))
-    filter <- c(filter, SeqnameFilter(SLs))
-    return(subsetByOverlaps(transcripts(x, columns=columns, filter=filter),
-           ranges, maxgap=maxgap, minoverlap=minoverlap, type=match.arg(type)))
+    filter <- AnnotationFilterList(filter, SeqNameFilter(SLs))
+    columns <- cleanColumns(x, columns)
+    subsetByOverlaps(transcripts(x, columns = columns, filter = filter),
+                     ranges, maxgap = maxgap, minoverlap = minoverlap,
+                     type = match.arg(type))
 })
 
 ####============================================================
@@ -1609,21 +1648,19 @@ setMethod("transcriptsByOverlaps", "EnsDb",
 ##
 ####------------------------------------------------------------
 setMethod("exonsByOverlaps", "EnsDb",
-          function(x, ranges, maxgap=0L, minoverlap=1L,
-                   type=c("any", "start", "end"),
-                   columns=listColumns(x, "exon"),
-                   filter) {
+          function(x, ranges, maxgap = 0L, minoverlap = 1L,
+                   type = c("any", "start", "end"),
+                   columns = listColumns(x, "exon"),
+                   filter = AnnotationFilterList()) {
     if(missing(ranges))
         stop("Parameter 'ranges' is missing!")
-    if(missing(filter)){
-        filter <- list()
-    }else{
-        filter <- checkFilter(filter)
-    }
+    filter <- .processFilterParam(filter, x)
     SLs <- unique(as.character(seqnames(ranges)))
-    filter <- c(filter, SeqnameFilter(SLs))
-    return(subsetByOverlaps(exons(x, columns=columns, filter=filter),
-           ranges, maxgap=maxgap, minoverlap=minoverlap, type=match.arg(type)))
+    filter <- AnnotationFilterList(filter, SeqNameFilter(SLs))
+    columns <- cleanColumns(x, columns)
+    subsetByOverlaps(exons(x, columns = columns, filter = filter),
+                     ranges, maxgap = maxgap, minoverlap = minoverlap,
+                     type = match.arg(type))
 })
 
 ############################################################
@@ -1664,40 +1701,48 @@ setReplaceMethod("orderResultsInR", "EnsDb", function(x, value) {
 ## useMySQL
 ##
 ## Switch from RSQlite backend to a MySQL backend.
-##' @title Use a MySQL backend
-##' @aliases useMySQL
-##'
-##' @description Change the SQL backend from \emph{SQLite} to \emph{MySQL}.
-##' When first called on an \code{\linkS4class{EnsDb}} object, the function
-##' tries to create and save all of the data into a MySQL database. All
-##' subsequent calls will connect to the already existing MySQL database.
-##'
-##' @details This functionality requires that the \code{RMySQL} package is
-##' installed and that the user has (write) access to a running MySQL server.
-##' If the corresponding database does already exist users without write access
-##' can use this functionality.
-##'
-##' @note At present the function does not evaluate whether the versions
-##' between the SQLite and MySQL database differ.
-##'
-##' @param x The \code{\linkS4class{EnsDb}} object.
-##' @param host Character vector specifying the host on which the MySQL
-##' server runs.
-##' @param port The port on which the MySQL server can be accessed.
-##' @param user The user name for the MySQL server.
-##' @param pass The password for the MySQL server.
-##' @return A \code{\linkS4class{EnsDb}} object providing access to the
-##' data stored in the MySQL backend.
-##' @author Johannes Rainer
-##' @examples
-##' ## Load the EnsDb database (SQLite backend).
-##' library(EnsDb.Hsapiens.v75)
-##' edb <- EnsDb.Hsapiens.v75
-##' ## Now change the backend to MySQL; my_user and my_pass should
-##' ## be the user name and password to access the MySQL server.
-##' \dontrun{
-##' edb_mysql <- useMySQL(edb, host = "localhost", user = my_user, pass = my_pass)
-##' }
+#' @title Use a MySQL backend
+#' 
+#' @aliases useMySQL
+#'
+#' @description Change the SQL backend from \emph{SQLite} to \emph{MySQL}.
+#'     When first called on an \code{\linkS4class{EnsDb}} object, the function
+#'     tries to create and save all of the data into a MySQL database. All
+#'     subsequent calls will connect to the already existing MySQL database.
+#'
+#' @details This functionality requires that the \code{RMySQL} package is
+#'     installed and that the user has (write) access to a running MySQL server.
+#'     If the corresponding database does already exist users without write
+#'     access can use this functionality.
+#'
+#' @note At present the function does not evaluate whether the versions
+#'     between the SQLite and MySQL database differ.
+#'
+#' @param x The \code{\linkS4class{EnsDb}} object.
+#' 
+#' @param host Character vector specifying the host on which the MySQL
+#'     server runs.
+#' 
+#' @param port The port on which the MySQL server can be accessed.
+#'
+#' @param user The user name for the MySQL server.
+#'
+#' @param pass The password for the MySQL server.
+#'
+#' @return A \code{\linkS4class{EnsDb}} object providing access to the
+#'      data stored in the MySQL backend.
+#'
+#' @author Johannes Rainer
+#'
+#' @examples
+#' ## Load the EnsDb database (SQLite backend).
+#' library(EnsDb.Hsapiens.v75)
+#' edb <- EnsDb.Hsapiens.v75
+#' ## Now change the backend to MySQL; my_user and my_pass should
+#' ## be the user name and password to access the MySQL server.
+#' \dontrun{
+#' edb_mysql <- useMySQL(edb, host = "localhost", user = my_user, pass = my_pass)
+#' }
 setMethod("useMySQL", "EnsDb", function(x, host = "localhost",
                                         port = 3306, user, pass) {
     if (missing(user))
@@ -1712,9 +1757,11 @@ setMethod("useMySQL", "EnsDb", function(x, host = "localhost",
                          port = port)
         ## Check if database is available.
         dbs <- dbGetQuery(con, "show databases;")
-        sqliteName <- sub(basename(dbfile(dbconn(x))),
-                          pattern = ".sqlite", replacement = "",
-                          fixed = TRUE)
+        ## sqliteName should be in the format EnsDb.Hsapiens.v75!
+        sqliteName <- .makePackageName(dbconn(x))
+        ## sqliteName <- sub(basename(dbfile(dbconn(x))),
+        ##                   pattern = ".sqlite", replacement = "",
+        ##                   fixed = TRUE)
         mysqlName <- SQLiteName2MySQL(sqliteName)
         if (nrow(dbs) == 0 | !any(dbs$Database == mysqlName)) {
             message("Database not available, trying to create it...",
@@ -1756,3 +1803,205 @@ setMethod("useMySQL", "EnsDb", function(x, host = "localhost",
         stop("Package 'RMySQL' not available.")
     }
 })
+
+############################################################
+## proteins
+##
+## If return type is GRanges, make a seqlevel and seqinfo for each protein, i.e.
+## put each protein on its own sequence.
+#' @title Protein related functionality
+#' 
+#' @aliases proteins
+#'
+#' @description This help page provides information about most of the
+#'     functionality related to protein annotations in \code{ensembldb}.
+#'
+#'     The \code{proteins} method retrieves protein related annotations from
+#'     an \code{\linkS4class{EnsDb}} database.
+#'
+#' @details The \code{proteins} method performs the query starting from the
+#'     \code{protein} tables and can hence return all annotations from the
+#'     database that are related to proteins and transcripts encoding these
+#'     proteins from the database. Since \code{proteins} does thus only query
+#'     annotations for protein coding transcripts, the \code{\link{genes}} or
+#'     \code{\link{transcripts}} methods have to be used to retrieve annotations
+#'     for non-coding transcripts.
+#' 
+#' @param object The \code{\linkS4class{EnsDb}} object.
+#'
+#' @param columns For \code{proteins}: character vector defining the columns to
+#'     be extracted from the database. Can be any column(s) listed by the
+#'     \code{\link{listColumns}} method.
+#'
+#' @param filter For \code{proteins}: A filter object extending
+#'     \code{AnnotationFilter} or a list of such objects to select
+#'     specific entries from the database. See \code{\link{Filter-classes}} for
+#'     a documentation of available filters and use
+#'     \code{\link{supportedFilters}} to get the full list of supported filters.
+#'
+#' @param order.by For \code{proteins}: a character vector specifying the
+#'     column(s) by which the result should be ordered.
+#'
+#' @param order.type For \code{proteins}: if the results should be ordered
+#'     ascending (\code{order.type = "asc"}) or descending
+#'     (\code{order.type = "desc"})
+#'
+#' @param return.type For \code{proteins}: character of lenght one specifying
+#'     the type of the returned object. Can be either \code{"DataFrame"},
+#'     \code{"data.frame"} or \code{"AAStringSet"}.
+#'
+#' @return The \code{proteins} method returns protein related annotations from
+#'     an \code{\linkS4class{EnsDb}} object with its \code{return.type} argument
+#'     allowing to define the type of the returned object. Note that if
+#'     \code{return.type = "AAStringSet"} additional annotation columns are
+#'     stored in a \code{DataFrame} that can be accessed with the \code{mcols}
+#'     method on the returned object.
+#'
+#' @rdname ProteinFunctionality
+#' 
+#' @author Johannes Rainer
+#'
+#' @examples
+#' library(ensembldb)
+#' library(EnsDb.Hsapiens.v75)
+#' edb <- EnsDb.Hsapiens.v75
+#' ## Get all proteins from tha database for the gene ZBTB16, if protein
+#' ## annotations are available
+#' if (hasProteinData(edb))
+#'     proteins(edb, filter = GenenameFilter("ZBTB16"))
+setMethod("proteins", "EnsDb", function(object,
+                                        columns = listColumns(object, "protein"),
+                                        filter = AnnotationFilterList(),
+                                        order.by = "",
+                                        order.type = "asc",
+                                        return.type = "DataFrame") {
+    if (!hasProteinData(object))
+        stop("The used EnsDb does not provide protein annotations!",
+             " Thus, 'proteins' can not be used.")
+    return.type <- match.arg(return.type, c("DataFrame", "AAStringSet",
+                                            "data.frame"))
+    columns <- cleanColumns(object, unique(c(columns, "protein_id")))
+    filter <- .processFilterParam(filter, object)
+    filter <- setFeatureInGRangesFilter(filter, "tx")
+    ## Eventually add columns for the filters:
+    columns <- addFilterColumns(columns, filter, object)
+    ## Check that we don't have any exon columns here.
+    ex_cols <- unique(listColumns(object, c("exon", "tx2exon")))
+    ex_cols <- ex_cols[ex_cols != "tx_id"]
+    if (any(columns %in% ex_cols)) {
+        warning("Exon specific columns are not allowed for proteins. Columns ",
+                paste0("'", columns[columns %in% ex_cols], "'", collapse = ", "),
+                " have been removed.")
+        columns <- columns[!(columns %in% ex_cols)]
+    }
+    retColumns <- columns
+    ## Process order.by:
+    ## If not specified we might want to order them by seq_name or tx_seq_start
+    ## if present in parameter columns
+    if (all(order.by == "")) {
+        order.by <- NULL
+        if (any(columns == "seq_name"))
+            order.by <- "seq_name"
+        seq_col_idx <- grep(columns, pattern = "_seq_")
+        if (length(seq_col_idx) > 0)
+            order.by <- c(order.by, columns[seq_col_idx[1]])
+        if (is.null(order.by))
+            order.by <- ""
+    }
+    ## If we're going to return a GRanges we need to know the length of the
+    ## peptide sequence.
+    if (return.type == "AAStringSet") {
+        columns <- unique(c(columns, "protein_sequence"))
+    }
+    ## protein_id is *always* required
+    columns <- unique(c(columns), "protein_id")
+    ## Get the data
+    Res <- getWhat(object, columns = columns, filter = filter,
+                   order.by = order.by, order.type = order.type,
+                   startWith = "protein", join = "suggested")
+    ## issue #48: collapse entrezid column if dbschema 2.0 is used.
+    if (as.numeric(dbSchemaVersion(object)) > 1 & any(columns == "entrezid"))
+        Res <- .collapseEntrezidInTable(Res, by = "protein_id")
+    ## Now process the result.
+    cols_not_found <- !(retColumns %in% colnames(Res))
+    retColumns <- retColumns[!cols_not_found]
+    if (any(cols_not_found))
+        warning("Columns ", paste0("'", retColumns[cols_not_found], "'",
+                                   collapse = ", "),
+                " not found in the database!")
+    if (return.type == "AAStringSet") {
+        aass <- AAStringSet(Res$protein_sequence)
+        names(aass) <- Res$protein_id
+        ## Add the mcols:
+        retColumns <- retColumns[retColumns != "protein_sequence"]
+        if (length(retColumns) > 0)
+            mcols(aass) <- DataFrame(Res[, retColumns, drop = FALSE])
+        return(aass)
+    } else {
+        Res <- Res[, retColumns, drop = FALSE]
+        if (return.type == "DataFrame")
+            Res <- DataFrame(Res)
+        return(Res)
+    }
+    return(NULL)
+})
+
+############################################################
+## listUniprotDbs
+#' @aliases listUniprotDbs
+#' 
+#' @description The \code{listUniprotDbs} method lists all Uniprot database
+#'     names in the \code{EnsDb}.
+#' 
+#' @examples
+#'
+#' ## List the names of all Uniprot databases from which Uniprot IDs are
+#' ## available in the EnsDb
+#' if (hasProteinData(edb))
+#'     listUniprotDbs(edb)
+#'
+#' @rdname ProteinFunctionality
+setMethod("listUniprotDbs", "EnsDb", function(object) {
+    if (!hasProteinData(object))
+        stop("The provided EnsDb database does not provide protein annotations!")
+    res <- dbGetQuery(dbconn(object), "select distinct uniprot_db from uniprot")
+    return(res$uniprot_db)
+})
+
+############################################################
+## listUniprotMappingTypes
+#' @aliases listUniprotMappingTypes
+#' 
+#' @description The \code{listUniprotMappingTypes} method lists all methods
+#'     that were used for the mapping of Uniprot IDs to Ensembl protein IDs.
+#'
+#' @examples
+#'
+#' ## List the type of all methods that were used to map Uniprot IDs to Ensembl
+#' ## protein IDs
+#' if (hasProteinData(edb))
+#'     listUniprotMappingTypes(edb)
+#'
+#' @rdname ProteinFunctionality
+setMethod("listUniprotMappingTypes", "EnsDb", function(object) {
+    if (!hasProteinData(object))
+        stop("The provided EnsDb database does not provide protein annotations!")
+    res <- dbGetQuery(dbconn(object),
+                      "select distinct uniprot_mapping_type from uniprot")
+    return(res$uniprot_mapping_type)
+})
+
+#' @description \code{supportedFilters} returns the names of all supported
+#'     filters for the \code{EnsDb} object.
+#'
+#' @param object For \code{supportedFilters}: an \code{EnsDb} object.
+#'
+#' @param ... For \code{supportedFilters}: currently not used.
+#'
+#' @return For \code{supportedFilters}: the names of the supported filter
+#'     classes.
+#' 
+#' @rdname Filter-classes
+setMethod("supportedFilters", "EnsDb", function(object, ...) {
+    .supportedFilters(object)
+})
diff --git a/R/dbhelpers.R b/R/dbhelpers.R
index 0bf090d..4ec5b9c 100644
--- a/R/dbhelpers.R
+++ b/R/dbhelpers.R
@@ -1,39 +1,44 @@
 ############################################################
 ## EnsDb
 ## Constructor function.
-##' @title Connect to an EnsDb object
-##'
-##' @description The \code{EnsDb} constructor function connects to the database
-##' specified with argument \code{x} and returns a corresponding
-##' \code{\linkS4class{EnsDb}} object.
-##'
-##' @details By providing the connection to a MySQL database, it is possible
-##' to use MySQL as the database backend and queries will be performed on that
-##' database. Note however that this requires the package \code{RMySQL} to be
-##' installed. In addition, the user needs to have access to a MySQL server
-##' providing already an EnsDb database, or must have write privileges on a
-##' MySQL server, in which case the \code{\link{useMySQL}} method can be used
-##' to insert the annotations from an EnsDB package into a MySQL database.
-##' @param x Either a character specifying the \emph{SQLite} database file, or
-##' a \code{DBIConnection} to e.g. a MySQL database.
-##' @return A \code{\linkS4class{EnsDb}} object.
-##' @author Johannes Rainer
-##' @examples
-##' ## "Standard" way to create an EnsDb object:
-##' library(EnsDb.Hsapiens.v75)
-##' EnsDb.Hsapiens.v75
-##'
-##' ## Alternatively, provide the full file name of a SQLite database file
-##' dbfile <- system.file("extdata/EnsDb.Hsapiens.v75.sqlite", package = "EnsDb.Hsapiens.v75")
-##' edb <- EnsDb(dbfile)
-##' edb
-##'
-##' ## Third way: connect to a MySQL database
-##' \dontrun{
-##' library(RMySQL)
-##' dbcon <- dbConnect(MySQL(), user = my_user, pass = my_pass, host = my_host, dbname = "ensdb_hsapiens_v75")
-##' edb <- EnsDb(dbcon)
-##' }
+#' @title Connect to an EnsDb object
+#'
+#' @description The \code{EnsDb} constructor function connects to the database
+#'     specified with argument \code{x} and returns a corresponding
+#'     \code{\linkS4class{EnsDb}} object.
+#'
+#' @details By providing the connection to a MySQL database, it is possible
+#'     to use MySQL as the database backend and queries will be performed on
+#'     that database. Note however that this requires the package \code{RMySQL}
+#'     to be installed. In addition, the user needs to have access to a MySQL
+#'     server providing already an EnsDb database, or must have write
+#'     privileges on a MySQL server, in which case the \code{\link{useMySQL}}
+#'     method can be used to insert the annotations from an EnsDB package into
+#'     a MySQL database.
+#' 
+#' @param x Either a character specifying the \emph{SQLite} database file, or
+#'     a \code{DBIConnection} to e.g. a MySQL database.
+#' 
+#' @return A \code{\linkS4class{EnsDb}} object.
+#' 
+#' @author Johannes Rainer
+#' 
+#' @examples
+#' ## "Standard" way to create an EnsDb object:
+#' library(EnsDb.Hsapiens.v75)
+#' EnsDb.Hsapiens.v75
+#'
+#' ## Alternatively, provide the full file name of a SQLite database file
+#' dbfile <- system.file("extdata/EnsDb.Hsapiens.v75.sqlite", package = "EnsDb.Hsapiens.v75")
+#' edb <- EnsDb(dbfile)
+#' edb
+#'
+#' ## Third way: connect to a MySQL database
+#' \dontrun{
+#' library(RMySQL)
+#' dbcon <- dbConnect(MySQL(), user = my_user, pass = my_pass, host = my_host, dbname = "ensdb_hsapiens_v75")
+#' edb <- EnsDb(dbcon)
+#' }
 EnsDb <- function(x){
     options(useFancyQuotes=FALSE)
     if(missing(x)){
@@ -58,8 +63,6 @@ EnsDb <- function(x){
     if (is.character(OK))
         stop(OK)
     tables <- dbListTables(con)
-    ## Quick fix for EnsDbs containing also protein data (issue #30):
-    tables <- tables[!(tables %in% c("protein", "uniprot", "protein_domain"))]
     ## read the columns for these tables.
     Tables <- vector(length=length(tables), "list")
     for(i in 1:length(Tables)){
@@ -67,32 +70,46 @@ EnsDb <- function(x){
                                                          tables[ i ], " limit 1")))
     }
     names(Tables) <- tables
-    EDB <- new("EnsDb", ensdb=con, tables=Tables)
+    EDB <- new("EnsDb", ensdb = con, tables = Tables)
     EDB <- setProperty(EDB, dbSeqlevelsStyle="Ensembl")
+    ## Add the db schema version to the properties.
+    EDB <- setProperty(EDB, DBSCHEMAVERSION =
+                                .getMetaDataValue(con, "DBSCHEMAVERSION"))
     ## Setting the default for the returnFilterColumns
     returnFilterColumns(EDB) <- TRUE
     ## Defining the default for the ordering
     orderResultsInR(EDB) <- FALSE
-    return(EDB)
+    ## Check it again...
+    OK <- validateEnsDb(EDB)
+    if (is.character(OK))
+        stop(OK)
+    EDB
 }
 
+## loadEnsDb <- function(x) {
+##     ## con <- ensDb( x )
+##     ## EDB <- new( "EnsDb", ensdb=con )
+##     return(EnsDb(x))
+## }
+
+
 ## x is the connection to the database, name is the name of the entry to fetch
 .getMetaDataValue <- function(x, name){
-    return(dbGetQuery(x, paste0("select value from metadata where name='", name, "'"))[ 1, 1])
+    return(dbGetQuery(x, paste0("select value from metadata where name='",
+                                name, "'"))[ 1, 1])
 }
 
-####
-## Note: that's the central function that checks which tables are needed for the
-## least expensive join!!! The names of the tables should then also be submitted
-## to any other method that calls prefixColumns (e.g. where of the Filter classes)
+############################################################
+## prefixColumns
 ##
-## this function checks:
-## a) for multi-table columns, selects the table with the highest degree
-## b) pre-pend (inverse of append ;)) the table name to the column name.
-## returns a list, names being the tables and the values being the columns
-## named: <table name>.<column name>
-## clean: whether a cleanColumns should be called on the submitted columns.
-## with.tables: force the prefix to be specifically on the submitted tables.
+## Determines which tables (along with the table attributes) are required for
+## the join.
+## Updated version of prefixColumns:
+## o Uses the order of the tables returned by listTables and adds the first
+##   table in which the column was found. That's different to the previous
+##   default of trying to join as few tables as possible but avoids problems
+##   with table joins between e.g. tx and protein in which not all tx_id are
+##   present in the protein table.
 prefixColumns <- function(x, columns, clean = TRUE, with.tables){
     if (missing(columns))
         stop("columns is empty! No columns provided!")
@@ -103,53 +120,38 @@ prefixColumns <- function(x, columns, clean = TRUE, with.tables){
         if (length(with.tables) > 0) {
             Tab <- Tab[ with.tables ]
         } else {
-            warning("The submitted table names are not valid in the database and were thus dropped.")
+            warning("The submitted table names are not valid in the database",
+                    " and were thus dropped.")
         }
         if (length(Tab) == 0)
-            stop("None of the tables submitted with with.tables is present in the database!")
+            stop("None of the tables submitted with with.tables is present",
+                 " in the database!")
     }
     if (clean)
         columns <- cleanColumns(x, columns)
     if (length(columns) == 0) {
         return(NULL)
     }
-    ## group the columns by table.
-    columns.bytable <- sapply(Tab, function(z){
-        return(z[ z %in% columns ])
-    }, simplify=FALSE, USE.NAMES=TRUE)
-    ## kick out empty tables...
-    columns.bytable <- columns.bytable[ unlist(lapply(columns.bytable, function(z){
-        return(length(z) > 0)
-    })) ]
-    if(length(columns.bytable)==0)
-        stop("No columns available!")
-    have.columns <- NULL
-    ## new approach! order the tables by number of elements, and after that, re-order them.
-    columns.bytable <- columns.bytable[ order(unlist(lapply(columns.bytable, length)),
-                                              decreasing=TRUE) ]
-    ## has to be a for loop!!!
-    ## loop throught the columns by table and sequentially kick out columns for the current table if they where already
-    ## in a previous (more relevant) table
-    ## however, prefer also cases were fewer tables are returned.
-    for(i in 1:length(columns.bytable)){
-        bm <- columns.bytable[[ i ]] %in% have.columns
-        keepvals <- columns.bytable[[ i  ]][ !bm ]   ## keep those
-        if(length(keepvals) > 0){
-            have.columns <- c(have.columns, keepvals)
-        }
-        if(length(keepvals) > 0){
-            columns.bytable[[ i ]] <- paste(names(columns.bytable)[ i ], keepvals, sep=".")
-        }else{
-            columns.bytable[[ i ]] <- keepvals
+    getCols <- columns
+    result <- lapply(Tab, function(z) {
+        if (length(getCols) > 0) {
+            gotIt <- z[z %in% getCols]
+            if (length(gotIt) > 0) {
+                getCols <<- getCols[!(getCols %in% gotIt)]
+                return(gotIt)
+            } else {
+                return(character())
+            }
         }
-    }
-    ## kick out those tables with no elements left...
-    columns.bytable <- columns.bytable[ unlist(lapply(columns.bytable, function(z){
-        return(length(z) > 0)
-    })) ]
-    ## re-order by degree.
-    columns.bytable <- columns.bytable[ tablesByDegree(x, names(columns.bytable)) ]
-    return(columns.bytable)
+    })
+    ## If getCols length > 0 it contains columns not present in the db.
+    result <- result[lengths(result) > 0]
+    if (length(result) == 0)
+        stop("None of the columns could be found in the database!")
+    result <- mapply(result, names(result), FUN = function(z, y) {
+        paste0(y, ".", z)
+    }, SIMPLIFY = FALSE)
+    return(result)
 }
 
 ############################################################
@@ -167,50 +169,96 @@ prefixColumnsKeepOrder <- function(x, columns, clean = TRUE, with.tables) {
     return(res_order[!is.null(res_order)])
 }
 
-
-
-## define a function to create a join query based on columns
+############################################################
+## ** NEW JOIN ENGINE **
+##
+## 1: table 1
+## 2: table 2
+## 3: on
+## 4: suggested join
+.JOINS2 <- rbind(
+    c("gene", "tx", "on (gene.gene_id=tx.gene_id)", "join"),
+    c("gene", "chromosome", "on (gene.seq_name=chromosome.seq_name)", "join"),
+    c("tx", "tx2exon", "on (tx.tx_id=tx2exon.tx_id)", "join"),
+    c("tx2exon", "exon", "on (tx2exon.exon_id=exon.exon_id)", "join"),
+    c("tx", "protein", "on (tx.tx_id=protein.tx_id)", "left outer join"),
+    c("gene", "entrezgene", "on (gene.gene_id=entrezgene.gene_id)",
+      "left outer join"),
+    c("protein", "protein_domain",
+      "on (protein.protein_id=protein_domain.protein_id)", "left outer join"),
+    c("protein", "uniprot", "on (protein.protein_id=uniprot.protein_id)",
+      "left outer join"),
+    c("uniprot", "protein_domain",
+      "on (uniprot.protein_id=protein_domain.protein_id)", "left outer join")
+)
+## Takes the names of two tables, determines how to join them and returns the
+## join query row, if found.
+joinTwoTables <- function(a, b) {
+    gotIt <- which((.JOINS2[, 1] %in% a & .JOINS2[, 2] %in% b) |
+                   (.JOINS2[, 2] %in% a & .JOINS2[, 1] %in% b))
+    if (length(gotIt) == 0) {
+        stop("Table(s) ", paste(a, collapse = ", "), " can not be joined with ",
+             paste(b, collapse = ", "), "!")
+    } else {
+        return(.JOINS2[gotIt[1], ])
+    }
+}
+## x: EnsDb.
+## tab: tables to join.
+## join: which type of join should be used?
+## startWith: optional table name from which the join should start. That's
+## specifically important for a left outer join call.
+joinQueryOnTables2 <- function(x, tab, join = "suggested", startWith = NULL) {
+    ## join can be join, left join, left outer join or suggested in which case
+    ## the join defined in the .JOINS2 table will be used.
+    join <- match.arg(join, c("join", "left join", "left outer join",
+                              "suggested"))
+    ## Order the tables.
+    ## Start with startWith, or with the first one.
+    if (missing(tab))
+        stop("Argument 'tab' missing! Need some tables to make a join!")
+    if (!is.null(startWith)) {
+        if (!any(tab == startWith))
+            stop("If provided, 'startWith' has to be the name of one of the",
+                 " tables that should be joined!")
+    }
+    ## Add eventually needed tables to link the ones provided. The tables will
+    ## be ordered by degree.
+    tab <- addRequiredTables(x, tab)
+    if (!is.null(startWith)) {
+        alreadyUsed <- startWith
+        tab <- tab[tab != startWith]
+    } else {
+        alreadyUsed <- tab[1]
+        tab <- tab[-1]
+    }
+    Query <- alreadyUsed
+    ## Iteratively build the query.
+    while (length(tab) > 0) {
+        res <- joinTwoTables(a = alreadyUsed, b = tab)
+        newTab <- res[1:2][!(res[1:2] %in% alreadyUsed)]
+        ## Could also use the suggested join which is in element 4.
+        Query <- paste(Query, ifelse(join == "suggested", res[4], join),
+                       newTab, res[3])
+        alreadyUsed <- c(alreadyUsed, newTab)
+        tab <- tab[tab != newTab]
+    }
+    return(Query)
+}
 ## this function has to first get all tables that contain the columns,
 ## and then select, for columns present in more than one
 ## x... EnsDb
 ## columns... the columns
-joinQueryOnColumns <- function(x, columns){
+## NOTE: if "startWith" is not NULL, we're adding it to the tables!!!!
+joinQueryOnColumns2 <- function(x, columns, join = "suggested",
+                                startWith = NULL) {
     columns.bytable <- prefixColumns(x, columns)
-    ## based on that we can build the query based on the tables we've got. Note that the
-    ## function internally
+    ## based on that we can build the query based on the tables we've got.
+    ## Note that the function internally
     ## adds tables that might be needed for the join.
-    Query <- joinQueryOnTables(x, names(columns.bytable))
-    return(Query)
-}
-
-
-## only list direct joins!!!
-.JOINS <- rbind(
-    c("gene", "tx", "join tx on (gene.gene_id=tx.gene_id)"),
-    c("gene", "chromosome", "join chromosome on (gene.seq_name=chromosome.seq_name)"),
-    c("tx", "tx2exon", "join tx2exon on (tx.tx_id=tx2exon.tx_id)"),
-    c("tx2exon", "exon", "join exon on (tx2exon.exon_id=exon.exon_id)")
-)
-## tx is now no 1:
-## .JOINS <- rbind(
-##     c("tx", "gene", "join gene on (tx.gene_id=gene.gene_id)"),
-##     c("gene", "chromosome", "join chromosome on (gene.seq_name=chromosome.seq_name)"),
-##     c("tx", "tx2exon", "join tx2exon on (tx.tx_id=tx2exon.tx_id)"),
-##     c("tx2exon", "exon", "join exon on (tx2exon.exon_id=exon.exon_id)")
-##    )
-
-
-joinQueryOnTables <- function(x, tab){
-    ## just to be on the save side: evaluate whether we have all required tables to join;
-    ## this will also ensure that the order is by degree.
-    tab <- addRequiredTables(x, tab)
-    Query <- tab[ 1 ]
-    previous.table <- tab[ 1 ]
-    for(i in 1:length(tab)){
-        if(i > 1){
-            Query <- paste(Query, .JOINS[ .JOINS[ , 2 ]==tab[ i ], 3 ])
-        }
-    }
+    Query <- joinQueryOnTables2(x, tab = c(names(columns.bytable), startWith),
+                                join = join,
+                                startWith = startWith)
     return(Query)
 }
 
@@ -219,11 +267,6 @@ joinQueryOnTables <- function(x, tab){
 ## Add additional tables in case the submitted tables are not directly connected
 ## and can thus not be joined. That's however not so complicated, since the database
 ## layout is pretty simple.
-## The tables are:
-##
-##  exon -(exon_id=t2e_exon_id)- tx2exon -(t2e_tx_id=tx_id)- tx -(gene_id=gene_id)- gene
-##                                                                                   |
-##                                                   chromosome -(seq_name=seq_name)-´
 addRequiredTables <- function(x, tab){
     ## dash it, as long as I can't find a way to get connected objects in a
     ## graph I'll do it manually...
@@ -239,6 +282,21 @@ addRequiredTables <- function(x, tab){
     if((any(tab=="exon") | (any(tab=="tx2exon"))) & any(tab=="gene")){
         tab <- unique(c(tab, "tx"))
     }
+    if (hasProteinData(x)) {
+        ## Resolve the proteins: need tx to map between proteome and genome
+        if (any(tab %in% c("uniprot", "protein_domain", "protein")) &
+            any(tab %in% c("exon", "tx2exon", "gene",
+                           "chromosome", "entrezgene")))
+            tab <- unique(c(tab, "tx"))
+        ## Need protein.
+        if (any(tab %in% c("uniprot", "protein_domain")) &
+            any(tab %in% c("exon", "tx2exon", "tx", "gene", "chromosome",
+                           "entrezgene")))
+            tab <- unique(c(tab, "protein"))
+    }
+    ## entrezgene is only linked via gene
+    if (any(tab == "entrezgene") & length(tab) > 1)
+        tab <- unique(c(tab, "gene"))
     return(tablesByDegree(x, tab))
 }
 
@@ -249,17 +307,19 @@ addRequiredTables <- function(x, tab){
 ## The "backbone" function that builds the SQL query based on the specified
 ## columns, the provided filters etc.
 ## x an EnsDb object
-.buildQuery <- function(x, columns, filter = list(), order.by = "",
-                        order.type = "asc", group.by, skip.order.check=FALSE,
-                        return.all.columns = TRUE) {
+## startWith: optional table from which the join should start.
+.buildQuery <- function(x, columns, filter = AnnotationFilterList(),
+                        order.by = "", order.type = "asc", group.by,
+                        skip.order.check=FALSE, return.all.columns = TRUE,
+                        join = "suggested", startWith = NULL) {
     resultcolumns <- columns    ## just to remember what we really want to give back
     ## 1) get all column names from the filters also removing the prefix.
-    if (class(filter)!="list")
-        stop("parameter filter has to be a list of BasicFilter classes!")
+    if (!is(filter, "AnnotationFilterList"))
+        stop("parameter 'filter' has to be an 'AnnotationFilterList'!")
     if (length(filter) > 0) {
         ## check filter!
         ## add the columns needed for the filter
-        filtercolumns <- unlist(lapply(filter, column, x))
+        filtercolumns <- unlist(lapply(filter, ensDbColumn, x))
         ## remove the prefix (column name for these)
         filtercolumns <- sapply(filtercolumns, removePrefix, USE.NAMES = FALSE)
         columns <- unique(c(columns, filtercolumns))
@@ -281,22 +341,19 @@ addRequiredTables <- function(x, tab){
     ##
     ## Now we can begin to build the query parts!
     ## a) the query part that joins all required tables.
-    joinquery <- joinQueryOnColumns(x, columns=columns)
+    joinquery <- joinQueryOnColumns2(x, columns=columns, join = join,
+                                     startWith = startWith)
     ## b) the filter part of the query
     if (length(filter) > 0) {
-        filterquery <- paste(" where",
-                             paste(unlist(lapply(filter, where, x,
-                                                 with.tables = need.tables)),
-                                   collapse=" and "))
+        ## USE THE ensDbQuery method here!!!
+        filterquery <- paste0(" where ", ensDbQuery(filter, x,
+                                                    with.tables = need.tables))
     } else {
         filterquery <- ""
     }
     ## c) the order part of the query
     if (any(order.by != "")) {
         if (!skip.order.check) {
-            ## order.by <- paste(unlist(prefixColumns(x=x, columns=order.by,
-            ##                                        with.tables=need.tables),
-            ##                          use.names=FALSE), collapse=",")
             order.by <- paste(prefixColumnsKeepOrder(x = x, columns = order.by,
                                                      with.tables = need.tables),
                               collapse=",")
@@ -310,10 +367,6 @@ addRequiredTables <- function(x, tab){
         resultcolumns <- columns
     }
     finalquery <- paste0("select distinct ",
-                         ## paste(unlist(prefixColumns(x,
-                         ##                            resultcolumns,
-                         ##                            with.tables=need.tables),
-                         ##              use.names=FALSE), collapse=","),
                          paste(prefixColumnsKeepOrder(x,
                                                       resultcolumns,
                                                       with.tables = need.tables),
@@ -336,19 +389,24 @@ removePrefix <- function(x, split=".", fixed=TRUE){
 }
 
 
-## just to add another layer; basically just calls buildQuery and executes the query
-.getWhat <- function(x, columns, filter = list(), order.by = "",
+## just to add another layer; basically just calls buildQuery and executes the
+## query
+## join: what type of join should be performed.
+## startWith: the name of the table from which the query should be started.
+.getWhat <- function(x, columns, filter = AnnotationFilterList(), order.by = "",
                      order.type = "asc", group.by = NULL,
-                     skip.order.check = FALSE) {
+                     skip.order.check = FALSE, join = "suggested",
+                     startWith = NULL) {
     ## That's nasty stuff; for now we support the column tx_name, which we however
     ## don't have in the database. Thus, we are querying everything except that
     ## column and filling it later with the values from tx_id.
     fetchColumns <- columns
     if(any(columns == "tx_name"))
-        fetchColumns <- unique(c("tx_id", fetchColumns[fetchColumns != "tx_name"]))
-    if (class(filter) != "list")
-        stop("parameter filter has to be a list of BasicFilter classes!")
-    ## If any of the filter is a SymbolFilter, add "symbol" to the return columns.
+        fetchColumns <- unique(c("tx_id",
+                                 fetchColumns[fetchColumns != "tx_name"]))
+    if (!is(filter, "AnnotationFilterList"))
+        stop("parameter 'filter' has to be an 'AnnotationFilterList'!")
+    ## If any filter is a SymbolFilter, add "symbol" to the return columns.
     if (length(filter) > 0) {
         if (any(unlist(lapply(filter, function(z) {
             return(is(z, "SymbolFilter"))
@@ -365,8 +423,10 @@ removePrefix <- function(x, split=".", fixed=TRUE){
         Q <- .buildQuery(x = x, columns = fetchColumns, filter = filter,
                          order.by = "", order.type = order.type,
                          group.by = group.by,
-                         skip.order.check = skip.order.check)
+                         skip.order.check = skip.order.check, join = join,
+                         startWith = startWith)
         ## Get the data
+        ## cat("Query: ", Q, "\n")
         Res <- dbGetQuery(dbconn(x), Q)
         ## Note: we can only order by the columns that we did get back from the
         ## database; that might be different for the SQL sorting!
@@ -377,8 +437,10 @@ removePrefix <- function(x, split=".", fixed=TRUE){
         Q <- .buildQuery(x = x, columns = fetchColumns, filter = filter,
                          order.by = order.by, order.type = order.type,
                          group.by = group.by,
-                         skip.order.check = skip.order.check)
+                         skip.order.check = skip.order.check, join = join,
+                         startWith = startWith)
         ## Get the data
+        ## cat("Query: ", Q, "\n")
         Res <- dbGetQuery(dbconn(x), Q)
     }
     ## cat("Query:\n", Q, "\n")
@@ -421,30 +483,106 @@ removePrefix <- function(x, split=".", fixed=TRUE){
     }
     ## Ensure that the ordering is as requested.
     Res <- Res[, columns, drop=FALSE]
-    return(Res)
+    Res
 }
 
 ############################################################
 ## Check database validity.
-.ENSDB_TABLES <- list(gene = c("gene_id", "gene_name", "entrezid",
-                               "gene_biotype", "gene_seq_start",
-                               "gene_seq_end", "seq_name", "seq_strand",
-                               "seq_coord_system"),
-                      tx = c("tx_id", "tx_biotype", "tx_seq_start",
-                             "tx_seq_end", "tx_cds_seq_start",
-                             "tx_cds_seq_end", "gene_id"),
-                      tx2exon = c("tx_id", "exon_id", "exon_idx"),
-                      exon = c("exon_id", "exon_seq_start", "exon_seq_end"),
-                      chromosome = c("seq_name", "seq_length", "is_circular"),
-                      metadata = c("name", "value"))
-dbHasRequiredTables <- function(con, returnError = TRUE) {
+#' @description Return tables with attributes based on the provided schema.
+#'
+#' @noRd
+.ensdb_tables <- function(version = "1.0") {
+    .ENSDB_TABLES <- list(`1.0` = list(
+                              gene = c("gene_id", "gene_name", "entrezid",
+                                       "gene_biotype", "gene_seq_start",
+                                       "gene_seq_end", "seq_name", "seq_strand",
+                                       "seq_coord_system"),
+                              tx = c("tx_id", "tx_biotype", "tx_seq_start",
+                                     "tx_seq_end", "tx_cds_seq_start",
+                                     "tx_cds_seq_end", "gene_id"),
+                              tx2exon = c("tx_id", "exon_id", "exon_idx"),
+                              exon = c("exon_id", "exon_seq_start",
+                                       "exon_seq_end"),
+                              chromosome = c("seq_name", "seq_length",
+                                             "is_circular"),
+                              metadata = c("name", "value")),
+                          `2.0` = list(
+                              gene = c("gene_id", "gene_name",
+                                       "gene_biotype", "gene_seq_start",
+                                       "gene_seq_end", "seq_name", "seq_strand",
+                                       "seq_coord_system"),
+                              tx = c("tx_id", "tx_biotype", "tx_seq_start",
+                                     "tx_seq_end", "tx_cds_seq_start",
+                                     "tx_cds_seq_end", "gene_id"),
+                              tx2exon = c("tx_id", "exon_id", "exon_idx"),
+                              exon = c("exon_id", "exon_seq_start",
+                                       "exon_seq_end"),
+                              chromosome = c("seq_name", "seq_length",
+                                             "is_circular"),
+                              entrezgene = c("gene_id", "entrezid"),
+                              metadata = c("name", "value"))
+                          )
+    .ENSDB_TABLES[[version]]
+}
+.ensdb_protein_tables <- function(version = "1.0") {
+    .ENSDB_PROTEIN_TABLES <- list(`1.0` = list(
+                                      protein = c("tx_id", "protein_id",
+                                                  "protein_sequence"),
+                                      uniprot = c("protein_id", "uniprot_id",
+                                                  "uniprot_db",
+                                                  "uniprot_mapping_type"),
+                                      protein_domain = c("protein_id",
+                                                         "protein_domain_id",
+                                                         "protein_domain_source",
+                                                         "interpro_accession",
+                                                         "prot_dom_start",
+                                                         "prot_dom_end")),
+                                  `2.0` = list(
+                                      protein = c("tx_id", "protein_id",
+                                                  "protein_sequence"),
+                                      uniprot = c("protein_id", "uniprot_id",
+                                                  "uniprot_db",
+                                                  "uniprot_mapping_type"),
+                                      protein_domain = c("protein_id",
+                                                         "protein_domain_id",
+                                                         "protein_domain_source",
+                                                         "interpro_accession",
+                                                         "prot_dom_start",
+                                                         "prot_dom_end"))
+                                  )
+    .ENSDB_PROTEIN_TABLES[[version]]
+}
+    
+#' @description Extract the database schema version if available in the metadata
+#'     database column.
+#'
+#' @param x Can be either a connection object or an \code{EnsDb} object.
+#' 
+#' @noRd
+dbSchemaVersion <- function(x) {
+    if (is(x, "EnsDb")) {
+        return(getProperty(x, "DBSCHEMAVERSION"))
+    } else {
+        tabs <- dbListTables(x)
+        if (any(tabs == "metadata")) {
+            res <- dbGetQuery(x, "select * from metadata")
+            if (any(res$name == "DBSCHEMAVERSION") &
+                any(colnames(res) == "value"))
+                return(res[res$name == "DBSCHEMAVERSION", "value"])
+        }
+    }
+    return("1.0")
+}
+
+dbHasRequiredTables <- function(con, returnError = TRUE,
+                                tables = .ensdb_tables(dbSchemaVersion(con))) {
     tabs <- dbListTables(con)
     if (length(tabs) == 0) {
         if (returnError)
             return("Database does not have any tables!")
         return(FALSE)
     }
-    not_there <- names(.ENSDB_TABLES)[!(names(.ENSDB_TABLES) %in% tabs)]
+    not_there <- names(tables)[!(names(tables) %in% tabs)]
     if (length(not_there) > 0) {
         if (returnError)
             return(paste0("Required tables ", paste(not_there, collapse = ", "),
@@ -453,9 +591,10 @@ dbHasRequiredTables <- function(con, returnError = TRUE) {
     }
     return(TRUE)
 }
-dbHasValidTables <- function(con, returnError = TRUE) {
-    for (tab in names(.ENSDB_TABLES)) {
-        cols <- .ENSDB_TABLES[[tab]]
+dbHasValidTables <- function(con, returnError = TRUE,
+                             tables = .ensdb_tables(dbSchemaVersion(con))) {
+    for (tab in names(tables)) {
+        cols <- tables[[tab]]
         from_db <- colnames(dbGetQuery(con, paste0("select * from ", tab,
                                                    " limit 1")))
         not_there <- cols[!(cols %in% from_db)]
@@ -501,66 +640,97 @@ feedEnsDb2MySQL <- function(x, mysql, verbose = TRUE) {
     ## Create the indices.
     if (verbose)
         message("Creating indices...", appendLF = FALSE)
-    .createEnsDbIndices(mysql, indexLength = "(20)")
+    ## Guess index length on the maximal number of characters of an ID.
+    indexLength <- max(nchar(
+        dbGetQuery(sqlite_con, "select distinct gene_id from gene")$gene_id
+    ))
+    .createEnsDbIndices(mysql, indexLength = paste0("(", indexLength, ")"),
+                        proteins = hasProteinData(x))
     if (verbose)
         message("OK")
     return(TRUE)
 }
 ## Small helper function to cfeate all the indices.
-.createEnsDbIndices <- function(con, indexLength = "") {
-    dbGetQuery(con, paste0("create index seq_name_idx on chromosome (seq_name",
-                           indexLength, ");"))
-    dbGetQuery(con, paste0("create index gene_gene_id_idx on gene (gene_id",
-                           indexLength, ");"))
-    dbGetQuery(con, paste0("create index gene_gene_name_idx on gene (gene_name",
-                           indexLength, ");"))
-    dbGetQuery(con, paste0("create index gene_seq_name_idx on gene (seq_name",
-                           indexLength, ");"))
-    dbGetQuery(con, paste0("create index tx_tx_id_idx on tx (tx_id",
-                           indexLength, ");"))
-    dbGetQuery(con, paste0("create index tx_gene_id_idx on tx (gene_id",
-                           indexLength, ");"))
-    dbGetQuery(con, paste0("create index exon_exon_id_idx on exon (exon_id",
-                           indexLength, ");"))
-    dbGetQuery(con, paste0("create index t2e_tx_id_idx on tx2exon (tx_id",
-                           indexLength, ");"))
-    dbGetQuery(con, paste0("create index t2e_exon_id_idx on tx2exon (exon_id",
-                           indexLength, ");"))
-    dbGetQuery(con, "create index t2e_exon_idx_idx on tx2exon (exon_idx);")
+.createEnsDbIndices <- function(con, indexLength = "", proteins = FALSE) {
+    indexCols <- c(chromosome = "seq_name", gene = "gene_id", gene = "gene_name",
+                   gene = "seq_name", tx = "tx_id", tx = "gene_id",
+                   exon = "exon_id", tx2exon = "tx_id", tx2exon = "exon_id")
+    if (as.numeric(dbSchemaVersion(con)) > 1)
+        indexCols <- c(indexCols,
+                       entrezgene = "gene_id", entrezgene = "entrezid")
+    if (proteins) {
+        indexCols <- c(indexCols,
+                       protein = "tx_id",
+                       protein = "protein_id",
+                       uniprot = "protein_id",
+                       uniprot = "uniprot_id",
+                       protein_domain = "protein_domain_id",
+                       protein_domain = "protein_id")
+    }
+    for (i in 1:length(indexCols)) {
+        tabname <- names(indexCols)[i]
+        colname <- indexCols[i]
+        ## Check if we've got any values at all. if not we're not creating the
+        ## index.
+        ids <- dbGetQuery(con, paste0("select distinct ", colname,
+                                      " from ", tabname))[, colname]
+        if (length(ids) == 0 | all(is.na(ids))) {
+            ## No need to make an index here!
+        } else {
+            if (indexLength != "")
+                idxL <- paste0("(", min(c(max(nchar(ids)), 20)), ")")
+            else
+                idxL <- ""
+            dbGetQuery(con, paste0("create index ", tabname, "_", colname, "_idx ",
+                                   "on ", tabname, " (",colname, idxL,")"))
+        }
+    }
+    ## Add the one on the numeric index:
+    dbGetQuery(con, "create index tx2exon_exon_idx_idx on tx2exon (exon_idx);")
 }
 
 ############################################################
 ## listEnsDbs
 ## list databases
-##' @title List EnsDb databases in a MySQL server
-##' @description The \code{listEnsDbs} function lists EnsDb databases in a
-##' MySQL server.
-##'
-##' @details The use of this function requires that the \code{RMySQL} package
-##' is installed and that the user has either access to a MySQL server with
-##' already installed EnsDb databases, or write access to a MySQL server in
-##' which case EnsDb databases could be added with the \code{\link{useMySQL}}
-##' method. EnsDb databases follow the same naming conventions than the EnsDb
-##' packages, with the exception that the name is all lower case and that
-##' \code{"."} is replaced by \code{"_"}.
-##' @param dbcon A \code{DBIConnection} object providing access to a MySQL
-##' database. Either \code{dbcon} or all of the other arguments have to be
-##' specified.
-##' @param host Character specifying the host on which the MySQL server is
-##' running.
-##' @param port The port of the MySQL server (usually \code{3306}).
-##' @param user The username for the MySQL server.
-##' @param pass The password for the MySQL server.
-##' @return A \code{data.frame} listing the database names, organism name
-##' and Ensembl version of the EnsDb databases found on the server.
-##' @author Johannes Rainer
-##' @seealso \code{\link{useMySQL}}
-##' @examples
-##' \dontrun{
-##' library(RMySQL)
-##' dbcon <- dbConnect(MySQL(), host = "localhost", user = my_user, pass = my_pass)
-##' listEnsDbs(dbcon)
-##' }
+#' @title List EnsDb databases in a MySQL server
+#'
+#' @description The \code{listEnsDbs} function lists EnsDb databases in a
+#'     MySQL server.
+#'
+#' @details The use of this function requires that the \code{RMySQL} package
+#'     is installed and that the user has either access to a MySQL server with
+#'     already installed EnsDb databases, or write access to a MySQL server in
+#'     which case EnsDb databases could be added with the \code{\link{useMySQL}}
+#'     method. EnsDb databases follow the same naming conventions than the EnsDb
+#'     packages, with the exception that the name is all lower case and that
+#'     \code{"."} is replaced by \code{"_"}.
+#' 
+#' @param dbcon A \code{DBIConnection} object providing access to a MySQL
+#'     database. Either \code{dbcon} or all of the other arguments have to be
+#'     specified.
+#' 
+#' @param host Character specifying the host on which the MySQL server is
+#'     running.
+#' 
+#' @param port The port of the MySQL server (usually \code{3306}).
+#' 
+#' @param user The username for the MySQL server.
+#' 
+#' @param pass The password for the MySQL server.
+#' 
+#' @return A \code{data.frame} listing the database names, organism name
+#'     and Ensembl version of the EnsDb databases found on the server.
+#' 
+#' @author Johannes Rainer
+#' 
+#' @seealso \code{\link{useMySQL}}
+#' 
+#' @examples
+#' \dontrun{
+#' library(RMySQL)
+#' dbcon <- dbConnect(MySQL(), host = "localhost", user = my_user, pass = my_pass)
+#' listEnsDbs(dbcon)
+#' }
 listEnsDbs <- function(dbcon, host, port, user, pass) {
     if(requireNamespace("RMySQL", quietly = TRUE)) {
         if (missing(dbcon)) {
@@ -587,3 +757,14 @@ listEnsDbs <- function(dbcon, host, port, user, pass) {
         stop("Required package 'RMySQL' is not installed.")
     }
 }
+
+#' Simple helper that "translates" R logical operators to SQL.
+#' @noRd
+.logOp2SQL <- function(x) {
+    if (x == "|")
+        return("or")
+    if (x == "&")
+        return("and")
+    return(NULL)
+}
+
diff --git a/R/functions-Filter.R b/R/functions-Filter.R
new file mode 100644
index 0000000..592e897
--- /dev/null
+++ b/R/functions-Filter.R
@@ -0,0 +1,324 @@
+## Some utility functions for Filters.
+
+## Vector to map AnnotationFilter fields to actual database columns.
+## Format: field name = database column name
+.ENSDB_FIELDS <- c(
+    ## gene
+    entrez = "entrezid",
+    gene_biotype = "gene_biotype",
+    gene_id = "gene_id",
+    genename = "gene_name",
+    symbol = "gene_name",
+    seq_name = "seq_name",
+    seq_strand = "seq_strand",
+    gene_start = "gene_seq_start",
+    gene_end = "gene_seq_end",
+    ## tx
+    tx_id = "tx_id",
+    tx_biotype = "tx_biotype",
+    tx_name = "tx_id",
+    tx_start = "tx_seq_start",
+    tx_end = "tx_seq_end",
+    ## exon
+    exon_id = "exon_id",
+    exon_rank = "exon_idx",
+    exon_start = "exon_seq_start",
+    exon_end = "exon_seq_end",
+    ## protein
+    protein_id = "protein_id",
+    uniprot = "uniprot_id",
+    uniprot_db = "uniprot_db",
+    uniprot_mapping_type = "uniprot_mapping_type",
+    prot_dom_id = "protein_domain_id"
+)
+
+.supportedFilters <- function(x) {
+    flts <- c(
+        "EntrezFilter", "GeneBiotypeFilter", "GeneIdFilter", "GenenameFilter",
+        "SymbolFilter", "SeqNameFilter", "SeqStrandFilter", "GeneStartFilter",
+        "GeneEndFilter", "TxIdFilter", "TxBiotypeFilter", "TxNameFilter",
+        "TxStartFilter", "TxEndFilter", "ExonIdFilter", "ExonRankFilter",
+        "ExonStartFilter", "ExonEndFilter", "GRangesFilter"
+    )
+    if (hasProteinData(x))
+        flts <- c(flts, "ProteinIdFilter", "UniprotFilter", "UniprotDbFilter",
+                  "UniprotMappingTypeFilter", "ProtDomIdFilter")
+    return(sort(flts))
+}
+
+#' Utility function to map from the default AnnotationFilters fields to the
+#' database columns used in ensembldb.
+#'
+#' @param x The field name to be \emph{translated}.
+#' @return The column name in the EnsDb database.
+#' @noRd
+.fieldInEnsDb <- function(x) {
+    if (length(x) == 0 || missing(x))
+        stop("Error in .fieldInEnsDb: got empty input argument!")
+    if (is.na(.ENSDB_FIELDS[x]))
+        stop("Unable to map field '", x, "'!")
+    else
+        .ENSDB_FIELDS[x]
+}
+
+
+#' Utility function to map the condition of an AnnotationFilter to the SQL
+#' condition to be used in the EnsDb database.
+#'
+#' @param x An \code{AnnotationFilter}.
+#'
+#' @return A character representing the condition for the SQL call.
+#' @noRd
+.conditionForEnsDb <- function(x) {
+    cond <- condition(x)
+    if (length(unique(value(x))) > 1) {
+        if (cond == "==")
+            cond <- "in"
+        if (cond == "!=")
+            cond <- "not in"
+    }
+    if (cond == "==")
+        cond <- "="
+    if (cond %in% c("startsWith", "endsWith"))
+        cond <- "like"
+    cond
+}
+
+#' Single quote character values, paste multiple values and enclose in quotes.
+#'
+#' @param x An \code{AnnotationFilter} object.
+#' @noRd
+.valueForEnsDb <- function(x) {
+    vals <- unique(value(x))
+    if (is(x, "CharacterFilter")) {
+        vals <- sQuote(gsub(unique(vals), pattern = "'", replacement = "''"))
+    }
+    if (length(vals) > 1)
+        vals <- paste0("(",  paste0(vals, collapse = ","), ")")
+    ## Process the like/startsWith/endsWith
+    if (condition(x) == "startsWith")
+        vals <- paste0("'", unique(x at value), "%'")
+    if (condition(x) == "endsWith")
+        vals <- paste0("'%", unique(x at value), "'")
+    vals
+}
+
+#' That's to build the standard query from an AnnotationFilter for EnsDb.
+#'
+#' @param x An \code{AnnotationFilter}.
+#' @noRd
+.queryForEnsDb <- function(x) {
+    paste(.fieldInEnsDb(field(x)), .conditionForEnsDb(x), .valueForEnsDb(x))
+}
+
+#' This is a slightly more sophisticated function that does also prefix the
+#' columns.
+#' @noRd
+.queryForEnsDbWithTables <- function(x, db, tables = character()) {
+    clmn <- .fieldInEnsDb(field(x))
+    if (!missing(db)) {
+        if (length(tables) == 0)
+            tables <- names(listTables(db))
+        clmn <- unlist(prefixColumns(db, clmn, with.tables = tables))
+    }
+    res <- paste(clmn, .conditionForEnsDb(x), .valueForEnsDb(x))
+    ## cat("  ", res, "\n")
+    return(res)
+}
+
+#' Simple helper function to convert expressions to AnnotationFilter or
+#' AnnotationFilterList.
+#'
+#' @param x Can be an \code{AnnotationFilter}, an \code{AnnotationFilterList},
+#' a \code{list} or a filter \code{expression}. This should NOT be empty!
+#' 
+#' @return Returns an \code{AnnotationFilterList} with all filters.
+#' 
+#' @noRd
+.processFilterParam <- function(x, db) {
+    if (missing(db))
+        stop("Argument 'db' missing.")
+    ## Check if x is a formula and eventually translate it.
+    if (is(x, "formula"))
+        res <- AnnotationFilter(x)
+    else res <- x
+    if (is(res, "AnnotationFilter"))
+        res <- AnnotationFilterList(res)
+    if (!is(res, "AnnotationFilterList")) {
+        ## Did not get a filter expression, thus checking what we've got.
+        if (is(res, "list")) {
+            if (length(res)) {
+                ## Check that all elements are AnnotationFilter objects!
+                if (!all(unlist(lapply(res, function(z) {
+                    inherits(z, "AnnotationFilter")
+                }), use.names = FALSE)))
+                    stop("One of more elements in 'filter' are not ",
+                         "'AnnotationFilter' objects!")
+                res <- as(res, "AnnotationFilterList")
+                res at logOp <- rep("&", (length(res) - 1))
+            } else {
+                res <- AnnotationFilterList()
+            }
+        } else {
+            stop("'filter' has to be an 'AnnotationFilter', a list of ",
+                 "'AnnotationFilter' object, an 'AnnotationFilterList' ",
+                 "or a valid filter expression!")
+        }
+    }
+    supp_filters <- supportedFilters(db)
+    have_filters <- unique(.AnnotationFilterClassNames(res))
+    if (!all(have_filters %in% supp_filters))
+        stop("AnnotationFilter classes: ",
+             paste(have_filters[!(have_filters %in% supp_filters)]),
+             " are not supported by EnsDb databases.")
+    res
+}
+
+
+############################################################
+## setFeatureInGRangesFilter
+##
+## Simple helper function to set the @feature in GRangesFilter
+## depending on the calling method.
+setFeatureInGRangesFilter <- function(x, feature){
+    for (i in seq(along.with = x)){
+        if (is(x[[i]], "GRangesFilter"))
+            x[[i]]@feature <- feature
+        if (is(x[[i]], "AnnotationFilterList"))
+            x[[i]] <- setFeatureInGRangesFilter(x[[i]], feature = feature)
+    }
+    x
+}
+
+############################################################
+## isProteinFilter
+##' evaluates whether the filter is a protein annotation related filter.
+##' @param x The object that should be evaluated. Can be an AnnotationFilter or
+##'     an AnnotationFilterList.
+##' @return Returns TRUE if 'x' is a filter for protein annotation tables and
+##' FALSE otherwise.
+##' @noRd
+isProteinFilter <- function(x) {
+    if (is(x, "AnnotationFilterList"))
+        return(unlist(lapply(x, isProteinFilter)))
+    else
+        return(is(x, "ProteinIdFilter") | is(x, "UniprotFilter") |
+               is(x, "ProtDomIdFilter") | is(x, "UniprotDbFilter") |
+               is(x, "UniprotMappingTypeFilter"))
+}
+
+## ############################################################
+## ## checkFilter:
+## ##
+## ## checks the filter argument and ensures that a list of Filter
+## ## object is returned
+## checkFilter <- function(x){
+##     if(is(x, "list")){
+##         if(length(x) == 0)
+##             return(x)
+##         ## check if all elements are Filter classes.
+##         if(!all(unlist(lapply(x, function(z){
+##             return((is(z, "AnnotationFilter") | is(z, "GRangesFilter")))
+##         }), use.names = FALSE)))
+##             stop("One of more elements in 'filter' are not filter objects!")
+##     }else{
+##         if(is(x, "AnnotationFilter") | is(x, "GRangesFilter")){
+##             x <- list(x)
+##         }else{
+##             stop("'filter' has to be a filter object or a list of",
+##                  " filter objects!")
+##         }
+##     }
+##     return(x)
+## }
+
+#' build the \emph{where} query for a \code{GRangedFilter}. Supported conditions
+#' are: \code{"start"}, \code{"end"}, \code{"equal"}, \code{"within"},
+#' \code{"any"}.
+#'
+#' @param grf \code{GRangesFilter}.
+#'
+#' @param columns named character vectors with the column names for start, end,
+#'     strand and seq_name.
+#'
+#' @param db An optional \code{EnsDb} instance. Used to \emph{translate}
+#'     seqnames depending on the specified seqlevels style.
+#'
+#' @return A character with the corresponding \emph{where} query.
+#' @noRd
+buildWhereForGRanges <- function(grf, columns, db = NULL){
+    condition <- condition(grf)
+    if (!(condition %in% c("start", "end", "within", "equal", "any")))
+        stop("'condition' ", condition, " not supported. Condition (type) can ",
+             "be one of 'any', 'start', 'end', 'equal', 'within'.")
+    if( is.null(names(columns)))
+        stop("The vector with the required column names for the",
+             " GRangesFilter query has to have names!")
+    if (!all(c("start", "end", "seqname", "strand") %in% names(columns)))
+        stop("'columns' has to be a named vector with names being ",
+             "'start', 'end', 'seqname', 'strand'!")
+    ## Build the query to fetch all features that are located within the range
+    quers <- sapply(value(grf), function(z) {
+        if (!is.null(db)) {
+            seqn <- formatSeqnamesForQuery(db, as.character(seqnames(z)))
+        } else {
+            seqn <- as.character(seqnames(z))
+        }
+        ## start: start, seqname and strand have to match.
+        if (condition == "start") {
+            query <- paste0(columns["start"], "=", start(z), " and ",
+                            columns["seqname"], "='", seqn, "'")
+        }
+        ## end: end, seqname and strand have to match.
+        if (condition == "end") {
+            query <- paste0(columns["end"], "=", end(z), " and ",
+                            columns["seqname"], "='", seqn, "'")
+        }
+        ## equal: start, end, seqname and strand have to match.
+        if (condition == "equal") {
+            query <- paste0(columns["start"], "=", start(z), " and ",
+                            columns["end"], "=", end(z), " and ",
+                            columns["seqname"], "='", seqn, "'")
+        }
+        ## within: start has to be >= start, end <= end, seqname and strand
+        ##         have to match.
+        if (condition == "within") {
+            query <- paste0(columns["start"], ">=", start(z), " and ",
+                            columns["end"], "<=", end(z), " and ",
+                            columns["seqname"], "='", seqn, "'")
+        }
+        ## any: essentially the overlapping.
+        if (condition == "any") {
+            query <- paste0(columns["start"], "<=", end(z), " and ",
+                            columns["end"], ">=", start(z), " and ",
+                            columns["seqname"], "='", seqn, "'")
+        }
+        ## Include the strand, if it's not "*"
+        if(as.character(strand(z)) != "*"){
+            query <- paste0(query, " and ", columns["strand"], " = ",
+                            strand2num(as.character(strand(z))))
+        }
+        return(query)
+    })
+    if(length(quers) > 1)
+        quers <- paste0("(", quers, ")")
+    ## Collapse now the queries.
+    query <- paste0(quers, collapse=" or ")
+    paste0("(", query, ")")
+}
+
+#' @description Helper to extract all AnnotationFilter class names from an
+#'     AnnotationFilterList (recursively!)
+#'
+#' @param x The \code{AnnotationFilterList}.
+#'
+#' @return A \code{character} with the names of the classes.
+#' @noRd
+.AnnotationFilterClassNames <- function(x) {
+    classes <- lapply(x, function(z) {
+        if (is(z, "AnnotationFilterList"))
+            return(.AnnotationFilterClassNames(z))
+        class(z)
+    })
+    unlist(classes, use.names = FALSE)
+}
diff --git a/R/EnsDbFromGTF.R b/R/functions-create-EnsDb.R
similarity index 55%
rename from R/EnsDbFromGTF.R
rename to R/functions-create-EnsDb.R
index c4eb887..043f009 100644
--- a/R/EnsDbFromGTF.R
+++ b/R/functions-create-EnsDb.R
@@ -1,3 +1,273 @@
+############################################################
+## Functions related to the creation of EnsDb databases.
+
+## Separate helper function for abbreviating the genus and species name strings
+## this simply makes the first character uppercase
+.organismName <- function(x){
+    substring(x, 1, 1) <- toupper(substring(x, 1, 1))
+    return(x)
+}
+
+.abbrevOrganismName <- function(organism){
+  spc <- unlist(strsplit(organism, "_"))
+  ## this assumes a binomial nomenclature has been maintained.
+  return(paste0(substr(spc[[1]], 1, 1), spc[[2]]))
+}
+
+## x has to be the connection to the database.
+.makePackageName <- function(x){
+    species <- .getMetaDataValue(x, "Organism")
+    ensembl_version <- .getMetaDataValue(x, "ensembl_version")
+    pkgName <- paste0("EnsDb.",.abbrevOrganismName(.organismName(species)),
+                      ".v", ensembl_version)
+    return(pkgName)
+}
+
+.makeObjectName <- function(pkgName){
+  strs <- unlist(strsplit(pkgName, "\\."))
+  paste(c(strs[2:length(strs)],strs[1]), collapse="_")
+}
+
+
+## retrieve Ensembl data
+## save all files to local folder.
+## returns the path where files have been saved to.
+fetchTablesFromEnsembl <- function(version, ensemblapi, user="anonymous",
+                                   host="ensembldb.ensembl.org", pass="",
+                                   port=5306, species="human"){
+    if(missing(version))
+        stop("The version of the Ensembl database has to be provided!")
+    ## setting the stage for perl:
+    fn <- system.file("perl", "get_gene_transcript_exon_tables.pl",
+                      package="ensembldb")
+    ## parameters: s, U, H, P, e
+    ## replacing white spaces with _
+    species <- gsub(species, pattern=" ", replacement="_")
+
+    cmd <- paste0("perl ", fn, " -s ", species," -e ", version,
+                  " -U ", user, " -H ", host, " -p ", port, " -P ", pass)
+    if(!missing(ensemblapi)){
+        Sys.setenv(ENS=ensemblapi)
+    }
+    system(cmd)
+    if(!missing(ensemblapi)){
+        Sys.unsetenv("ENS")
+    }
+
+    ## we should now have the files:
+    in_files <- c("ens_gene.txt", "ens_tx.txt", "ens_exon.txt",
+                  "ens_tx2exon.txt", "ens_chromosome.txt", "ens_metadata.txt")
+    ## check if we have all files...
+    all_files <- dir(pattern="txt")
+    if(sum(in_files %in% all_files)!=length(in_files))
+        stop("Something went wrong! I'm missing some of the txt files the perl script should have generated.")
+}
+
+
+####
+##
+## create a SQLite database containing the information defined in the txt files.
+makeEnsemblSQLiteFromTables <- function(path=".", dbname){
+    ## check if we have all files...
+    in_files <- c("ens_gene.txt", "ens_tx.txt", "ens_exon.txt",
+                  "ens_tx2exon.txt", "ens_chromosome.txt", "ens_metadata.txt")
+    ## check if we have all files...
+    all_files <- dir(path, pattern="txt")
+    if(sum(in_files %in% all_files)!=length(in_files))
+        stop("Something went wrong! I'm missing some of the txt files the",
+             " perl script should have generated.")
+
+    ## read information
+    info <- read.table(paste0(path, .Platform$file.sep ,"ens_metadata.txt"),
+                       sep="\t", as.is=TRUE, header=TRUE)
+    species <- .organismName(info[ info$name=="Organism", "value" ])
+    ##substring(species, 1, 1) <- toupper(substring(species, 1, 1))
+    if(missing(dbname)){
+        dbname <- paste0("EnsDb.",substring(species, 1, 1),
+                         unlist(strsplit(species, split="_"))[ 2 ], ".v",
+                         info[ info$name=="ensembl_version", "value" ], ".sqlite")
+    }
+    con <- dbConnect(dbDriver("SQLite"), dbname=dbname)
+
+    ## write information table
+    dbWriteTable(con, name="metadata", info, row.names=FALSE)
+
+    ## process chromosome
+    message("Processing 'chromosome' table ... ", appendLF = FALSE)
+    tmp <- read.table(paste0(path, .Platform$file.sep ,"ens_chromosome.txt"),
+                      sep="\t", as.is=TRUE, header=TRUE)
+    tmp[, "seq_name"] <- as.character(tmp[, "seq_name"])
+    dbWriteTable(con, name="chromosome", tmp, row.names=FALSE)
+    rm(tmp)
+    message("OK")
+
+    message("Processing 'gene' table ... ", appendLF = FALSE)
+    ## process genes: some gene names might have fancy names...
+    tmp <- read.table(paste0(path, .Platform$file.sep, "ens_gene.txt"),
+                      sep="\t", as.is=TRUE, header=TRUE,
+                      quote="", comment.char="" )
+    OK <- .checkIntegerCols(tmp)
+    dbWriteTable(con, name="gene", tmp, row.names=FALSE)
+    rm(tmp)
+    message("OK")
+
+    if (as.numeric(info[info$name == "DBSCHEMAVERSION", "value"]) > 1) {
+        message("Processing 'entrezgene' table ... ", appendLF = FALSE)
+        ## process genes: some gene names might have fancy names...
+        tmp <- read.table(paste0(path, .Platform$file.sep, "ens_entrezgene.txt"),
+                          sep="\t", as.is=TRUE, header=TRUE,
+                          quote="", comment.char="" )
+        dbWriteTable(con, name="entrezgene", tmp, row.names=FALSE)
+        rm(tmp)
+        message("OK")
+    }
+    
+    message("Processing 'trancript' table ... ", appendLF = FALSE)
+    ## process transcripts:
+    tmp <- read.table(paste0(path, .Platform$file.sep, "ens_tx.txt"),
+                      sep="\t", as.is=TRUE, header=TRUE)
+    ## Fix the tx_cds_seq_start and tx_cds_seq_end columns: these should be integer!
+    suppressWarnings(
+        tmp[, "tx_cds_seq_start"] <- as.integer(tmp[, "tx_cds_seq_start"])
+    )
+    suppressWarnings(
+        tmp[, "tx_cds_seq_end"] <- as.integer(tmp[, "tx_cds_seq_end"])
+    )
+    OK <- .checkIntegerCols(tmp)
+    dbWriteTable(con, name="tx", tmp, row.names=FALSE)
+    rm(tmp)
+    message("OK")
+
+    ## process exons:
+    message("Processing 'exon' table ... ", appendLF = FALSE)
+    tmp <- read.table(paste0(path, .Platform$file.sep, "ens_exon.txt"),
+                      sep = "\t", as.is = TRUE, header = TRUE)
+    OK <- .checkIntegerCols(tmp)
+    dbWriteTable(con, name="exon", tmp, row.names=FALSE)
+    rm(tmp)
+    message("OK")
+    message("Processing 'tx2exon' table ... ", appendLF = FALSE)
+    tmp <- read.table(paste0(path, .Platform$file.sep, "ens_tx2exon.txt"),
+                      sep = "\t", as.is = TRUE, header = TRUE)
+    OK <- .checkIntegerCols(tmp)
+    dbWriteTable(con, name="tx2exon", tmp, row.names = FALSE)
+    rm(tmp)
+    message("OK")
+
+    ## process proteins; if available.
+    prot_file <- paste0(path, .Platform$file.sep, "ens_protein.txt")
+    if (file.exists(prot_file)) {
+        message("Processing 'protein' table ... ", appendLF = FALSE)
+        tmp <- read.table(prot_file, sep = "\t", as.is = TRUE, header = TRUE)
+        OK <- .checkIntegerCols(tmp)
+        dbWriteTable(con, name = "protein", tmp, row.names = FALSE)
+        message("OK")
+        message("Processing 'uniprot' table ... ", appendLF = FALSE)
+        tmp <- read.table(paste0(path, .Platform$file.sep, "ens_uniprot.txt"),
+                          sep = "\t", as.is = TRUE, header = TRUE)
+        OK <- .checkIntegerCols(tmp)
+        dbWriteTable(con, name = "uniprot", tmp, row.names = FALSE)
+        message("OK")
+        message("Processing 'protein_domain' table ... ", appendLF = FALSE)
+        tmp <- read.table(paste0(path, .Platform$file.sep, "ens_protein_domain.txt"),
+                          sep = "\t", as.is = TRUE, header = TRUE)
+        OK <- .checkIntegerCols(tmp)
+        dbWriteTable(con, name = "protein_domain", tmp, row.names = FALSE)
+        message("OK")
+    }
+
+    ## Create indices
+    message("Creating indices ... ", appendLF = FALSE)
+    .createEnsDbIndices(con, proteins = file.exists(prot_file))
+    message("OK")
+    dbDisconnect(con)
+    ## Check if the data could be loaded.
+    message("Checking validity of the database ... ", appendLF = FALSE)
+    msg <- validObject(EnsDb(dbname))
+    if (!is.logical(msg))
+        stop(msg)
+    message("OK")
+    ## done.
+    return(dbname)
+}
+
+############################################################
+## Simply checking that some columns are integer
+.checkIntegerCols <- function(x, columns = c("gene_seq_start", "gene_seq_end",
+                                             "tx_seq_start", "tx_seq_start",
+                                             "exon_seq_start", "exon_seq_end",
+                                             "exon_idx", "tx_cds_seq_start",
+                                             "tx_cds_seq_end", "prot_dom_start",
+                                             "prot_dom_end")) {
+    cols <- columns[columns %in% colnames(x)]
+    if(length(cols) > 0) {
+        sapply(cols, function(z) {
+            if(!is.integer(x[, z]))
+                stop("Column '", z,"' is not of type integer!")
+        })
+    }
+    return(TRUE)
+}
+
+
+####
+## the function that creates the annotation package.
+## ensdb should be a connection to an SQLite database, or a character string...
+makeEnsembldbPackage <- function(ensdb,
+                                 version,
+                                 maintainer,
+                                 author,
+                                 destDir=".",
+                                 license="Artistic-2.0"){
+    if(class(ensdb)!="character")
+        stop("ensdb has to be the name of the SQLite database!")
+    ensdbfile <- ensdb
+    ensdb <- EnsDb(x=ensdbfile)
+    con <- dbconn(ensdb)
+    pkgName <- .makePackageName(con)
+    ensembl_version <- .getMetaDataValue(con, "ensembl_version")
+    ## there should only be one template
+    template_path <- system.file("pkg-template",package="ensembldb")
+    ## We need to define some symbols in order to have the
+    ## template filled out correctly.
+    symvals <- list(
+        PKGTITLE=paste("Ensembl based annotation package"),
+        PKGDESCRIPTION="Exposes an annotation databases generated from Ensembl.",
+        PKGVERSION=version,
+        AUTHOR=author,
+        MAINTAINER=maintainer,
+        LIC=license,
+        ORGANISM=.organismName(.getMetaDataValue(con ,'Organism')),
+        SPECIES=.organismName(.getMetaDataValue(con,'Organism')),
+        PROVIDER="Ensembl",
+        PROVIDERVERSION=as.character(ensembl_version),
+        RELEASEDATE= .getMetaDataValue(con ,'Creation time'),
+        SOURCEURL= .getMetaDataValue(con ,'ensembl_host'),
+        ORGANISMBIOCVIEW=gsub(" ","_",
+                              .organismName(.getMetaDataValue(con ,'Organism'))),
+        TXDBOBJNAME=pkgName ## .makeObjectName(pkgName)
+       )
+    ## Should never happen
+    if (any(duplicated(names(symvals)))) {
+        str(symvals)
+        stop("'symvals' contains duplicated symbols")
+    }
+    createPackage(pkgname=pkgName,
+                  destinationDir=destDir,
+                  originDir=template_path,
+                  symbolValues=symvals)
+    ## then copy the contents of the database into the extdata dir
+    sqlfilename <- unlist(strsplit(ensdbfile, split=.Platform$file.sep))
+    sqlfilename <- sqlfilename[ length(sqlfilename) ]
+    dir.create(paste(c(destDir, pkgName, "inst", "extdata"),
+                     collapse=.Platform$file.sep),
+               showWarnings=FALSE, recursive=TRUE)
+    db_path <- file.path(destDir, pkgName, "inst", "extdata",
+                         paste(pkgName,"sqlite",sep="."))
+    file.copy(ensdbfile, to=db_path)
+}
+
+
 ####
 ## function to create a EnsDb object (or rather the SQLite database) from
 ## a Ensembl GTF file.
@@ -8,9 +278,10 @@
 ## + The CDS features in the GTF are somewhat problematic, while we're used to get just the
 ##   coding start and end for a transcript from the Ensembl perl API, here we get the coding
 ##   start and end for each exon.
-ensDbFromGtf <- function(gtf, outfile, path, organism, genomeVersion, version){
+ensDbFromGtf <- function(gtf, outfile, path, organism, genomeVersion,
+                         version, ...){
     options(useFancyQuotes=FALSE)
-    message("Importing GTF file...", appendLF=FALSE)
+    message("Importing GTF file ... ", appendLF=FALSE)
     ## wanted.features <- c("gene", "transcript", "exon", "CDS")
     wanted.features <- c("exon")
     ## GTF <- import(con=gtf, format="gtf", feature.type=wanted.features)
@@ -21,7 +292,8 @@ ensDbFromGtf <- function(gtf, outfile, path, organism, genomeVersion, version){
     if(any(!(wanted.features %in% levels(GTF$type)))){
         stop(paste0("One or more required types are not in the gtf file. Need ",
                     paste(wanted.features, collapse=","), " but got only ",
-                    paste(wanted.features[wanted.features %in% levels(GTF$type)], collapse=","),
+                    paste(wanted.features[wanted.features %in% levels(GTF$type)],
+                          collapse=","),
                     "."))
     }
     ## transcript biotype?
@@ -37,11 +309,18 @@ ensDbFromGtf <- function(gtf, outfile, path, organism, genomeVersion, version){
     tmp <- readLines(gtf, n=10)
     tmp <- tmp[grep(tmp, pattern="^#")]
     haveHeader <- FALSE
-    if(length(tmp) > 0){
+    if (length(tmp) > 0) {
         ##message("GTF file has a header.")
-        tmp <- gsub(tmp, pattern="^#", replacement="")
-        tmp <- gsub(tmp, pattern="^!", replacement="")
-        Header <- do.call(rbind, strsplit(tmp, split=" ", fixed=TRUE))
+        tmp <- gsub(tmp, pattern = "^#", replacement = "")
+        tmp <- gsub(tmp, pattern = "^!", replacement = "")
+        ## Splitting by " " but be careful, if there are more than one " "!
+        hdr <- strsplit(tmp, split = " ", fixed = TRUE)
+        hdr <- lapply(hdr, function(z) {
+            if (length(z) > 2)
+                z[2] <- paste(z[2:length(z)], collapse = " ")
+            z[1:2]
+        })
+        Header <- do.call(rbind, hdr)
         colnames(Header) <- c("name", "value")
         haveHeader <- TRUE
     }
@@ -64,8 +343,10 @@ ensDbFromGtf <- function(gtf, outfile, path, organism, genomeVersion, version){
 
     GTF <- fixCDStypeInEnsemblGTF(GTF)
     ## here on -> call ensDbFromGRanges.
-    dbname <- ensDbFromGRanges(GTF, outfile=outfile, path=path, organism=organism,
-                               genomeVersion=genomeVersion, version=ensemblVersion)
+    dbname <- ensDbFromGRanges(GTF, outfile = outfile, path = path,
+                               organism = organism,
+                               genomeVersion = genomeVersion,
+                               version = ensemblVersion, ...)
 
     gtfFilename <- unlist(strsplit(gtf, split=.Platform$file.sep))
     gtfFilename <- gtfFilename[length(gtfFilename)]
@@ -118,7 +399,7 @@ ensDbFromAH <- function(ah, outfile, path, organism, genomeVersion, version){
     orgFromAH <- Parms["organism"]
     genFromAH <- Parms["genomeVersion"]
     gtfFilename <- ah$title
-    message("Fetching data ...", appendLF=FALSE)
+    message("Fetching data ... ", appendLF=FALSE)
     suppressMessages(
         gff <- ah[[1]]
     )
@@ -150,16 +431,19 @@ ensDbFromAH <- function(ah, outfile, path, organism, genomeVersion, version){
         orgFromFile <- NA
         genFromFile <- NA
         if(missing(organism) | missing(genomeVersion) | missing(version))
-            stop("The file name does not match the expected naming scheme of Ensembl",
-                 " files hence I cannot extract any information from it! Parameters",
-                 " 'organism', 'genomeVersion' and 'version' are thus required!")
+            stop("The file name does not match the expected naming scheme",
+                 " of Ensembl files hence I cannot extract any information",
+                 " from it! Parameters 'organism', 'genomeVersion' and",
+                 " 'version' are thus required!")
     }
     ## Do some more testing with versions provided from the user.
     if(!missing(organism)){
         if(!is.na(orgFromFile)){
             if(organism != orgFromFile){
-                warning("User specified organism (", organism, ") is different to the one extracted",
-                        " from the file name (", orgFromFile, ")! Using the one defined by the user.")
+                warning("User specified organism (", organism,
+                        ") is different to the one extracted",
+                        " from the file name (", orgFromFile,
+                        ")! Using the one defined by the user.")
             }
         }
         orgFromFile <- organism
@@ -167,8 +451,10 @@ ensDbFromAH <- function(ah, outfile, path, organism, genomeVersion, version){
     if(!missing(genomeVersion)){
         if(!is.na(genFromFile)){
             if(genomeVersion != genFromFile){
-                warning("User specified genome version (", genomeVersion, ") is different to the one extracted",
-                        " from the file name (", genFromFile, ")! Using the one defined by the user.")
+                warning("User specified genome version (", genomeVersion,
+                        ") is different to the one extracted",
+                        " from the file name (", genFromFile,
+                        ")! Using the one defined by the user.")
             }
         }
         genFromFile <- genomeVersion
@@ -176,8 +462,10 @@ ensDbFromAH <- function(ah, outfile, path, organism, genomeVersion, version){
     if(!missing(version)){
         if(!is.na(ensFromFile)){
             if(version != ensFromFile){
-            warning("User specified Ensembl version (", version, ") is different to the one extracted",
-                    " from the file name (", ensFromFile, ")! Using the one defined by the user.")
+                warning("User specified Ensembl version (", version,
+                        ") is different to the one extracted",
+                        " from the file name (", ensFromFile,
+                        ")! Using the one defined by the user.")
             }
         }
         ensFromFile <- version
@@ -194,7 +482,8 @@ ensDbFromAH <- function(ah, outfile, path, organism, genomeVersion, version){
 ##  ensDbFromGff
 ##
 ####------------------------------------------------------------
-ensDbFromGff <- function(gff, outfile, path, organism, genomeVersion, version){
+ensDbFromGff <- function(gff, outfile, path, organism, genomeVersion,
+                         version, ...){
     options(useFancyQuotes=FALSE)
 
     ## Check parameters
@@ -212,21 +501,23 @@ ensDbFromGff <- function(gff, outfile, path, organism, genomeVersion, version){
         stop("This function supports only GFF version 3 files!")
     tmp <- tmp[grep(tmp, pattern="^#!")]
     if(length(tmp) > 0){
-        tmp <- gsub(tmp, pattern="^#!", replacement="")
-        Header <- do.call(rbind, strsplit(tmp, split="[ ]+"))
-        colnames(Header) <- c("name", "value")
-        if(any(Header[, "name"] == "genome-version")){
-            genFromHeader <- Header[Header[, "name"] == "genome-version", "value"]
-            if(genFromHeader != genFromFile){
-                warning("Genome version extracted from file name (", genFromFile,
-                        ") does not match the genome version specified inside the file (",
-                        genFromHeader, "). Will consider the one defined inside the file.")
-                genFromFile <- genFromHeader
+        ## Check if I can extract the genome-version
+        idx <- grep(tmp, pattern = "^#!genome-version")
+        if (length(idx) > 0) {
+            genFromHeader <- sub(tmp[idx], pattern = "^#!genome-version",
+                                 replacement = "")
+            genFromHeader <- gsub(genFromHeader, pattern = " ",
+                                  replacement = "", fixed = TRUE)
+            if (genFromHeader != genFromFile) {
+                warning("Genome version extracted from file name (",
+                        genFromFile, ") does not match genome version",
+                        " defined within the gff file (", genFromHeader,
+                        "). Will use the version defined within the gff.")
             }
         }
     }
 
-    message("Importing GFF...", appendLF=FALSE)
+    message("Importing GFF ... ", appendLF=FALSE)
     suppressWarnings(
         theGff <- import(gff, format=paste0("gff", gffVersion))
     )
@@ -240,12 +531,14 @@ ensDbFromGff <- function(gff, outfile, path, organism, genomeVersion, version){
     gffcols <- c("type", "ID", "Name", "Parent")
     if(!all(gffcols %in% colnames(mcols(theGff))))
         stop("Required columns/fields ",
-             paste(gffcols[!(gffcols %in% colnames(mcols(theGff)))], collapse=";"),
+             paste(gffcols[!(gffcols %in% colnames(mcols(theGff)))],
+                   collapse=";"),
              " not present in the GFF file!")
     enscols <- c("gene_id", "transcript_id", "exon_id", "rank", "biotype")
     if(!all(enscols %in% colnames(mcols(theGff))))
         stop("Required columns/fields ",
-             paste(enscols[!(enscols %in% colnames(mcols(theGff)))], collapse=";"),
+             paste(enscols[!(enscols %in% colnames(mcols(theGff)))],
+                   collapse=";"),
              " not present in the GFF file!")
     ## Subsetting to eventually speed up further processing.
     theGff <- theGff[, c(gffcols, enscols)]
@@ -259,7 +552,7 @@ ensDbFromGff <- function(gff, outfile, path, organism, genomeVersion, version){
     ## Processing that stuff...
     ## Replace the ID format type:ID.
     ids <- strsplit(theGff$ID, split=":")
-    message("Fixing IDs...", appendLF=FALSE)
+    message("Fixing IDs ... ", appendLF=FALSE)
     ## For those that have length > 1 use the second element.
     theGff$ID <- unlist(lapply(ids, function(z){
         if(length(z) > 1)
@@ -268,7 +561,7 @@ ensDbFromGff <- function(gff, outfile, path, organism, genomeVersion, version){
     }))
     message("OK")
     ## Process genes...
-    message("Processing genes...", appendLF=FALSE)
+    message("Processing genes ... ", appendLF=FALSE)
     ## Bring the GFF into the correct format for EnsDb/ensDbFromGRanges.
     idx <- which(!is.na(theGff$gene_id))
     theGff$type[idx] <- "gene"
@@ -283,29 +576,35 @@ ensDbFromGff <- function(gff, outfile, path, organism, genomeVersion, version){
     ## message("OK")
 
     ## Process transcripts...
-    message("Processing transcripts...", appendLF=FALSE)
+    message("Processing transcripts ... ", appendLF=FALSE)
     idx <- which(!is.na(theGff$transcript_id))
     ## Check if I've got multiple parents...
     parentGenes <- theGff$Parent[idx]
     if(any(lengths(parentGenes) > 1))
-        stop("Transcripts with multiple parents in GFF element 'Parent' not (yet) supported!")
+        stop("Transcripts with multiple parents in GFF element 'Parent'",
+             " not (yet) supported!")
     theGff$type[idx] <- "transcript"
     ## Setting the gene_id for these guys...
-    theGff$gene_id[idx] <- unlist(sub(parentGenes, pattern="gene:", replacement="", fixed=TRUE))
+    theGff$gene_id[idx] <- unlist(sub(parentGenes, pattern="gene:",
+                                      replacement="", fixed=TRUE))
     ## The CDS:
     idx <- which(theGff$type == "CDS")
     parentTx <- theGff$Parent[idx]
     if(any(lengths(parentTx) > 1))
-        stop("CDS with multiple parent transcripts in GFF element 'Parent' not (yet) supported!")
-    theGff$transcript_id[idx] <- unlist(sub(parentTx, pattern="transcript:", replacement="", fixed=TRUE))
+        stop("CDS with multiple parent transcripts in GFF element 'Parent'",
+             " not (yet) supported!")
+    theGff$transcript_id[idx] <- unlist(sub(parentTx, pattern="transcript:",
+                                            replacement="", fixed=TRUE))
     message("OK")
 
-    message("Processing exons...", appendLF=FALSE)
+    message("Processing exons ... ", appendLF=FALSE)
     idx <- which(!is.na(theGff$exon_id))
     parentTx <- theGff$Parent[idx]
     if(any(lengths(parentTx) > 1))
-        stop("Exons with multiple parent transcripts in GFF element 'Parent' not (yet) supported!")
-    theGff$transcript_id[idx] <- unlist(sub(parentTx, pattern="transcript:", replacement="", fixed=TRUE))
+        stop("Exons with multiple parent transcripts in GFF element 'Parent'",
+             " not (yet) supported!")
+    theGff$transcript_id[idx] <- unlist(sub(parentTx, pattern="transcript:",
+                                            replacement="", fixed=TRUE))
     message("OK")
 
     theGff <- theGff[theGff$type %in% c("gene", "transcript", "exon", "CDS")]
@@ -316,8 +615,10 @@ ensDbFromGff <- function(gff, outfile, path, organism, genomeVersion, version){
     message("Proceeding to create the database.")
 
     ## Proceed.
-    dbname <- ensDbFromGRanges(theGff, outfile=outfile, path=path, organism=orgFromFile,
-                               genomeVersion=genFromFile, version=ensFromFile)
+    dbname <- ensDbFromGRanges(theGff, outfile = outfile, path = path,
+                               organism = orgFromFile,
+                               genomeVersion = genFromFile,
+                               version = ensFromFile, ...)
 
     gtfFilename <- unlist(strsplit(gff, split=.Platform$file.sep))
     gtfFilename <- gtfFilename[length(gtfFilename)]
@@ -342,12 +643,14 @@ ensDbFromGff <- function(gff, outfile, path, organism, genomeVersion, version){
 ##    the organism, genome build and ensembl version from the file name, if not
 ##    provided.
 ##
-ensDbFromGRanges <- function(x, outfile, path, organism, genomeVersion, version){
+ensDbFromGRanges <- function(x, outfile, path, organism, genomeVersion,
+                             version, ...){
     if(!is(x, "GRanges"))
         stop("This method can only be called on GRanges objects!")
     ## check for missing parameters
     if(missing(organism)){
-        stop("The organism has to be specified (e.g. using organism=\"Homo_sapiens\")")
+        stop("The organism has to be specified (e.g. using",
+             " organism=\"Homo_sapiens\")")
     }
     if(missing(version)){
         stop("The Ensembl version has to be specified!")
@@ -374,7 +677,8 @@ ensDbFromGRanges <- function(x, outfile, path, organism, genomeVersion, version)
     }
     if(missing(outfile)){
         ## use the organism, genome version and ensembl version as the file name.
-        outfile <- paste0(c(organism, genomeVersion, version, "sqlite"), collapse=".")
+        outfile <- paste0(c(organism, genomeVersion, version, "sqlite"),
+                          collapse=".")
         if(missing(path))
             path <- "."
         dbname <- paste0(path, .Platform$file.sep, outfile)
@@ -398,10 +702,12 @@ ensDbFromGRanges <- function(x, outfile, path, organism, genomeVersion, version)
     on.exit(dbDisconnect(con))
     ## ----------------------------
     ## metadata table:
-    message("Processing metadata...", appendLF=FALSE)
+    message("Processing metadata ... ", appendLF=FALSE)
     Metadata <- buildMetadata(organism, version, host="unknown",
-                              sourceFile="GRanges object", genomeVersion=genomeVersion)
-    dbWriteTable(con, name="metadata", Metadata, overwrite=TRUE, row.names=FALSE)
+                              sourceFile="GRanges object",
+                              genomeVersion=genomeVersion)
+    dbWriteTable(con, name="metadata", Metadata, overwrite=TRUE,
+                 row.names=FALSE)
     message("OK")
     ## Check if we've got column "type"
     if(!any(colnames(mcols(x)) == "type"))
@@ -413,7 +719,7 @@ ensDbFromGRanges <- function(x, outfile, path, organism, genomeVersion, version)
     ## process genes
     ## we're lacking NCBI Entrezids and also the coord system, but these are not
     ## required columns anyway...
-    message("Processing genes...")
+    message("Processing genes ... ")
     ## want to have: gene_id, gene_name, entrezid, gene_biotype, gene_seq_start,
     ##               gene_seq_end, seq_name, seq_strand, seq_coord_system.
     wouldBeNice <- c("gene_id", "gene_name", "entrezid", "gene_biotype")
@@ -422,14 +728,15 @@ ensDbFromGRanges <- function(x, outfile, path, organism, genomeVersion, version)
     ## Just really require the gene_id...
     reqCols <- c("gene_id")
     if(length(dontHave) > 0){
-        mess <- paste0(" I'm missing column(s): ", paste0(sQuote(dontHave), collapse=","),
+        mess <- paste0(" I'm missing column(s): ", paste0(sQuote(dontHave),
+                                                          collapse=","),
                        ".")
         warning(mess, " The corresponding database column(s) will be empty!")
     }
     message(" Attribute availability:", appendLF=TRUE)
     for(i in 1:length(wouldBeNice)){
-        message("  o ", wouldBeNice[i], "...",
-                ifelse(any(gotColumns == wouldBeNice[i]), yes=" OK", no=" Nope"))
+        message("  o ", wouldBeNice[i], " ... ",
+                ifelse(any(gotColumns == wouldBeNice[i]), yes="OK", no="Nope"))
     }
     if(!any(reqCols %in% haveGot))
         stop(paste0("One or more required fields are not defined in the",
@@ -481,21 +788,21 @@ ensDbFromGRanges <- function(x, outfile, path, organism, genomeVersion, version)
     ## ----------------------------
     ##
     ## process transcripts
-    message("Processing transcripts...", appendLF=TRUE)
+    message("Processing transcripts ... ", appendLF=TRUE)
     ## want to have: tx_id, tx_biotype, tx_seq_start, tx_seq_end, tx_cds_seq_start,
     ##               tx_cds_seq_end, gene_id
     wouldBeNice <- c("transcript_id", "gene_id", txBiotypeCol)
     dontHave <- wouldBeNice[!(wouldBeNice %in% gotColumns)]
     if(length(dontHave) > 0){
-        mess <- paste0("I'm missing column(s): ", paste0(sQuote(dontHave), collapse=","),
-                       ".")
+        mess <- paste0("I'm missing column(s): ", paste0(sQuote(dontHave),
+                                                         collapse=","), ".")
         warning(mess, " The corresponding database columns will be empty!")
     }
     haveGot <- wouldBeNice[wouldBeNice %in% gotColumns]
     message(" Attribute availability:", appendLF=TRUE)
     for(i in 1:length(wouldBeNice)){
-        message("  o ", wouldBeNice[i], "...",
-                ifelse(any(gotColumns == wouldBeNice[i]), yes=" OK", no=" Nope"))
+        message("  o ", wouldBeNice[i], " ... ",
+                ifelse(any(gotColumns == wouldBeNice[i]), yes="OK", no="Nope"))
     }
     reqCols <- c("transcript_id", "gene_id")
     if(!any(reqCols %in% gotColumns))
@@ -526,7 +833,8 @@ ensDbFromGRanges <- function(x, outfile, path, organism, genomeVersion, version)
         colnames(tx) <- c(cn, dontHave)
     }
     ## Add columns for UTR
-    tx <- cbind(tx, tx_cds_seq_start=rep(NA, nrow(tx)), tx_cds_seq_end=rep(NA, nrow(tx)))
+    tx <- cbind(tx, tx_cds_seq_start=rep(NA, nrow(tx)),
+                tx_cds_seq_end=rep(NA, nrow(tx)))
     ## Process CDS...
     if(any(gotTypes == "CDS")){
         ## Only do that if we've got type == "CDS"!
@@ -534,9 +842,11 @@ ensDbFromGRanges <- function(x, outfile, path, organism, genomeVersion, version)
         CDS <- as.data.frame(x[x$type == "CDS", "transcript_id"])
         ##
         startByTx <- split(CDS$start, f=CDS$transcript_id)
-        cdsStarts <- unlist(lapply(startByTx, function(z){return(min(z, na.rm=TRUE))}))
+        cdsStarts <- unlist(lapply(startByTx,
+                                   function(z){return(min(z, na.rm=TRUE))}))
         endByTx <- split(CDS$end, f=CDS$transcript_id)
-        cdsEnds <- unlist(lapply(endByTx, function(z){return(max(z, na.rm=TRUE))}))
+        cdsEnds <- unlist(lapply(endByTx,
+                                 function(z){return(max(z, na.rm=TRUE))}))
         idx <- match(names(cdsStarts), tx$transcript_id)
         areNas <- is.na(idx)
         idx <- idx[!areNas]
@@ -545,12 +855,13 @@ ensDbFromGRanges <- function(x, outfile, path, organism, genomeVersion, version)
         tx[idx, "tx_cds_seq_start"] <- cdsStarts
         tx[idx, "tx_cds_seq_end"] <- cdsEnds
     }else{
-        mess <- " I can't find type=='CDS'! The resulting database will lack CDS information!"
+        mess <- paste0(" I can't find type=='CDS'! The resulting database",
+                       " will lack CDS information!")
         message(mess, appendLF = TRUE)
         warning(mess)
     }
-    colnames(tx) <- c("tx_seq_start", "tx_seq_end", "tx_id", "gene_id", "tx_biotype",
-                      "tx_cds_seq_start", "tx_cds_seq_end")
+    colnames(tx) <- c("tx_seq_start", "tx_seq_end", "tx_id", "gene_id",
+                      "tx_biotype", "tx_cds_seq_start", "tx_cds_seq_end")
     ## rearranging data.frame:
     tx <- tx[ , c("tx_id", "tx_biotype", "tx_seq_start", "tx_seq_end",
                   "tx_cds_seq_start", "tx_cds_seq_end", "gene_id")]
@@ -565,7 +876,7 @@ ensDbFromGRanges <- function(x, outfile, path, organism, genomeVersion, version)
     ## ----------------------------
     ##
     ## process exons
-    message("Processing exons...", appendLF=FALSE)
+    message("Processing exons ... ", appendLF=FALSE)
     reqCols <- c("exon_id", "transcript_id", "exon_number")
     if(!any(reqCols %in% gotColumns))
         stop(paste0("One or more required fields are not defined in",
@@ -594,31 +905,31 @@ ensDbFromGRanges <- function(x, outfile, path, organism, genomeVersion, version)
     ## ----------------------------
     ##
     ## process chromosomes
-    message("Processing chromosomes...", appendLF=FALSE)
-    if(fetchSeqinfo){
+    message("Processing chromosomes ... ", appendLF=FALSE)
+    if (fetchSeqinfo) {
         ## problem is I don't have these available...
-        chroms <- data.frame(seq_name=unique(as.character(genes$seq_name)))
-        chroms <- cbind(chroms, seq_length=rep(NA, nrow(chroms)),
-                        is_circular=rep(NA, nrow(chroms)))
+        chroms <- data.frame(seq_name = unique(as.character(genes$seq_name)))
+        chroms <- cbind(chroms, seq_length = rep(NA, nrow(chroms)),
+                        is_circular = rep(NA, nrow(chroms)))
         rownames(chroms) <- chroms$seq_name
-        ## now trying to get the sequence lengths directly from Ensembl using internal
-        ## functions from the GenomicFeatures package. I will use "try" to not break
-        ## the call if no seqlengths are available.
-        seqlengths <- tryGetSeqinfoFromEnsembl(organism, version, seqnames=chroms$seq_name)
-        if(nrow(seqlengths)>0){
-            seqlengths <- seqlengths[seqlengths[, "name"] %in% rownames(chroms), ]
-            chroms[seqlengths[, "name"], "seq_length"] <- seqlengths[, "length"]
+        ## Try to get sequence lengths from Ensembl or Ensemblgenomes.
+        sl <- tryGetSeqinfoFromEnsembl(organism, version,
+                                       seqnames = chroms$seq_name)
+        if (nrow(sl) > 0) {
+            sl <- sl[sl[, "name"] %in% rownames(chroms), ]
+            chroms[sl[, "name"], "seq_length"] <- sl[, "length"]
         }
-    }else{
+    } else {
         ## have seqinfo available.
-        chroms <- data.frame(seq_name=seqnames(Seqinfo), seq_length=seqlengths(Seqinfo),
-                             is_circular=isCircular(Seqinfo))
+        chroms <- data.frame(seq_name = seqnames(Seqinfo),
+                             seq_length = seqlengths(Seqinfo),
+                             is_circular = isCircular(Seqinfo))
     }
     ## write the table.
     dbWriteTable(con, name="chromosome", chroms, overwrite=TRUE, row.names=FALSE)
     rm(genes)
     message("OK")
-    message("Generating index...", appendLF=FALSE)
+    message("Generating index ... ", appendLF=FALSE)
     ## generating all indices...
     .createEnsDbIndices(con)
     message("OK")
@@ -633,28 +944,32 @@ ensDbFromGRanges <- function(x, outfile, path, organism, genomeVersion, version)
 ## EnsDb database is correct (i.e. transcript within gene coordinates, exons within
 ## transcript coordinates, cds within transcript)
 checkValidEnsDb <- function(x){
-    message("Checking transcripts...", appendLF=FALSE)
-    tx <- transcripts(x, columns=c("gene_id", "tx_id", "gene_seq_start", "gene_seq_end",
-                             "tx_seq_start", "tx_seq_end", "tx_cds_seq_start",
-                             "tx_cds_seq_end"), return.type="DataFrame")
+    message("Checking transcripts ... ", appendLF=FALSE)
+    tx <- transcripts(x, columns=c("gene_id", "tx_id", "gene_seq_start",
+                                   "gene_seq_end", "tx_seq_start",
+                                   "tx_seq_end", "tx_cds_seq_start",
+                                   "tx_cds_seq_end"), return.type="DataFrame")
     ## check if the tx are inside the genes...
-    isInside <- tx$tx_seq_start >= tx$gene_seq_start & tx$tx_seq_end <= tx$gene_seq_end
+    isInside <- tx$tx_seq_start >= tx$gene_seq_start &
+        tx$tx_seq_end <= tx$gene_seq_end
     if(any(!isInside))
         stop("Start and end coordinates for ", sum(!isInside),
              "transcripts are not within the gene coordinates!")
     ## check cds coordinates
-    notInside <- which(!(tx$tx_cds_seq_start >= tx$tx_seq_start & tx$tx_cds_seq_end <= tx$tx_seq_end))
+    notInside <- which(!(tx$tx_cds_seq_start >= tx$tx_seq_start &
+                         tx$tx_cds_seq_end <= tx$tx_seq_end))
     if(length(notInside) > 0){
         stop("The CDS start and end coordinates for ", length(notInside),
              " transcripts are not within the transcript coordinates!")
     }
     rm(tx)
-    message("OK\nChecking exons...", appendLF=FALSE)
+    message("OK\nChecking exons ... ", appendLF=FALSE)
     ex <- exons(x, columns=c("exon_id", "tx_id", "exon_seq_start", "exon_seq_end",
-                       "tx_seq_start", "tx_seq_end", "seq_strand", "exon_idx"),
+                             "tx_seq_start", "tx_seq_end", "seq_strand", "exon_idx"),
                 return.type="data.frame")
     ## check if exons are within tx
-    isInside <- ex$exon_seq_start >= ex$tx_seq_start & ex$exon_seq_end <= ex$tx_seq_end
+    isInside <- ex$exon_seq_start >= ex$tx_seq_start &
+        ex$exon_seq_end <= ex$tx_seq_end
     if(any(!isInside))
         stop("Start and end coordinates for ", sum(!isInside),
              " exons are not within the transcript coordinates!")
@@ -666,8 +981,8 @@ checkValidEnsDb <- function(x){
                                    return(any(z != seq(1, length(z))))
                                }))
     if(any(Different)){
-        stop(paste0("Provided exon index in transcript does not match with ordering",
-                    " of the exons by chromosomal coordinates for",
+        stop(paste0("Provided exon index in transcript does not match with",
+                    " ordering of the exons by chromosomal coordinates for",
                     sum(Different), "of the", length(Different),
                     "transcripts encoded on the + strand!"))
     }
@@ -678,53 +993,43 @@ checkValidEnsDb <- function(x){
                                    return(any(z != seq(1, length(z))))
                                }))
     if(any(Different)){
-        stop(paste0("Provided exon index in transcript does not match with ordering",
-                    " of the exons by chromosomal coordinates for",
+        stop(paste0("Provided exon index in transcript does not match with",
+                    " ordering of the exons by chromosomal coordinates for",
                     sum(Different), "of the", length(Different),
                     "transcripts encoded on the - strand!"))
     }
     message("OK")
+    return(TRUE)
 }
 
 
-## organism is expected to be e.g. Homo_sapiens, so the full organism name, with
-## _ as a separator
-tryGetSeqinfoFromEnsembl <- function(organism, ensemblVersion, seqnames){
-    ## Quick fix if organism contains whitespace instead of _:
-    organism <- gsub(organism, pattern=" ", replacement="_", fixed=TRUE)
-    Dataset <- paste0(c(tolower(.abbrevOrganismName(organism)), "gene_ensembl"),
-                      collapse="_")
-    message("Fetch seqlengths from ensembl, dataset ", Dataset, " version ",
-            ensemblVersion, "...", appendLF=FALSE)
-    ## get it all from the ensemblgenomes.org host???
-    tmp <- try(
-        GenomicFeatures:::fetchChromLengthsFromEnsembl(dataset=Dataset,
-                                                       release=ensemblVersion,
-                                                       extra_seqnames=seqnames),
-        silent=TRUE)
-    if(class(tmp)=="try-error"){
-        message(paste0("Unable to get sequence lengths from Ensembl for dataset: ",
-                       Dataset, ". Error was: ", message(tmp), "\n"))
-    }else{
-        message("OK")
-        return(tmp)
-    }
-    ## try plant genomes...
+############################################################
+##' Fetch chromosome sequence lengths from Ensembl.
+##' @param organism The organism. Has to be in the form "Homo sapiens"
+##' @param ensemblVersion The Ensembl version.
+##' @param seqnames The names of the chromosomes/sequences; optional.
+##' @return A matrix with two columns name and seq_length.
+##' @noRd
+tryGetSeqinfoFromEnsembl <- function(organism, ensemblVersion, seqnames,
+                                     skip = FALSE){
+    if (skip)
+        return(matrix(nrow = 0, ncol = 2))
+    message("Fetch seqlengths from ensembl ... ", appendLF=FALSE)
     tmp <- try(
-        GenomicFeatures:::fetchChromLengthsFromEnsemblPlants(dataset=Dataset,
-                                                             extra_seqnames=seqnames),
-        silent=TRUE)
-    if(class(tmp)=="try-error"){
-        message(paste0("Unable to get sequence lengths from Ensembl plants for dataset: ",
-                       Dataset, ". Error was: ", message(tmp), "\n"))
-    }else{
-        message("OK")
-        return(tmp)
+        .getSeqlengthsFromMysqlFolder(organism = organism,
+                                      ensembl = ensemblVersion,
+                                      seqnames = seqnames)
+    , silent = TRUE)
+    if (is(tmp, "try-error") | is.null(tmp)) {
+        message("FAIL")
+        warning("Unable to retrieve sequence lengths from Ensembl.")
+        return(matrix(nrow = 0, ncol = 2))
     }
-    message("FAIL")
-    return(matrix(ncol=2, nrow=0))
+    colnames(tmp) <- c("name", "length")
+    return(tmp)
 }
 
+
 buildMetadata <- function(organism="", ensemblVersion="", genomeVersion="",
                           host="", sourceFile=""){
     MetaData <- data.frame(matrix(ncol=2, nrow=11))
@@ -763,24 +1068,27 @@ compareEnsDbs <- function(x, y){
     if(length(idx)>0)
         Messages["metadata"] <- "NOTE"
     ## check ensembl version
-    if(metadataX["ensembl_version", "value"] == metadataY["ensembl_version", "value"]){
+    if(metadataX["ensembl_version", "value"] ==
+       metadataY["ensembl_version", "value"]){
         cat(" Ensembl versions match.\n")
     }else{
-        cat(" WARNING: databases base on different Ensembl versions! Expect considerable differences!\n")
+        cat(" WARNING: databases base on different Ensembl versions!",
+            " Expect considerable differences!\n", sep = "")
         Messages["metadata"] <- "WARN"
     }
     ## genome build
     if(metadataX["genome_build", "value"] == metadataY["genome_build", "value"]){
         cat(" Genome builds match.\n")
     }else{
-        cat(" WARNING: databases base on different Genome builds! Expect considerable differences!\n")
+        cat(" WARNING: databases base on different Genome builds!",
+            " Expect considerable differences!\n", sep = "")
         Messages["metadata"] <- "WARN"
     }
     if(length(idx)>0){
         cat(" All differences: <name>: <value x> != <value y>\n")
         for(i in idx){
-            cat(paste("  - ", metadataX[i, "name"], ":", metadataX[i, "value"], " != ",
-                      metadataY[i, "value"], "\n"))
+            cat(paste("  - ", metadataX[i, "name"], ":", metadataX[i, "value"],
+                      " != ", metadataY[i, "value"], "\n"))
         }
     }
     cat(paste0("Done. Result: ", Messages["metadata"],"\n"))
@@ -792,6 +1100,11 @@ compareEnsDbs <- function(x, y){
     Messages["transcript"] <- compareTx(x, y)
     ## comparing exons
     Messages["exon"] <- compareExons(x, y)
+    ## If we've got protein data in one of the two:
+    if (hasProteinData(x) | hasProteinData(y)) {
+        Messages <- c(Messages, protein = "OK")
+        Messages["protein"] <- compareProteins(x, y)
+    }
     return(Messages)
 }
 
@@ -808,10 +1121,20 @@ compareChromosomes <- function(x, y){
     if(length(onlyX) > 0 | length(onlyY) > 0)
         Ret <- "WARN"
     cat(paste0( " Sequence names: (", length(inboth), ") common, (",
-               length(onlyX), ") only in x, (", length(onlyY), ") only in y.\n" ))
-    same <- length(which(chromX[inboth, "seqlengths"]==chromY[inboth, "seqlengths"]))
-    different <- length(inboth) - same
-    cat(paste0( " Sequence lengths: (",same, ") identical, (", different, ") different.\n" ))
+               length(onlyX), ") only in x, (", length(onlyY),
+               ") only in y.\n" ))
+    ## seqlengths:
+    if (!all.equal(chromX[inboth, "seqlengths"],
+                   chromY[inboth, "seqlengths"])) {
+        same <- length(which(chromX[inboth, "seqlengths"] ==
+                             chromY[inboth, "seqlengths"]))
+        different <- length(inboth) - same
+    } else {
+        same <- length(inboth)
+        different <- 0
+    }
+    cat(paste0( " Sequence lengths: (",same, ") identical, (",
+               different, ") different.\n" ))
     if(different > 0)
         Ret <- "WARN"
     cat(paste0("Done. Result: ", Ret,"\n"))
@@ -832,12 +1155,14 @@ compareGenes <- function(x, y){
                length(onlyX), ") only in x, (", length(onlyY), ") only in y.\n"))
     ## seq names
     same <- length(
-        which(as.character(seqnames(genesX[inboth]))==as.character(seqnames(genesY[inboth])))
+        which(as.character(seqnames(genesX[inboth])) ==
+              as.character(seqnames(genesY[inboth])))
         )
     different <- length(inboth) - same
     if(different > 0)
         Ret <- "ERROR"
-    cat(paste0( " Sequence names: (",same, ") identical, (", different, ") different.\n" ))
+    cat(paste0( " Sequence names: (",same, ") identical, (",
+               different, ") different.\n" ))
     ## start
     same <- length(
         which(start(genesX[inboth]) == start(genesY[inboth]))
@@ -908,7 +1233,8 @@ compareTx <- function(x, y){
     if(length(onlyX) > 0 | length(onlyY) > 0)
         Ret <- "WARN"
     cat(paste0(" transcript IDs: (", length(inboth), ") common, (",
-               length(onlyX), ") only in x, (", length(onlyY), ") only in y.\n"))
+               length(onlyX), ") only in x, (", length(onlyY),
+               ") only in y.\n"))
     ## start
     same <- length(
         which(start(txX[inboth]) == start(txY[inboth]))
@@ -947,8 +1273,9 @@ compareTx <- function(x, y){
     cdsOnlyY <- txCdsY[!(txCdsY %in% txCdsX)]
     if((length(cdsOnlyX) > 0 | length(cdsOnlyY)) & Ret!="ERROR")
         Ret <- "ERROR"
-    cat(paste0(" Common transcripts with defined CDS: (",length(cdsInBoth), ") common, (",
-               length(cdsOnlyX), ") only in x, (", length(cdsOnlyY), ") only in y.\n"))
+    cat(paste0(" Common transcripts with defined CDS: (", length(cdsInBoth),
+               ") common, (", length(cdsOnlyX), ") only in x, (",
+               length(cdsOnlyY), ") only in y.\n"))
     same <- length(
         which(txX[cdsInBoth]$tx_cds_seq_start == txY[cdsInBoth]$tx_cds_seq_start)
     )
@@ -979,6 +1306,48 @@ compareTx <- function(x, y){
     return(Ret)
 }
 
+compareProteins <- function(x, y){
+    cat("\nComparing protein data:\n")
+    Ret <- "OK"
+    if (!hasProteinData(x) | !hasProteinData(y)) {
+        Ret <- "WARN"
+        cat(paste0("No protein data available for one or both EnsDbs."))
+        return(Ret)
+    }
+    X <- proteins(x)
+    Y <- proteins(y)
+    inboth <- X$protein_id[X$protein_id %in% Y$protein_id]
+    onlyX <- X$protein_id[!(X$protein_id %in% Y$protein_id)]
+    onlyY <- Y$protein_id[!(Y$protein_id %in% X$protein_id)]
+    if(length(onlyX) > 0 | length(onlyY) > 0)
+        Ret <- "WARN"
+    cat(paste0(" protein IDs: (", length(inboth), ") common, (",
+               length(onlyX), ") only in x, (", length(onlyY),
+               ") only in y.\n"))
+    X <- X[X$protein_id %in% inboth, ]
+    Y <- Y[Y$protein_id %in% inboth, ]
+    ## sorting both by protein_id should be enough.
+    X <- X[order(X$protein_id), ]
+    Y <- Y[order(Y$protein_id), ]
+
+    ## tx_id
+    same <- length(which(X$tx_id == Y$tx_id))
+    different <- length(inboth) - same
+    if(different > 0)
+        Ret <- "ERROR"
+    cat(paste0( " Transcript IDs: (",same,
+               ") identical, (", different, ") different.\n" ))
+    ## sequence
+    same <- length(which(X$protein_sequence == Y$protein_sequence))
+    different <- length(inboth) - same
+    if(different > 0)
+        Ret <- "ERROR"
+    cat(paste0( " Protein sequence: (",same,
+               ") identical, (", different, ") different.\n" ))
+    cat(paste0("Done. Result: ", Ret,"\n"))
+    return(Ret)
+}
+
 compareExons <- function(x, y){
     cat("\nComparing exon data:\n")
     Ret <- "OK"
@@ -1042,7 +1411,7 @@ compareExons <- function(x, y){
 ##  The problem is that the genome version can also be . separated.
 ####------------------------------------------------------------
 isEnsemblFileName <- function(x){
-    x <- file.name(x)
+    x <- basename(x)
     ## If we split by ., do we get at least 4 elements?
     els <- unlist(strsplit(x, split=".", fixed=TRUE))
     if(length(els) < 4)
@@ -1071,7 +1440,7 @@ organismFromGtfFileName <- function(x){
 ##  finds a numeric value it returns it, otherwise it returns NA.
 ####------------------------------------------------------------
 ensemblVersionFromGtfFileName <- function(x){
-    x <- file.name(x)
+    x <- basename(x)
     els <- unlist(strsplit(x, split=".", fixed=TRUE))
     ## Ensembl version is the last numeric value in the file name.
     for(elm in rev(els)){
@@ -1090,7 +1459,7 @@ ensemblVersionFromGtfFileName <- function(x){
 ## the first element (i.e. organism), or the ensembl version, that is one left of
 ## the gtf.
 genomeVersionFromGtfFileName <- function(x){
-    x <- file.name(x)
+    x <- basename(x)
     els <- unlist(strsplit(x, split=".", fixed=TRUE))
     ensVer <- ensemblVersionFromGtfFileName(x)
     if(is.na(ensVer)){
@@ -1104,21 +1473,6 @@ genomeVersionFromGtfFileName <- function(x){
              " The file name does not follow the expected naming convention from Ensembl!")
     return(paste(els[2:(idx-1)], collapse="."))
 }
-old_ensemblVersionFromGtfFileName <- function(x){
-    tmp <- unlist(strsplit(x, split=.Platform$file.sep, fixed=TRUE))
-    splitty <- unlist(strsplit(tmp[length(tmp)], split=".", fixed=TRUE))
-    return(splitty[(grep(splitty, pattern="gtf")-1)])
-}
-
-## the genome build can also contain .! thus, I return everything which is not
-## the first element (i.e. organism), or the ensembl version, that is one left of
-## the gtf.
-old_genomeVersionFromGtfFileName <- function(x){
-    tmp <- unlist(strsplit(x, split=.Platform$file.sep, fixed=TRUE))
-    splitty <- unlist(strsplit(tmp[length(tmp)], split=".", fixed=TRUE))
-    gvparts <- splitty[2:(grep(splitty, pattern="gtf")-2)]
-    return(paste(gvparts, collapse="."))
-}
 
 ## Returns NULL if there was a problem.
 elementFromEnsemblFilename <- function(x, which=1){
@@ -1131,8 +1485,202 @@ elementFromEnsemblFilename <- function(x, which=1){
     return(splitty[which])
 }
 
-file.name <- function(x){
-    fn <- unlist(strsplit(x, split=.Platform$file.sep, fixed=TRUE))
-    fn <- fn[length(fn)]
-    return(fn)
+############################################################
+## Utilities to fetch sequence lengths from Ensembl's ftp server, more
+## specifically from the MySQL tables there.
+## These replace the (unexported) functions from GenomicFeatures used thus far.
+.ENSEMBL_URL <- "ftp://ftp.ensembl.org/pub/"
+.ENSEMBLGENOMES_URL <- "ftp://ftp.ensemblgenomes.org/pub/"
+##' Get the base url containing the mysql database for the specified host,
+##' orgnism and Ensembl version.
+##' @details The function will first build an approximate database name (without
+##' the trailing <_genome version number> as this is not easy to guess).
+##' Next all directories in the base MySQL folder will be scanned for the best
+##' matching folder.
+##' @param type Either "ensembl" or "ensemblgenomes"
+##' @param organism Character specifying the organism. Has to be the full name,
+##' i.e "homo_sapiens" or "Homo sapiens".
+##' @param ensembl The Ensembl version.
+##' @param genone The Genome version.
+##' @noRd
+.getEnsemblMysqlUrl <- function(type = "ensembl", organism, ensembl, genome) {
+    type <- match.arg(type, c("ensembl", "ensemblgenomes"))
+    if (type == "ensembl") {
+        my_url <- paste0(.ENSEMBL_URL, "release-", ensembl, "/mysql/")
+        db_name <- .guessDatabaseName(organism, ensembl)
+        ## List folders; GenomicFeatures does it without 'dirlistonly',
+        ## eventually that's what breaks on Windows?
+        ## res <- getURL(my_url, dirlistonly = TRUE)
+        res <- readLines(curl(my_url))
+        res <- gsub(res, pattern = "\r", replacement = "", fixed = TRUE)
+        if (length(res) > 0) {
+            ## dirs <- unlist(strsplit(res, split = "\n"))
+            ## ## Remove the \r on Windows.
+            ## dirs <- sub(dirs, pattern = "\r", replacement = "", fixed = TRUE)
+            ## idx <- grep(dirs, pattern = db_name)
+            idx <- grep(res, pattern = db_name)
+            if (length(idx) > 1)
+                stop("Found more than one database matching '", db_name,
+                     "' in Ensembl's ftp server!")
+            if (length(idx) == 0)
+                stop("No database matching '", db_name, "' found in Ensembl's",
+                     " ftp server.")
+            db_dir <- unlist(strsplit(res[idx], split = " ", fixed = TRUE))
+            db_dir <- db_dir[length(db_dir)]
+            return(paste0(my_url, db_dir))
+        }
+    } else {
+        ## That's tricky! Have to find out whether the species is in plants,
+        ## fungi, bacteria etc.
+        ## List dirs of bacteria, fungi, metazoa, plants, protists
+        sub_folders <- c("bacteria", "fungi", "metazoa", "plants", "protists")
+        db_name <- .guessDatabaseName(organism, ensembl)
+        for (fold in sub_folders) {
+            my_url <- paste0(.ENSEMBLGENOMES_URL, "release-", ensembl, "/",
+                             fold, "/mysql/")
+            res <- try(readLines(curl(my_url)))
+            if (is(res, "try-error")| length(res) == 0)
+                next
+            if (length(res) > 0) {
+                res <- gsub(res, pattern = "\r", replacement = "", fixed = TRUE)
+                idx <- grep(res, pattern = db_name)
+                if (length(idx) > 1)
+                    stop("Found more than one database matching '", db_name,
+                         "' in Ensemblgenomes' ftp server!")
+                if (length(idx) == 1) {
+                    db_dir <- unlist(strsplit(res[idx], split = " ",
+                                              fixed = TRUE))
+                    db_dir <- db_dir[length(db_dir)]
+                    return(paste0(my_url, db_dir))
+                }
+            }
+            ## ## Catch eventual errors
+            ## res <- try(getURL(my_url, dirlistonly = TRUE), silent = TRUE)
+            ## if (is(res, "try-error") | length(res) == 0)
+            ##     next
+            ## if (length(res) > 0) {
+            ##     dirs <- unlist(strsplit(res, split = "\n"))
+            ##     ## Remove the \r on Windows.
+            ##     dirs <- sub(dirs, pattern = "\r", replacement = "", fixed = TRUE)
+            ##     idx <- grep(dirs, pattern = db_name)
+            ##     if (length(idx) == 1)
+            ##         return(paste0(my_url, dirs[idx]))
+            ##     if (length(idx) > 1)
+            ##         stop("Found more than one database matching '", db_name,
+            ##              "' in Ensembl's ftp server!")
+            ##     ## Well, then let's go to the next one.
+            ## }
+        }
+        stop("No database matching '", db_name, "' found in Ensembl's",
+             " ftp server")
+    }
+}
+
+############################################################
+## .guessDatabaseName
+##' build the database name from species, ensembl version and genome version.
+##' The latter is specifically difficult, as it is not quite clear how Ensembl
+##' defines the Genome version number.
+##' @param organism Character specifying the organism. Has to be in the format
+##' "homo_sapiens" or "Homo sapiens", i.e. the full name.
+##' @param ensembl The Ensembl version number.
+##' @param genome The Genome version, e.g. GRCh38 (optional!).
+##' @return A character representing the guessed database name in Ensembl.
+##' @noRd
+.guessDatabaseName <- function(organism, ensembl, genome) {
+    if (missing(organism) & missing(ensembl))
+        stop("'organism' and 'ensembl' are required!")
+    ## Organism: all lower case, replace . with _
+    organism <- tolower(gsub(organism, pattern = ".", replacement = "_",
+                             fixed = TRUE))
+    organism <- gsub(organism, pattern = " ", replacement = "_",
+                     fixed = TRUE)
+    dbname <- paste0(organism, "_core_", ensembl)
+    ## Genome: remove all letters and keep just the numbers.
+    if (!missing(genome)) {
+        genome <- gsub(genome, pattern = "[a-zA-Z]", replacement = "")
+        ## replace .0 at the end
+        genome <- gsub(genome, pattern = ".0$", replacement = "")
+        genome <- gsub(genome, pattern = ".", replacement = "", fixed = TRUE)
+        genome <- gsub(genome, pattern = "_", replacement = "", fixed = TRUE)
+        dbname <- paste0(dbname, "_", genome)
+    }
+    return(dbname)
 }
+
+############################################################
+## .getSeqlengthsFromMysqlFolder
+##' Fetch the coord_system.txt.gz and seq_region.txt.gz and extract the
+##' seqlengths from there.
+##' @noRd
+.getSeqlengthsFromMysqlFolder <- function(organism, ensembl, seqnames) {
+    ## Test whether we have the database in ensembl
+    mysql_url <- try(.getEnsemblMysqlUrl(type = "ensembl", organism = organism,
+                                         ensembl = ensembl), silent = TRUE)
+    if (is(mysql_url, "try-error")) {
+        mysql_url <- try(.getEnsemblMysqlUrl(type = "ensemblgenomes",
+                                             organism = organism,
+                                             ensembl = ensembl), silent = TRUE)
+    }
+    if (is(mysql_url, "try-error")) {
+        warning("Can not get the sequence lengths from Ensembl or",
+                " Ensemblgenomes. Seqinfo will lack the sequence lengths.")
+        return(NULL)
+    }
+    ## Get the coord_system table
+    coord_syst <- .getReadMysqlTable(mysql_url, "coord_system.txt.gz",
+                                     colnames = c("coord_system_id",
+                                                  "species_id", "name",
+                                                  "version", "rank", "attrib"))
+    ## Subset to the ones with "default" in "attrib"
+    coord_syst <- coord_syst[grep(coord_syst$attrib, pattern = "default"),
+                           , drop = FALSE]
+    rownames(coord_syst) <- as.character(coord_syst$coord_system_id)
+    ## Get the seq_region table
+    seq_region <- .getReadMysqlTable(mysql_url, "seq_region.txt.gz",
+                                     colnames = c("seq_region_id", "name",
+                                                  "coord_system_id", "length"))
+    ## Sub-set to the ones matching the coord_syst_ids and from these, select
+    ## the one entry with the smallest rank, if more than one present.
+    seq_region <- seq_region[seq_region$coord_system_id %in%
+                             coord_syst$coord_system_id,
+                           , drop = FALSE]
+    seq_region <- cbind(seq_region,
+                        rank = coord_syst[as.character(seq_region$coord_system_id),
+                                          "rank"])
+    ## Sub-set to the seqlevels we've got.
+    if (!missing(seqnames)) {
+        seq_region <- seq_region[seq_region$name %in% seqnames, , drop = FALSE]
+        if (!all(seqnames %in% seq_region$name))
+            warning("Could not determine length for all seqnames.")
+    }
+    sr <- split(seq_region, f = seq_region$name)
+    if (length(sr) == 0)
+        return(NULL)
+    sr <- lapply(sr, function(z) {
+        if (nrow(z) == 1)
+            return(z)
+        z <- z[order(z$rank), ]
+        return(z[1, , drop = FALSE])
+    })
+    sr <- do.call(rbind, sr)
+    rownames(sr) <- sr$name
+    return(sr[, c("name", "length")])
+}
+
+##' Download and read a table from Ensembl
+##' @param base_url the base url to the mysql folder on the server.
+##' @param file_name the file name of the table.
+##' @param colnames the column names.
+##' @return A data.frame with the table's content.
+##' @noRd
+.getReadMysqlTable <- function(base_url, file_name, colnames) {
+    tmp_file <- tempfile()
+    download.file(url = paste0(base_url, "/", file_name), destfile = tmp_file,
+                  quiet = TRUE)
+    tmp <- read.table(tmp_file, sep = "\t", quote = "", comment.char = "",
+                      as.is = TRUE)
+    colnames(tmp) <- colnames
+    return(tmp)
+}
+
diff --git a/R/functions-utils.R b/R/functions-utils.R
index bf108e7..1634c61 100644
--- a/R/functions-utils.R
+++ b/R/functions-utils.R
@@ -53,22 +53,23 @@ checkOrderBy <- function(orderBy, supported = character()) {
 ##    column call without db are added).
 ## b) GRangesFilter: the feature is set based on the specified feature parameter
 ## Args:
-addFilterColumns <- function(cols, filter = list(), edb) {
+addFilterColumns <- function(cols, filter = AnnotationFilterList(), edb) {
+    if (missing(cols))
+        cols <- NULL
     gimmeAll <- returnFilterColumns(edb)
-    if (!missing(filter)) {
-        if(!is.list(filter))
-            filter <- list(filter)
-    } else {
-        return(cols)
-    }
     if (!gimmeAll)
         return(cols)
+    ## Put filter into an AnnotationFilterList if it's not already one
+    if (is(filter, "AnnotationFilter"))
+        filter <- AnnotationFilterList(filter)
     ## Or alternatively process the filters and add columns.
     symFilts <- c("SymbolFilter")
     addC <- unlist(lapply(filter, function(z) {
         if(class(z) %in% symFilts)
-            return(column(z))
-        return(column(z))
+            return(z at field)
+        if (is(z, "AnnotationFilterList"))
+            return(addFilterColumns(cols = cols, filter = z, edb))
+        return(ensDbColumn(z))
     }))
     return(unique(c(cols, addC)))
 }
@@ -81,3 +82,160 @@ addFilterColumns <- function(cols, filter = list(), edb) {
 SQLiteName2MySQL <- function(x) {
     return(tolower(gsub(x, pattern = ".", replacement = "_", fixed = TRUE)))
 }
+
+
+## running the shiny web app.
+runEnsDbApp <- function(...){
+    if(requireNamespace("shiny", quietly=TRUE)){
+        message("Starting the EnsDb shiny web app. Use Ctrl-C to stop.")
+        shiny::runApp(appDir=system.file("shinyHappyPeople",
+                                         package="ensembldb"), ...)
+    }else{
+        stop("Package shiny not installed!")
+    }
+}
+
+############################################################
+## anyProteinColumns
+##
+## Check if any of 'x' are protein columns.
+anyProteinColumns <- function(x){
+    return(any(x %in% unlist(.ensdb_protein_tables(), use.names = FALSE)))
+}
+
+############################################################
+## listProteinColumns
+##
+#' @description The \code{listProteinColumns} function allows to conveniently
+#'     extract all database columns containing protein annotations from
+#'     an \code{\linkS4class{EnsDb}} database.
+#' 
+#' @return The \code{listProteinColumns} function returns a character vector
+#'     with the column names containing protein annotations or throws an error
+#'     if no such annotations are available.
+#' 
+#' @rdname ProteinFunctionality
+#' 
+#' @examples
+#'
+#' ## List all columns containing protein annotations
+#' library(EnsDb.Hsapiens.v75)
+#' edb <- EnsDb.Hsapiens.v75
+#' if (hasProteinData(edb))
+#'     listProteinColumns(edb)
+listProteinColumns <- function(object) {
+    if (missing(object))
+        stop("'object' is missing with no default.")
+    if (!is(object, "EnsDb"))
+        stop("'object' has to be an instance of an 'EnsDb' object.")
+    if (!hasProteinData(object))
+        stop("The provided EnsDb database does not contain protein annotations!")
+    return(listColumns(object, c("protein", "uniprot", "protein_domain")))
+}
+
+############################################################
+## .ProteinsFromDataframe
+#' @param x \code{EnsDb} object.
+#' 
+#' @param data \code{data.frame} with the results from a call to the
+#'     \code{proteins} method; has to have required columns \code{"protein_id"}
+#'     and \code{"protein_sequence"}.
+#' 
+#' @noRd
+.ProteinsFromDataframe <- function(x, data) {
+    if (!all(c("protein_id", "protein_sequence") %in% colnames(data)))
+        stop("Reguired columns 'protein_id' and 'protein_sequence' not in 'data'!")
+    ## Get the column names for uniprot and protein_domain
+    uniprot_cols <- listColumns(x, "uniprot")
+    uniprot_cols <- uniprot_cols[uniprot_cols != "protein_id"]
+    uniprot_cols <- uniprot_cols[uniprot_cols %in% colnames(data)]
+    if (length(uniprot_cols) > 0)
+        warning("Don't know yet how to handle the 1:n mapping between",
+                " protein_id and uniprot_id!")
+
+    prot_dom_cols <- listColumns(x, "protein_domain")
+    prot_dom_cols <- prot_dom_cols[prot_dom_cols != "protein_id"]
+    prot_dom_cols <- prot_dom_cols[prot_dom_cols %in% colnames(data)]
+
+    ## Create the protein part of the object, i.e. the AAStringSet.
+    ## Use all columns other than protein_id, protein_sequence
+    prot_cols <- colnames(data)
+    prot_cols <- prot_cols[!(prot_cols %in% c(uniprot_cols, prot_dom_cols))]
+    protein_sub <- unique(data[, prot_cols, drop = FALSE])
+    aass <- AAStringSet(protein_sub$protein_sequence)
+    names(aass) <- protein_sub$protein_id
+    prot_cols <- prot_cols[!(prot_cols %in% c("protein_id", "protein_sequence"))]
+    if (length(prot_cols) > 0) {
+        mcols(aass) <- DataFrame(protein_sub[, prot_cols, drop = FALSE])
+        ## drop these columns from data to eventually speed up splits
+        data <- data[, !(colnames(data) %in% prot_cols), drop = FALSE]
+    }
+
+    ## How to process the Uniprot here??? have a 1:n mapping!
+
+    ## Create the protein domain part
+    if (length(prot_dom_cols) > 0) {
+        message("Processing protein domains not yet implemented!")
+        ## Split the dataframe by protein_id
+        ## process this list to create the IRangesList.
+        ## pranges should have the same order and the same names
+    } else {
+        pranges <- IRangesList(replicate(length(aass), IRanges()))
+        names(pranges) <- names(aass)
+    }
+    metadata <- list(created = date())
+
+    ##return(new("Proteins", aa = aass, pranges = pranges, metadata = metadata))
+}
+
+## map chromosome strand...
+strand2num <- function(x){
+    if (is.numeric(x)) {
+        if (x >= 0) return(1)
+        else return(-1)
+    }
+    xm <- x
+    if(xm == "+" | xm == "-")
+        xm <- paste0(xm, 1)
+    xm <- as.numeric(xm)
+    if (is.na(xm))
+        stop("'", x, "' can not be converted to a strand!")
+    return(xm)
+}
+
+num2strand <- function(x){
+    if(x < 0){
+        return("-")
+    }else{
+        return("+")
+    }
+}
+
+#' @description Collapses entries in the \code{"entrezid"} column of a
+#'     \code{data.frame} or \code{DataFrame} making the rest of \code{x} unique.
+#'
+#' @param x Either a \code{data.frame} or a \code{DataFrame}.
+#'
+#' @param by \code{character(1)} defining the column by which the
+#'     \code{"entrezid"} column should be splitted.
+#' 
+#' @author Johannes Rainer
+#' 
+#' @noRd
+.collapseEntrezidInTable <- function(x, by = "gene_id") {
+    ## Slow version: use unique call.
+    eg_idx <- which(colnames(x) == "entrezid")
+    if (length(eg_idx)) {
+        ## Avoid an additional lapply unique call.
+        tmp <- unique(x[, c(by, "entrezid")])
+        egs <- split(tmp[, "entrezid"],
+                     f = factor(tmp[, by], levels = unique(tmp[, by])))
+        ## Use a unique call.
+        ## x_sub <- x[match(names(egs), x[, by]), , drop = FALSE] would be much
+        ## faster but does not work e.g. for exons or transcripts.
+        x_sub <- unique(x[, -eg_idx, drop = FALSE])
+        x_sub$entrezid <- egs[x_sub[, by]]
+        return(x_sub)
+    }
+    x
+}
diff --git a/R/loadEnsDb.R b/R/loadEnsDb.R
deleted file mode 100644
index 2a9832c..0000000
--- a/R/loadEnsDb.R
+++ /dev/null
@@ -1,5 +0,0 @@
-loadEnsDb <- function( x ){
-    ## con <- ensDb( x )
-    ## EDB <- new( "EnsDb", ensdb=con )
-    return( EnsDb( x ) )
-}
diff --git a/R/makeEnsemblDbPackage.R b/R/makeEnsemblDbPackage.R
deleted file mode 100644
index d40c750..0000000
--- a/R/makeEnsemblDbPackage.R
+++ /dev/null
@@ -1,213 +0,0 @@
-## part of this code is from GenomicFeatures makeTxDbPackage.R
-## So to make a package we need a couple things:
-## 1) we need a method called makeTxDbPackage (that will take a txdb object)
-## 2) we will need a package template to use
-
-
-
-## Separate helper function for abbreviating the genus and species name strings
-## this simply makes the first character uppercase
-.organismName <- function(x){
-    substring(x, 1, 1) <- toupper(substring(x, 1, 1))
-    return(x)
-}
-
-.abbrevOrganismName <- function(organism){
-  spc <- unlist(strsplit(organism, "_"))
-  ## this assumes a binomial nomenclature has been maintained.
-  return(paste0(substr(spc[[1]], 1, 1), spc[[2]]))
-}
-
-
-
-## x has to be the connection to the database.
-.makePackageName <- function(x){
-    species <- .getMetaDataValue(x, "Organism")
-    ensembl_version <- .getMetaDataValue(x, "ensembl_version")
-    pkgName <- paste0("EnsDb.",.abbrevOrganismName(.organismName(species)),
-                      ".v", ensembl_version)
-    return(pkgName)
-}
-
-.makeObjectName <- function(pkgName){
-  strs <- unlist(strsplit(pkgName, "\\."))
-  paste(c(strs[2:length(strs)],strs[1]), collapse="_")
-}
-
-
-## retrieve Ensembl data
-## save all files to local folder.
-## returns the path where files have been saved to.
-fetchTablesFromEnsembl <- function(version, ensemblapi, user="anonymous",
-                                   host="ensembldb.ensembl.org", pass="",
-                                   port=5306, species="human"){
-    if(missing(version))
-        stop("The version of the Ensembl database has to be provided!")
-    ## setting the stage for perl:
-    fn <- system.file("perl", "get_gene_transcript_exon_tables.pl", package="ensembldb")
-    ## parameters: s, U, H, P, e
-    ## replacing white spaces with _
-    species <- gsub(species, pattern=" ", replacement="_")
-
-    cmd <- paste0("perl ", fn, " -s ", species," -e ", version,
-                  " -U ", user, " -H ", host, " -p ", port, " -P ", pass)
-    if(!missing(ensemblapi)){
-        Sys.setenv(ENS=ensemblapi)
-    }
-    system(cmd)
-    if(!missing(ensemblapi)){
-        Sys.unsetenv("ENS")
-    }
-
-    ## we should now have the files:
-    in_files <- c("ens_gene.txt", "ens_tx.txt", "ens_exon.txt",
-                  "ens_tx2exon.txt", "ens_chromosome.txt", "ens_metadata.txt")
-    ## check if we have all files...
-    all_files <- dir(pattern="txt")
-    if(sum(in_files %in% all_files)!=length(in_files))
-        stop("Something went wrong! I'm missing some of the txt files the perl script should have generated.")
-}
-
-
-####
-##
-## create a SQLite database containing the information defined in the txt files.
-makeEnsemblSQLiteFromTables <- function(path=".", dbname){
-    ## check if we have all files...
-    in_files <- c("ens_gene.txt", "ens_tx.txt", "ens_exon.txt",
-                  "ens_tx2exon.txt", "ens_chromosome.txt", "ens_metadata.txt")
-    ## check if we have all files...
-    all_files <- dir(path, pattern="txt")
-    if(sum(in_files %in% all_files)!=length(in_files))
-        stop("Something went wrong! I'm missing some of the txt files the perl script should have generated.")
-
-    ## read information
-    info <- read.table(paste0(path, .Platform$file.sep ,"ens_metadata.txt"), sep="\t",
-                       as.is=TRUE, header=TRUE)
-    species <- .organismName(info[ info$name=="Organism", "value" ])
-    ##substring(species, 1, 1) <- toupper(substring(species, 1, 1))
-    if(missing(dbname)){
-        dbname <- paste0("EnsDb.",substring(species, 1, 1),
-                         unlist(strsplit(species, split="_"))[ 2 ], ".v",
-                         info[ info$name=="ensembl_version", "value" ], ".sqlite")
-    }
-    con <- dbConnect(dbDriver("SQLite"), dbname=dbname)
-
-    ## write information table
-    dbWriteTable(con, name="metadata", info, row.names=FALSE)
-
-    ## process chromosome
-    tmp <- read.table(paste0(path, .Platform$file.sep ,"ens_chromosome.txt"), sep="\t", as.is=TRUE, header=TRUE)
-    tmp[, "seq_name"] <- as.character(tmp[, "seq_name"])
-    dbWriteTable(con, name="chromosome", tmp, row.names=FALSE)
-    rm(tmp)
-
-    ## process genes: some gene names might have fancy names...
-    tmp <- read.table(paste0(path, .Platform$file.sep, "ens_gene.txt"), sep="\t", as.is=TRUE, header=TRUE,
-                      quote="", comment.char="" )
-    OK <- .checkIntegerCols(tmp)
-    dbWriteTable(con, name="gene", tmp, row.names=FALSE)
-    rm(tmp)
-
-    ## process transcripts:
-    tmp <- read.table(paste0(path, .Platform$file.sep, "ens_tx.txt"), sep="\t", as.is=TRUE, header=TRUE)
-    ## Fix the tx_cds_seq_start and tx_cds_seq_end columns: these should be integer!
-    suppressWarnings(
-        tmp[, "tx_cds_seq_start"] <- as.integer(tmp[, "tx_cds_seq_start"])
-    )
-    suppressWarnings(
-        tmp[, "tx_cds_seq_end"] <- as.integer(tmp[, "tx_cds_seq_end"])
-    )
-    OK <- .checkIntegerCols(tmp)
-    dbWriteTable(con, name="tx", tmp, row.names=FALSE)
-    rm(tmp)
-
-    ## process exons:
-    tmp <- read.table(paste0(path, .Platform$file.sep, "ens_exon.txt"), sep="\t", as.is=TRUE, header=TRUE)
-    OK <- .checkIntegerCols(tmp)
-    dbWriteTable(con, name="exon", tmp, row.names=FALSE)
-    rm(tmp)
-    tmp <- read.table(paste0(path, .Platform$file.sep, "ens_tx2exon.txt"), sep="\t", as.is=TRUE, header=TRUE)
-    OK <- .checkIntegerCols(tmp)
-    dbWriteTable(con, name="tx2exon", tmp, row.names=FALSE)
-    rm(tmp)
-    ## Create indices
-    .createEnsDbIndices(con)
-    dbDisconnect(con)
-    ## done.
-    return(dbname)
-}
-
-############################################################
-## Simply checking that some columns are integer
-.checkIntegerCols <- function(x, columns = c("gene_seq_start", "gene_seq_end",
-                                             "tx_seq_start", "tx_seq_start",
-                                             "exon_seq_start", "exon_seq_end",
-                                             "exon_idx", "tx_cds_seq_start",
-                                             "tx_cds_seq_end")) {
-    cols <- columns[columns %in% colnames(x)]
-    if(length(cols) > 0) {
-        sapply(cols, function(z) {
-            if(!is.integer(x[, z]))
-                stop("Column '", z,"' is not of type integer!")
-        })
-    }
-    return(TRUE)
-}
-
-
-####
-## the function that creates the annotation package.
-## ensdb should be a connection to an SQLite database, or a character string...
-makeEnsembldbPackage <- function(ensdb,
-                                 version,
-                                 maintainer,
-                                 author,
-                                 destDir=".",
-                                 license="Artistic-2.0"){
-    if(class(ensdb)!="character")
-        stop("ensdb has to be the name of the SQLite database!")
-    ensdbfile <- ensdb
-    ensdb <- EnsDb(x=ensdbfile)
-    con <- dbconn(ensdb)
-    pkgName <- .makePackageName(con)
-    ensembl_version <- .getMetaDataValue(con, "ensembl_version")
-    ## there should only be one template
-    template_path <- system.file("pkg-template",package="ensembldb")
-    ## We need to define some symbols in order to have the
-    ## template filled out correctly.
-    symvals <- list(
-        PKGTITLE=paste("Ensembl based annotation package"),
-        PKGDESCRIPTION=paste("Exposes an annotation databases generated from Ensembl."),
-        PKGVERSION=version,
-        AUTHOR=author,
-        MAINTAINER=maintainer,
-        LIC=license,
-        ORGANISM=.organismName(.getMetaDataValue(con ,'Organism')),
-        SPECIES=.organismName(.getMetaDataValue(con,'Organism')),
-        PROVIDER="Ensembl",
-        PROVIDERVERSION=as.character(ensembl_version),
-        RELEASEDATE= .getMetaDataValue(con ,'Creation time'),
-        SOURCEURL= .getMetaDataValue(con ,'ensembl_host'),
-        ORGANISMBIOCVIEW=gsub(" ","_",.organismName(.getMetaDataValue(con ,'Organism'))),
-        TXDBOBJNAME=pkgName ## .makeObjectName(pkgName)
-       )
-    ## Should never happen
-    if (any(duplicated(names(symvals)))) {
-        str(symvals)
-        stop("'symvals' contains duplicated symbols")
-    }
-    createPackage(pkgname=pkgName,
-                  destinationDir=destDir,
-                  originDir=template_path,
-                  symbolValues=symvals)
-    ## then copy the contents of the database into the extdata dir
-    sqlfilename <- unlist(strsplit(ensdbfile, split=.Platform$file.sep))
-    sqlfilename <- sqlfilename[ length(sqlfilename) ]
-    dir.create(paste(c(destDir, pkgName, "inst", "extdata"),
-                      collapse=.Platform$file.sep), showWarnings=FALSE, recursive=TRUE)
-    db_path <- file.path(destDir, pkgName, "inst", "extdata",
-                         paste(pkgName,"sqlite",sep="."))
-    file.copy(ensdbfile, to=db_path)
-}
-
diff --git a/R/runEnsDbApp.R b/R/runEnsDbApp.R
deleted file mode 100644
index 7d74396..0000000
--- a/R/runEnsDbApp.R
+++ /dev/null
@@ -1,10 +0,0 @@
-## running the shiny web app.
-runEnsDbApp <- function(...){
-    if(requireNamespace("shiny", quietly=TRUE)){
-        message("Starting the EnsDb shiny web app. Use Ctrl-C to stop.")
-        shiny::runApp(appDir=system.file("shinyHappyPeople", package="ensembldb"), ...)
-    }else{
-        stop("Package shiny not installed!")
-    }
-}
-
diff --git a/R/select-methods.R b/R/select-methods.R
index 3f9cc18..1d04c0d 100644
--- a/R/select-methods.R
+++ b/R/select-methods.R
@@ -65,30 +65,49 @@ setMethod("columns", "EnsDb",
 ####------------------------------------------------------------
 setMethod("keytypes", "EnsDb",
           function(x){
-              return(.filterKeytypes())
+              return(.filterKeytypes(withProteins = hasProteinData(x)))
           }
 )
 ## This just returns some (eventually) usefull names for keys
 .simpleKeytypes <- function(x){
     return(c("GENEID","TXID","TXNAME","EXONID","EXONNAME","CDSID","CDSNAME"))
 }
-.filterKeytypes <- function(x){
-    return(names(.keytype2FilterMapping()))
+.filterKeytypes <- function(withProteins = FALSE){
+    return(names(.keytype2FilterMapping(withProteins = withProteins)))
 }
 ## returns a vector mapping keytypes (names of vector) to filter names (elements).
-.keytype2FilterMapping <- function(){
-    filters <- c("EntrezidFilter", "GeneidFilter", "GenebiotypeFilter", "GenenameFilter",
-                 "TxidFilter", "TxbiotypeFilter", "ExonidFilter", "SeqnameFilter",
-                 "SeqstrandFilter", "TxidFilter", "SymbolFilter")
-    names(filters) <- c("ENTREZID", "GENEID", "GENEBIOTYPE", "GENENAME", "TXID",
-                        "TXBIOTYPE", "EXONID", "SEQNAME", "SEQSTRAND", "TXNAME",
-                        "SYMBOL")
+.keytype2FilterMapping <- function(withProteins = FALSE){
+    filters <- c(ENTREZID = "EntrezFilter",
+                 GENEID = "GeneIdFilter",
+                 GENEBIOTYPE = "GeneBiotypeFilter",
+                 GENENAME = "GenenameFilter",
+                 TXID = "TxIdFilter",
+                 TXBIOTYPE = "TxBiotypeFilter",
+                 EXONID = "ExonIdFilter",
+                 SEQNAME = "SeqNameFilter",
+                 SEQSTRAND = "SeqStrandFilter",
+                 TXNAME = "TxIdFilter",
+                 SYMBOL = "SymbolFilter")
+    if (withProteins) {
+        filters <- c(filters,
+                     PROTEINID = "ProteinIdFilter",
+                     UNIPROTID = "UniprotFilter",
+                     PROTEINDOMAINID = "ProtDomIdFilter")
+    }
     return(filters)
 }
-filterForKeytype <- function(keytype){
-    filters <- .keytype2FilterMapping()
+filterForKeytype <- function(keytype, x, vals){
+    if (missing(vals))
+        vals <- 1
+    if (!missing(x)) {
+        withProts <- hasProteinData(x)
+    } else {
+        withProts <- FALSE
+    }
+    filters <- .keytype2FilterMapping(withProts)
     if(any(names(filters) == keytype)){
-        filt <- new(filters[keytype])
+        filt <- do.call(filters[keytype], args = list(value = vals))
+        ## filt <- new(filters[keytype])
         return(filt)
     }else{
         stop("No filter for that keytype!")
@@ -104,28 +123,45 @@ filterForKeytype <- function(keytype){
 ##
 ####------------------------------------------------------------
 setMethod("keys", "EnsDb",
-          function(x, keytype, filter,...){
+          function(x, keytype, filter, ...){
               if(missing(keytype))
                   keytype <- "GENEID"
               if(missing(filter))
-                  filter <- list()
-              if(is(filter, "BasicFilter"))
-                  filter <- list(filter)
+                  filter <- AnnotationFilterList()
+              filter <- .processFilterParam(filter, x)
               keyt <- keytypes(x)
+              if (length(keytype) > 1) {
+                  keytype <- keytype[1]
+                  warning("Using only first provided keytype.")
+              }
+              if (!any(keyt == keytype))
+                  stop("keytype '", keytype, "' not supported! ",
+                       "Allowed choices are: ",
+                       paste0("'", keyt ,"'", collapse = ", "), ".")
               keytype <- match.arg(keytype, keyt)
               ## Map the keytype to the appropriate column name.
               dbColumn <- ensDbColumnForColumn(x, keytype)
               ## Perform the query.
-              res <- getWhat(x, columns=dbColumn, filter=filter)[, dbColumn]
+              res <- getWhat(x, columns = dbColumn, filter = filter)[, dbColumn]
               return(res)
           })
 
 
-####============================================================
-##  select method
-##
+############################################################
+## select method
 ##
-####------------------------------------------------------------
+##  We have to be carefull, if the database contains protein annotations too:
+##  o If the keys are DNA/RNA related, start from a DNA/RNA related table.
+##  o if keys are protein related: start from a protein column.
+##  Reason is that we do have only protein annotations for protein coding genes
+##  and no annotation for the remaining. Thus the type of the join (left join,
+##  left outer join) is crucial, as well as the table with which we start the
+##  query!
+##  What if we provide more than one filter?
+##  a) GenenameFilter and ProteinidFilter: doesn't really matter from which table
+##     we start, because the query will only return results with protein
+##     annotions. -> if there is one DNA/RNA related filter: don't do anything.
+##  b) Only protein filters: start from the highest protein table.
 setMethod("select", "EnsDb",
           function(x, keys, columns, keytype, ...) {
               if (missing(keys))
@@ -147,8 +183,9 @@ setMethod("select", "EnsDb",
     if (all(notAvailable))
         stop("None of the specified columns are avaliable in the database!")
     if (any(notAvailable)){
-        warning("The following columns are not available in the database and have",
-                " thus been removed: ", paste(columns[notAvailable], collapse = ", "))
+        warning("The following columns are not available in the database and",
+                " have thus been removed: ",
+                paste(columns[notAvailable], collapse = ", "))
         columns <- columns[!notAvailable]
     }
     ## keys:
@@ -156,79 +193,92 @@ setMethod("select", "EnsDb",
         ## Get everything from the database...
         keys <- list()
     } else {
-        if (!(is(keys, "character") | is(keys, "list") | is(keys, "BasicFilter")))
-            stop("Argument keys should be a character vector, an object extending BasicFilter ",
-                 "or a list of objects extending BasicFilter.")
-        if (is(keys, "list")) {
-            if (!all(vapply(keys, is, logical(1L), "BasicFilter")))
-                stop("If keys is a list it should be a list of objects extending BasicFilter!")
-        }
-        if (is(keys, "BasicFilter")) {
-            keys <- list(keys)
-        }
+        if (!(is(keys, "character") | is(keys, "list") | is(keys, "formula") |
+              is(keys, "AnnotationFilter") | is(keys, "AnnotationFilterList")))
+            stop("Argument keys should be a character vector, an object",
+                 " extending AnnotationFilter, a filter expression",
+                 " or an AnnotationFilterList.")
         if (is(keys, "character")) {
             if (is.null(keytype)) {
-                stop("Argument keytype is mandatory if keys is a character vector!")
+                stop("Argument keytype is mandatory if keys is a",
+                     " character vector!")
             }
             ## Check also keytype:
             if (!(keytype %in% keytypes(x)))
                 stop("keytype ", keytype, " not available in the database.",
                      " Use keytypes method to list all available keytypes.")
             ## Generate a filter object for the filters.
-            keyFilter <- filterForKeytype(keytype)
-            value(keyFilter) <- keys
+            keyFilter <- filterForKeytype(keytype, x, vals = keys)
+            ## value(keyFilter) <- keys
+            ## keyFilter at value <- keys
             keys <- list(keyFilter)
             ## Add also the keytype itself to the columns.
             if (!any(columns == keytype))
                 columns <- c(keytype, columns)
         }
+        ## Check and fix filter.
+        keys <- .processFilterParam(keys, x)
     }
-    ## Map the columns to column names we have in the database and add filter columns too.
+    ## Map the columns to column names we have in the database and
+    ## add filter columns too.
     ensCols <- unique(c(ensDbColumnForColumn(x, columns),
                         addFilterColumns(character(), filter = keys, x)))
+    ## TODO @jo: Do we have to check that we are allowed to have protein filters
+    ##           or columns?
     ## OK, now perform the query given the filters we've got.
-    res <- getWhat(x, columns = ensCols, filter = keys)
+    ## Check if keys does only contain protein annotation columns; in that case
+    ## select one of tables "protein", "uniprot", "protein_domain" in that order
+    ## if (all(unlist(lapply(keys, isProteinFilter)))) {
+    if (all(isProteinFilter(keys))) {
+        startWith <- "protein_domain"
+        if (any(unlist(lapply(keys, function(z) is(z, "UniprotFilter")))))
+            startWith <- "uniprot"
+        if (any(unlist(lapply(keys, function(z) is(z, "ProteinIdFilter")))))
+            startWith <- "protein"
+    } else {
+        startWith <- NULL
+    }
+    ## Otherwise set startWith to NULL
+    res <- getWhat(x, columns = ensCols, filter = keys, startWith = startWith)
     ## Order results if length of filters is 1.
     if (length(keys) == 1) {
         ## Define the filters on which we could sort.
-        sortFilts <- c("GenenameFilter", "GeneidFilter", "EntrezidFilter", "GenebiotypeFilter",
-                       "SymbolFilter", "TxidFilter", "TxbiotypeFilter", "ExonidFilter",
-                       "ExonrankFilter", "SeqnameFilter")
+        sortFilts <- c("GenenameFilter", "GeneIdFilter", "EntrezFilter",
+                       "GeneBiotypeFilter", "SymbolFilter", "TxIdFilter",
+                       "TxBiotypeFilter", "ExonIdFilter", "ExonRankFilter",
+                       "SeqNameFilter")
         if (class(keys[[1]]) %in% sortFilts) {
             keyvals <- value(keys[[1]])
             ## Handle symlink Filter differently:
             if (is(keys[[1]], "SymbolFilter")) {
-                sortCol <- column(keys[[1]])
+                ## sortCol <- ensDbColumn(keys[[1]])
+                sortCol <- keys[[1]]@field
             } else {
-                sortCol <- removePrefix(column(keys[[1]], x))
+                sortCol <- ensDbColumn(keys[[1]])
+                ## sortCol <- removePrefix(ensDbColumn(keys[[1]], x))
             }
             res <- res[order(match(res[, sortCol], keyvals)), ]
         }
     } else {
         ## Show a mild warning message
-        message("Note: ordering of the results might not match ordering of keys!")
+        message(paste0("Note: ordering of the results might not match ordering",
+                       " of keys!"))
     }
     colMap <- .getColMappings(x)
     colnames(res) <- colMap[colnames(res)]
     rownames(res) <- NULL
     if (returnFilterColumns(x))
         return(res)
-    ## ## Now, if we've got a "TXNAME" in columns, we have to replace at least one of the "TXID"s
-    ## ## in the colnames...
-    ## if(any(columns == "TXNAME"))
-    ##     colnames(res)[match("TXID", colnames(res))] <- "TXNAME"
-    return(res[, columns])
+    res[, columns]
 }
 
-
-####============================================================
+############################################################
 ##  mapIds method
 ##
 ##  maps the submitted keys (names of the returned vector) to values
 ##  of the column specified by column.
 ##  x, key, column, keytype, ..., multiVals
-####------------------------------------------------------------
-setMethod("mapIds", "EnsDb", function(x, keys, column, keytype, ..., multiVals){
+setMethod("mapIds", "EnsDb", function(x, keys, column, keytype, ..., multiVals) {
     if(missing(keys))
         keys <- NULL
     if(missing(column))
@@ -237,38 +287,53 @@ setMethod("mapIds", "EnsDb", function(x, keys, column, keytype, ..., multiVals){
         keytype <- NULL
     if(missing(multiVals))
         multiVals <- NULL
-    return(.mapIds(x=x, keys=keys, column=column, keytype=keytype, multiVals=multiVals, ...))
+    return(.mapIds(x = x, keys = keys, column = column, keytype = keytype,
+                   multiVals = multiVals, ...))
 })
 ## Other methods: saveDb, species, dbfile, dbconn, taxonomyId
-.mapIds <- function(x, keys=NULL, column=NULL, keytype=NULL, ..., multiVals=NULL){
-    if(is.null(keys))
+.mapIds <- function(x, keys = NULL, column = NULL, keytype = NULL, ...,
+                    multiVals = NULL) {
+    if (is.null(keys))
         stop("Argument keys has to be provided!")
-    if(!(is(keys, "character") | is(keys, "list") | is(keys, "BasicFilter")))
-        stop("Argument keys should be a character vector, an object extending BasicFilter ",
-             "or a list of objects extending BasicFilter.")
-    if(is.null(column))
+    ## if (!(is(keys, "character") | is(keys, "list") |
+    ##       is(keys, "AnnotationFilter")))
+    ##     stop("Argument keys should be a character vector, an object extending",
+    ##          " AnnotationFilter or a list of objects extending AnnotationFilter.")
+    if (is.null(column))
         column <- "GENEID"
     ## Have to specify the columns argument. Has to be keytype and column.
-    if(is(keys, "character")){
-        if(is.null(keytype))
+    if (is(keys, "character")){
+        if (is.null(keytype))
             stop("Argument keytype is mandatory if keys is a character vector!")
         columns <- c(keytype, column)
-    }
-    if(is(keys, "list") | is(keys, "BasicFilter")){
-        if(is(keys, "list")){
-            if(length(keys) > 1)
-                warning("Got ", length(keys), " filter objects.",
-                        " Will use the keys of the first for the mapping!")
-            cn <- class(keys[[1]])[1]
-        }else{
-            cn <- class(keys)[1]
-        }
+    } else {
+        ## Test if we can convert the filter. Returns ALWAYS an
+        ## AnnotationFilterList
+        keys <- .processFilterParam(keys, x)
+        if(length(keys) > 1)
+            warning("Got ", length(keys), " filter objects.",
+                    " Will use the keys of the first for the mapping!")
+        cn <- class(keys[[1]])[1]
         ## Use the first element to determine the keytype...
         mapping <- .keytype2FilterMapping()
         columns <- c(names(mapping)[mapping == cn], column)
         keytype <- NULL
     }
-    res <- select(x, keys=keys, columns=columns, keytype=keytype)
+    ## if(is(keys, "list") | is(keys, "AnnotationFilter")){
+    ##     if(is(keys, "list")){
+    ##         if(length(keys) > 1)
+    ##             warning("Got ", length(keys), " filter objects.",
+    ##                     " Will use the keys of the first for the mapping!")
+    ##         cn <- class(keys[[1]])[1]
+    ##     }else{
+    ##         cn <- class(keys)[1]
+    ##     }
+    ##     ## Use the first element to determine the keytype...
+    ##     mapping <- .keytype2FilterMapping()
+    ##     columns <- c(names(mapping)[mapping == cn], column)
+    ##     keytype <- NULL
+    ## }
+    res <- select(x, keys = keys, columns = columns, keytype = keytype)
     if(nrow(res) == 0)
         return(character())
     ## Handling multiVals.
@@ -276,13 +341,6 @@ setMethod("mapIds", "EnsDb", function(x, keys, column, keytype, ..., multiVals){
         multiVals <- "first"
     if(is(multiVals, "function"))
         stop("Not yet implemented!")
-    ## Eventually re-order the data.frame in the same order than the keys...
-    ## That's amazingly slow!!!
-    ## if(is.character(keys)){
-    ##     res <- split(res, f=factor(res[, 1], levels=keys))
-    ##     res <- do.call(rbind, res)
-    ##     rownames(res) <- NULL
-    ## }
     if(is.character(keys)){
         theNames <- keys
     }else{
diff --git a/build/vignette.rds b/build/vignette.rds
index 6e57dc9..256aa5c 100644
Binary files a/build/vignette.rds and b/build/vignette.rds differ
diff --git a/inst/NEWS b/inst/NEWS
index 5391de0..23d3646 100644
--- a/inst/NEWS
+++ b/inst/NEWS
@@ -1,15 +1,116 @@
-CHANGES IN VERSION 1.6.2
-------------------------
+CHANGES IN VERSION 2.0.4
+--------------------------
 
 BUG FIXES:
-    o Avoid errors when using EnsDbs with protein annotations.
+    o ensDbFromGtf failed to parse header for GTF files with more than one
+      white space.
 
 
-CHANGES IN VERSION 1.6.1
-------------------------
+CHANGES IN VERSION 1.99.13
+--------------------------
+
+USER VISIBLE CHANGES:
+    o Most filter classes are now imported from the AnnotationFilter package.
+    o Parameter 'filter' supports now filter expression.
+    o Multiple filters can be combined with & and |.
+    o buildQuery is no longer exported.
+
+
+CHANGES IN VERSION 1.99.11
+--------------------------
+
+BUG FIXES:
+    o ensDbFromGtf failed to fetch sequence length for some ensemblgenomes
+      versions.
+    
+
+CHANGES IN VERSION 1.99.11
+--------------------------
+
+NEW FEATURES
+    o Retrieving also the taxonomy ID from the Ensembl databases and storing this
+      information into the metadata table.
+
+
+CHANGES IN VERSION 1.99.10
+--------------------------
+
+BUG FIXES:
+    o Fix problem on Windows systems failing to download files from Ensembl
+      servers.
+
+CHANGES IN VERSION 1.99.6
+-------------------------
+
+BUG FIXES:
+    o MySQL database name for useMySQL was not created as expected for GTF/GFF
+      based EnsDbs.
+
+
+CHANGES IN VERSION 1.99.5
+-------------------------
+
+NEW FEATURES:
+    o OnlyCodingTxFilter is now exported. This filter allows to query for
+    protein coding genes.
+
+
+CHANGES IN VERSION 1.99.3
+-------------------------
+
+BUG FIXES:
+   o Add two additional uniprot table columns to internal variable and fix
+     failing unit test.
+
+
+CHANGES IN VERSION 1.99.3
+-------------------------
 
 BUG FIXES:
-    o Fix plain return statements in shiny server.R.
+   o Add two additional uniprot table columns to internal variable and fix
+     failing unit test.
+
+
+CHANGES IN VERSION 1.99.3
+-------------------------
+
+NEW FEATURES:
+    o UniprotdbFilter and UniprotmappingtypeFilter.
+
+USER VISIBLE CHANGES:
+    o Fetching Uniprot database and the type of mapping method for
+      Uniprot IDs to Ensembl protein IDs: database columns uniprot_db and
+      uniprot_mapping_type.
+
+
+CHANGES IN VERSION 1.99.2
+-------------------------
+
+BUG FIXES:
+    o Perl script is no longer failing if no chromosome info is available.
+
+
+CHANGES IN VERSION 1.99.1
+-------------------------
+
+BUG FIXES:
+    o No protein table indices were created when inserting an EnsDb with protein
+      data to MySQL.
+
+
+CHANGES IN VERSION 1.99.0
+-------------------------
+
+NEW FEATURES:
+    o The perl script to create EnsDb databases fetches also protein annotations.
+    o Added functionality to extract protein annotations from the database
+      (if available) ensuring backward compatibility.
+    o Add proteins vignette.
+
+USER VISIBLE CHANGES:
+    o Improved functionality to fetch sequence lengths for chromosomes from
+      Ensembl or ensemblgenomes.
+
 
 CHANGES IN VERSION 1.5.14
 -------------------------
@@ -102,10 +203,10 @@ BUG FIXES
 CHANGES IN VERSION 1.5.4
 -------------------------
 
-Bug fixes
-    o Column tx_id was always removed from exonsBy result even if in the
-      columns argument.
-    o exon_idx was of type character if database generated from a GTF file.
+BUG FIXES
+    o tx_id was removed from metadata columns in txBy.
+    o Fixed a bug that caused exon_idx column to be character if database created
+      from a GTF.
 
 
 CHANGES IN VERSION 1.5.2
diff --git a/inst/doc/MySQL-backend.R b/inst/doc/MySQL-backend.R
index df94d72..9003464 100644
--- a/inst/doc/MySQL-backend.R
+++ b/inst/doc/MySQL-backend.R
@@ -1,4 +1,4 @@
-## ----eval=FALSE----------------------------------------------------------
+## ----eval = FALSE----------------------------------------------------------
 #  library(ensembldb)
 #  ## Load the EnsDb package that should be installed on the MySQL server
 #  library(EnsDb.Hsapiens.v75)
@@ -11,7 +11,7 @@
 #  ## Use this EnsDb object
 #  genes(edb_mysql)
 
-## ----eval=FALSE----------------------------------------------------------
+## ----eval = FALSE----------------------------------------------------------
 #  library(ensembldb)
 #  library(RMySQL)
 #  
diff --git a/inst/doc/MySQL-backend.Rmd b/inst/doc/MySQL-backend.Rmd
index 0acd514..de512cb 100644
--- a/inst/doc/MySQL-backend.Rmd
+++ b/inst/doc/MySQL-backend.Rmd
@@ -1,21 +1,17 @@
 ---
 title: "Using a MySQL server backend"
-graphics: yes
+author: "Johannes Rainer"
+package: ensembldb
 output:
-  BiocStyle::html_document2
+  BiocStyle::html_document2:
+    toc_float: true
 vignette: >
   %\VignetteIndexEntry{Using a MySQL server backend}
   %\VignetteEngine{knitr::rmarkdown}
   %\VignetteEncoding{UTF-8}
   %\VignetteDepends{ensembldb,EnsDb.Hsapiens.v75,BiocStyle}
-  %\VignettePackage{ensembldb}
-  %\VignetteKeywords{annotation,database}
 ---
 
-**Package**: `r BiocStyle::Biocpkg("ensembldb")`<br />
-**Authors**: `r packageDescription("ensembldb")$Author`<br />
-**Modified**: 20 September, 2016<br />
-**Compiled**: `r date()`
 
 # Introduction
 
@@ -31,12 +27,13 @@ the individual clients.
 **Note** the code in this document is not executed during vignette generation as
 this would require access to a MySQL server.
 
+
 # Using `ensembldb` with a MySQL server
 
 Installation of `EnsDb` databases in a MySQL server is straight forward - given
 that the user has write access to the server:
 
-```{r eval=FALSE}
+```{r eval = FALSE}
 library(ensembldb)
 ## Load the EnsDb package that should be installed on the MySQL server
 library(EnsDb.Hsapiens.v75)
@@ -55,7 +52,7 @@ R-package, the connection to the database can be passed to the `EnsDb` construct
 function. With the resulting `EnsDb` object annotations can be retrieved from the
 MySQL database.
 
-```{r eval=FALSE}
+```{r eval = FALSE}
 library(ensembldb)
 library(RMySQL)
 
@@ -72,3 +69,4 @@ dbcon <- dbConnect(MySQL(), host = "localhost", user = "readonly",
 edb <- EnsDb(dbcon)
 edb
 ```
+
diff --git a/inst/doc/MySQL-backend.html b/inst/doc/MySQL-backend.html
index 0d6d27b..8850745 100644
--- a/inst/doc/MySQL-backend.html
+++ b/inst/doc/MySQL-backend.html
@@ -4,25 +4,30 @@
 
 <head>
 
-<meta charset="utf-8">
+<meta charset="utf-8" />
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="pandoc" />
 
 
+<meta name="author" content="Johannes Rainer" />
 
 
 <title>Using a MySQL server backend</title>
 
 <script src="data:application/x-javascript;base64,LyohIGpRdWVyeSB2MS4xMS4zIHwgKGMpIDIwMDUsIDIwMTUgalF1ZXJ5IEZvdW5kYXRpb24sIEluYy4gfCBqcXVlcnkub3JnL2xpY2Vuc2UgKi8KIWZ1bmN0aW9uKGEsYil7Im9iamVjdCI9PXR5cGVvZiBtb2R1bGUmJiJvYmplY3QiPT10eXBlb2YgbW9kdWxlLmV4cG9ydHM/bW9kdWxlLmV4cG9ydHM9YS5kb2N1bWVudD9iKGEsITApOmZ1bmN0aW9uKGEpe2lmKCFhLmRvY3VtZW50KXRocm93IG5ldyBFcnJvcigialF1ZXJ5IHJlcXVpcmVzIGEgd2luZG93IHdpdGggYSBkb2N1bWVudCIpO3JldHVybiBiKGEpfTpiKGEpfSgidW5kZWZpbmVkIiE9dHlwZW9mIHdpbmRvdz93aW5kb3c6dG [...]
 <meta name="viewport" content="width=device-width, initial-scale=1" />
-<link href="data:text/css;charset=utf-8,html%7Bfont%2Dfamily%3Asans%2Dserif%3B%2Dwebkit%2Dtext%2Dsize%2Dadjust%3A100%25%3B%2Dms%2Dtext%2Dsize%2Dadjust%3A100%25%7Dbody%7Bmargin%3A0%7Darticle%2Caside%2Cdetails%2Cfigcaption%2Cfigure%2Cfooter%2Cheader%2Chgroup%2Cmain%2Cmenu%2Cnav%2Csection%2Csummary%7Bdisplay%3Ablock%7Daudio%2Ccanvas%2Cprogress%2Cvideo%7Bdisplay%3Ainline%2Dblock%3Bvertical%2Dalign%3Abaseline%7Daudio%3Anot%28%5Bcontrols%5D%29%7Bdisplay%3Anone%3Bheight%3A0%7D%5Bhidden%5D%2Ctem [...]
+<link href="data:text/css;charset=utf-8,html%7Bfont%2Dfamily%3Asans%2Dserif%3B%2Dwebkit%2Dtext%2Dsize%2Dadjust%3A100%25%3B%2Dms%2Dtext%2Dsize%2Dadjust%3A100%25%7Dbody%7Bmargin%3A0%7Darticle%2Caside%2Cdetails%2Cfigcaption%2Cfigure%2Cfooter%2Cheader%2Chgroup%2Cmain%2Cmenu%2Cnav%2Csection%2Csummary%7Bdisplay%3Ablock%7Daudio%2Ccanvas%2Cprogress%2Cvideo%7Bdisplay%3Ainline%2Dblock%3Bvertical%2Dalign%3Abaseline%7Daudio%3Anot%28%5Bcontrols%5D%29%7Bdisplay%3Anone%3Bheight%3A0%7D%5Bhidden%5D%2Ctem [...]
 <script src="data:application/x-javascript;base64,LyohCiAqIEJvb3RzdHJhcCB2My4zLjUgKGh0dHA6Ly9nZXRib290c3RyYXAuY29tKQogKiBDb3B5cmlnaHQgMjAxMS0yMDE1IFR3aXR0ZXIsIEluYy4KICogTGljZW5zZWQgdW5kZXIgdGhlIE1JVCBsaWNlbnNlCiAqLwppZigidW5kZWZpbmVkIj09dHlwZW9mIGpRdWVyeSl0aHJvdyBuZXcgRXJyb3IoIkJvb3RzdHJhcCdzIEphdmFTY3JpcHQgcmVxdWlyZXMgalF1ZXJ5Iik7K2Z1bmN0aW9uKGEpeyJ1c2Ugc3RyaWN0Ijt2YXIgYj1hLmZuLmpxdWVyeS5zcGxpdCgiICIpWzBdLnNwbGl0KCIuIik7aWYoYlswXTwyJiZiWzFdPDl8fDE9PWJbMF0mJjk9PWJbMV0mJmJbMl08MSl0aHJvdy [...]
 <script src="data:application/x-javascript;base64,LyoqCiogQHByZXNlcnZlIEhUTUw1IFNoaXYgMy43LjIgfCBAYWZhcmthcyBAamRhbHRvbiBAam9uX25lYWwgQHJlbSB8IE1JVC9HUEwyIExpY2Vuc2VkCiovCi8vIE9ubHkgcnVuIHRoaXMgY29kZSBpbiBJRSA4CmlmICghIXdpbmRvdy5uYXZpZ2F0b3IudXNlckFnZW50Lm1hdGNoKCJNU0lFIDgiKSkgewohZnVuY3Rpb24oYSxiKXtmdW5jdGlvbiBjKGEsYil7dmFyIGM9YS5jcmVhdGVFbGVtZW50KCJwIiksZD1hLmdldEVsZW1lbnRzQnlUYWdOYW1lKCJoZWFkIilbMF18fGEuZG9jdW1lbnRFbGVtZW50O3JldHVybiBjLmlubmVySFRNTD0ieDxzdHlsZT4iK2IrIjwvc3R5bGU+IixkLm [...]
 <script src="data:application/x-javascript;base64,LyohIFJlc3BvbmQuanMgdjEuNC4yOiBtaW4vbWF4LXdpZHRoIG1lZGlhIHF1ZXJ5IHBvbHlmaWxsICogQ29weXJpZ2h0IDIwMTMgU2NvdHQgSmVobAogKiBMaWNlbnNlZCB1bmRlciBodHRwczovL2dpdGh1Yi5jb20vc2NvdHRqZWhsL1Jlc3BvbmQvYmxvYi9tYXN0ZXIvTElDRU5TRS1NSVQKICogICovCgovLyBPbmx5IHJ1biB0aGlzIGNvZGUgaW4gSUUgOAppZiAoISF3aW5kb3cubmF2aWdhdG9yLnVzZXJBZ2VudC5tYXRjaCgiTVNJRSA4IikpIHsKIWZ1bmN0aW9uKGEpeyJ1c2Ugc3RyaWN0IjthLm1hdGNoTWVkaWE9YS5tYXRjaE1lZGlhfHxmdW5jdGlvbihhKXt2YXIgYixjPWEuZG [...]
+<script src="data:application/x-javascript;base64,LyohIGpRdWVyeSBVSSAtIHYxLjExLjQgLSAyMDE2LTAxLTA1CiogaHR0cDovL2pxdWVyeXVpLmNvbQoqIEluY2x1ZGVzOiBjb3JlLmpzLCB3aWRnZXQuanMsIG1vdXNlLmpzLCBwb3NpdGlvbi5qcywgZHJhZ2dhYmxlLmpzLCBkcm9wcGFibGUuanMsIHJlc2l6YWJsZS5qcywgc2VsZWN0YWJsZS5qcywgc29ydGFibGUuanMsIGFjY29yZGlvbi5qcywgYXV0b2NvbXBsZXRlLmpzLCBidXR0b24uanMsIGRpYWxvZy5qcywgbWVudS5qcywgcHJvZ3Jlc3NiYXIuanMsIHNlbGVjdG1lbnUuanMsIHNsaWRlci5qcywgc3Bpbm5lci5qcywgdGFicy5qcywgdG9vbHRpcC5qcywgZWZmZWN0LmpzLC [...]
+<link href="data:text/css;charset=utf-8,%0A%0A%2Etocify%20%7B%0Awidth%3A%2020%25%3B%0Amax%2Dheight%3A%2090%25%3B%0Aoverflow%3A%20auto%3B%0Amargin%2Dleft%3A%202%25%3B%0Aposition%3A%20fixed%3B%0Aborder%3A%201px%20solid%20%23ccc%3B%0Awebkit%2Dborder%2Dradius%3A%206px%3B%0Amoz%2Dborder%2Dradius%3A%206px%3B%0Aborder%2Dradius%3A%206px%3B%0A%7D%0A%0A%2Etocify%20ul%2C%20%2Etocify%20li%20%7B%0Alist%2Dstyle%3A%20none%3B%0Amargin%3A%200%3B%0Apadding%3A%200%3B%0Aborder%3A%20none%3B%0Aline%2Dheight%3 [...]
+<script src="data:application/x-javascript;base64,LyoganF1ZXJ5IFRvY2lmeSAtIHYxLjkuMSAtIDIwMTMtMTAtMjIKICogaHR0cDovL3d3dy5ncmVnZnJhbmtvLmNvbS9qcXVlcnkudG9jaWZ5LmpzLwogKiBDb3B5cmlnaHQgKGMpIDIwMTMgR3JlZyBGcmFua287IExpY2Vuc2VkIE1JVCAqLwoKLy8gSW1tZWRpYXRlbHktSW52b2tlZCBGdW5jdGlvbiBFeHByZXNzaW9uIChJSUZFKSBbQmVuIEFsbWFuIEJsb2cgUG9zdF0oaHR0cDovL2JlbmFsbWFuLmNvbS9uZXdzLzIwMTAvMTEvaW1tZWRpYXRlbHktaW52b2tlZC1mdW5jdGlvbi1leHByZXNzaW9uLykgdGhhdCBjYWxscyBhbm90aGVyIElJRkUgdGhhdCBjb250YWlucyBhbGwgb2YgdG [...]
+<script src="data:application/x-javascript;base64,CgovKioKICogalF1ZXJ5IFBsdWdpbjogU3RpY2t5IFRhYnMKICoKICogQGF1dGhvciBBaWRhbiBMaXN0ZXIgPGFpZGFuQHBocC5uZXQ+CiAqIGFkYXB0ZWQgYnkgUnViZW4gQXJzbGFuIHRvIGFjdGl2YXRlIHBhcmVudCB0YWJzIHRvbwogKiBodHRwOi8vd3d3LmFpZGFubGlzdGVyLmNvbS8yMDE0LzAzL3BlcnNpc3RpbmctdGhlLXRhYi1zdGF0ZS1pbi1ib290c3RyYXAvCiAqLwooZnVuY3Rpb24oJCkgewogICJ1c2Ugc3RyaWN0IjsKICAkLmZuLnJtYXJrZG93blN0aWNreVRhYnMgPSBmdW5jdGlvbigpIHsKICAgIHZhciBjb250ZXh0ID0gdGhpczsKICAgIC8vIFNob3cgdGhlIHRhYi [...]
+<link href="data:text/css;charset=utf-8,pre%20%2Eoperator%2C%0Apre%20%2Eparen%20%7B%0Acolor%3A%20rgb%28104%2C%20118%2C%20135%29%0A%7D%0Apre%20%2Eliteral%20%7B%0Acolor%3A%20%23990073%0A%7D%0Apre%20%2Enumber%20%7B%0Acolor%3A%20%23099%3B%0A%7D%0Apre%20%2Ecomment%20%7B%0Acolor%3A%20%23998%3B%0Afont%2Dstyle%3A%20italic%0A%7D%0Apre%20%2Ekeyword%20%7B%0Acolor%3A%20%23900%3B%0Afont%2Dweight%3A%20bold%0A%7D%0Apre%20%2Eidentifier%20%7B%0Acolor%3A%20rgb%280%2C%200%2C%200%29%3B%0A%7D%0Apre%20%2Estri [...]
+<script src="data:application/x-javascript;base64,dmFyIGhsanM9bmV3IGZ1bmN0aW9uKCl7ZnVuY3Rpb24gbShwKXtyZXR1cm4gcC5yZXBsYWNlKC8mL2dtLCImYW1wOyIpLnJlcGxhY2UoLzwvZ20sIiZsdDsiKX1mdW5jdGlvbiBmKHIscSxwKXtyZXR1cm4gUmVnRXhwKHEsIm0iKyhyLmNJPyJpIjoiIikrKHA/ImciOiIiKSl9ZnVuY3Rpb24gYihyKXtmb3IodmFyIHA9MDtwPHIuY2hpbGROb2Rlcy5sZW5ndGg7cCsrKXt2YXIgcT1yLmNoaWxkTm9kZXNbcF07aWYocS5ub2RlTmFtZT09IkNPREUiKXtyZXR1cm4gcX1pZighKHEubm9kZVR5cGU9PTMmJnEubm9kZVZhbHVlLm1hdGNoKC9ccysvKSkpe2JyZWFrfX19ZnVuY3Rpb24gaCh0LH [...]
 
 <style type="text/css">code{white-space: pre;}</style>
-<link href="data:text/css;charset=utf-8,pre%20%2Eoperator%2C%0Apre%20%2Eparen%20%7B%0Acolor%3A%20rgb%28104%2C%20118%2C%20135%29%0A%7D%0Apre%20%2Eliteral%20%7B%0Acolor%3A%20%23990073%0A%7D%0Apre%20%2Enumber%20%7B%0Acolor%3A%20%23099%3B%0A%7D%0Apre%20%2Ecomment%20%7B%0Acolor%3A%20%23998%3B%0Afont%2Dstyle%3A%20italic%0A%7D%0Apre%20%2Ekeyword%20%7B%0Acolor%3A%20%23900%3B%0Afont%2Dweight%3A%20bold%0A%7D%0Apre%20%2Eidentifier%20%7B%0Acolor%3A%20rgb%280%2C%200%2C%200%29%3B%0A%7D%0Apre%20%2Estri [...]
-<script src="data:application/x-javascript;base64,dmFyIGhsanM9bmV3IGZ1bmN0aW9uKCl7ZnVuY3Rpb24gbShwKXtyZXR1cm4gcC5yZXBsYWNlKC8mL2dtLCImYW1wOyIpLnJlcGxhY2UoLzwvZ20sIiZsdDsiKX1mdW5jdGlvbiBmKHIscSxwKXtyZXR1cm4gUmVnRXhwKHEsIm0iKyhyLmNJPyJpIjoiIikrKHA/ImciOiIiKSl9ZnVuY3Rpb24gYihyKXtmb3IodmFyIHA9MDtwPHIuY2hpbGROb2Rlcy5sZW5ndGg7cCsrKXt2YXIgcT1yLmNoaWxkTm9kZXNbcF07aWYocS5ub2RlTmFtZT09IkNPREUiKXtyZXR1cm4gcX1pZighKHEubm9kZVR5cGU9PTMmJnEubm9kZVZhbHVlLm1hdGNoKC9ccysvKSkpe2JyZWFrfX19ZnVuY3Rpb24gaCh0LH [...]
 <style type="text/css">
 
 </style>
@@ -63,7 +68,7 @@ h6 {
 }
 </style>
 
-<link href="data:text/css;charset=utf-8,body%20%7B%0Amax%2Dwidth%3A%201054px%3B%0Amargin%3A%200px%20auto%3B%0A%7D%0Abody%2C%20td%20%7B%0Afont%2Dfamily%3A%20sans%2Dserif%3B%0Afont%2Dsize%3A%2010pt%3B%0A%7D%0A%0Adiv%23TOC%20ul%20%7B%0Apadding%3A%200px%200px%200px%2045px%3B%0Alist%2Dstyle%3A%20none%3B%0Abackground%2Dimage%3A%20none%3B%0Abackground%2Drepeat%3A%20none%3B%0Abackground%2Dposition%3A%200%3B%0Afont%2Dsize%3A%2010pt%3B%0Afont%2Dfamily%3A%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B [...]
+<link href="data:text/css;charset=utf-8,body%20%7B%0Amargin%3A%200px%20auto%3B%0Amax%2Dwidth%3A%201134px%3B%0A%7D%0Abody%2C%20td%20%7B%0Afont%2Dfamily%3A%20sans%2Dserif%3B%0Afont%2Dsize%3A%2010pt%3B%0A%7D%0A%0Adiv%23TOC%20ul%20%7B%0Apadding%3A%200px%200px%200px%2045px%3B%0Alist%2Dstyle%3A%20none%3B%0Abackground%2Dimage%3A%20none%3B%0Abackground%2Drepeat%3A%20none%3B%0Abackground%2Dposition%3A%200%3B%0Afont%2Dsize%3A%2010pt%3B%0Afont%2Dfamily%3A%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B [...]
 
 </head>
 
@@ -71,7 +76,7 @@ h6 {
 
 <style type="text/css">
 .main-container {
-  max-width: 768px;
+  max-width: 828px;
   margin-left: auto;
   margin-right: auto;
 }
@@ -93,7 +98,6 @@ button.code-folding-btn:focus {
 <div class="container-fluid main-container">
 
 <!-- tabsets -->
-<script src="data:application/x-javascript;base64,Cgp3aW5kb3cuYnVpbGRUYWJzZXRzID0gZnVuY3Rpb24odG9jSUQpIHsKCiAgLy8gYnVpbGQgYSB0YWJzZXQgZnJvbSBhIHNlY3Rpb24gZGl2IHdpdGggdGhlIC50YWJzZXQgY2xhc3MKICBmdW5jdGlvbiBidWlsZFRhYnNldCh0YWJzZXQpIHsKCiAgICAvLyBjaGVjayBmb3IgZmFkZSBhbmQgcGlsbHMgb3B0aW9ucwogICAgdmFyIGZhZGUgPSB0YWJzZXQuaGFzQ2xhc3MoInRhYnNldC1mYWRlIik7CiAgICB2YXIgcGlsbHMgPSB0YWJzZXQuaGFzQ2xhc3MoInRhYnNldC1waWxscyIpOwogICAgdmFyIG5hdkNsYXNzID0gcGlsbHMgPyAibmF2LXBpbGxzIiA6ICJuYXYtdGFicyI7CgogIC [...]
 <script>
 $(document).ready(function () {
   window.buildTabsets("TOC");
@@ -105,6 +109,98 @@ $(document).ready(function () {
 
 
 
+<script>
+$(document).ready(function ()  {
+
+    // move toc-ignore selectors from section div to header
+    $('div.section.toc-ignore')
+        .removeClass('toc-ignore')
+        .children('h1,h2,h3,h4,h5').addClass('toc-ignore');
+
+    // establish options
+    var options = {
+      selectors: "h1,h2,h3",
+      theme: "bootstrap3",
+      context: '.toc-content',
+      hashGenerator: function (text) {
+        return text.replace(/[.\\/?&!#<>]/g, '').replace(/\s/g, '_').toLowerCase();
+      },
+      ignoreSelector: ".toc-ignore",
+      scrollTo: 0
+    };
+    options.showAndHide = true;
+    options.smoothScroll = true;
+
+    // tocify
+    var toc = $("#TOC").tocify(options).data("toc-tocify");
+});
+</script>
+
+<style type="text/css">
+
+#TOC {
+  margin: 25px 0px 20px 0px;
+}
+ at media (max-width: 768px) {
+#TOC {
+  position: relative;
+  width: 100%;
+}
+}
+
+
+
+
+div.main-container {
+  max-width: 1200px;
+}
+
+div.tocify {
+  width: 20%;
+  max-width: 246px;
+  max-height: 85%;
+}
+
+ at media (min-width: 768px) and (max-width: 991px) {
+  div.tocify {
+    width: 25%;
+  }
+}
+
+ at media (max-width: 767px) {
+  div.tocify {
+    width: 100%;
+    max-width: none;
+  }
+}
+
+.tocify ul, .tocify li {
+  line-height: 20px;
+}
+
+.tocify-subheader .tocify-item {
+  font-size: 0.90em;
+  padding-left: 25px;
+  text-indent: 0;
+}
+
+.tocify .list-group-item {
+  border-radius: 0px;
+}
+
+
+</style>
+
+<!-- setup 3col/9col grid for toc_float and main content  -->
+<div class="row-fluid">
+<div class="col-xs-12 col-sm-4 col-md-3">
+<div id="TOC" class="tocify">
+</div>
+</div>
+
+<div class="toc-content col-xs-12 col-sm-8 col-md-9">
+
+
 
 
 <div class="fluid-row" id="header">
@@ -112,18 +208,14 @@ $(document).ready(function () {
 
 
 <h1 class="title toc-ignore">Using a MySQL server backend</h1>
+<p class="author-name">Johannes Rainer</p>
+<h4 class="date"><em>4 August 2017</em></h4>
+<h4 class="package">Package</h4>
+<p>ensembldb 2.0.4</p>
 
 </div>
 
-<h1>Contents</h1>
-<div id="TOC">
-<ul>
-<li><a href="#introduction"><span class="toc-section-number">1</span> Introduction</a></li>
-<li><a href="#using-ensembldb-with-a-mysql-server"><span class="toc-section-number">2</span> Using <code>ensembldb</code> with a MySQL server</a></li>
-</ul>
-</div>
 
-<p><strong>Package</strong>: <em><a href="http://bioconductor.org/packages/ensembldb">ensembldb</a></em><br /> <strong>Authors</strong>: Johannes Rainer <a href="mailto:johannes.rainer at eurac.edu">johannes.rainer at eurac.edu</a>, Tim Triche <a href="mailto:tim.triche at usc.edu">tim.triche at usc.edu</a><br /> <strong>Modified</strong>: 20 September, 2016<br /> <strong>Compiled</strong>: Wed Nov 16 19:52:05 2016</p>
 <div id="introduction" class="section level1">
 <h1><span class="header-section-number">1</span> Introduction</h1>
 <p><code>ensembldb</code> uses by default, similar to other annotation packages in Bioconductor, a SQLite database backend, i.e. annotations are retrieved from file-based SQLite databases that are provided <em>via</em> packages, such as the <code>EnsDb.Hsapiens.v75</code> package. In addition, <code>ensembldb</code> allows to switch the backend from SQLite to MySQL and thus to retrieve annotations from a MySQL server instead. Such a setup might be useful for a lab running a well-configur [...]
@@ -163,14 +255,19 @@ edb</code></pre>
 
 
 
+</div>
+</div>
 
 </div>
 
 <script>
 
 // add bootstrap table styles to pandoc tables
-$(document).ready(function () {
+function bootstrapStylePandocTables() {
   $('tr.header').parent('thead').parent('table').addClass('table table-condensed');
+}
+$(document).ready(function () {
+  bootstrapStylePandocTables();
 });
 
 
@@ -178,12 +275,6 @@ $(document).ready(function () {
 
 <script type="text/x-mathjax-config">
   MathJax.Hub.Config({
-    TeX: {
-      TagSide: "right",
-      equationNumbers: {
-        autoNumber: "AMS"
-      }
-    },
     "HTML-CSS": {
       styles: {
         ".MathJax_Display": {
@@ -200,7 +291,7 @@ $(document).ready(function () {
   (function () {
     var script = document.createElement("script");
     script.type = "text/javascript";
-    script.src  = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
+    script.src  = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
     document.getElementsByTagName("head")[0].appendChild(script);
   })();
 </script>
diff --git a/inst/doc/ensembldb.R b/inst/doc/ensembldb.R
index 2efa4be..76f79c2 100644
--- a/inst/doc/ensembldb.R
+++ b/inst/doc/ensembldb.R
@@ -1,4 +1,4 @@
-## ----warning=FALSE, message=FALSE----------------------------------------
+## ----load-libs, warning=FALSE, message=FALSE-------------------------------
 library(EnsDb.Hsapiens.v75)
 
 ## Making a "short cut"
@@ -6,10 +6,19 @@ edb <- EnsDb.Hsapiens.v75
 ## print some informations for this package
 edb
 
-## for what organism was the database generated?
+## For what organism was the database generated?
 organism(edb)
 
-## ------------------------------------------------------------------------
+## ----no-network, echo = FALSE, results = "hide"----------------------------
+## Disable code chunks that require network connection - conditionally
+## disable this on Windows only. This is to avoid TIMEOUT errors on the
+## Bioconductor Windows build maching (issue #47).
+use_network <- FALSE
+
+## ----filters---------------------------------------------------------------
+supportedFilters(edb)
+
+## ----transcripts-----------------------------------------------------------
 Tx <- transcripts(edb, filter = list(GenenameFilter("BCL2L11")))
 
 Tx
@@ -20,29 +29,33 @@ head(start(Tx))
 ## or extract the biotype with
 head(Tx$tx_biotype)
 
-## ------------------------------------------------------------------------
+## ----transcripts-filter-expression-----------------------------------------
+## Use a filter expression to perform the filtering.
+transcripts(edb, filter = ~ genename == "ZBTB16")
+
+## ----list-columns----------------------------------------------------------
 ## list all database tables along with their columns
 listTables(edb)
 
 ## list columns from a specific table
 listColumns(edb, "tx")
 
-## ------------------------------------------------------------------------
+## ----transcripts-example2--------------------------------------------------
 Tx <- transcripts(edb,
 		  columns = c(listColumns(edb , "tx"), "gene_name"),
-		  filter = TxbiotypeFilter("nonsense_mediated_decay"),
+		  filter = TxBiotypeFilter("nonsense_mediated_decay"),
 		  return.type = "DataFrame")
 nrow(Tx)
 Tx
 
-## ------------------------------------------------------------------------
-yCds <- cdsBy(edb, filter = SeqnameFilter("Y"))
+## ----cdsBy-----------------------------------------------------------------
+yCds <- cdsBy(edb, filter = SeqNameFilter("Y"))
 yCds
 
-## ------------------------------------------------------------------------
+## ----genes-GRangesFilter---------------------------------------------------
 ## Define the filter
 grf <- GRangesFilter(GRanges("11", ranges = IRanges(114000000, 114000050),
-			     strand = "+"), condition = "overlapping")
+			     strand = "+"), type = "any")
 
 ## Query genes:
 gn <- genes(edb, filter = grf)
@@ -64,72 +77,69 @@ for(i in 1:length(txs)) {
     text(start(current), y = i-0.5, pos = 4, cex = 0.75, labels = current$tx_id)
 }
 
-## ------------------------------------------------------------------------
+## ----transcripts-GRangesFilter---------------------------------------------
 transcripts(edb, filter = grf)
 
-## ------------------------------------------------------------------------
-## Get all gene biotypes from the database. The GenebiotypeFilter
+## ----biotypes--------------------------------------------------------------
+## Get all gene biotypes from the database. The GeneBiotypeFilter
 ## allows to filter on these values.
 listGenebiotypes(edb)
 
 ## Get all transcript biotypes from the database.
 listTxbiotypes(edb)
 
-## ------------------------------------------------------------------------
+## ----genes-BCL2------------------------------------------------------------
 ## We're going to fetch all genes which names start with BCL. To this end
 ## we define a GenenameFilter with partial matching, i.e. condition "like"
 ## and a % for any character/string.
 BCLs <- genes(edb,
 	      columns = c("gene_name", "entrezid", "gene_biotype"),
-	      filter = list(GenenameFilter("BCL%", condition = "like")),
+	      filter = GenenameFilter("BCL", condition = "startsWith"),
 	      return.type = "DataFrame")
 nrow(BCLs)
 BCLs
 
-## ------------------------------------------------------------------------
+## ----example-AnnotationFilterList------------------------------------------
 ## determine the average length of snRNA, snoRNA and rRNA genes encoded on
 ## chromosomes X and Y.
-mean(lengthOf(edb, of = "tx",
-	      filter = list(GenebiotypeFilter(c("snRNA", "snoRNA", "rRNA")),
-			    SeqnameFilter(c("X", "Y")))))
+mean(lengthOf(edb, of = "tx", filter = AnnotationFilterList(
+				  GeneBiotypeFilter(c("snRNA", "snoRNA", "rRNA")),
+				  SeqNameFilter(c("X", "Y")))))
 
 ## determine the average length of protein coding genes encoded on the same
 ## chromosomes.
-mean(lengthOf(edb, of = "tx",
-	      filter = list(GenebiotypeFilter("protein_coding"),
-			    SeqnameFilter(c("X", "Y")))))
+mean(lengthOf(edb, of = "tx", filter = ~ gene_biotype == "protein_coding" &
+				  seq_name %in% c("X", "Y")))
 
-## ------------------------------------------------------------------------
+## ----example-first-two-exons-----------------------------------------------
 ## Extract all exons 1 and (if present) 2 for all genes encoded on the
 ## Y chromosome
 exons(edb, columns = c("tx_id", "exon_idx"),
-      filter = list(SeqnameFilter("Y"),
-		    ExonrankFilter(3, condition = "<")))
+      filter = list(SeqNameFilter("Y"),
+		    ExonRankFilter(3, condition = "<")))
 
-## ------------------------------------------------------------------------
-TxByGns <- transcriptsBy(edb, by = "gene",
-			 filter = list(SeqnameFilter(c("X", "Y")))
-			 )
+## ----transcriptsBy-X-Y-----------------------------------------------------
+TxByGns <- transcriptsBy(edb, by = "gene", filter = SeqNameFilter(c("X", "Y")))
 TxByGns
 
-## ----eval=FALSE----------------------------------------------------------
+## ----exonsBy-RNAseq, message = FALSE, eval = FALSE-------------------------
 #  ## will just get exons for all genes on chromosomes 1 to 22, X and Y.
 #  ## Note: want to get rid of the "LRG" genes!!!
-#  EnsGenes <- exonsBy(edb, by = "gene",
-#  		    filter = list(SeqnameFilter(c(1:22, "X", "Y")),
-#  				  GeneidFilter("ENSG%", "like")))
+#  EnsGenes <- exonsBy(edb, by = "gene", filter = AnnotationFilterList(
+#  					  SeqNameFilter(c(1:22, "X", "Y")),
+#  					  GeneIdFilter("ENSG", "startsWith")))
 
-## ----eval=FALSE----------------------------------------------------------
+## ----toSAF-RNAseq, message = FALSE, eval=FALSE-----------------------------
 #  ## Transforming the GRangesList into a data.frame in SAF format
 #  EnsGenes.SAF <- toSAF(EnsGenes)
 
-## ----eval=FALSE----------------------------------------------------------
+## ----disjointExons, message = FALSE, eval=FALSE----------------------------
 #  ## Create a GRanges of non-overlapping exon parts.
-#  DJE <- disjointExons(edb,
-#  		     filter = list(SeqnameFilter(c(1:22, "X", "Y")),
-#  				   GeneidFilter("ENSG%", "like")))
+#  DJE <- disjointExons(edb, filter = AnnotationFilterList(
+#  			      SeqNameFilter(c(1:22, "X", "Y")),
+#  			      GeneIdFilter("ENSG%", "startsWith")))
 
-## ----eval=FALSE----------------------------------------------------------
+## ----transcript-sequence-AnnotationHub, message = FALSE, eval = FALSE------
 #  library(EnsDb.Hsapiens.v75)
 #  library(Rsamtools)
 #  edb <- EnsDb.Hsapiens.v75
@@ -148,9 +158,9 @@ TxByGns
 #  ## all of the gene's exons and introns.
 #  geneSeqs <- getSeq(Dna, genes)
 
-## ----eval=FALSE----------------------------------------------------------
+## ----transcript-sequence-extractTranscriptSeqs, message = FALSE, eval = FALSE----
 #  ## get all exons of all transcripts encoded on chromosome Y
-#  yTx <- exonsBy(edb, filter = SeqnameFilter("Y"))
+#  yTx <- exonsBy(edb, filter = SeqNameFilter("Y"))
 #  
 #  ## Retrieve the sequences for these transcripts from the FaFile.
 #  library(GenomicFeatures)
@@ -158,23 +168,23 @@ TxByGns
 #  yTxSeqs
 #  
 #  ## Extract the sequences of all transcripts encoded on chromosome Y.
-#  yTx <- extractTranscriptSeqs(Dna, edb, filter = SeqnameFilter("Y"))
+#  yTx <- extractTranscriptSeqs(Dna, edb, filter = SeqNameFilter("Y"))
 #  
 #  ## Along these lines, we could use the method also to retrieve the coding sequence
 #  ## of all transcripts on the Y chromosome.
-#  cdsY <- cdsBy(edb, filter = SeqnameFilter("Y"))
+#  cdsY <- cdsBy(edb, filter = SeqNameFilter("Y"))
 #  extractTranscriptSeqs(Dna, cdsY)
 
-## ----message=FALSE-------------------------------------------------------
+## ----seqlevelsStyle, message = FALSE---------------------------------------
 ## Change the seqlevels style form Ensembl (default) to UCSC:
 seqlevelsStyle(edb) <- "UCSC"
 
-## Now we can use UCSC style seqnames in SeqnameFilters or GRangesFilter:
-genesY <- genes(edb, filter = SeqnameFilter("chrY"))
+## Now we can use UCSC style seqnames in SeqNameFilters or GRangesFilter:
+genesY <- genes(edb, filter = ~ seq_name == "chrY")
 ## The seqlevels of the returned GRanges are also in UCSC style
 seqlevels(genesY)
 
-## ------------------------------------------------------------------------
+## ----seqlevelsStyle-2, message = FALSE-------------------------------------
 seqlevelsStyle(edb) <- "UCSC"
 
 ## Getting the default option:
@@ -191,7 +201,7 @@ seqlevels(edb)[1:30]
 ## Resetting the option.
 options(ensembldb.seqnameNotFound = "ORIGINAL")
 
-## ----warning=FALSE, message=FALSE----------------------------------------
+## ----extractTranscriptSeqs-BSGenome, warning = FALSE, message = FALSE------
 library(BSgenome.Hsapiens.UCSC.hg19)
 bsg <- BSgenome.Hsapiens.UCSC.hg19
 
@@ -201,19 +211,21 @@ unique(genome(edb))
 ## Although differently named, both represent genome build GRCh37.
 
 ## Extract the full transcript sequences.
-yTxSeqs <- extractTranscriptSeqs(bsg, exonsBy(edb, "tx", filter = SeqnameFilter("chrY")))
+yTxSeqs <- extractTranscriptSeqs(bsg, exonsBy(edb, "tx",
+					      filter = SeqNameFilter("chrY")))
 
 yTxSeqs
 
 ## Extract just the CDS
-Test <- cdsBy(edb, "tx", filter = SeqnameFilter("chrY"))
-yTxCds <- extractTranscriptSeqs(bsg, cdsBy(edb, "tx", filter = SeqnameFilter("chrY")))
+Test <- cdsBy(edb, "tx", filter = SeqNameFilter("chrY"))
+yTxCds <- extractTranscriptSeqs(bsg, cdsBy(edb, "tx",
+					   filter = SeqNameFilter("chrY")))
 yTxCds
 
-## ------------------------------------------------------------------------
+## ----seqlevelsStyle-restore------------------------------------------------
 seqlevelsStyle(edb) <- "Ensembl"
 
-## ----gviz-plot, message=FALSE, fig.align='center', fig.width=7.5, fig.height=2.25----
+## ----gviz-plot, message=FALSE, fig.align='center', fig.width=7.5, fig.height=2.3----
 ## Loading the Gviz library
 library(Gviz)
 library(EnsDb.Hsapiens.v75)
@@ -234,7 +246,7 @@ plotTracks(list(gat, GeneRegionTrack(gr)))
 
 options(ucscChromosomeNames = TRUE)
 
-## ----message=FALSE-------------------------------------------------------
+## ----message=FALSE---------------------------------------------------------
 seqlevelsStyle(edb) <- "UCSC"
 ## Retrieving the GRanges objects with seqnames corresponding to UCSC chromosome names.
 gr <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
@@ -247,10 +259,10 @@ plotTracks(list(gat, GeneRegionTrack(gr)))
 ## ----gviz-separate-tracks, message=FALSE, warning=FALSE, fig.align='center', fig.width=7.5, fig.height=2.25----
 protCod <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
 				     start = 20400000, end = 21400000,
-				     filter = GenebiotypeFilter("protein_coding"))
+				     filter = GeneBiotypeFilter("protein_coding"))
 lincs <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
 				   start = 20400000, end = 21400000,
-				   filter = GenebiotypeFilter("lincRNA"))
+				   filter = GeneBiotypeFilter("lincRNA"))
 
 plotTracks(list(gat, GeneRegionTrack(protCod, name = "protein coding"),
 		GeneRegionTrack(lincs, name = "lincRNAs")), transcriptAnnotation = "symbol")
@@ -258,7 +270,19 @@ plotTracks(list(gat, GeneRegionTrack(protCod, name = "protein coding"),
 ## At last we change the seqlevels style again to Ensembl
 seqlevelsStyle <- "Ensembl"
 
-## ------------------------------------------------------------------------
+## ----pplot-plot, message=FALSE, fig.align='center', fig.width=7.5, fig.height=4----
+library(ggbio)
+
+## Create a plot for all transcripts of the gene SKA2
+autoplot(edb, ~ genename == "SKA2")
+
+## ----pplot-plot-2, message=FALSE, fig.align='center', fig.width=7.5, fig.height=4----
+## Get the chromosomal region in which the gene is encoded
+ska2 <- genes(edb, filter = ~ genename == "SKA2")
+strand(ska2) <- "*"
+autoplot(edb, GRangesFilter(ska2), names.expr = "gene_name")
+
+## ----AnnotationDbi, message = FALSE----------------------------------------
 library(EnsDb.Hsapiens.v75)
 edb <- EnsDb.Hsapiens.v75
 
@@ -279,20 +303,20 @@ gids <- keys(edb, keytype = "GENEID")
 length(gids)
 
 ## Get all gene names for genes encoded on chromosome Y.
-gnames <- keys(edb, keytype = "GENENAME", filter = SeqnameFilter("Y"))
+gnames <- keys(edb, keytype = "GENENAME", filter = SeqNameFilter("Y"))
 head(gnames)
 
-## ----warning=FALSE-------------------------------------------------------
+## ----select, message = FALSE, warning=FALSE--------------------------------
 ## Use the /standard/ way to fetch data.
 select(edb, keys = c("BCL2", "BCL2L11"), keytype = "GENENAME",
        columns = c("GENEID", "GENENAME", "TXID", "TXBIOTYPE"))
 
 ## Use the filtering system of ensembldb
-select(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
-			TxbiotypeFilter("protein_coding")),
+select(edb, keys = ~ genename %in% c("BCL2", "BCL2L11") &
+		tx_biotype == "protein_coding",
        columns = c("GENEID", "GENENAME", "TXID", "TXBIOTYPE"))
 
-## ------------------------------------------------------------------------
+## ----mapIds, message = FALSE-----------------------------------------------
 ## Use the default method, which just returns the first value for multi mappings.
 mapIds(edb, keys = c("BCL2", "BCL2L11"), column = "TXID", keytype = "GENENAME")
 
@@ -302,10 +326,29 @@ mapIds(edb, keys = c("BCL2", "BCL2L11"), column = "TXID", keytype = "GENENAME",
 
 ## And, just like before, we can use filters to map only to protein coding transcripts.
 mapIds(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
-			TxbiotypeFilter("protein_coding")), column = "TXID",
+			TxBiotypeFilter("protein_coding")), column = "TXID",
        multiVals = "list")
 
-## ----eval=FALSE----------------------------------------------------------
+## ----AnnotationHub-query, message = FALSE, eval = use_network--------------
+#  library(AnnotationHub)
+#  ## Load the annotation resource.
+#  ah <- AnnotationHub()
+#  
+#  ## Query for all available EnsDb databases
+#  query(ah, "EnsDb")
+
+## ----AnnotationHub-query-2, message = FALSE, eval = use_network------------
+#  ahDb <- query(ah, pattern = c("Xiphophorus Maculatus", "EnsDb", 87))
+#  ## What have we got
+#  ahDb
+
+## ----AnnotationHub-fetch, message = FALSE, eval = FALSE--------------------
+#  ahEdb <- ahDb[[1]]
+#  
+#  ## retriebe all genes
+#  gns <- genes(ahEdb)
+
+## ----edb-from-ensembl, message = FALSE, eval = FALSE-----------------------
 #  library(ensembldb)
 #  
 #  ## get all human gene/transcript/exon annotations from Ensembl (75)
@@ -323,7 +366,7 @@ mapIds(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
 #  		     maintainer = "Johannes Rainer <johannes.rainer at eurac.edu>",
 #  		     author = "J Rainer")
 
-## ----eval=FALSE----------------------------------------------------------
+## ----gtf-gff-edb, message = FALSE, eval = FALSE----------------------------
 #  ## Load the AnnotationHub data.
 #  library(AnnotationHub)
 #  ah <- AnnotationHub()
@@ -345,28 +388,27 @@ mapIds(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
 #  Dna <- getGenomeFaFile(edb)
 #  library(Rsamtools)
 #  ## We next retrieve the sequence of all exons on chromosome Y.
-#  exons <- exons(edb, filter = SeqnameFilter("Y"))
+#  exons <- exons(edb, filter = SeqNameFilter("Y"))
 #  exonSeq <- getSeq(Dna, exons)
 #  
 #  ## Alternatively, look up and retrieve the toplevel DNA sequence manually.
 #  Dna <- ah[["AH22042"]]
 
-## ----message=FALSE-------------------------------------------------------
-## Generate a sqlite database from a GRanges object specifying
-## genes encoded on chromosome Y
-load(system.file("YGRanges.RData", package = "ensembldb"))
-Y
-
-DB <- ensDbFromGRanges(Y, path = tempdir(), version = 75,
-		       organism = "Homo_sapiens")
-
-edb <- EnsDb(DB)
-edb
-
-## As shown in the example below, we could make an EnsDb package on
-## this DB object using the makeEnsembldbPackage function.
+## ----EnsDb-from-Y-GRanges, message = FALSE, eval = use_network-------------
+#  ## Generate a sqlite database from a GRanges object specifying
+#  ## genes encoded on chromosome Y
+#  load(system.file("YGRanges.RData", package = "ensembldb"))
+#  Y
+#  
+#  ## Create the EnsDb database file
+#  DB <- ensDbFromGRanges(Y, path = tempdir(), version = 75,
+#  		       organism = "Homo_sapiens")
+#  
+#  ## Load the database
+#  edb <- EnsDb(DB)
+#  edb
 
-## ----eval=FALSE----------------------------------------------------------
+## ----EnsDb-from-GTF, message = FALSE, eval = FALSE-------------------------
 #  library(ensembldb)
 #  
 #  ## the GTF file can be downloaded from
diff --git a/inst/doc/ensembldb.Rmd b/inst/doc/ensembldb.Rmd
index 44420d6..7bbf10c 100644
--- a/inst/doc/ensembldb.Rmd
+++ b/inst/doc/ensembldb.Rmd
@@ -1,37 +1,33 @@
 ---
 title: "Generating an using Ensembl based annotation packages"
+author: "Johannes Rainer"
 graphics: yes
+package: ensembldb
 output:
-  BiocStyle::html_document2
+  BiocStyle::html_document2:
+    toc_float: true
 vignette: >
   %\VignetteIndexEntry{Generating an using Ensembl based annotation packages}
   %\VignetteEngine{knitr::rmarkdown}
   %\VignetteEncoding{UTF-8}
-  %\VignetteDepends{ensembldb,EnsDb.Hsapiens.v75,Gviz,BiocStyle}
-  %\VignettePackage{ensembldb}
-  %\VignetteKeywords{annotation,database}
+  %\VignetteDepends{ensembldb,EnsDb.Hsapiens.v75,BiocStyle,AnnotationHub,ggbio,Gviz}
 ---
 
-**Package**: `r BiocStyle::Biocpkg("ensembldb")`<br />
-**Authors**: `r packageDescription("ensembldb")$Author`<br />
-**Modified**: 12 September, 2016<br />
-**Compiled**: `r date()`
 
 # Introduction
 
 The `ensembldb` package provides functions to create and use transcript centric
 annotation databases/packages. The annotation for the databases are directly
-fetched from Ensembl <sup><a id="fnr.1" class="footref" href="#fn.1">1</a></sup> using their Perl API.  The functionality and data is
-similar to that of the `TxDb` packages from the `GenomicFeatures` package, but,
-in addition to retrieve all gene/transcript models and annotations from the
+fetched from Ensembl <sup><a id="fnr.1" class="footref" href="#fn.1">1</a></sup> using their Perl API. The functionality and data is
+similar to that of the `TxDb` packages from the `GenomicFeatures` package, but, in
+addition to retrieve all gene/transcript models and annotations from the
 database, the `ensembldb` package provides also a filter framework allowing to
 retrieve annotations for specific entries like genes encoded on a chromosome
-region or transcript models of lincRNA genes.  In the databases, along with the
-gene and transcript models and their chromosomal coordinates, additional
-annotations including the gene name (symbol) and NCBI Entrezgene identifiers as
-well as the gene and transcript biotypes are stored too (see Section
-[11](#orgtarget1) for the database layout and an overview of available
-attributes/columns).
+region or transcript models of lincRNA genes. From version 1.7 on, `EnsDb`
+databases created by the `ensembldb` package contain also protein annotation data
+(see Section [11](#org35014ed) for the database layout and an overview of
+available attributes/columns). For more information on the use of the protein
+annotations refer to the *proteins* vignette.
 
 Another main goal of this package is to generate *versioned* annotation
 packages, i.e. annotation packages that are build for a specific Ensembl
@@ -43,10 +39,10 @@ also allows to load multiple annotation packages at the same time in order to
 e.g. compare gene models between Ensembl releases.
 
 In the example below we load an Ensembl based annotation package for Homo
-sapiens, Ensembl version 75. The connection to the database is bound to the
-variable `EnsDb.Hsapiens.v75`.
+sapiens, Ensembl version 75. The `EnsDb` object providing access to the underlying
+SQLite database is bound to the variable name `EnsDb.Hsapiens.v75`.
 
-```{r warning=FALSE, message=FALSE}
+```{r load-libs, warning=FALSE, message=FALSE}
 library(EnsDb.Hsapiens.v75)
 
 ## Making a "short cut"
@@ -54,72 +50,107 @@ edb <- EnsDb.Hsapiens.v75
 ## print some informations for this package
 edb
 
-## for what organism was the database generated?
+## For what organism was the database generated?
 organism(edb)
 ```
 
+```{r no-network, echo = FALSE, results = "hide"}
+## Disable code chunks that require network connection - conditionally
+## disable this on Windows only. This is to avoid TIMEOUT errors on the
+## Bioconductor Windows build maching (issue #47).
+use_network <- FALSE
+```
+
+
 # Using `ensembldb` annotation packages to retrieve specific annotations
 
-The `ensembldb` package provides a set of filter objects allowing to specify
-which entries should be fetched from the database. The complete list of filters,
-which can be used individually or can be combined, is shown below (in
-alphabetical order):
+One of the strengths of the `ensembldb` package and the related `EnsDb` databases is
+its implementation of a filter framework that enables to efficiently extract
+data sub-sets from the databases. The `ensembldb` package supports most of the
+filters defined in the `AnnotationFilter` Bioconductor package and defines some
+additional filters specific to the data stored in `EnsDb` databases. The
+`supportedFilters` method can be used to get an overview over all supported filter
+classes, each of them (except the `GRangesFilter`) working on a single
+column/field in the database.
+
+```{r filters}
+supportedFilters(edb)
+```
+
+These filters can be divided into 3 main filter types:
 
--   `ExonidFilter`: allows to filter the result based on the (Ensembl) exon
-    identifiers.
--   `ExonrankFilter`: filter results on the rank (index) of an exon within the
+-   `IntegerFilter`: filter classes extending this basic object can take a single
+    numeric value as input and support the conditions `=, !`, >, <, >= and <=. All
+    filters that work on chromosomal coordinates, such as the `GeneEndFilter` extend
+    `IntegerFilter`.
+-   `CharacterFilter`: filter classes extending this object can take a single or
+    multiple character values as input and allow conditions: `=, !`, "startsWith"
+    and "endsWith". All filters working on IDs extend this class.
+-   `GRangesFilter`: takes a `GRanges` object as input and supports all conditions
+    that `findOverlaps` from the `IRanges` package supports ("any", "start", "end",
+    "within", "equal"). Note that these have to be passed using the parameter `type`
+    to the constructor function.
+
+The supported filters are:
+
+-   `EntrezFilter`: allows to filter results based on NCBI Entrezgene
+    identifiers of the genes.
+-   `ExonEndFilter`: filter using the chromosomal end coordinate of exons.
+-   `ExonIdFilter`: filter based on the (Ensembl) exon identifiers.
+-   `ExonRankFilter`: filter based on the rank (index) of an exon within the
     transcript model. Exons are always numbered from 5' to 3' end of the
     transcript, thus, also on the reverse strand, the exon 1 is the most 5' exon
     of the transcript.
--   `EntrezidFilter`: allows to filter results based on NCBI Entrezgene
-    identifiers of the genes.
--   `GenebiotypeFilter`: allows to filter for the gene biotypes defined in the
-    Ensembl database; use the `listGenebiotypes` method to list all available
-    biotypes.
--   `GeneidFilter`: allows to filter based on the Ensembl gene IDs.
--   `GenenameFilter`: allows to filter based on the names (symbols) of the genes.
--   `SymbolFilter`: allows to filter on gene symbols; note that no database columns
-    *symbol* is available in an `EnsDb` database and hence the gene name is used for
-    filtering.
+-   `ExonStartFilter`: filter using the chromosomal start coordinate of exons.
+-   `GeneBiotypeFilter`: filter using the gene biotypes defined in the Ensembl
+    database; use the `listGenebiotypes` method to list all available biotypes.
+-   `GeneEndFilter`: filter using the chromosomal end coordinate of gene.
+-   `GeneIdFilter`: filter based on the Ensembl gene IDs.
+-   `GenenameFilter`: filter based on the names (symbols) of the genes.
+-   `GeneStartFilter`: filter using the chromosomal start coordinate of gene.
 -   `GRangesFilter`: allows to retrieve all features (genes, transcripts or exons)
-    that are either within (setting `condition` to "within") or partially
-    overlapping (setting `condition` to "overlapping") the defined genomic
-    region/range. Note that, depending on the called method (`genes`, `transcripts`
-    or `exons`) the start and end coordinates of either the genes, transcripts or
-    exons are used for the filter. For methods `exonsBy`, `cdsBy` and `txBy` the
-    coordinates of `by` are used.
--   `SeqendFilter`: filter based on the chromosomal end coordinate of the exons,
-    transcripts or genes (correspondingly set =feature = "exon"=, =feature = "tx"= or
-    =feature = "gene"=).
--   `SeqnameFilter`: filter by the name of the chromosomes the genes are encoded
+    that are either within (setting parameter `type` to "within") or partially
+    overlapping (setting `type` to "any") the defined genomic region/range. Note
+    that, depending on the called method (`genes`, `transcripts` or `exons`) the start
+    and end coordinates of either the genes, transcripts or exons are used for the
+    filter. For methods `exonsBy`, `cdsBy` and `txBy` the coordinates of `by` are used.
+-   `SeqNameFilter`: filter by the name of the chromosomes the genes are encoded
     on.
--   `SeqstartFilter`: filter based on the chromosomal start coordinates of the
-    exons, transcripts or genes (correspondingly set =feature = "exon"=,
-    =feature = "tx"= or =feature = "gene"=).
--   `SeqstrandFilter`: filter for the chromosome strand on which the genes are
+-   `SeqStrandFilter`: filter for the chromosome strand on which the genes are
     encoded.
--   `TxbiotypeFilter`: filter on the transcript biotype defined in Ensembl; use
+-   `SymbolFilter`: filter on gene symbols; note that no database columns *symbol* is
+    available in an `EnsDb` database and hence the gene name is used for filtering.
+-   `TxBiotypeFilter`: filter on the transcript biotype defined in Ensembl; use
     the `listTxbiotypes` method to list all available biotypes.
--   `TxidFilter`: filter on the Ensembl transcript identifiers.
-
-Each of the filter classes can take a single value or a vector of values (with
-the exception of the `SeqendFilter` and `SeqstartFilter`) for comparison. In
-addition, it is possible to specify the *condition* for the filter,
-e.g. setting `condition` to = to retrieve all entries matching the filter value,
-to != to negate the filter or setting `condition = "like"= to allow
-partial matching. The =condition` parameter for `SeqendFilter` and
-`SeqendFilter` can take the values = , >, >=, < and <= (since these
-filters base on numeric values).
-
-A simple example would be to get all transcripts for the gene *BCL2L11*. To this
-end we specify a `GenenameFilter` with the value *BCL2L11*. As a result we get
-a `GRanges` object with `start`, `end`, `strand` and `seqname` of the `GRanges`
-object being the start coordinate, end coordinate, chromosome name and strand
-for the respective transcripts. All additional annotations are available as
-metadata columns. Alternatively, by setting `return.type` to "DataFrame", or
-"data.frame" the method would return a `DataFrame` or `data.frame` object.
-
-```{r }
+-   `TxEndFilter`: filter using the chromosomal end coordinate of transcripts.
+-   `TxIdFilter`: filter on the Ensembl transcript identifiers.
+-   `TxNameFilter`: filter on the Ensembl transcript names (currently identical to
+    the transcript IDs).
+-   `TxStartFilter`: filter using the chromosomal start coordinate of transcripts.
+
+In addition to the above listed *DNA-RNA-based* filters, *protein-specific*
+filters are also available: 
+
+-   `ProtDomIdFilter`: filter by the protein domain ID.
+-   `ProteinIdFilter`: filter by Ensembl protein ID filters.
+-   `UniprotDbFilter`: filter by the name of the Uniprot database.
+-   `UniprotFilter`: filter by the Uniprot ID.
+-   `UniprotMappingTypeFilter`: filter by the mapping type of Ensembl protein IDs to
+    Uniprot IDs.
+
+These can however only be used on `EnsDb` databases that provide protein
+annotations, i.e. for which a call to `hasProteinData` returns `TRUE`.
+
+A simple use case for the filter framework would be to get all transcripts for
+the gene *BCL2L11*. To this end we specify a `GenenameFilter` with the value
+*BCL2L11*. As a result we get a `GRanges` object with `start`, `end`, `strand` and `seqname`
+being the start coordinate, end coordinate, chromosome name and strand for the
+respective transcripts. All additional annotations are available as metadata
+columns. Alternatively, by setting `return.type` to "DataFrame", or "data.frame"
+the method would return a `DataFrame` or `data.frame` object instead of the default
+`GRanges`.
+
+```{r transcripts}
 Tx <- transcripts(edb, filter = list(GenenameFilter("BCL2L11")))
 
 Tx
@@ -131,22 +162,34 @@ head(start(Tx))
 head(Tx$tx_biotype)
 ```
 
-The parameter `columns` of the `exons`, `genes` and `transcripts` method allows
-to specify which database attributes (columns) should be retrieved. The `exons`
-method returns by default all exon-related columns, the `transcripts` all columns
-from the transcript database table and the `genes` all from the gene table. Note
-however that in the example above we got also a column `gene_name` although this
-column is not present in the transcript database table. By default the methods
-return also all columns that are used by any of the filters submitted with the
-`filter` argument (thus, because a `GenenameFilter` was used, the column `gene_name`
-is also returned). Setting `returnFilterColumns(edb) <- FALSE` disables this
-option and only the columns specified by the `columns` parameter are retrieved.
+The parameter `columns` of the extractor methods (such as `exons`, `genes` or
+`transcripts)` allows to specify which database attributes (columns) should be
+retrieved. The `exons` method returns by default all exon-related columns, the
+`transcripts` all columns from the transcript database table and the `genes` all
+from the gene table. Note however that in the example above we got also a column
+`gene_name` although this column is not present in the transcript database
+table. By default the methods return also all columns that are used by any of
+the filters submitted with the `filter` argument (thus, because a `GenenameFilter`
+was used, the column `gene_name` is also returned). Setting
+`returnFilterColumns(edb) <- FALSE` disables this option and only the columns
+specified by the `columns` parameter are retrieved.
+
+Instead of passing a filter *object* to the method it is also possible to provide
+a filter *expression* written as a `formula`.
+
+```{r transcripts-filter-expression}
+## Use a filter expression to perform the filtering.
+transcripts(edb, filter = ~ genename == "ZBTB16")
+```
+
+Filter expression have to be written as a formula (i.e. starting with a `~`) in
+the form *column name* followed by the logical condition.
 
 To get an overview of database tables and available columns the function
 `listTables` can be used. The method `listColumns` on the other hand lists columns
 for the specified database table.
 
-```{r }
+```{r list-columns}
 ## list all database tables along with their columns
 listTables(edb)
 
@@ -161,10 +204,10 @@ the name of the gene for each transcript. Note that we are changing here the
 `return.type` to `DataFrame`, so the method will return a `DataFrame` with the
 results instead of the default `GRanges`.
 
-```{r }
+```{r transcripts-example2}
 Tx <- transcripts(edb,
 		  columns = c(listColumns(edb , "tx"), "gene_name"),
-		  filter = TxbiotypeFilter("nonsense_mediated_decay"),
+		  filter = TxBiotypeFilter("nonsense_mediated_decay"),
 		  return.type = "DataFrame")
 nrow(Tx)
 Tx
@@ -174,8 +217,8 @@ For protein coding transcripts, we can also specifically extract their coding
 region. In the example below we extract the CDS for all transcripts encoded on
 chromosome Y.
 
-```{r }
-yCds <- cdsBy(edb, filter = SeqnameFilter("Y"))
+```{r cdsBy}
+yCds <- cdsBy(edb, filter = SeqNameFilter("Y"))
 yCds
 ```
 
@@ -185,10 +228,10 @@ below we query all genes that are partially overlapping with a small region on
 chromosome 11. The filter restricts to all genes for which either an exon or an
 intron is partially overlapping with the region.
 
-```{r }
+```{r genes-GRangesFilter}
 ## Define the filter
 grf <- GRangesFilter(GRanges("11", ranges = IRanges(114000000, 114000050),
-			     strand = "+"), condition = "overlapping")
+			     strand = "+"), type = "any")
 
 ## Query genes:
 gn <- genes(edb, filter = grf)
@@ -217,7 +260,7 @@ region. Below we fetch these 4 transcripts. Note, that a call to `exons` will
 not return any features from the database, as no exon is overlapping with the
 region.
 
-```{r }
+```{r transcripts-GRangesFilter}
 transcripts(edb, filter = grf)
 ```
 
@@ -229,11 +272,11 @@ overlapping genomic regions using the `exonsByOverlaps` or
 implementation of these methods for `EnsDb` objects supports also to use filters
 to further fine-tune the query.
 
-To get an overview of allowed/available gene and transcript biotype the
-functions `listGenebiotypes` and `listTxbiotypes` can be used.
+The functions `listGenebiotypes` and `listTxbiotypes` can be used to get an overview
+of allowed/available gene and transcript biotype
 
-```{r }
-## Get all gene biotypes from the database. The GenebiotypeFilter
+```{r biotypes}
+## Get all gene biotypes from the database. The GeneBiotypeFilter
 ## allows to filter on these values.
 listGenebiotypes(edb)
 
@@ -245,13 +288,13 @@ Data can be fetched in an analogous way using the `exons` and `genes`
 methods. In the example below we retrieve `gene_name`, `entrezid` and the
 `gene_biotype` of all genes in the database which names start with "BCL2".
 
-```{r }
+```{r genes-BCL2}
 ## We're going to fetch all genes which names start with BCL. To this end
 ## we define a GenenameFilter with partial matching, i.e. condition "like"
 ## and a % for any character/string.
 BCLs <- genes(edb,
 	      columns = c("gene_name", "entrezid", "gene_biotype"),
-	      filter = list(GenenameFilter("BCL%", condition = "like")),
+	      filter = GenenameFilter("BCL", condition = "startsWith"),
 	      return.type = "DataFrame")
 nrow(BCLs)
 BCLs
@@ -261,20 +304,21 @@ Sometimes it might be useful to know the length of genes or transcripts
 (i.e. the total sum of nucleotides covered by their exons). Below we calculate
 the mean length of transcripts from protein coding genes on chromosomes X and Y
 as well as the average length of snoRNA, snRNA and rRNA transcripts encoded on
-these chromosomes.
+these chromosomes. For the first query we combine two `AnnotationFilter` objects
+using an `AnnotationFilterList` object, in the second we define the query using a
+filter expression.
 
-```{r }
+```{r example-AnnotationFilterList}
 ## determine the average length of snRNA, snoRNA and rRNA genes encoded on
 ## chromosomes X and Y.
-mean(lengthOf(edb, of = "tx",
-	      filter = list(GenebiotypeFilter(c("snRNA", "snoRNA", "rRNA")),
-			    SeqnameFilter(c("X", "Y")))))
+mean(lengthOf(edb, of = "tx", filter = AnnotationFilterList(
+				  GeneBiotypeFilter(c("snRNA", "snoRNA", "rRNA")),
+				  SeqNameFilter(c("X", "Y")))))
 
 ## determine the average length of protein coding genes encoded on the same
 ## chromosomes.
-mean(lengthOf(edb, of = "tx",
-	      filter = list(GenebiotypeFilter("protein_coding"),
-			    SeqnameFilter(c("X", "Y")))))
+mean(lengthOf(edb, of = "tx", filter = ~ gene_biotype == "protein_coding" &
+				  seq_name %in% c("X", "Y")))
 ```
 
 Not unexpectedly, transcripts of protein coding genes are longer than those of
@@ -283,14 +327,15 @@ snRNA, snoRNA or rRNA genes.
 At last we extract the first two exons of each transcript model from the
 database.
 
-```{r }
+```{r example-first-two-exons}
 ## Extract all exons 1 and (if present) 2 for all genes encoded on the
 ## Y chromosome
 exons(edb, columns = c("tx_id", "exon_idx"),
-      filter = list(SeqnameFilter("Y"),
-		    ExonrankFilter(3, condition = "<")))
+      filter = list(SeqNameFilter("Y"),
+		    ExonRankFilter(3, condition = "<")))
 ```
 
+
 # Extracting gene/transcript/exon models for RNASeq feature counting
 
 For the feature counting step of an RNAseq experiment, the gene or transcript
@@ -307,10 +352,8 @@ CDS.
 A simple use case is to retrieve all genes encoded on chromosomes X and Y from
 the database.
 
-```{r }
-TxByGns <- transcriptsBy(edb, by = "gene",
-			 filter = list(SeqnameFilter(c("X", "Y")))
-			 )
+```{r transcriptsBy-X-Y}
+TxByGns <- transcriptsBy(edb, by = "gene", filter = SeqNameFilter(c("X", "Y")))
 TxByGns
 ```
 
@@ -319,17 +362,17 @@ Since Ensembl contains also definitions of genes that are on chromosome variants
 gene models should be returned.
 
 In a real use case, we might thus want to retrieve all genes encoded on the
-*standard* chromosomes. In addition it is advisable to use a `GeneidFilter` to
+*standard* chromosomes. In addition it is advisable to use a `GeneIdFilter` to
 restrict to Ensembl genes only, as also *LRG* (Locus Reference Genomic)
 genes<sup><a id="fnr.2" class="footref" href="#fn.2">2</a></sup> are defined in the database, which are partially redundant with
 Ensembl genes.
 
-```{r eval=FALSE}
+```{r exonsBy-RNAseq, message = FALSE, eval = FALSE}
 ## will just get exons for all genes on chromosomes 1 to 22, X and Y.
 ## Note: want to get rid of the "LRG" genes!!!
-EnsGenes <- exonsBy(edb, by = "gene",
-		    filter = list(SeqnameFilter(c(1:22, "X", "Y")),
-				  GeneidFilter("ENSG%", "like")))
+EnsGenes <- exonsBy(edb, by = "gene", filter = AnnotationFilterList(
+					  SeqNameFilter(c(1:22, "X", "Y")),
+					  GeneIdFilter("ENSG", "startsWith")))
 ```
 
 The code above returns a `GRangesList` that can be used directly as an input for
@@ -339,7 +382,7 @@ Alternatively, the above `GRangesList` can be transformed to a `data.frame` in
 *SAF* format that can be used as an input to the `featureCounts` function of the
 `Rsubread` package <sup><a id="fnr.4" class="footref" href="#fn.4">4</a></sup>.
 
-```{r eval=FALSE}
+```{r toSAF-RNAseq, message = FALSE, eval=FALSE}
 ## Transforming the GRangesList into a data.frame in SAF format
 EnsGenes.SAF <- toSAF(EnsGenes)
 ```
@@ -353,13 +396,14 @@ In addition, the `disjointExons` function (similar to the one defined in
 `GenomicFeatures`) can be used to generate a `GRanges` of non-overlapping exon
 parts which can be used in the `DEXSeq` package.
 
-```{r eval=FALSE}
+```{r disjointExons, message = FALSE, eval=FALSE}
 ## Create a GRanges of non-overlapping exon parts.
-DJE <- disjointExons(edb,
-		     filter = list(SeqnameFilter(c(1:22, "X", "Y")),
-				   GeneidFilter("ENSG%", "like")))
+DJE <- disjointExons(edb, filter = AnnotationFilterList(
+			      SeqNameFilter(c(1:22, "X", "Y")),
+			      GeneIdFilter("ENSG%", "startsWith")))
 ```
 
+
 # Retrieving sequences for gene/transcript/exon models
 
 The methods to retrieve exons, transcripts and genes (i.e. `exons`, `transcripts`
@@ -381,7 +425,7 @@ the package, subset to genes encoded on sequences available in the `FaFile` and
 extract all of their sequences. Note: these sequences represent the sequence
 between the chromosomal start and end coordinates of the gene.
 
-```{r eval=FALSE}
+```{r transcript-sequence-AnnotationHub, message = FALSE, eval = FALSE}
 library(EnsDb.Hsapiens.v75)
 library(Rsamtools)
 edb <- EnsDb.Hsapiens.v75
@@ -405,9 +449,9 @@ To retrieve the (exonic) sequence of transcripts (i.e. without introns) we can
 use directly the `extractTranscriptSeqs` method defined in the `GenomicFeatures` on
 the `EnsDb` object, eventually using a filter to restrict the query.
 
-```{r eval=FALSE}
+```{r transcript-sequence-extractTranscriptSeqs, message = FALSE, eval = FALSE}
 ## get all exons of all transcripts encoded on chromosome Y
-yTx <- exonsBy(edb, filter = SeqnameFilter("Y"))
+yTx <- exonsBy(edb, filter = SeqNameFilter("Y"))
 
 ## Retrieve the sequences for these transcripts from the FaFile.
 library(GenomicFeatures)
@@ -415,17 +459,18 @@ yTxSeqs <- extractTranscriptSeqs(Dna, yTx)
 yTxSeqs
 
 ## Extract the sequences of all transcripts encoded on chromosome Y.
-yTx <- extractTranscriptSeqs(Dna, edb, filter = SeqnameFilter("Y"))
+yTx <- extractTranscriptSeqs(Dna, edb, filter = SeqNameFilter("Y"))
 
 ## Along these lines, we could use the method also to retrieve the coding sequence
 ## of all transcripts on the Y chromosome.
-cdsY <- cdsBy(edb, filter = SeqnameFilter("Y"))
+cdsY <- cdsBy(edb, filter = SeqNameFilter("Y"))
 extractTranscriptSeqs(Dna, cdsY)
 ```
 
 Note: in the next section we describe how transcript sequences can be retrieved
 from a `BSgenome` package that is based on UCSC, not Ensembl.
 
+
 # Integrating annotations from Ensembl based  `EnsDb` packages with UCSC based annotations
 
 Sometimes it might be useful to combine (Ensembl based) annotations from `EnsDb`
@@ -440,12 +485,12 @@ UCSC, NCBI and Ensembl chromosome names for the *main* chromosomes).
 
 In the example below we change the seqnames style to UCSC.
 
-```{r message=FALSE}
+```{r seqlevelsStyle, message = FALSE}
 ## Change the seqlevels style form Ensembl (default) to UCSC:
 seqlevelsStyle(edb) <- "UCSC"
 
-## Now we can use UCSC style seqnames in SeqnameFilters or GRangesFilter:
-genesY <- genes(edb, filter = SeqnameFilter("chrY"))
+## Now we can use UCSC style seqnames in SeqNameFilters or GRangesFilter:
+genesY <- genes(edb, filter = ~ seq_name == "chrY")
 ## The seqlevels of the returned GRanges are also in UCSC style
 seqlevels(genesY)
 ```
@@ -459,7 +504,7 @@ ones from Ensembl) are returned. With `ensembldb.seqnameNotFound` "MISSING" each
 time a seqname can not be found an error is thrown. For all other cases
 (e.g. `ensembldb.seqnameNotFound = NA`) the value of the option is returned.
 
-```{r }
+```{r seqlevelsStyle-2, message = FALSE}
 seqlevelsStyle(edb) <- "UCSC"
 
 ## Getting the default option:
@@ -483,7 +528,7 @@ the `BSGenome` package for the human genome from UCSC. The specified version
 while we changed the style of the seqnames to UCSC we did not change the naming
 of the genome release.
 
-```{r warning=FALSE, message=FALSE}
+```{r extractTranscriptSeqs-BSGenome, warning = FALSE, message = FALSE}
 library(BSgenome.Hsapiens.UCSC.hg19)
 bsg <- BSgenome.Hsapiens.UCSC.hg19
 
@@ -493,22 +538,25 @@ unique(genome(edb))
 ## Although differently named, both represent genome build GRCh37.
 
 ## Extract the full transcript sequences.
-yTxSeqs <- extractTranscriptSeqs(bsg, exonsBy(edb, "tx", filter = SeqnameFilter("chrY")))
+yTxSeqs <- extractTranscriptSeqs(bsg, exonsBy(edb, "tx",
+					      filter = SeqNameFilter("chrY")))
 
 yTxSeqs
 
 ## Extract just the CDS
-Test <- cdsBy(edb, "tx", filter = SeqnameFilter("chrY"))
-yTxCds <- extractTranscriptSeqs(bsg, cdsBy(edb, "tx", filter = SeqnameFilter("chrY")))
+Test <- cdsBy(edb, "tx", filter = SeqNameFilter("chrY"))
+yTxCds <- extractTranscriptSeqs(bsg, cdsBy(edb, "tx",
+					   filter = SeqNameFilter("chrY")))
 yTxCds
 ```
 
-At last changing the seqname style to the default value ="Ensembl"=.
+At last changing the seqname style to the default value `"Ensembl"`.
 
-```{r }
+```{r seqlevelsStyle-restore}
 seqlevelsStyle(edb) <- "Ensembl"
 ```
 
+
 # Interactive annotation lookup using the `shiny` web app
 
 In addition to the `genes`, `transcripts` and `exons` methods it is possibly to
@@ -517,7 +565,8 @@ search interactively for gene/transcript/exon annotations using the internal,
 `runEnsDbApp()` function. The search results from this app can also be returned
 to the R workspace either as a `data.frame` or `GRanges` object.
 
-# Plotting gene/transcript features using `ensembldb` and `Gviz`
+
+# Plotting gene/transcript features using `ensembldb` and `Gviz` and `ggbio`
 
 The `Gviz` package provides functions to plot genes and transcripts along with
 other data on a genomic scale. Gene models can be provided either as a
@@ -535,7 +584,7 @@ not necessary if we just want to retrieve gene models from an `EnsDb` object, as
 the `ensembldb` package internally checks the `ucscChromosomeNames` option and,
 depending on that, maps Ensembl chromosome names to UCSC chromosome names.
 
-```{r gviz-plot, message=FALSE, fig.align='center', fig.width=7.5, fig.height=2.25}
+```{r gviz-plot, message=FALSE, fig.align='center', fig.width=7.5, fig.height=2.3}
 ## Loading the Gviz library
 library(Gviz)
 library(EnsDb.Hsapiens.v75)
@@ -560,7 +609,7 @@ options(ucscChromosomeNames = TRUE)
 Above we had to change the option `ucscChromosomeNames` to `FALSE` in order to
 use it with non-UCSC chromosome names. Alternatively, we could however also
 change the `seqnamesStyle` of the `EnsDb` object to `UCSC`. Note that we have to
-use now also chromosome names in the *UCSC style* in the `SeqnameFilter`
+use now also chromosome names in the *UCSC style* in the `SeqNameFilter`
 (i.e. "chrY" instead of `Y`).
 
 ```{r message=FALSE}
@@ -581,10 +630,10 @@ different gene region tracks, one for protein coding genes and one for lincRNAs.
 ```{r gviz-separate-tracks, message=FALSE, warning=FALSE, fig.align='center', fig.width=7.5, fig.height=2.25}
 protCod <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
 				     start = 20400000, end = 21400000,
-				     filter = GenebiotypeFilter("protein_coding"))
+				     filter = GeneBiotypeFilter("protein_coding"))
 lincs <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
 				   start = 20400000, end = 21400000,
-				   filter = GenebiotypeFilter("lincRNA"))
+				   filter = GeneBiotypeFilter("lincRNA"))
 
 plotTracks(list(gat, GeneRegionTrack(protCod, name = "protein coding"),
 		GeneRegionTrack(lincs, name = "lincRNAs")), transcriptAnnotation = "symbol")
@@ -593,6 +642,28 @@ plotTracks(list(gat, GeneRegionTrack(protCod, name = "protein coding"),
 seqlevelsStyle <- "Ensembl"
 ```
 
+Alternatively, we can also use `ggbio` for plotting. For `ggplot` we can directly
+pass the `EnsDb` object along with optional filters (or as in the example below a
+filter expression as a `formula`).
+
+```{r pplot-plot, message=FALSE, fig.align='center', fig.width=7.5, fig.height=4}
+library(ggbio)
+
+## Create a plot for all transcripts of the gene SKA2
+autoplot(edb, ~ genename == "SKA2")
+```
+
+To plot the genomic region and plot genes from both strands we can use a
+`GRangesFilter`.
+
+```{r pplot-plot-2, message=FALSE, fig.align='center', fig.width=7.5, fig.height=4}
+## Get the chromosomal region in which the gene is encoded
+ska2 <- genes(edb, filter = ~ genename == "SKA2")
+strand(ska2) <- "*"
+autoplot(edb, GRangesFilter(ska2), names.expr = "gene_name")
+```
+
+
 # Using `EnsDb` objects in the `AnnotationDbi` framework
 
 Most of the methods defined for objects extending the basic annotation package
@@ -605,7 +676,7 @@ In the example below we first evaluate all the available columns and keytypes in
 the database and extract then the gene names for all genes encoded on chromosome
 X.
 
-```{r }
+```{r AnnotationDbi, message = FALSE}
 library(EnsDb.Hsapiens.v75)
 edb <- EnsDb.Hsapiens.v75
 
@@ -626,7 +697,7 @@ gids <- keys(edb, keytype = "GENEID")
 length(gids)
 
 ## Get all gene names for genes encoded on chromosome Y.
-gnames <- keys(edb, keytype = "GENENAME", filter = SeqnameFilter("Y"))
+gnames <- keys(edb, keytype = "GENENAME", filter = SeqNameFilter("Y"))
 head(gnames)
 ```
 
@@ -636,14 +707,14 @@ In the next example we retrieve specific information from the database using the
 we employ the filtering system to perform a more fine-grained query to fetch
 only the protein coding transcripts for these genes.
 
-```{r warning=FALSE}
+```{r select, message = FALSE, warning=FALSE}
 ## Use the /standard/ way to fetch data.
 select(edb, keys = c("BCL2", "BCL2L11"), keytype = "GENENAME",
        columns = c("GENEID", "GENENAME", "TXID", "TXBIOTYPE"))
 
 ## Use the filtering system of ensembldb
-select(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
-			TxbiotypeFilter("protein_coding")),
+select(edb, keys = ~ genename %in% c("BCL2", "BCL2L11") &
+		tx_biotype == "protein_coding",
        columns = c("GENEID", "GENENAME", "TXID", "TXBIOTYPE"))
 ```
 
@@ -651,7 +722,7 @@ Finally, we use the `mapIds` method to establish a mapping between ids and
 values. In the example below we fetch transcript ids for the two genes from the
 example above.
 
-```{r }
+```{r mapIds, message = FALSE}
 ## Use the default method, which just returns the first value for multi mappings.
 mapIds(edb, keys = c("BCL2", "BCL2L11"), column = "TXID", keytype = "GENENAME")
 
@@ -661,13 +732,14 @@ mapIds(edb, keys = c("BCL2", "BCL2L11"), column = "TXID", keytype = "GENENAME",
 
 ## And, just like before, we can use filters to map only to protein coding transcripts.
 mapIds(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
-			TxbiotypeFilter("protein_coding")), column = "TXID",
+			TxBiotypeFilter("protein_coding")), column = "TXID",
        multiVals = "list")
 ```
 
 Note that, if the filters are used, the ordering of the result does no longer
 match the ordering of the genes.
 
+
 # Important notes
 
 These notes might explain eventually unexpected results (and, more importantly,
@@ -691,38 +763,79 @@ help avoiding them):
 -   At present, `EnsDb` support only genes/transcripts for which all of their
     exons are encoded on the same chromosome and the same strand.
 
-# Building an transcript-centric database package based on Ensembl annotation
+-   Since a single Ensembl gene ID might be mapped to multiple NCBI Entrezgene IDs
+    methods such as `genes`, `transcripts` etc return a `list` in the `"entrezid"` column
+    of the resulting result object.
 
-The code in this section is not supposed to be automatically executed when the
-vignette is built, as this would require a working installation of the Ensembl
-Perl API, which is not expected to be available on each system. Also, building
-`EnsDb` from alternative sources, like GFF or GTF files takes some time and
-thus also these examples are not directly executed when the vignette is build.
 
-## Requirements
+# Getting or building `EnsDb` databases/packages
 
-The `fetchTablesFromEnsembl` function of the package uses the Ensembl Perl API
-to retrieve the required annotations from an Ensembl database (e.g. from the
-main site *ensembldb.ensembl.org*). Thus, to use the functionality to built
-databases, the Ensembl Perl API needs to be installed (see <sup><a id="fnr.5" class="footref" href="#fn.5">5</a></sup> for details).
+Some of the code in this section is not supposed to be automatically executed
+when the vignette is built, as this would require a working installation of the
+Ensembl Perl API, which is not expected to be available on each system. Also,
+building `EnsDb` from alternative sources, like GFF or GTF files takes some time
+and thus also these examples are not directly executed when the vignette is
+build.
+
+
+## Getting `EnsDb` databases
+
+Some `EnsDb` databases are available as `R` packages from Bioconductor and can be
+simply installed with the `biocLite` function from the `BiocInstaller` package. The
+name of such annotation packages starts with *EnsDb* followed by the abbreviation
+of the organism and the Ensembl version on which the annotation
+bases. `EnsDb.Hsapiens.v86` provides thus an `EnsDb` database for homo sapiens with
+annotations from Ensembl version 86.
+
+Since Bioconductor version 3.5 `EnsDb` databases can also be retrieved directly
+from `AnnotationHub`.
+
+```{r AnnotationHub-query, message = FALSE, eval = use_network}
+library(AnnotationHub)
+## Load the annotation resource.
+ah <- AnnotationHub()
+
+## Query for all available EnsDb databases
+query(ah, "EnsDb")
+```
+
+We can simply fetch one of the databases.
+
+```{r AnnotationHub-query-2, message = FALSE, eval = use_network}
+ahDb <- query(ah, pattern = c("Xiphophorus Maculatus", "EnsDb", 87))
+## What have we got
+ahDb
+```
+
+Fetch the `EnsDb` and use it.
+
+```{r AnnotationHub-fetch, message = FALSE, eval = FALSE}
+ahEdb <- ahDb[[1]]
+
+## retriebe all genes
+gns <- genes(ahEdb)
+```
+
+We could even make an annotation package from this `EnsDb` object using the
+`makeEnsembldbPackage` and passing `dbfile(dbconn(ahEdb))` as `ensdb` argument.
 
-Alternatively, the `ensDbFromAH`, `ensDbFromGff`, `ensDbFromGRanges` and `ensDbFromGtf`
-functions allow to build EnsDb SQLite files from a `GRanges` object or GFF/GTF
-files from Ensembl (either provided as files or *via* `AnnotationHub`). These
-functions do not depend on the Ensembl Perl API, but require a working internet
-connection to fetch the chromosome lengths from Ensembl as these are not
-provided within GTF or GFF files.
 
 ## Building annotation packages
 
-The functions below use the Ensembl Perl API to fetch the required data directly
-from the Ensembl core databases. Thus, the path to the Perl API specific for the
-desired Ensembl version needs to be added to the `PERL5LIB` environment variable.
 
-An annotation package containing all human genes for Ensembl version 75 can be
-created using the code in the block below.
+### Directly from Ensembl databases
+
+The `fetchTablesFromEnsembl` function uses the Ensembl Perl API
+to retrieve the required annotations from an Ensembl database (e.g. from the
+main site *ensembldb.ensembl.org*). Thus, to use this functionality to build
+databases, the Ensembl Perl API needs to be installed (see <sup><a id="fnr.5" class="footref" href="#fn.5">5</a></sup> for details).
+
+Below we create an `EnsDb` database by fetching the required data directly from
+the Ensembl core databases. The `makeEnsembldbPackage` function is then used to
+create an annotation package from this `EnsDb` containing all human genes for
+Ensembl version 75.
 
-```{r eval=FALSE}
+```{r edb-from-ensembl, message = FALSE, eval = FALSE}
 library(ensembldb)
 
 ## get all human gene/transcript/exon annotations from Ensembl (75)
@@ -751,6 +864,20 @@ thaliana), the *Ensembl genomes* should be specified as a host, i.e. setting
 `host` to "mysql-eg-publicsql.ebi.ac.uk", `port` to `4157` and `species` to
 e.g. "arabidopsis thaliana".
 
+
+### From a GTF or GFF file
+
+Alternatively, the `ensDbFromAH`, `ensDbFromGff`, `ensDbFromGRanges` and `ensDbFromGtf`
+functions allow to build EnsDb SQLite files from a `GRanges` object or GFF/GTF
+files from Ensembl (either provided as files or *via* `AnnotationHub`). These
+functions do not depend on the Ensembl Perl API, but require a working internet
+connection to fetch the chromosome lengths from Ensembl as these are not
+provided within GTF or GFF files. Also note that protein annotations are usually
+not available in GTF or GFF files, thus, such annotations will not be included
+in the generated `EnsDb` database - protein annotations are only available in
+`EnsDb` databases created with the Ensembl Perl API (such as the ones provided
+through `AnnotationHub` or as Bioconductor packages).
+
 In the next example we create an `EnsDb` database using the `AnnotationHub`
 package and load also the corresponding genomic DNA sequence matching the
 Ensembl version. We thus first query the `AnnotationHub` package for all
@@ -760,7 +887,7 @@ then use the `getGenomeFaFile` method on the `EnsDb` to directly look up and
 retrieve the correct or best matching `FaFile` with the genomic DNA sequence. At
 last we retrieve the sequences of all exons using the `getSeq` method.
 
-```{r eval=FALSE}
+```{r gtf-gff-edb, message = FALSE, eval = FALSE}
 ## Load the AnnotationHub data.
 library(AnnotationHub)
 ah <- AnnotationHub()
@@ -782,7 +909,7 @@ edb <- EnsDb(DbFile)
 Dna <- getGenomeFaFile(edb)
 library(Rsamtools)
 ## We next retrieve the sequence of all exons on chromosome Y.
-exons <- exons(edb, filter = SeqnameFilter("Y"))
+exons <- exons(edb, filter = SeqNameFilter("Y"))
 exonSeq <- getSeq(Dna, exons)
 
 ## Alternatively, look up and retrieve the toplevel DNA sequence manually.
@@ -790,37 +917,37 @@ Dna <- ah[["AH22042"]]
 ```
 
 In the example below we load a `GRanges` containing gene definitions for genes
-encoded on chromosome Y and generate a EnsDb SQLite database from that
+encoded on chromosome Y and generate a `EnsDb` SQLite database from that
 information.
 
-```{r message=FALSE}
+```{r EnsDb-from-Y-GRanges, message = FALSE, eval = use_network}
 ## Generate a sqlite database from a GRanges object specifying
 ## genes encoded on chromosome Y
 load(system.file("YGRanges.RData", package = "ensembldb"))
 Y
 
+## Create the EnsDb database file
 DB <- ensDbFromGRanges(Y, path = tempdir(), version = 75,
 		       organism = "Homo_sapiens")
 
+## Load the database
 edb <- EnsDb(DB)
 edb
-
-## As shown in the example below, we could make an EnsDb package on
-## this DB object using the makeEnsembldbPackage function.
 ```
 
 Alternatively we can build the annotation database using the `ensDbFromGtf`
-`ensDbFromGff` functions, that extracts most of the required data from a GTF
-respectively GFF (version 3) file which can be downloaded from Ensembl (e.g. from
-<ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens> for human gene definitions
-from Ensembl version 75; for plant genomes etc files can be retrieved from
-<ftp://ftp.ensemblgenomes.org>). All information except the chromosome lengths and
-the NCBI Entrezgene IDs can be extracted from these GTF files. The function also
-tries to retrieve chromosome length information automatically from Ensembl.
+`ensDbFromGff` functions, that extract most of the required data from a GTF
+respectively GFF (version 3) file which can be downloaded from Ensembl
+(e.g. from <ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens> for human gene
+definitions from Ensembl version 75; for plant genomes etc, files can be
+retrieved from <ftp://ftp.ensemblgenomes.org>). All information except the
+chromosome lengths, the NCBI Entrezgene IDs and protein annotations can be
+extracted from these GTF files. The function also tries to retrieve chromosome
+length information automatically from Ensembl.
 
 Below we create the annotation from a gtf file that we fetch directly from Ensembl.
 
-```{r eval=FALSE}
+```{r EnsDb-from-GTF, message = FALSE, eval = FALSE}
 library(ensembldb)
 
 ## the GTF file can be downloaded from
@@ -839,17 +966,23 @@ makeEnsembldbPackage(ensdb = DB, version = "0.99.12",
 		     author = "J Rainer")
 ```
 
-# Database layout<a id="orgtarget1"></a>
+
+# Database layout<a id="org35014ed"></a>
 
 The database consists of the following tables and attributes (the layout is also
-shown in Figure [115](#orgparagraph1)):
+shown in Figure [159](#org6a42233)). Note that the protein-specific annotations
+might not be available in all `EnsDB` databases (e.g. such ones created with
+`ensembldb` version < 1.7 or created from GTF or GFF files).
 
 -   **gene**: all gene specific annotations.
     -   `gene_id`: the Ensembl ID of the gene.
     -   `gene_name`: the name (symbol) of the gene.
+<<<<<<< variant A
     -   `entrezid`: the NCBI Entrezgene ID(s) of the gene. Note that this can be a
         `;` separated list of IDs for genes that are mapped to more than one
         Entrezgene.
+>>>>>>> variant B
+======= end
     -   `gene_biotype`: the biotype of the gene.
     -   `gene_seq_start`: the start coordinate of the gene on the sequence (usually
         a chromosome).
@@ -858,6 +991,11 @@ shown in Figure [115](#orgparagraph1)):
     -   `seq_strand`: the strand on which the gene is encoded.
     -   `seq_coord_system`: the coordinate system of the sequence.
 
+-   **entrezgene**: mapping of Ensembl genes to NCBI Entrezgene identifiers. Note that
+    this mapping can be a one-to-many mapping.
+    -   `gene_id`: the Ensembl gene ID.
+    -   `entrezid`: the NCBI Entrezgene ID.
+
 -   **tx**: all transcript related annotations. Note that while no `tx_name` column
     is available in this database column, all methods to retrieve data from the
     database support also this column. The returned values are however the ID of
@@ -887,9 +1025,36 @@ shown in Figure [115](#orgparagraph1)):
     -   `seq_length`: the length of the sequence.
     -   `is_circular`: whether the sequence in circular.
 
--   **information**: some additional, internal, informations (Genome build, Ensembl
+-   **protein**: provides protein annotation for a (coding) transcript.
+    -   `protein_id`: the Ensembl protein ID.
+    -   `tx_id`: the transcript ID which CDS encodes the protein.
+    -   `protein_sequence`: the peptide sequence of the protein (translated from the
+        transcript's coding sequence after applying eventual RNA editing).
+
+-   **uniprot**: provides the mapping from Ensembl protein ID(s) to Uniprot ID(s). Not
+    all Ensembl proteins are annotated to Uniprot IDs, also, each Ensembl protein
+    might be mapped to multiple Uniprot IDs.
+    -   `protein_id`: the Ensembl protein ID.
+    -   `uniprot_id`: the Uniprot ID.
+    -   `uniprot_db`: the Uniprot database in which the ID is defined.
+    -   `uniprot_mapping_type`: the type of the mapping method that was used to assign
+        the Uniprot ID to an Ensembl protein ID.
+
+-   **protein\_domain**: provides protein domain annotations and mapping to proteins.
+    -   `protein_id`: the Ensembl protein ID on which the protein domain is present.
+    -   `protein_domain_id`: the ID of the protein domain (from the protein domain
+        source).
+    -   `protein_domain_source`: the source/analysis method in/by which the protein
+        domain was defined (such as pfam etc).
+    -   `interpro_accession`: the Interpro accession ID of the protein domain.
+    -   `prot_dom_start`: the start position of the protein domain within the
+        protein's sequence.
+    -   `prot_dom_end`: the end position of the protein domain within the protein's
+        sequence.
+
+-   **metadata**: some additional, internal, informations (Genome build, Ensembl
     version etc).
-    -   `key`
+    -   `name`
     -   `value`
 
 -   *virtual* columns:
@@ -897,24 +1062,22 @@ shown in Figure [115](#orgparagraph1)):
         possible to use it in the `columns` parameter. This column is *symlinked* to the
         `gene_name` column.
     -   `tx_name`: similar to the `symbol` column, this column is *symlinked* to the `tx_id`
-            column.
+        column.
 
-![img](images/dblayout.png "Database layout.")
+The database layout: as already described above, protein related annotations
+(green) might not be available in each `EnsDb` database.
 
-<div id="footnotes">
-<h2 class="footnotes">Footnotes: </h2>
-<div id="text-footnotes">
+![img](images/dblayout.png "Database layout.")
 
-<div class="footdef"><sup><a id="fn.1" class="footnum" href="#fnr.1">1</a></sup> <div class="footpara"><http://www.ensembl.org></div></div>
 
-<div class="footdef"><sup><a id="fn.2" class="footnum" href="#fnr.2">2</a></sup> <div class="footpara"><http://www.lrg-sequence.org></div></div>
+# Footnotes
 
-<div class="footdef"><sup><a id="fn.3" class="footnum" href="#fnr.3">3</a></sup> <div class="footpara"><http://www.ncbi.nlm.nih.gov/pubmed/23950696></div></div>
+<sup><a id="fn.1" href="#fnr.1">1</a></sup> <http://www.ensembl.org>
 
-<div class="footdef"><sup><a id="fn.4" class="footnum" href="#fnr.4">4</a></sup> <div class="footpara"><http://www.ncbi.nlm.nih.gov/pubmed/24227677></div></div>
+<sup><a id="fn.2" href="#fnr.2">2</a></sup> <http://www.lrg-sequence.org>
 
-<div class="footdef"><sup><a id="fn.5" class="footnum" href="#fnr.5">5</a></sup> <div class="footpara"><http://www.ensembl.org/info/docs/api/api_installation.html></div></div>
+<sup><a id="fn.3" href="#fnr.3">3</a></sup> <http://www.ncbi.nlm.nih.gov/pubmed/23950696>
 
+<sup><a id="fn.4" href="#fnr.4">4</a></sup> <http://www.ncbi.nlm.nih.gov/pubmed/24227677>
 
-</div>
-</div>
+<sup><a id="fn.5" href="#fnr.5">5</a></sup> <http://www.ensembl.org/info/docs/api/api_installation.html>
diff --git a/inst/doc/ensembldb.html b/inst/doc/ensembldb.html
index 4b49ca3..8952374 100644
--- a/inst/doc/ensembldb.html
+++ b/inst/doc/ensembldb.html
@@ -4,25 +4,30 @@
 
 <head>
 
-<meta charset="utf-8">
+<meta charset="utf-8" />
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="pandoc" />
 
 
+<meta name="author" content="Johannes Rainer" />
 
 
 <title>Generating an using Ensembl based annotation packages</title>
 
 <script src="data:application/x-javascript;base64,LyohIGpRdWVyeSB2MS4xMS4zIHwgKGMpIDIwMDUsIDIwMTUgalF1ZXJ5IEZvdW5kYXRpb24sIEluYy4gfCBqcXVlcnkub3JnL2xpY2Vuc2UgKi8KIWZ1bmN0aW9uKGEsYil7Im9iamVjdCI9PXR5cGVvZiBtb2R1bGUmJiJvYmplY3QiPT10eXBlb2YgbW9kdWxlLmV4cG9ydHM/bW9kdWxlLmV4cG9ydHM9YS5kb2N1bWVudD9iKGEsITApOmZ1bmN0aW9uKGEpe2lmKCFhLmRvY3VtZW50KXRocm93IG5ldyBFcnJvcigialF1ZXJ5IHJlcXVpcmVzIGEgd2luZG93IHdpdGggYSBkb2N1bWVudCIpO3JldHVybiBiKGEpfTpiKGEpfSgidW5kZWZpbmVkIiE9dHlwZW9mIHdpbmRvdz93aW5kb3c6dG [...]
 <meta name="viewport" content="width=device-width, initial-scale=1" />
-<link href="data:text/css;charset=utf-8,html%7Bfont%2Dfamily%3Asans%2Dserif%3B%2Dwebkit%2Dtext%2Dsize%2Dadjust%3A100%25%3B%2Dms%2Dtext%2Dsize%2Dadjust%3A100%25%7Dbody%7Bmargin%3A0%7Darticle%2Caside%2Cdetails%2Cfigcaption%2Cfigure%2Cfooter%2Cheader%2Chgroup%2Cmain%2Cmenu%2Cnav%2Csection%2Csummary%7Bdisplay%3Ablock%7Daudio%2Ccanvas%2Cprogress%2Cvideo%7Bdisplay%3Ainline%2Dblock%3Bvertical%2Dalign%3Abaseline%7Daudio%3Anot%28%5Bcontrols%5D%29%7Bdisplay%3Anone%3Bheight%3A0%7D%5Bhidden%5D%2Ctem [...]
+<link href="data:text/css;charset=utf-8,html%7Bfont%2Dfamily%3Asans%2Dserif%3B%2Dwebkit%2Dtext%2Dsize%2Dadjust%3A100%25%3B%2Dms%2Dtext%2Dsize%2Dadjust%3A100%25%7Dbody%7Bmargin%3A0%7Darticle%2Caside%2Cdetails%2Cfigcaption%2Cfigure%2Cfooter%2Cheader%2Chgroup%2Cmain%2Cmenu%2Cnav%2Csection%2Csummary%7Bdisplay%3Ablock%7Daudio%2Ccanvas%2Cprogress%2Cvideo%7Bdisplay%3Ainline%2Dblock%3Bvertical%2Dalign%3Abaseline%7Daudio%3Anot%28%5Bcontrols%5D%29%7Bdisplay%3Anone%3Bheight%3A0%7D%5Bhidden%5D%2Ctem [...]
 <script src="data:application/x-javascript;base64,LyohCiAqIEJvb3RzdHJhcCB2My4zLjUgKGh0dHA6Ly9nZXRib290c3RyYXAuY29tKQogKiBDb3B5cmlnaHQgMjAxMS0yMDE1IFR3aXR0ZXIsIEluYy4KICogTGljZW5zZWQgdW5kZXIgdGhlIE1JVCBsaWNlbnNlCiAqLwppZigidW5kZWZpbmVkIj09dHlwZW9mIGpRdWVyeSl0aHJvdyBuZXcgRXJyb3IoIkJvb3RzdHJhcCdzIEphdmFTY3JpcHQgcmVxdWlyZXMgalF1ZXJ5Iik7K2Z1bmN0aW9uKGEpeyJ1c2Ugc3RyaWN0Ijt2YXIgYj1hLmZuLmpxdWVyeS5zcGxpdCgiICIpWzBdLnNwbGl0KCIuIik7aWYoYlswXTwyJiZiWzFdPDl8fDE9PWJbMF0mJjk9PWJbMV0mJmJbMl08MSl0aHJvdy [...]
 <script src="data:application/x-javascript;base64,LyoqCiogQHByZXNlcnZlIEhUTUw1IFNoaXYgMy43LjIgfCBAYWZhcmthcyBAamRhbHRvbiBAam9uX25lYWwgQHJlbSB8IE1JVC9HUEwyIExpY2Vuc2VkCiovCi8vIE9ubHkgcnVuIHRoaXMgY29kZSBpbiBJRSA4CmlmICghIXdpbmRvdy5uYXZpZ2F0b3IudXNlckFnZW50Lm1hdGNoKCJNU0lFIDgiKSkgewohZnVuY3Rpb24oYSxiKXtmdW5jdGlvbiBjKGEsYil7dmFyIGM9YS5jcmVhdGVFbGVtZW50KCJwIiksZD1hLmdldEVsZW1lbnRzQnlUYWdOYW1lKCJoZWFkIilbMF18fGEuZG9jdW1lbnRFbGVtZW50O3JldHVybiBjLmlubmVySFRNTD0ieDxzdHlsZT4iK2IrIjwvc3R5bGU+IixkLm [...]
 <script src="data:application/x-javascript;base64,LyohIFJlc3BvbmQuanMgdjEuNC4yOiBtaW4vbWF4LXdpZHRoIG1lZGlhIHF1ZXJ5IHBvbHlmaWxsICogQ29weXJpZ2h0IDIwMTMgU2NvdHQgSmVobAogKiBMaWNlbnNlZCB1bmRlciBodHRwczovL2dpdGh1Yi5jb20vc2NvdHRqZWhsL1Jlc3BvbmQvYmxvYi9tYXN0ZXIvTElDRU5TRS1NSVQKICogICovCgovLyBPbmx5IHJ1biB0aGlzIGNvZGUgaW4gSUUgOAppZiAoISF3aW5kb3cubmF2aWdhdG9yLnVzZXJBZ2VudC5tYXRjaCgiTVNJRSA4IikpIHsKIWZ1bmN0aW9uKGEpeyJ1c2Ugc3RyaWN0IjthLm1hdGNoTWVkaWE9YS5tYXRjaE1lZGlhfHxmdW5jdGlvbihhKXt2YXIgYixjPWEuZG [...]
+<script src="data:application/x-javascript;base64,LyohIGpRdWVyeSBVSSAtIHYxLjExLjQgLSAyMDE2LTAxLTA1CiogaHR0cDovL2pxdWVyeXVpLmNvbQoqIEluY2x1ZGVzOiBjb3JlLmpzLCB3aWRnZXQuanMsIG1vdXNlLmpzLCBwb3NpdGlvbi5qcywgZHJhZ2dhYmxlLmpzLCBkcm9wcGFibGUuanMsIHJlc2l6YWJsZS5qcywgc2VsZWN0YWJsZS5qcywgc29ydGFibGUuanMsIGFjY29yZGlvbi5qcywgYXV0b2NvbXBsZXRlLmpzLCBidXR0b24uanMsIGRpYWxvZy5qcywgbWVudS5qcywgcHJvZ3Jlc3NiYXIuanMsIHNlbGVjdG1lbnUuanMsIHNsaWRlci5qcywgc3Bpbm5lci5qcywgdGFicy5qcywgdG9vbHRpcC5qcywgZWZmZWN0LmpzLC [...]
+<link href="data:text/css;charset=utf-8,%0A%0A%2Etocify%20%7B%0Awidth%3A%2020%25%3B%0Amax%2Dheight%3A%2090%25%3B%0Aoverflow%3A%20auto%3B%0Amargin%2Dleft%3A%202%25%3B%0Aposition%3A%20fixed%3B%0Aborder%3A%201px%20solid%20%23ccc%3B%0Awebkit%2Dborder%2Dradius%3A%206px%3B%0Amoz%2Dborder%2Dradius%3A%206px%3B%0Aborder%2Dradius%3A%206px%3B%0A%7D%0A%0A%2Etocify%20ul%2C%20%2Etocify%20li%20%7B%0Alist%2Dstyle%3A%20none%3B%0Amargin%3A%200%3B%0Apadding%3A%200%3B%0Aborder%3A%20none%3B%0Aline%2Dheight%3 [...]
+<script src="data:application/x-javascript;base64,LyoganF1ZXJ5IFRvY2lmeSAtIHYxLjkuMSAtIDIwMTMtMTAtMjIKICogaHR0cDovL3d3dy5ncmVnZnJhbmtvLmNvbS9qcXVlcnkudG9jaWZ5LmpzLwogKiBDb3B5cmlnaHQgKGMpIDIwMTMgR3JlZyBGcmFua287IExpY2Vuc2VkIE1JVCAqLwoKLy8gSW1tZWRpYXRlbHktSW52b2tlZCBGdW5jdGlvbiBFeHByZXNzaW9uIChJSUZFKSBbQmVuIEFsbWFuIEJsb2cgUG9zdF0oaHR0cDovL2JlbmFsbWFuLmNvbS9uZXdzLzIwMTAvMTEvaW1tZWRpYXRlbHktaW52b2tlZC1mdW5jdGlvbi1leHByZXNzaW9uLykgdGhhdCBjYWxscyBhbm90aGVyIElJRkUgdGhhdCBjb250YWlucyBhbGwgb2YgdG [...]
+<script src="data:application/x-javascript;base64,CgovKioKICogalF1ZXJ5IFBsdWdpbjogU3RpY2t5IFRhYnMKICoKICogQGF1dGhvciBBaWRhbiBMaXN0ZXIgPGFpZGFuQHBocC5uZXQ+CiAqIGFkYXB0ZWQgYnkgUnViZW4gQXJzbGFuIHRvIGFjdGl2YXRlIHBhcmVudCB0YWJzIHRvbwogKiBodHRwOi8vd3d3LmFpZGFubGlzdGVyLmNvbS8yMDE0LzAzL3BlcnNpc3RpbmctdGhlLXRhYi1zdGF0ZS1pbi1ib290c3RyYXAvCiAqLwooZnVuY3Rpb24oJCkgewogICJ1c2Ugc3RyaWN0IjsKICAkLmZuLnJtYXJrZG93blN0aWNreVRhYnMgPSBmdW5jdGlvbigpIHsKICAgIHZhciBjb250ZXh0ID0gdGhpczsKICAgIC8vIFNob3cgdGhlIHRhYi [...]
+<link href="data:text/css;charset=utf-8,pre%20%2Eoperator%2C%0Apre%20%2Eparen%20%7B%0Acolor%3A%20rgb%28104%2C%20118%2C%20135%29%0A%7D%0Apre%20%2Eliteral%20%7B%0Acolor%3A%20%23990073%0A%7D%0Apre%20%2Enumber%20%7B%0Acolor%3A%20%23099%3B%0A%7D%0Apre%20%2Ecomment%20%7B%0Acolor%3A%20%23998%3B%0Afont%2Dstyle%3A%20italic%0A%7D%0Apre%20%2Ekeyword%20%7B%0Acolor%3A%20%23900%3B%0Afont%2Dweight%3A%20bold%0A%7D%0Apre%20%2Eidentifier%20%7B%0Acolor%3A%20rgb%280%2C%200%2C%200%29%3B%0A%7D%0Apre%20%2Estri [...]
+<script src="data:application/x-javascript;base64,dmFyIGhsanM9bmV3IGZ1bmN0aW9uKCl7ZnVuY3Rpb24gbShwKXtyZXR1cm4gcC5yZXBsYWNlKC8mL2dtLCImYW1wOyIpLnJlcGxhY2UoLzwvZ20sIiZsdDsiKX1mdW5jdGlvbiBmKHIscSxwKXtyZXR1cm4gUmVnRXhwKHEsIm0iKyhyLmNJPyJpIjoiIikrKHA/ImciOiIiKSl9ZnVuY3Rpb24gYihyKXtmb3IodmFyIHA9MDtwPHIuY2hpbGROb2Rlcy5sZW5ndGg7cCsrKXt2YXIgcT1yLmNoaWxkTm9kZXNbcF07aWYocS5ub2RlTmFtZT09IkNPREUiKXtyZXR1cm4gcX1pZighKHEubm9kZVR5cGU9PTMmJnEubm9kZVZhbHVlLm1hdGNoKC9ccysvKSkpe2JyZWFrfX19ZnVuY3Rpb24gaCh0LH [...]
 
 <style type="text/css">code{white-space: pre;}</style>
-<link href="data:text/css;charset=utf-8,pre%20%2Eoperator%2C%0Apre%20%2Eparen%20%7B%0Acolor%3A%20rgb%28104%2C%20118%2C%20135%29%0A%7D%0Apre%20%2Eliteral%20%7B%0Acolor%3A%20%23990073%0A%7D%0Apre%20%2Enumber%20%7B%0Acolor%3A%20%23099%3B%0A%7D%0Apre%20%2Ecomment%20%7B%0Acolor%3A%20%23998%3B%0Afont%2Dstyle%3A%20italic%0A%7D%0Apre%20%2Ekeyword%20%7B%0Acolor%3A%20%23900%3B%0Afont%2Dweight%3A%20bold%0A%7D%0Apre%20%2Eidentifier%20%7B%0Acolor%3A%20rgb%280%2C%200%2C%200%29%3B%0A%7D%0Apre%20%2Estri [...]
-<script src="data:application/x-javascript;base64,dmFyIGhsanM9bmV3IGZ1bmN0aW9uKCl7ZnVuY3Rpb24gbShwKXtyZXR1cm4gcC5yZXBsYWNlKC8mL2dtLCImYW1wOyIpLnJlcGxhY2UoLzwvZ20sIiZsdDsiKX1mdW5jdGlvbiBmKHIscSxwKXtyZXR1cm4gUmVnRXhwKHEsIm0iKyhyLmNJPyJpIjoiIikrKHA/ImciOiIiKSl9ZnVuY3Rpb24gYihyKXtmb3IodmFyIHA9MDtwPHIuY2hpbGROb2Rlcy5sZW5ndGg7cCsrKXt2YXIgcT1yLmNoaWxkTm9kZXNbcF07aWYocS5ub2RlTmFtZT09IkNPREUiKXtyZXR1cm4gcX1pZighKHEubm9kZVR5cGU9PTMmJnEubm9kZVZhbHVlLm1hdGNoKC9ccysvKSkpe2JyZWFrfX19ZnVuY3Rpb24gaCh0LH [...]
 <style type="text/css">
 
 </style>
@@ -63,7 +68,7 @@ h6 {
 }
 </style>
 
-<link href="data:text/css;charset=utf-8,body%20%7B%0Amax%2Dwidth%3A%201054px%3B%0Amargin%3A%200px%20auto%3B%0A%7D%0Abody%2C%20td%20%7B%0Afont%2Dfamily%3A%20sans%2Dserif%3B%0Afont%2Dsize%3A%2010pt%3B%0A%7D%0A%0Adiv%23TOC%20ul%20%7B%0Apadding%3A%200px%200px%200px%2045px%3B%0Alist%2Dstyle%3A%20none%3B%0Abackground%2Dimage%3A%20none%3B%0Abackground%2Drepeat%3A%20none%3B%0Abackground%2Dposition%3A%200%3B%0Afont%2Dsize%3A%2010pt%3B%0Afont%2Dfamily%3A%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B [...]
+<link href="data:text/css;charset=utf-8,body%20%7B%0Amargin%3A%200px%20auto%3B%0Amax%2Dwidth%3A%201134px%3B%0A%7D%0Abody%2C%20td%20%7B%0Afont%2Dfamily%3A%20sans%2Dserif%3B%0Afont%2Dsize%3A%2010pt%3B%0A%7D%0A%0Adiv%23TOC%20ul%20%7B%0Apadding%3A%200px%200px%200px%2045px%3B%0Alist%2Dstyle%3A%20none%3B%0Abackground%2Dimage%3A%20none%3B%0Abackground%2Drepeat%3A%20none%3B%0Abackground%2Dposition%3A%200%3B%0Afont%2Dsize%3A%2010pt%3B%0Afont%2Dfamily%3A%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B [...]
 
 </head>
 
@@ -71,7 +76,7 @@ h6 {
 
 <style type="text/css">
 .main-container {
-  max-width: 768px;
+  max-width: 828px;
   margin-left: auto;
   margin-right: auto;
 }
@@ -93,7 +98,6 @@ button.code-folding-btn:focus {
 <div class="container-fluid main-container">
 
 <!-- tabsets -->
-<script src="data:application/x-javascript;base64,Cgp3aW5kb3cuYnVpbGRUYWJzZXRzID0gZnVuY3Rpb24odG9jSUQpIHsKCiAgLy8gYnVpbGQgYSB0YWJzZXQgZnJvbSBhIHNlY3Rpb24gZGl2IHdpdGggdGhlIC50YWJzZXQgY2xhc3MKICBmdW5jdGlvbiBidWlsZFRhYnNldCh0YWJzZXQpIHsKCiAgICAvLyBjaGVjayBmb3IgZmFkZSBhbmQgcGlsbHMgb3B0aW9ucwogICAgdmFyIGZhZGUgPSB0YWJzZXQuaGFzQ2xhc3MoInRhYnNldC1mYWRlIik7CiAgICB2YXIgcGlsbHMgPSB0YWJzZXQuaGFzQ2xhc3MoInRhYnNldC1waWxscyIpOwogICAgdmFyIG5hdkNsYXNzID0gcGlsbHMgPyAibmF2LXBpbGxzIiA6ICJuYXYtdGFicyI7CgogIC [...]
 <script>
 $(document).ready(function () {
   window.buildTabsets("TOC");
@@ -105,6 +109,98 @@ $(document).ready(function () {
 
 
 
+<script>
+$(document).ready(function ()  {
+
+    // move toc-ignore selectors from section div to header
+    $('div.section.toc-ignore')
+        .removeClass('toc-ignore')
+        .children('h1,h2,h3,h4,h5').addClass('toc-ignore');
+
+    // establish options
+    var options = {
+      selectors: "h1,h2,h3",
+      theme: "bootstrap3",
+      context: '.toc-content',
+      hashGenerator: function (text) {
+        return text.replace(/[.\\/?&!#<>]/g, '').replace(/\s/g, '_').toLowerCase();
+      },
+      ignoreSelector: ".toc-ignore",
+      scrollTo: 0
+    };
+    options.showAndHide = true;
+    options.smoothScroll = true;
+
+    // tocify
+    var toc = $("#TOC").tocify(options).data("toc-tocify");
+});
+</script>
+
+<style type="text/css">
+
+#TOC {
+  margin: 25px 0px 20px 0px;
+}
+ at media (max-width: 768px) {
+#TOC {
+  position: relative;
+  width: 100%;
+}
+}
+
+
+
+
+div.main-container {
+  max-width: 1200px;
+}
+
+div.tocify {
+  width: 20%;
+  max-width: 246px;
+  max-height: 85%;
+}
+
+ at media (min-width: 768px) and (max-width: 991px) {
+  div.tocify {
+    width: 25%;
+  }
+}
+
+ at media (max-width: 767px) {
+  div.tocify {
+    width: 100%;
+    max-width: none;
+  }
+}
+
+.tocify ul, .tocify li {
+  line-height: 20px;
+}
+
+.tocify-subheader .tocify-item {
+  font-size: 0.90em;
+  padding-left: 25px;
+  text-indent: 0;
+}
+
+.tocify .list-group-item {
+  border-radius: 0px;
+}
+
+
+</style>
+
+<!-- setup 3col/9col grid for toc_float and main content  -->
+<div class="row-fluid">
+<div class="col-xs-12 col-sm-4 col-md-3">
+<div id="TOC" class="tocify">
+</div>
+</div>
+
+<div class="toc-content col-xs-12 col-sm-8 col-md-9">
+
+
 
 
 <div class="fluid-row" id="header">
@@ -112,35 +208,19 @@ $(document).ready(function () {
 
 
 <h1 class="title toc-ignore">Generating an using Ensembl based annotation packages</h1>
+<p class="author-name">Johannes Rainer</p>
+<h4 class="date"><em>4 August 2017</em></h4>
+<h4 class="package">Package</h4>
+<p>ensembldb 2.0.4</p>
 
 </div>
 
-<h1>Contents</h1>
-<div id="TOC">
-<ul>
-<li><a href="#introduction"><span class="toc-section-number">1</span> Introduction</a></li>
-<li><a href="#using-ensembldb-annotation-packages-to-retrieve-specific-annotations"><span class="toc-section-number">2</span> Using <code>ensembldb</code> annotation packages to retrieve specific annotations</a></li>
-<li><a href="#extracting-genetranscriptexon-models-for-rnaseq-feature-counting"><span class="toc-section-number">3</span> Extracting gene/transcript/exon models for RNASeq feature counting</a></li>
-<li><a href="#retrieving-sequences-for-genetranscriptexon-models"><span class="toc-section-number">4</span> Retrieving sequences for gene/transcript/exon models</a></li>
-<li><a href="#integrating-annotations-from-ensembl-based-ensdb-packages-with-ucsc-based-annotations"><span class="toc-section-number">5</span> Integrating annotations from Ensembl based <code>EnsDb</code> packages with UCSC based annotations</a></li>
-<li><a href="#interactive-annotation-lookup-using-the-shiny-web-app"><span class="toc-section-number">6</span> Interactive annotation lookup using the <code>shiny</code> web app</a></li>
-<li><a href="#plotting-genetranscript-features-using-ensembldb-and-gviz"><span class="toc-section-number">7</span> Plotting gene/transcript features using <code>ensembldb</code> and <code>Gviz</code></a></li>
-<li><a href="#using-ensdb-objects-in-the-annotationdbi-framework"><span class="toc-section-number">8</span> Using <code>EnsDb</code> objects in the <code>AnnotationDbi</code> framework</a></li>
-<li><a href="#important-notes"><span class="toc-section-number">9</span> Important notes</a></li>
-<li><a href="#building-an-transcript-centric-database-package-based-on-ensembl-annotation"><span class="toc-section-number">10</span> Building an transcript-centric database package based on Ensembl annotation</a><ul>
-<li><a href="#requirements"><span class="toc-section-number">10.1</span> Requirements</a></li>
-<li><a href="#building-annotation-packages"><span class="toc-section-number">10.2</span> Building annotation packages</a></li>
-</ul></li>
-<li><a href="#database-layout"><span class="toc-section-number">11</span> Database layout<a id="orgtarget1"></a></a></li>
-</ul>
-</div>
 
-<p><strong>Package</strong>: <em><a href="http://bioconductor.org/packages/ensembldb">ensembldb</a></em><br /> <strong>Authors</strong>: Johannes Rainer <a href="mailto:johannes.rainer at eurac.edu">johannes.rainer at eurac.edu</a>, Tim Triche <a href="mailto:tim.triche at usc.edu">tim.triche at usc.edu</a><br /> <strong>Modified</strong>: 12 September, 2016<br /> <strong>Compiled</strong>: Wed Nov 16 19:52:05 2016</p>
 <div id="introduction" class="section level1">
 <h1><span class="header-section-number">1</span> Introduction</h1>
-<p>The <code>ensembldb</code> package provides functions to create and use transcript centric annotation databases/packages. The annotation for the databases are directly fetched from Ensembl <sup><a id="fnr.1" class="footref" href="#fn.1">1</a></sup> using their Perl API. The functionality and data is similar to that of the <code>TxDb</code> packages from the <code>GenomicFeatures</code> package, but, in addition to retrieve all gene/transcript models and annotations from the database,  [...]
+<p>The <code>ensembldb</code> package provides functions to create and use transcript centric annotation databases/packages. The annotation for the databases are directly fetched from Ensembl <sup><a id="fnr.1" class="footref" href="#fn.1">1</a></sup> using their Perl API. The functionality and data is similar to that of the <code>TxDb</code> packages from the <code>GenomicFeatures</code> package, but, in addition to retrieve all gene/transcript models and annotations from the database,  [...]
 <p>Another main goal of this package is to generate <em>versioned</em> annotation packages, i.e. annotation packages that are build for a specific Ensembl release, and are also named according to that (e.g. <code>EnsDb.Hsapiens.v75</code> for human gene definitions of the Ensembl code database version 75). This ensures reproducibility, as it allows to load annotations from a specific Ensembl release also if newer versions of annotation packages/releases are available. It also allows to l [...]
-<p>In the example below we load an Ensembl based annotation package for Homo sapiens, Ensembl version 75. The connection to the database is bound to the variable <code>EnsDb.Hsapiens.v75</code>.</p>
+<p>In the example below we load an Ensembl based annotation package for Homo sapiens, Ensembl version 75. The <code>EnsDb</code> object providing access to the underlying SQLite database is bound to the variable name <code>EnsDb.Hsapiens.v75</code>.</p>
 <pre class="r"><code>library(EnsDb.Hsapiens.v75)
 
 ## Making a "short cut"
@@ -153,40 +233,74 @@ edb</code></pre>
 ## |Type of Gene ID: Ensembl Gene ID
 ## |Supporting package: ensembldb
 ## |Db created by: ensembldb package from Bioconductor
-## |script_version: 0.1.3
-## |Creation time: Thu Sep 15 13:16:58 2016
+## |script_version: 0.2.3
+## |Creation time: Tue Nov 15 23:35:19 2016
 ## |ensembl_version: 75
 ## |ensembl_host: localhost
 ## |Organism: homo_sapiens
 ## |genome_build: GRCh37
 ## |DBSCHEMAVERSION: 1.0
 ## | No. of genes: 64102.
-## | No. of transcripts: 215647.</code></pre>
-<pre class="r"><code>## for what organism was the database generated?
+## | No. of transcripts: 215647.
+## |Protein data available.</code></pre>
+<pre class="r"><code>## For what organism was the database generated?
 organism(edb)</code></pre>
 <pre><code>## [1] "Homo sapiens"</code></pre>
 </div>
 <div id="using-ensembldb-annotation-packages-to-retrieve-specific-annotations" class="section level1">
 <h1><span class="header-section-number">2</span> Using <code>ensembldb</code> annotation packages to retrieve specific annotations</h1>
-<p>The <code>ensembldb</code> package provides a set of filter objects allowing to specify which entries should be fetched from the database. The complete list of filters, which can be used individually or can be combined, is shown below (in alphabetical order):</p>
+<p>One of the strengths of the <code>ensembldb</code> package and the related <code>EnsDb</code> databases is its implementation of a filter framework that enables to efficiently extract data sub-sets from the databases. The <code>ensembldb</code> package supports most of the filters defined in the <code>AnnotationFilter</code> Bioconductor package and defines some additional filters specific to the data stored in <code>EnsDb</code> databases. The <code>supportedFilters</code> method can [...]
+<pre class="r"><code>supportedFilters(edb)</code></pre>
+<pre><code>##  [1] "EntrezFilter"             "ExonEndFilter"           
+##  [3] "ExonIdFilter"             "ExonRankFilter"          
+##  [5] "ExonStartFilter"          "GRangesFilter"           
+##  [7] "GeneBiotypeFilter"        "GeneEndFilter"           
+##  [9] "GeneIdFilter"             "GeneStartFilter"         
+## [11] "GenenameFilter"           "ProtDomIdFilter"         
+## [13] "ProteinIdFilter"          "SeqNameFilter"           
+## [15] "SeqStrandFilter"          "SymbolFilter"            
+## [17] "TxBiotypeFilter"          "TxEndFilter"             
+## [19] "TxIdFilter"               "TxNameFilter"            
+## [21] "TxStartFilter"            "UniprotDbFilter"         
+## [23] "UniprotFilter"            "UniprotMappingTypeFilter"</code></pre>
+<p>These filters can be divided into 3 main filter types:</p>
 <ul>
-<li><code>ExonidFilter</code>: allows to filter the result based on the (Ensembl) exon identifiers.</li>
-<li><code>ExonrankFilter</code>: filter results on the rank (index) of an exon within the transcript model. Exons are always numbered from 5’ to 3’ end of the transcript, thus, also on the reverse strand, the exon 1 is the most 5’ exon of the transcript.</li>
-<li><code>EntrezidFilter</code>: allows to filter results based on NCBI Entrezgene identifiers of the genes.</li>
-<li><code>GenebiotypeFilter</code>: allows to filter for the gene biotypes defined in the Ensembl database; use the <code>listGenebiotypes</code> method to list all available biotypes.</li>
-<li><code>GeneidFilter</code>: allows to filter based on the Ensembl gene IDs.</li>
-<li><code>GenenameFilter</code>: allows to filter based on the names (symbols) of the genes.</li>
-<li><code>SymbolFilter</code>: allows to filter on gene symbols; note that no database columns <em>symbol</em> is available in an <code>EnsDb</code> database and hence the gene name is used for filtering.</li>
-<li><code>GRangesFilter</code>: allows to retrieve all features (genes, transcripts or exons) that are either within (setting <code>condition</code> to “within”) or partially overlapping (setting <code>condition</code> to “overlapping”) the defined genomic region/range. Note that, depending on the called method (<code>genes</code>, <code>transcripts</code> or <code>exons</code>) the start and end coordinates of either the genes, transcripts or exons are used for the filter. For methods < [...]
-<li><code>SeqendFilter</code>: filter based on the chromosomal end coordinate of the exons, transcripts or genes (correspondingly set =feature = “exon”=, =feature = “tx”= or =feature = “gene”=).</li>
-<li><code>SeqnameFilter</code>: filter by the name of the chromosomes the genes are encoded on.</li>
-<li><code>SeqstartFilter</code>: filter based on the chromosomal start coordinates of the exons, transcripts or genes (correspondingly set =feature = “exon”=, =feature = “tx”= or =feature = “gene”=).</li>
-<li><code>SeqstrandFilter</code>: filter for the chromosome strand on which the genes are encoded.</li>
-<li><code>TxbiotypeFilter</code>: filter on the transcript biotype defined in Ensembl; use the <code>listTxbiotypes</code> method to list all available biotypes.</li>
-<li><code>TxidFilter</code>: filter on the Ensembl transcript identifiers.</li>
+<li><code>IntegerFilter</code>: filter classes extending this basic object can take a single numeric value as input and support the conditions <code>=, !</code>, >, <, >= and <=. All filters that work on chromosomal coordinates, such as the <code>GeneEndFilter</code> extend <code>IntegerFilter</code>.</li>
+<li><code>CharacterFilter</code>: filter classes extending this object can take a single or multiple character values as input and allow conditions: <code>=, !</code>, “startsWith” and “endsWith”. All filters working on IDs extend this class.</li>
+<li><code>GRangesFilter</code>: takes a <code>GRanges</code> object as input and supports all conditions that <code>findOverlaps</code> from the <code>IRanges</code> package supports (“any”, “start”, “end”, “within”, “equal”). Note that these have to be passed using the parameter <code>type</code> to the constructor function.</li>
 </ul>
-<p>Each of the filter classes can take a single value or a vector of values (with the exception of the <code>SeqendFilter</code> and <code>SeqstartFilter</code>) for comparison. In addition, it is possible to specify the <em>condition</em> for the filter, e.g. setting <code>condition</code> to = to retrieve all entries matching the filter value, to != to negate the filter or setting <code>condition = "like"= to allow partial matching. The =condition</code> parameter for <code>S [...]
-<p>A simple example would be to get all transcripts for the gene <em>BCL2L11</em>. To this end we specify a <code>GenenameFilter</code> with the value <em>BCL2L11</em>. As a result we get a <code>GRanges</code> object with <code>start</code>, <code>end</code>, <code>strand</code> and <code>seqname</code> of the <code>GRanges</code> object being the start coordinate, end coordinate, chromosome name and strand for the respective transcripts. All additional annotations are available as meta [...]
+<p>The supported filters are:</p>
+<ul>
+<li><code>EntrezFilter</code>: allows to filter results based on NCBI Entrezgene identifiers of the genes.</li>
+<li><code>ExonEndFilter</code>: filter using the chromosomal end coordinate of exons.</li>
+<li><code>ExonIdFilter</code>: filter based on the (Ensembl) exon identifiers.</li>
+<li><code>ExonRankFilter</code>: filter based on the rank (index) of an exon within the transcript model. Exons are always numbered from 5’ to 3’ end of the transcript, thus, also on the reverse strand, the exon 1 is the most 5’ exon of the transcript.</li>
+<li><code>ExonStartFilter</code>: filter using the chromosomal start coordinate of exons.</li>
+<li><code>GeneBiotypeFilter</code>: filter using the gene biotypes defined in the Ensembl database; use the <code>listGenebiotypes</code> method to list all available biotypes.</li>
+<li><code>GeneEndFilter</code>: filter using the chromosomal end coordinate of gene.</li>
+<li><code>GeneIdFilter</code>: filter based on the Ensembl gene IDs.</li>
+<li><code>GenenameFilter</code>: filter based on the names (symbols) of the genes.</li>
+<li><code>GeneStartFilter</code>: filter using the chromosomal start coordinate of gene.</li>
+<li><code>GRangesFilter</code>: allows to retrieve all features (genes, transcripts or exons) that are either within (setting parameter <code>type</code> to “within”) or partially overlapping (setting <code>type</code> to “any”) the defined genomic region/range. Note that, depending on the called method (<code>genes</code>, <code>transcripts</code> or <code>exons</code>) the start and end coordinates of either the genes, transcripts or exons are used for the filter. For methods <code>exo [...]
+<li><code>SeqNameFilter</code>: filter by the name of the chromosomes the genes are encoded on.</li>
+<li><code>SeqStrandFilter</code>: filter for the chromosome strand on which the genes are encoded.</li>
+<li><code>SymbolFilter</code>: filter on gene symbols; note that no database columns <em>symbol</em> is available in an <code>EnsDb</code> database and hence the gene name is used for filtering.</li>
+<li><code>TxBiotypeFilter</code>: filter on the transcript biotype defined in Ensembl; use the <code>listTxbiotypes</code> method to list all available biotypes.</li>
+<li><code>TxEndFilter</code>: filter using the chromosomal end coordinate of transcripts.</li>
+<li><code>TxIdFilter</code>: filter on the Ensembl transcript identifiers.</li>
+<li><code>TxNameFilter</code>: filter on the Ensembl transcript names (currently identical to the transcript IDs).</li>
+<li><code>TxStartFilter</code>: filter using the chromosomal start coordinate of transcripts.</li>
+</ul>
+<p>In addition to the above listed <em>DNA-RNA-based</em> filters, <em>protein-specific</em> filters are also available:</p>
+<ul>
+<li><code>ProtDomIdFilter</code>: filter by the protein domain ID.</li>
+<li><code>ProteinIdFilter</code>: filter by Ensembl protein ID filters.</li>
+<li><code>UniprotDbFilter</code>: filter by the name of the Uniprot database.</li>
+<li><code>UniprotFilter</code>: filter by the Uniprot ID.</li>
+<li><code>UniprotMappingTypeFilter</code>: filter by the mapping type of Ensembl protein IDs to Uniprot IDs.</li>
+</ul>
+<p>These can however only be used on <code>EnsDb</code> databases that provide protein annotations, i.e. for which a call to <code>hasProteinData</code> returns <code>TRUE</code>.</p>
+<p>A simple use case for the filter framework would be to get all transcripts for the gene <em>BCL2L11</em>. To this end we specify a <code>GenenameFilter</code> with the value <em>BCL2L11</em>. As a result we get a <code>GRanges</code> object with <code>start</code>, <code>end</code>, <code>strand</code> and <code>seqname</code> being the start coordinate, end coordinate, chromosome name and strand for the respective transcripts. All additional annotations are available as metadata colu [...]
 <pre class="r"><code>Tx <- transcripts(edb, filter = list(GenenameFilter("BCL2L11")))
 
 Tx</code></pre>
@@ -239,7 +353,47 @@ head(start(Tx))</code></pre>
 head(Tx$tx_biotype)</code></pre>
 <pre><code>## [1] "protein_coding" "protein_coding" "protein_coding" "protein_coding"
 ## [5] "protein_coding" "protein_coding"</code></pre>
-<p>The parameter <code>columns</code> of the <code>exons</code>, <code>genes</code> and <code>transcripts</code> method allows to specify which database attributes (columns) should be retrieved. The <code>exons</code> method returns by default all exon-related columns, the <code>transcripts</code> all columns from the transcript database table and the <code>genes</code> all from the gene table. Note however that in the example above we got also a column <code>gene_name</code> although th [...]
+<p>The parameter <code>columns</code> of the extractor methods (such as <code>exons</code>, <code>genes</code> or <code>transcripts)</code> allows to specify which database attributes (columns) should be retrieved. The <code>exons</code> method returns by default all exon-related columns, the <code>transcripts</code> all columns from the transcript database table and the <code>genes</code> all from the gene table. Note however that in the example above we got also a column <code>gene_nam [...]
+<p>Instead of passing a filter <em>object</em> to the method it is also possible to provide a filter <em>expression</em> written as a <code>formula</code>.</p>
+<pre class="r"><code>## Use a filter expression to perform the filtering.
+transcripts(edb, filter = ~ genename == "ZBTB16")</code></pre>
+<pre><code>## GRanges object with 9 ranges and 7 metadata columns:
+##                   seqnames                 ranges strand |           tx_id
+##                      <Rle>              <IRanges>  <Rle> |     <character>
+##   ENST00000335953       11 [113930315, 114121398]      + | ENST00000335953
+##   ENST00000541602       11 [113930447, 114060486]      + | ENST00000541602
+##   ENST00000544220       11 [113930459, 113934368]      + | ENST00000544220
+##   ENST00000535700       11 [113930979, 113934466]      + | ENST00000535700
+##   ENST00000392996       11 [113931229, 114121374]      + | ENST00000392996
+##   ENST00000539918       11 [113935134, 114118066]      + | ENST00000539918
+##   ENST00000545851       11 [114051488, 114118018]      + | ENST00000545851
+##   ENST00000535379       11 [114107929, 114121279]      + | ENST00000535379
+##   ENST00000535509       11 [114117512, 114121198]      + | ENST00000535509
+##                                tx_biotype tx_cds_seq_start tx_cds_seq_end
+##                               <character>        <integer>      <integer>
+##   ENST00000335953          protein_coding        113934023      114121277
+##   ENST00000541602         retained_intron             <NA>           <NA>
+##   ENST00000544220          protein_coding        113934023      113934368
+##   ENST00000535700          protein_coding        113934023      113934466
+##   ENST00000392996          protein_coding        113934023      114121277
+##   ENST00000539918 nonsense_mediated_decay        113935134      113992549
+##   ENST00000545851    processed_transcript             <NA>           <NA>
+##   ENST00000535379    processed_transcript             <NA>           <NA>
+##   ENST00000535509         retained_intron             <NA>           <NA>
+##                           gene_id         tx_name   gene_name
+##                       <character>     <character> <character>
+##   ENST00000335953 ENSG00000109906 ENST00000335953      ZBTB16
+##   ENST00000541602 ENSG00000109906 ENST00000541602      ZBTB16
+##   ENST00000544220 ENSG00000109906 ENST00000544220      ZBTB16
+##   ENST00000535700 ENSG00000109906 ENST00000535700      ZBTB16
+##   ENST00000392996 ENSG00000109906 ENST00000392996      ZBTB16
+##   ENST00000539918 ENSG00000109906 ENST00000539918      ZBTB16
+##   ENST00000545851 ENSG00000109906 ENST00000545851      ZBTB16
+##   ENST00000535379 ENSG00000109906 ENST00000535379      ZBTB16
+##   ENST00000535509 ENSG00000109906 ENST00000535509      ZBTB16
+##   -------
+##   seqinfo: 1 sequence from GRCh37 genome</code></pre>
+<p>Filter expression have to be written as a formula (i.e. starting with a <code>~</code>) in the form <em>column name</em> followed by the logical condition.</p>
 <p>To get an overview of database tables and available columns the function <code>listTables</code> can be used. The method <code>listColumns</code> on the other hand lists columns for the specified database table.</p>
 <pre class="r"><code>## list all database tables along with their columns
 listTables(edb)</code></pre>
@@ -263,6 +417,17 @@ listTables(edb)</code></pre>
 ## $chromosome
 ## [1] "seq_name"    "seq_length"  "is_circular"
 ## 
+## $protein
+## [1] "tx_id"            "protein_id"       "protein_sequence"
+## 
+## $uniprot
+## [1] "protein_id"           "uniprot_id"           "uniprot_db"          
+## [4] "uniprot_mapping_type"
+## 
+## $protein_domain
+## [1] "protein_id"            "protein_domain_id"     "protein_domain_source"
+## [4] "interpro_accession"    "prot_dom_start"        "prot_dom_end"         
+## 
 ## $metadata
 ## [1] "name"  "value"</code></pre>
 <pre class="r"><code>## list columns from a specific table
@@ -273,7 +438,7 @@ listColumns(edb, "tx")</code></pre>
 <p>Thus, we could retrieve all transcripts of the biotype <em>nonsense_mediated_decay</em> (which, according to the definitions by Ensembl are transcribed, but most likely not translated in a protein, but rather degraded after transcription) along with the name of the gene for each transcript. Note that we are changing here the <code>return.type</code> to <code>DataFrame</code>, so the method will return a <code>DataFrame</code> with the results instead of the default <code>GRanges</code>.</p>
 <pre class="r"><code>Tx <- transcripts(edb,
           columns = c(listColumns(edb , "tx"), "gene_name"),
-          filter = TxbiotypeFilter("nonsense_mediated_decay"),
+          filter = TxBiotypeFilter("nonsense_mediated_decay"),
           return.type = "DataFrame")
 nrow(Tx)</code></pre>
 <pre><code>## [1] 13812</code></pre>
@@ -319,7 +484,7 @@ nrow(Tx)</code></pre>
 ## 13811      ZNF692
 ## 13812      ZNF692</code></pre>
 <p>For protein coding transcripts, we can also specifically extract their coding region. In the example below we extract the CDS for all transcripts encoded on chromosome Y.</p>
-<pre class="r"><code>yCds <- cdsBy(edb, filter = SeqnameFilter("Y"))
+<pre class="r"><code>yCds <- cdsBy(edb, filter = SeqNameFilter("Y"))
 yCds</code></pre>
 <pre><code>## GRangesList object of length 160:
 ## $ENST00000155093 
@@ -382,7 +547,7 @@ yCds</code></pre>
 <p>Using a <code>GRangesFilter</code> we can retrieve all features from the database that are either within or overlapping the specified genomic region. In the example below we query all genes that are partially overlapping with a small region on chromosome 11. The filter restricts to all genes for which either an exon or an intron is partially overlapping with the region.</p>
 <pre class="r"><code>## Define the filter
 grf <- GRangesFilter(GRanges("11", ranges = IRanges(114000000, 114000050),
-                 strand = "+"), condition = "overlapping")
+                 strand = "+"), type = "any")
 
 ## Query genes:
 gn <- genes(edb, filter = grf)
@@ -412,7 +577,7 @@ for(i in 1:length(txs)) {
      ytop = i-0.125, border = "grey")
     text(start(current), y = i-0.5, pos = 4, cex = 0.75, labels = current$tx_id)
 }</code></pre>
-<p><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABaAAAAPACAIAAABM5kn/AAAACXBIWXMAAB2HAAAdhwGP5fFlAAAgAElEQVR4nOzde3RV5Z0//ieHhMM1yE0FRJk6CChIB/AyigvEUVGkM1214CCWm4I1dSxUqf0qikKdqiy8FBWkWmahdgBdlSlfAVdJdYSKkAoCXkDUUUSRACGQQIAk5/fH/jXfTEIwUOBkw+v1h+vk2Z/9nM/eSPS8z7P3zkilUgEAAAAgzhLpbgAAAADgbyXgAAAAAGJPwAEAAADEnoADAAAAiD0BBwAAABB7Ag4AAAAg9gQcAAAAQOwJOAAAAIDYE3AAAAAAsSfgAAAAAGJPwAEAAADEnoADAAAAiD0BBwAAABB7Ag4AAAAg9gQcAAAAQOwJOAAAAIDYE3AAAAAAsSfgAAAAAGJPwAEAAADEnoADAAAAiD0BBwAAABB7A [...]
+<p><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABLUAAALyCAIAAACjKnGBAAAABGdBTUEAALGPC/xhBQAAACBjSFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAABmJLR0QA/wD/AP+gvaeTAAAACW9GRnMAAACcAAAAnABteqXgAAAACXBIWXMAAB2HAAAdhwGP5fFlAAAAB3RJTUUH4QgEFzozLc60iwAAAAl2cEFnAAAFoAAAA8AAl0rqVwAAgABJREFUeNrs3XtclHX+///XDAPDQQZR8AAeKFPEE+WxTVlPa2lqu/5q0UxTPJbUFmbUrue0toOLlpaY6bJLhxX1trm5efgmWmF5IMFj4jFDPHBwBDnDzPz+eG/zmYaDA2Yz4OP+x+74ntf1ntd1zUz49H1xXZoLFy6sWbPGZDIJAAAAAOCO5OPj88ILL0hsbKyzO [...]
 <p>As we can see, 4 transcripts of the gene ZBTB16 are also overlapping the region. Below we fetch these 4 transcripts. Note, that a call to <code>exons</code> will not return any features from the database, as no exon is overlapping with the region.</p>
 <pre class="r"><code>transcripts(edb, filter = grf)</code></pre>
 <pre><code>## GRanges object with 4 ranges and 6 metadata columns:
@@ -437,8 +602,8 @@ for(i in 1:length(txs)) {
 ##   -------
 ##   seqinfo: 1 sequence from GRCh37 genome</code></pre>
 <p>The <code>GRangesFilter</code> supports also <code>GRanges</code> defining multiple regions and a query will return all features overlapping any of these regions. Besides using the <code>GRangesFilter</code> it is also possible to search for transcripts or exons overlapping genomic regions using the <code>exonsByOverlaps</code> or <code>transcriptsByOverlaps</code> known from the <code>GenomicFeatures</code> package. Note that the implementation of these methods for <code>EnsDb</code> [...]
-<p>To get an overview of allowed/available gene and transcript biotype the functions <code>listGenebiotypes</code> and <code>listTxbiotypes</code> can be used.</p>
-<pre class="r"><code>## Get all gene biotypes from the database. The GenebiotypeFilter
+<p>The functions <code>listGenebiotypes</code> and <code>listTxbiotypes</code> can be used to get an overview of allowed/available gene and transcript biotype</p>
+<pre class="r"><code>## Get all gene biotypes from the database. The GeneBiotypeFilter
 ## allows to filter on these values.
 listGenebiotypes(edb)</code></pre>
 <pre><code>##  [1] "protein_coding"           "pseudogene"              
@@ -504,7 +669,7 @@ listTxbiotypes(edb)</code></pre>
 ## and a % for any character/string.
 BCLs <- genes(edb,
           columns = c("gene_name", "entrezid", "gene_biotype"),
-          filter = list(GenenameFilter("BCL%", condition = "like")),
+          filter = GenenameFilter("BCL", condition = "startsWith"),
           return.type = "DataFrame")
 nrow(BCLs)</code></pre>
 <pre><code>## [1] 25</code></pre>
@@ -523,26 +688,25 @@ nrow(BCLs)</code></pre>
 ## 23         BCL9         607 protein_coding ENSG00000266095
 ## 24        BCL9L      283149 protein_coding ENSG00000186174
 ## 25       BCLAF1        9774 protein_coding ENSG00000029363</code></pre>
-<p>Sometimes it might be useful to know the length of genes or transcripts (i.e. the total sum of nucleotides covered by their exons). Below we calculate the mean length of transcripts from protein coding genes on chromosomes X and Y as well as the average length of snoRNA, snRNA and rRNA transcripts encoded on these chromosomes.</p>
+<p>Sometimes it might be useful to know the length of genes or transcripts (i.e. the total sum of nucleotides covered by their exons). Below we calculate the mean length of transcripts from protein coding genes on chromosomes X and Y as well as the average length of snoRNA, snRNA and rRNA transcripts encoded on these chromosomes. For the first query we combine two <code>AnnotationFilter</code> objects using an <code>AnnotationFilterList</code> object, in the second we define the query us [...]
 <pre class="r"><code>## determine the average length of snRNA, snoRNA and rRNA genes encoded on
 ## chromosomes X and Y.
-mean(lengthOf(edb, of = "tx",
-          filter = list(GenebiotypeFilter(c("snRNA", "snoRNA", "rRNA")),
-                SeqnameFilter(c("X", "Y")))))</code></pre>
+mean(lengthOf(edb, of = "tx", filter = AnnotationFilterList(
+                  GeneBiotypeFilter(c("snRNA", "snoRNA", "rRNA")),
+                  SeqNameFilter(c("X", "Y")))))</code></pre>
 <pre><code>## [1] 116.3046</code></pre>
 <pre class="r"><code>## determine the average length of protein coding genes encoded on the same
 ## chromosomes.
-mean(lengthOf(edb, of = "tx",
-          filter = list(GenebiotypeFilter("protein_coding"),
-                SeqnameFilter(c("X", "Y")))))</code></pre>
+mean(lengthOf(edb, of = "tx", filter = ~ gene_biotype == "protein_coding" &
+                  seq_name %in% c("X", "Y")))</code></pre>
 <pre><code>## [1] 1920</code></pre>
 <p>Not unexpectedly, transcripts of protein coding genes are longer than those of snRNA, snoRNA or rRNA genes.</p>
 <p>At last we extract the first two exons of each transcript model from the database.</p>
 <pre class="r"><code>## Extract all exons 1 and (if present) 2 for all genes encoded on the
 ## Y chromosome
 exons(edb, columns = c("tx_id", "exon_idx"),
-      filter = list(SeqnameFilter("Y"),
-            ExonrankFilter(3, condition = "<")))</code></pre>
+      filter = list(SeqNameFilter("Y"),
+            ExonRankFilter(3, condition = "<")))</code></pre>
 <pre><code>## GRanges object with 1287 ranges and 3 metadata columns:
 ##                   seqnames               ranges strand |           tx_id
 ##                      <Rle>            <IRanges>  <Rle> |     <character>
@@ -577,9 +741,7 @@ exons(edb, columns = c("tx_id", "exon_idx"),
 <h1><span class="header-section-number">3</span> Extracting gene/transcript/exon models for RNASeq feature counting</h1>
 <p>For the feature counting step of an RNAseq experiment, the gene or transcript models (defined by the chromosomal start and end positions of their exons) have to be known. To extract these from an Ensembl based annotation package, the <code>exonsBy</code>, <code>genesBy</code> and <code>transcriptsBy</code> methods can be used in an analogous way as in <code>TxDb</code> packages generated by the <code>GenomicFeatures</code> package. However, the <code>transcriptsBy</code> method does n [...]
 <p>A simple use case is to retrieve all genes encoded on chromosomes X and Y from the database.</p>
-<pre class="r"><code>TxByGns <- transcriptsBy(edb, by = "gene",
-             filter = list(SeqnameFilter(c("X", "Y")))
-             )
+<pre class="r"><code>TxByGns <- transcriptsBy(edb, by = "gene", filter = SeqNameFilter(c("X", "Y")))
 TxByGns</code></pre>
 <pre><code>## GRangesList object of length 2908:
 ## $ENSG00000000003 
@@ -641,12 +803,12 @@ TxByGns</code></pre>
 ## -------
 ## seqinfo: 2 sequences from GRCh37 genome</code></pre>
 <p>Since Ensembl contains also definitions of genes that are on chromosome variants (supercontigs), it is advisable to specify the chromosome names for which the gene models should be returned.</p>
-<p>In a real use case, we might thus want to retrieve all genes encoded on the <em>standard</em> chromosomes. In addition it is advisable to use a <code>GeneidFilter</code> to restrict to Ensembl genes only, as also <em>LRG</em> (Locus Reference Genomic) genes<sup><a id="fnr.2" class="footref" href="#fn.2">2</a></sup> are defined in the database, which are partially redundant with Ensembl genes.</p>
+<p>In a real use case, we might thus want to retrieve all genes encoded on the <em>standard</em> chromosomes. In addition it is advisable to use a <code>GeneIdFilter</code> to restrict to Ensembl genes only, as also <em>LRG</em> (Locus Reference Genomic) genes<sup><a id="fnr.2" class="footref" href="#fn.2">2</a></sup> are defined in the database, which are partially redundant with Ensembl genes.</p>
 <pre class="r"><code>## will just get exons for all genes on chromosomes 1 to 22, X and Y.
 ## Note: want to get rid of the "LRG" genes!!!
-EnsGenes <- exonsBy(edb, by = "gene",
-            filter = list(SeqnameFilter(c(1:22, "X", "Y")),
-                  GeneidFilter("ENSG%", "like")))</code></pre>
+EnsGenes <- exonsBy(edb, by = "gene", filter = AnnotationFilterList(
+                      SeqNameFilter(c(1:22, "X", "Y")),
+                      GeneIdFilter("ENSG", "startsWith")))</code></pre>
 <p>The code above returns a <code>GRangesList</code> that can be used directly as an input for the <code>summarizeOverlaps</code> function from the <code>GenomicAlignments</code> package <sup><a id="fnr.3" class="footref" href="#fn.3">3</a></sup>.</p>
 <p>Alternatively, the above <code>GRangesList</code> can be transformed to a <code>data.frame</code> in <em>SAF</em> format that can be used as an input to the <code>featureCounts</code> function of the <code>Rsubread</code> package <sup><a id="fnr.4" class="footref" href="#fn.4">4</a></sup>.</p>
 <pre class="r"><code>## Transforming the GRangesList into a data.frame in SAF format
@@ -654,9 +816,9 @@ EnsGenes.SAF <- toSAF(EnsGenes)</code></pre>
 <p>Note that the ID by which the <code>GRangesList</code> is split is used in the SAF formatted <code>data.frame</code> as the <code>GeneID</code>. In the example below this would be the Ensembl gene IDs, while the start, end coordinates (along with the strand and chromosomes) are those of the the exons.</p>
 <p>In addition, the <code>disjointExons</code> function (similar to the one defined in <code>GenomicFeatures</code>) can be used to generate a <code>GRanges</code> of non-overlapping exon parts which can be used in the <code>DEXSeq</code> package.</p>
 <pre class="r"><code>## Create a GRanges of non-overlapping exon parts.
-DJE <- disjointExons(edb,
-             filter = list(SeqnameFilter(c(1:22, "X", "Y")),
-                   GeneidFilter("ENSG%", "like")))</code></pre>
+DJE <- disjointExons(edb, filter = AnnotationFilterList(
+                  SeqNameFilter(c(1:22, "X", "Y")),
+                  GeneIdFilter("ENSG%", "startsWith")))</code></pre>
 </div>
 <div id="retrieving-sequences-for-genetranscriptexon-models" class="section level1">
 <h1><span class="header-section-number">4</span> Retrieving sequences for gene/transcript/exon models</h1>
@@ -681,7 +843,7 @@ genes <- genes[seqnames(genes) %in% seqnames(seqinfo(Dna))]
 geneSeqs <- getSeq(Dna, genes)</code></pre>
 <p>To retrieve the (exonic) sequence of transcripts (i.e. without introns) we can use directly the <code>extractTranscriptSeqs</code> method defined in the <code>GenomicFeatures</code> on the <code>EnsDb</code> object, eventually using a filter to restrict the query.</p>
 <pre class="r"><code>## get all exons of all transcripts encoded on chromosome Y
-yTx <- exonsBy(edb, filter = SeqnameFilter("Y"))
+yTx <- exonsBy(edb, filter = SeqNameFilter("Y"))
 
 ## Retrieve the sequences for these transcripts from the FaFile.
 library(GenomicFeatures)
@@ -689,11 +851,11 @@ yTxSeqs <- extractTranscriptSeqs(Dna, yTx)
 yTxSeqs
 
 ## Extract the sequences of all transcripts encoded on chromosome Y.
-yTx <- extractTranscriptSeqs(Dna, edb, filter = SeqnameFilter("Y"))
+yTx <- extractTranscriptSeqs(Dna, edb, filter = SeqNameFilter("Y"))
 
 ## Along these lines, we could use the method also to retrieve the coding sequence
 ## of all transcripts on the Y chromosome.
-cdsY <- cdsBy(edb, filter = SeqnameFilter("Y"))
+cdsY <- cdsBy(edb, filter = SeqNameFilter("Y"))
 extractTranscriptSeqs(Dna, cdsY)</code></pre>
 <p>Note: in the next section we describe how transcript sequences can be retrieved from a <code>BSgenome</code> package that is based on UCSC, not Ensembl.</p>
 </div>
@@ -704,8 +866,8 @@ extractTranscriptSeqs(Dna, cdsY)</code></pre>
 <pre class="r"><code>## Change the seqlevels style form Ensembl (default) to UCSC:
 seqlevelsStyle(edb) <- "UCSC"
 
-## Now we can use UCSC style seqnames in SeqnameFilters or GRangesFilter:
-genesY <- genes(edb, filter = SeqnameFilter("chrY"))
+## Now we can use UCSC style seqnames in SeqNameFilters or GRangesFilter:
+genesY <- genes(edb, filter = ~ seq_name == "chrY")
 ## The seqlevels of the returned GRanges are also in UCSC style
 seqlevels(genesY)</code></pre>
 <pre><code>## [1] "chrY"</code></pre>
@@ -733,10 +895,10 @@ seqlevels(edb)[1:30]</code></pre>
 <pre><code>## Warning in .formatSeqnameByStyleFromQuery(x, sn, ifNotFound): More than 5
 ## seqnames with seqlevels style of the database (Ensembl) could not be mapped
 ## to the seqlevels style: UCSC!) Returning NA for these.</code></pre>
-<pre><code>##  [1] "chr1"  "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16"
-##  [9] "chr17" "chr18" "chr19" "chr2"  "chr20" "chr21" "chr22" "chr3" 
-## [17] "chr4"  "chr5"  "chr6"  "chr7"  "chr8"  "chr9"  NA      NA     
-## [25] NA      NA      NA      NA      NA      NA</code></pre>
+<pre><code>##  [1] "chr1"  "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17"
+## [10] "chr18" "chr19" "chr2"  "chr20" "chr21" "chr22" "chr3"  "chr4"  "chr5" 
+## [19] "chr6"  "chr7"  "chr8"  "chr9"  NA      NA      NA      NA      NA     
+## [28] NA      NA      NA</code></pre>
 <pre class="r"><code>## Resetting the option.
 options(ensembldb.seqnameNotFound = "ORIGINAL")</code></pre>
 <p>Next we retrieve transcript sequences from genes encoded on chromosome Y using the <code>BSGenome</code> package for the human genome from UCSC. The specified version <code>hg19</code> matches the genome build of Ensembl version 75, i.e. <code>GRCh37</code>. Note that while we changed the style of the seqnames to UCSC we did not change the naming of the genome release.</p>
@@ -751,48 +913,50 @@ unique(genome(bsg))</code></pre>
 <pre class="r"><code>## Although differently named, both represent genome build GRCh37.
 
 ## Extract the full transcript sequences.
-yTxSeqs <- extractTranscriptSeqs(bsg, exonsBy(edb, "tx", filter = SeqnameFilter("chrY")))
+yTxSeqs <- extractTranscriptSeqs(bsg, exonsBy(edb, "tx",
+                          filter = SeqNameFilter("chrY")))
 
 yTxSeqs</code></pre>
 <pre><code>##   A DNAStringSet instance of length 731
-##       width seq                                        names               
-##   [1]  5239 GCCTAGTGCGCGCGCAGTAA...AAATGTTTACTTGTATATG ENST00000155093
-##   [2]  4023 ATGTTTAGGGTTGGCTTCTT...GGAAACACATCCCTTGTAA ENST00000215473
-##   [3]   802 AGAGGACCAAGCCTCCCTGT...TAAAATGTTTTAAAAATCA ENST00000215479
-##   [4]   910 TGTCTGTCAGAGCTGTCAGC...ACACTGGTATATTTCTGTT ENST00000250776
-##   [5]  1305 TTCCAGGATATGAACTCTAC...ATCCTGTGGCTGTAGGAAA ENST00000250784
+##       width seq                                          names               
+##   [1]  5239 GCCTAGTGCGCGCGCAGTAAC...TAAATGTTTACTTGTATATG ENST00000155093
+##   [2]  4023 ATGTTTAGGGTTGGCTTCTTA...TGGAAACACATCCCTTGTAA ENST00000215473
+##   [3]   802 AGAGGACCAAGCCTCCCTGTG...ATAAAATGTTTTAAAAATCA ENST00000215479
+##   [4]   910 TGTCTGTCAGAGCTGTCAGCC...AACACTGGTATATTTCTGTT ENST00000250776
+##   [5]  1305 TTCCAGGATATGAACTCTACA...AATCCTGTGGCTGTAGGAAA ENST00000250784
 ##   ...   ... ...
-## [727]   333 ATGGATGAAGAAGAGAAAAC...TGAACTTTCTAGATTGCAT ENST00000604924
-## [728]  1247 CATGGCGGGGTTCCTGCCTT...TTTGGAGTAATGTCTTAGT ENST00000605584
-## [729]   199 CAGTTCTCGCTCCTGTGCAG...GGTCTGGGTGGCTTCTGGA ENST00000605663
-## [730]   276 GCCCCAGGAGGAAAGGGGGA...AATAAAGAACAGCGCATTC ENST00000606439
-## [731]   444 ATGGGAGCCACTGGGCTTGG...CGTTCATGAAGAAGACTAA ENST00000607210</code></pre>
+## [727]   333 ATGGATGAAGAAGAGAAAACC...GTGAACTTTCTAGATTGCAT ENST00000604924
+## [728]  1247 CATGGCGGGGTTCCTGCCTTC...CTTTGGAGTAATGTCTTAGT ENST00000605584
+## [729]   199 CAGTTCTCGCTCCTGTGCAGC...TGGTCTGGGTGGCTTCTGGA ENST00000605663
+## [730]   276 GCCCCAGGAGGAAAGGGGGAC...AAATAAAGAACAGCGCATTC ENST00000606439
+## [731]   444 ATGGGAGCCACTGGGCTTGGC...ACGTTCATGAAGAAGACTAA ENST00000607210</code></pre>
 <pre class="r"><code>## Extract just the CDS
-Test <- cdsBy(edb, "tx", filter = SeqnameFilter("chrY"))
-yTxCds <- extractTranscriptSeqs(bsg, cdsBy(edb, "tx", filter = SeqnameFilter("chrY")))
+Test <- cdsBy(edb, "tx", filter = SeqNameFilter("chrY"))
+yTxCds <- extractTranscriptSeqs(bsg, cdsBy(edb, "tx",
+                       filter = SeqNameFilter("chrY")))
 yTxCds</code></pre>
 <pre><code>##   A DNAStringSet instance of length 160
-##       width seq                                        names               
-##   [1]  2406 ATGGATGAAGATGAATTTGA...AGAAGTTGGTCTGCCCTAA ENST00000155093
-##   [2]  4023 ATGTTTAGGGTTGGCTTCTT...GGAAACACATCCCTTGTAA ENST00000215473
-##   [3]   579 ATGGGGACCTGGATTTTGTT...GCAGGAGGAAGTGGATTAA ENST00000215479
-##   [4]   792 ATGGCCCGGGGCCCCAAGAA...CAAACAGAGCAGTGGCTAA ENST00000250784
-##   [5]   378 ATGAGTCCAAAGCCGAGAGC...TACTCCCCTATCTCCCTGA ENST00000250823
+##       width seq                                          names               
+##   [1]  2406 ATGGATGAAGATGAATTTGAA...AAGAAGTTGGTCTGCCCTAA ENST00000155093
+##   [2]  4023 ATGTTTAGGGTTGGCTTCTTA...TGGAAACACATCCCTTGTAA ENST00000215473
+##   [3]   579 ATGGGGACCTGGATTTTGTTT...AGCAGGAGGAAGTGGATTAA ENST00000215479
+##   [4]   792 ATGGCCCGGGGCCCCAAGAAG...CCAAACAGAGCAGTGGCTAA ENST00000250784
+##   [5]   378 ATGAGTCCAAAGCCGAGAGCC...CTACTCCCCTATCTCCCTGA ENST00000250823
 ##   ...   ... ...
-## [156]    63 CGCAAGGATTTAAAAGAGAT...ACCCTGTTGGCCAGGCTAG ENST00000601700
-## [157]    42 CTTGATACAAAGAATCAATTTAATTTTAAGATTGTCTATCTT ENST00000601705
-## [158]    33 ATGATGACGCTTGTCCCCAGAGCCAGGACACGT          ENST00000602680
-## [159]    33 ATGATGACGCTTGTCCCCAGAGCCAGGACACGT          ENST00000602732
-## [160]    33 ATGATGACGCTTGTCCCCAGAGCCAGGACACGT          ENST00000602770</code></pre>
-<p>At last changing the seqname style to the default value =“Ensembl”=.</p>
+## [156]    63 CGCAAGGATTTAAAAGAGATG...CACCCTGTTGGCCAGGCTAG ENST00000601700
+## [157]    42 CTTGATACAAAGAATCAATTTAATTTTAAGATTGTCTATCTT   ENST00000601705
+## [158]    33 ATGATGACGCTTGTCCCCAGAGCCAGGACACGT            ENST00000602680
+## [159]    33 ATGATGACGCTTGTCCCCAGAGCCAGGACACGT            ENST00000602732
+## [160]    33 ATGATGACGCTTGTCCCCAGAGCCAGGACACGT            ENST00000602770</code></pre>
+<p>At last changing the seqname style to the default value <code>"Ensembl"</code>.</p>
 <pre class="r"><code>seqlevelsStyle(edb) <- "Ensembl"</code></pre>
 </div>
 <div id="interactive-annotation-lookup-using-the-shiny-web-app" class="section level1">
 <h1><span class="header-section-number">6</span> Interactive annotation lookup using the <code>shiny</code> web app</h1>
 <p>In addition to the <code>genes</code>, <code>transcripts</code> and <code>exons</code> methods it is possibly to search interactively for gene/transcript/exon annotations using the internal, <code>shiny</code> based, web application. The application can be started with the <code>runEnsDbApp()</code> function. The search results from this app can also be returned to the R workspace either as a <code>data.frame</code> or <code>GRanges</code> object.</p>
 </div>
-<div id="plotting-genetranscript-features-using-ensembldb-and-gviz" class="section level1">
-<h1><span class="header-section-number">7</span> Plotting gene/transcript features using <code>ensembldb</code> and <code>Gviz</code></h1>
+<div id="plotting-genetranscript-features-using-ensembldb-and-gviz-and-ggbio" class="section level1">
+<h1><span class="header-section-number">7</span> Plotting gene/transcript features using <code>ensembldb</code> and <code>Gviz</code> and <code>ggbio</code></h1>
 <p>The <code>Gviz</code> package provides functions to plot genes and transcripts along with other data on a genomic scale. Gene models can be provided either as a <code>data.frame</code>, <code>GRanges</code>, <code>TxDB</code> database, can be fetched from biomart and can also be retrieved from <code>ensembldb</code>.</p>
 <p>Below we generate a <code>GeneRegionTrack</code> fetching all transcripts from a certain region on chromosome Y.</p>
 <p>Note that if we want in addition to work also with BAM files that were aligned against DNA sequences retrieved from Ensembl or FASTA files representing genomic DNA sequences from Ensembl we should change the <code>ucscChromosomeNames</code> option from <code>Gviz</code> to <code>FALSE</code> (i.e. by calling <code>options(ucscChromosomeNames = FALSE)</code>). This is not necessary if we just want to retrieve gene models from an <code>EnsDb</code> object, as the <code>ensembldb</code>  [...]
@@ -813,14 +977,17 @@ gat <- GenomeAxisTrack()
 options(ucscChromosomeNames = FALSE)
 
 plotTracks(list(gat, GeneRegionTrack(gr)))</code></pre>
-<p><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABaAAAAGwCAMAAABo5zJyAAABuVBMVEWAgICBgYGCgoKDg4OEhISFhYWGhoaHh4eIiIiJiYmKioqLi4uMjIyNjY2Ojo6Pj4+QkJCRkZGSkpKTk5OUlJSVlZWWlpaXl5eYmJiZmZmampqbm5ucnJydnZ2enp6fn5+goKChoaGioqKjo6OkpKSlpaWmpqanp6eoqKipqamqqaiqqqqrqqirq6usqqesq6isrKytq6etra2uq6eurq6vrKevr6+wsLCxsbGysrKzs7O0r6W0tLS1r6S1tbW2tra3t7e4uLi5ubm6urq7u7u8vLy9vb2+vr6/v7/AtaHAwMDBwcHCwsLDw8PExMTFxcXGxsbHuZ/Hx8fIyMjJuZ3JycnKysrLup3Ly8vMzMzNzc3OvJzOzs7PvJvPz8/Q0NDR0dHS0tLTv5rT09PU1 [...]
+<p><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABYkAAAGpCAMAAAD/dkk7AAAABGdBTUEAALGPC/xhBQAAACBjSFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAABtlBMVEX////R0dGfn5+Li4uJiYmhoaHU1NTX19ebm5uFhYWnp6fp6emqqqrt7e2Tk5P+/v7IyMienp6Hh4empqbl5eX5+fmEhITLy8uxsbGAgICCgoKYmJiXl5eBgYGwsLCKiorn5+fr6+v9/f3z8/OOjo7c3NzMzMzh4eGNjY3q6uro6OjW1tbd3d29vb2goKCampr39/fDw8P19fXQ0ND4+PjOzs6GhoapqanPz8+5ubnw8PCtra3e3t7ExMSMjIzx8fG0tLTf39+cnJzy8vL29vaSkpKjo6Pg4OC+vr7Gxsa/v7/BwcHNzc2UlJTJycnu7u6oq [...]
 <pre class="r"><code>options(ucscChromosomeNames = TRUE)</code></pre>
-<p>Above we had to change the option <code>ucscChromosomeNames</code> to <code>FALSE</code> in order to use it with non-UCSC chromosome names. Alternatively, we could however also change the <code>seqnamesStyle</code> of the <code>EnsDb</code> object to <code>UCSC</code>. Note that we have to use now also chromosome names in the <em>UCSC style</em> in the <code>SeqnameFilter</code> (i.e. “chrY” instead of <code>Y</code>).</p>
+<p>Above we had to change the option <code>ucscChromosomeNames</code> to <code>FALSE</code> in order to use it with non-UCSC chromosome names. Alternatively, we could however also change the <code>seqnamesStyle</code> of the <code>EnsDb</code> object to <code>UCSC</code>. Note that we have to use now also chromosome names in the <em>UCSC style</em> in the <code>SeqNameFilter</code> (i.e. “chrY” instead of <code>Y</code>).</p>
 <pre class="r"><code>seqlevelsStyle(edb) <- "UCSC"
 ## Retrieving the GRanges objects with seqnames corresponding to UCSC chromosome names.
 gr <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
-                start = 20400000, end = 21400000)
-seqnames(gr)</code></pre>
+                start = 20400000, end = 21400000)</code></pre>
+<pre><code>## Warning in .formatSeqnameByStyleForQuery(x, sn, ifNotFound): Seqnames:
+## Y could not be mapped to the seqlevels style of the database (Ensembl)!
+## Returning the orginal seqnames for these.</code></pre>
+<pre class="r"><code>seqnames(gr)</code></pre>
 <pre><code>## factor-Rle of length 218 with 1 run
 ##   Lengths:  218
 ##   Values : chrY
@@ -828,20 +995,32 @@ seqnames(gr)</code></pre>
 <pre class="r"><code>## Define a genome axis track
 gat <- GenomeAxisTrack()
 plotTracks(list(gat, GeneRegionTrack(gr)))</code></pre>
-<p><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABaAAAAPACAMAAAD0Wi6aAAABv1BMVEWAgICBgYGCgoKDg4OEhISFhYWGhoaHh4eIiIiJiYmKioqLi4uMjIyNjY2Ojo6Pj4+QkJCRkZGSkpKTk5OUlJSVlZWWlpaXl5eYmJiZmZmampqbm5ucnJydnZ2enp6fn5+goKChoaGioqKjo6OkpKSlpaWmpqanp6eoqKipqamqqqmqqqqrqqirq6usqqesq6isrKytq6etq6itra2urKeurq6vrKavrKevr6+wsLCxraaxsbGysrKzs7O0tLS1tbW2tra3t7e4uLi5ubm6urq7sqO7u7u8vLy9s6G9vb2+tKG+vr6/v7/AtaDAwMDBtaHBwcHCwsLDw8PExMTFxcXGxsbHx8fIyMjJycnKysrLy8vMzMzNzc3Ozs7Pz8/Q0NDR0dHS0tLTv5rT09PU1 [...]
+<p><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABekAAAOwCAMAAACd1w8xAAAABGdBTUEAALGPC/xhBQAAACBjSFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAABzlBMVEX////R0dGfn5+Li4uJiYmhoaHU1NTX19ebm5uFhYWnp6fp6emqqqrt7e2Tk5P+/v7IyMienp6Hh4empqbl5eX5+fmEhITLy8uxsbGAgICCgoKYmJiXl5eBgYGwsLCKiorn5+fr6+v9/f3z8/OOjo7c3NzMzMzh4eGNjY3q6uro6OjW1tbd3d29vb2goKCampr39/fDw8P19fXQ0ND4+PjOzs6GhoapqanPz8+5ubnw8PCtra3e3t7ExMSMjIzx8fG0tLTf39+cnJzy8vL29vaSkpKjo6Pg4OC+vr7Gxsa/v7/BwcHNzc2UlJTJycnu7u6oq [...]
 <p>We can also use the filters from the <code>ensembldb</code> package to further refine what transcripts are fetched, like in the example below, in which we create two different gene region tracks, one for protein coding genes and one for lincRNAs.</p>
 <pre class="r"><code>protCod <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
                      start = 20400000, end = 21400000,
-                     filter = GenebiotypeFilter("protein_coding"))
+                     filter = GeneBiotypeFilter("protein_coding"))
 lincs <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
                    start = 20400000, end = 21400000,
-                   filter = GenebiotypeFilter("lincRNA"))
+                   filter = GeneBiotypeFilter("lincRNA"))
 
 plotTracks(list(gat, GeneRegionTrack(protCod, name = "protein coding"),
         GeneRegionTrack(lincs, name = "lincRNAs")), transcriptAnnotation = "symbol")</code></pre>
-<p><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABaAAAAGwCAMAAABo5zJyAAABsFBMVEWAgICBgYGCgoKDg4OEhISFhYWGhoaHh4eIiIiJiYmKioqLi4uMjIyNjY2Ojo6Pj4+QkJCRkZGSkpKTk5OUlJSVlZWWlpaXl5eYmJiZmZmampqbm5ucnJydnZ2enp6fn5+goKChoaGioqKjo6OkpKSlpaWmpqanp6eoqKipqamqqqqrq6usqqesrKytq6itra2uq6auq6eurq6vrKevr6+wrKawsLCxraaxsbGysrKzrqWzs7O0rqS0r6W0tLS1r6W1tbW2tra3sKS3t7e4uLi5ubm6urq7u7u8vLy9vb2+vr6/v7/AwMDBwcHCwsLDw8PEt5/ExMTFxcXGuJ/GxsbHx8fIyMjJycnKysrLy8vMzMzNzc3Ozs7Pz8/Q0NDR0dHS0tLT09PU1NTV1dXW1 [...]
+<p><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABYkAAAGgCAMAAADYeRjzAAAABGdBTUEAALGPC/xhBQAAACBjSFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAABsFBMVEX////R0dGfn5+Li4uJiYmhoaHU1NTX19ebm5uFhYWnp6fp6emqqqrt7e2Tk5P+/v7IyMienp6Hh4empqbl5eX5+fmEhITLy8uxsbGAgICCgoKYmJiXl5eBgYGwsLCKiorn5+fr6+v9/f3z8/OOjo7c3NzMzMzh4eGNjY3q6uro6OjW1tbd3d29vb2goKCampr39/fDw8P19fXQ0ND4+PjOzs6GhoapqanPz8+5ubnw8PCtra3e3t7ExMSMjIzx8fG0tLTf39+cnJzy8vL29vaSkpKjo6Pg4OC+vr7Gxsa/v7/BwcHNzc2UlJTJycnu7u6oq [...]
 <pre class="r"><code>## At last we change the seqlevels style again to Ensembl
 seqlevelsStyle <- "Ensembl"</code></pre>
+<p>Alternatively, we can also use <code>ggbio</code> for plotting. For <code>ggplot</code> we can directly pass the <code>EnsDb</code> object along with optional filters (or as in the example below a filter expression as a <code>formula</code>).</p>
+<pre class="r"><code>library(ggbio)
+
+## Create a plot for all transcripts of the gene SKA2
+autoplot(edb, ~ genename == "SKA2")</code></pre>
+<p><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABV4AAAK+CAAAAAB/NVpbAAAABGdBTUEAALGPC/xhBQAAACBjSFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAAAmJLR0QA/4ePzL8AAAAJb0ZGcwAAADMAAAAPAN9c3wIAAAAJcEhZcwAAHYcAAB2HAY/l8WUAAAAHdElNRQfhCAQXOwny2Vx4AAAACXZwQWcAAAWgAAADAABcsD0ZAACAAElEQVR42u29B1wV17b4P9h7L4mJJU2jRk1iNKZH0wvGFImkmOogKIoKolHsYqLBEolijKARO/YWu7H3ihERIyDtnLm57+W9e/O7//veu/z3mjnAOWutQVCRA3d9Px9ls/aemX327P09c2bvOWj/YXgF/8jN9ZKa/J77X6VdBRe5uf9b2lVw8bfc0q6Biz9yc/8s7Tq4+ [...]
+<p>To plot the genomic region and plot genes from both strands we can use a <code>GRangesFilter</code>.</p>
+<pre class="r"><code>## Get the chromosomal region in which the gene is encoded
+ska2 <- genes(edb, filter = ~ genename == "SKA2")
+strand(ska2) <- "*"
+autoplot(edb, GRangesFilter(ska2), names.expr = "gene_name")</code></pre>
+<p><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABV4AAAK+CAAAAAB/NVpbAAAABGdBTUEAALGPC/xhBQAAACBjSFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAAAmJLR0QA/4ePzL8AAAAJb0ZGcwAAADMAAAAPAN9c3wIAAAAJcEhZcwAAHYcAAB2HAY/l8WUAAAAHdElNRQfhCAQXOwyCs6j3AAAACXZwQWcAAAWgAAADAABcsD0ZAACAAElEQVR42u2dB3xVxbb/J5TQe1HpWGiCqDRBEZGiokFEiGDByoSEAEETIiVSJIBIBATpVSKGjg2RXiR0pAQhCS2BJJxz3t/37n33vqs378l/1t7nJOfMrB1Cks2c4Pp9PsrZa2bvtWf2rO8us/YO+0+XPfrbjf9n05bzpz9u3PhN6w78nq3VvevGjf/VuwN//kOr+7/fu [...]
 </div>
 <div id="using-ensdb-objects-in-the-annotationdbi-framework" class="section level1">
 <h1><span class="header-section-number">8</span> Using <code>EnsDb</code> objects in the <code>AnnotationDbi</code> framework</h1>
@@ -852,38 +1031,46 @@ edb <- EnsDb.Hsapiens.v75
 
 ## List all available columns in the database.
 columns(edb)</code></pre>
-<pre><code>##  [1] "ENTREZID"       "EXONID"         "EXONIDX"        "EXONSEQEND"    
-##  [5] "EXONSEQSTART"   "GENEBIOTYPE"    "GENEID"         "GENENAME"      
-##  [9] "GENESEQEND"     "GENESEQSTART"   "ISCIRCULAR"     "SEQCOORDSYSTEM"
-## [13] "SEQLENGTH"      "SEQNAME"        "SEQSTRAND"      "SYMBOL"        
-## [17] "TXBIOTYPE"      "TXCDSSEQEND"    "TXCDSSEQSTART"  "TXID"          
-## [21] "TXNAME"         "TXSEQEND"       "TXSEQSTART"</code></pre>
+<pre><code>##  [1] "ENTREZID"            "EXONID"              "EXONIDX"            
+##  [4] "EXONSEQEND"          "EXONSEQSTART"        "GENEBIOTYPE"        
+##  [7] "GENEID"              "GENENAME"            "GENESEQEND"         
+## [10] "GENESEQSTART"        "INTERPROACCESSION"   "ISCIRCULAR"         
+## [13] "PROTDOMEND"          "PROTDOMSTART"        "PROTEINDOMAINID"    
+## [16] "PROTEINDOMAINSOURCE" "PROTEINID"           "PROTEINSEQUENCE"    
+## [19] "SEQCOORDSYSTEM"      "SEQLENGTH"           "SEQNAME"            
+## [22] "SEQSTRAND"           "SYMBOL"              "TXBIOTYPE"          
+## [25] "TXCDSSEQEND"         "TXCDSSEQSTART"       "TXID"               
+## [28] "TXNAME"              "TXSEQEND"            "TXSEQSTART"         
+## [31] "UNIPROTDB"           "UNIPROTID"           "UNIPROTMAPPINGTYPE"</code></pre>
 <pre class="r"><code>## Note that these do *not* correspond to the actual column names
 ## of the database that can be passed to methods like exons, genes,
 ## transcripts etc. These column names can be listed with the listColumns
 ## method.
 listColumns(edb)</code></pre>
-<pre><code>##  [1] "seq_name"         "seq_length"       "is_circular"     
-##  [4] "exon_id"          "exon_seq_start"   "exon_seq_end"    
-##  [7] "gene_id"          "gene_name"        "entrezid"        
-## [10] "gene_biotype"     "gene_seq_start"   "gene_seq_end"    
-## [13] "seq_name"         "seq_strand"       "seq_coord_system"
-## [16] "symbol"           "name"             "value"           
-## [19] "tx_id"            "tx_biotype"       "tx_seq_start"    
-## [22] "tx_seq_end"       "tx_cds_seq_start" "tx_cds_seq_end"  
-## [25] "gene_id"          "tx_name"          "tx_id"           
-## [28] "exon_id"          "exon_idx"</code></pre>
+<pre><code>##  [1] "seq_name"              "seq_length"            "is_circular"          
+##  [4] "exon_id"               "exon_seq_start"        "exon_seq_end"         
+##  [7] "gene_id"               "gene_name"             "entrezid"             
+## [10] "gene_biotype"          "gene_seq_start"        "gene_seq_end"         
+## [13] "seq_strand"            "seq_coord_system"      "symbol"               
+## [16] "name"                  "value"                 "tx_id"                
+## [19] "protein_id"            "protein_sequence"      "protein_domain_id"    
+## [22] "protein_domain_source" "interpro_accession"    "prot_dom_start"       
+## [25] "prot_dom_end"          "tx_biotype"            "tx_seq_start"         
+## [28] "tx_seq_end"            "tx_cds_seq_start"      "tx_cds_seq_end"       
+## [31] "tx_name"               "exon_idx"              "uniprot_id"           
+## [34] "uniprot_db"            "uniprot_mapping_type"</code></pre>
 <pre class="r"><code>## List all of the supported key types.
 keytypes(edb)</code></pre>
-<pre><code>##  [1] "ENTREZID"    "EXONID"      "GENEBIOTYPE" "GENEID"      "GENENAME"   
-##  [6] "SEQNAME"     "SEQSTRAND"   "SYMBOL"      "TXBIOTYPE"   "TXID"       
-## [11] "TXNAME"</code></pre>
+<pre><code>##  [1] "ENTREZID"        "EXONID"          "GENEBIOTYPE"     "GENEID"         
+##  [5] "GENENAME"        "PROTEINDOMAINID" "PROTEINID"       "SEQNAME"        
+##  [9] "SEQSTRAND"       "SYMBOL"          "TXBIOTYPE"       "TXID"           
+## [13] "TXNAME"          "UNIPROTID"</code></pre>
 <pre class="r"><code>## Get all gene ids from the database.
 gids <- keys(edb, keytype = "GENEID")
 length(gids)</code></pre>
 <pre><code>## [1] 64102</code></pre>
 <pre class="r"><code>## Get all gene names for genes encoded on chromosome Y.
-gnames <- keys(edb, keytype = "GENENAME", filter = SeqnameFilter("Y"))
+gnames <- keys(edb, keytype = "GENENAME", filter = SeqNameFilter("Y"))
 head(gnames)</code></pre>
 <pre><code>## [1] "KDM5D"   "DDX3Y"   "ZFY"     "TBL1Y"   "PCDH11Y" "AMELY"</code></pre>
 <p>In the next example we retrieve specific information from the database using the <code>select</code> method. First we fetch all transcripts for the genes <em>BCL2</em> and <em>BCL2L11</em>. In the first call we provide the gene names, while in the second call we employ the filtering system to perform a more fine-grained query to fetch only the protein coding transcripts for these genes.</p>
@@ -914,10 +1101,9 @@ select(edb, keys = c("BCL2", "BCL2L11"), keytype = "GEN
 ## 21 ENSG00000153094  BCL2L11 ENST00000393253          protein_coding
 ## 22 ENSG00000153094  BCL2L11 ENST00000337565          protein_coding</code></pre>
 <pre class="r"><code>## Use the filtering system of ensembldb
-select(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
-            TxbiotypeFilter("protein_coding")),
+select(edb, keys = ~ genename %in% c("BCL2", "BCL2L11") &
+        tx_biotype == "protein_coding",
        columns = c("GENEID", "GENENAME", "TXID", "TXBIOTYPE"))</code></pre>
-<pre><code>## Note: ordering of the results might not match ordering of keys!</code></pre>
 <pre><code>##             GENEID GENENAME            TXID      TXBIOTYPE
 ## 1  ENSG00000171791     BCL2 ENST00000398117 protein_coding
 ## 2  ENSG00000171791     BCL2 ENST00000333681 protein_coding
@@ -945,20 +1131,17 @@ mapIds(edb, keys = c("BCL2", "BCL2L11"), column = "TXID
 ## [5] "ENST00000444484"
 ## 
 ## $BCL2L11
-##  [1] "ENST00000432179" "ENST00000308659" "ENST00000393256"
-##  [4] "ENST00000393252" "ENST00000433098" "ENST00000405953"
-##  [7] "ENST00000415458" "ENST00000436733" "ENST00000437029"
-## [10] "ENST00000452231" "ENST00000361493" "ENST00000431217"
-## [13] "ENST00000439718" "ENST00000438054" "ENST00000357757"
-## [16] "ENST00000393253" "ENST00000337565"</code></pre>
+##  [1] "ENST00000432179" "ENST00000308659" "ENST00000393256" "ENST00000393252"
+##  [5] "ENST00000433098" "ENST00000405953" "ENST00000415458" "ENST00000436733"
+##  [9] "ENST00000437029" "ENST00000452231" "ENST00000361493" "ENST00000431217"
+## [13] "ENST00000439718" "ENST00000438054" "ENST00000357757" "ENST00000393253"
+## [17] "ENST00000337565"</code></pre>
 <pre class="r"><code>## And, just like before, we can use filters to map only to protein coding transcripts.
 mapIds(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
-            TxbiotypeFilter("protein_coding")), column = "TXID",
+            TxBiotypeFilter("protein_coding")), column = "TXID",
        multiVals = "list")</code></pre>
-<pre><code>## Warning in .mapIds(x = x, keys = keys, column = column, keytype =
-## keytype, : Got 2 filter objects. Will use the keys of the first for the
-## mapping!</code></pre>
-<pre><code>## Note: ordering of the results might not match ordering of keys!</code></pre>
+<pre><code>## Warning in .mapIds(x = x, keys = keys, column = column, keytype = keytype, :
+## Got 2 filter objects. Will use the keys of the first for the mapping!</code></pre>
 <pre><code>## $BCL2
 ## [1] "ENST00000398117" "ENST00000333681" "ENST00000589955" "ENST00000444484"
 ## 
@@ -977,20 +1160,39 @@ mapIds(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11"))
 <li><p>The CDS provided by <code>EnsDb</code> objects <strong>always</strong> includes both, the start and the stop codon.</p></li>
 <li><p>Transcripts with multiple CDS are at present not supported by <code>EnsDb</code>.</p></li>
 <li><p>At present, <code>EnsDb</code> support only genes/transcripts for which all of their exons are encoded on the same chromosome and the same strand.</p></li>
+<li><p>Since a single Ensembl gene ID might be mapped to multiple NCBI Entrezgene IDs methods such as <code>genes</code>, <code>transcripts</code> etc return a <code>list</code> in the <code>"entrezid"</code> column of the resulting result object.</p></li>
 </ul>
 </div>
-<div id="building-an-transcript-centric-database-package-based-on-ensembl-annotation" class="section level1">
-<h1><span class="header-section-number">10</span> Building an transcript-centric database package based on Ensembl annotation</h1>
-<p>The code in this section is not supposed to be automatically executed when the vignette is built, as this would require a working installation of the Ensembl Perl API, which is not expected to be available on each system. Also, building <code>EnsDb</code> from alternative sources, like GFF or GTF files takes some time and thus also these examples are not directly executed when the vignette is build.</p>
-<div id="requirements" class="section level2">
-<h2><span class="header-section-number">10.1</span> Requirements</h2>
-<p>The <code>fetchTablesFromEnsembl</code> function of the package uses the Ensembl Perl API to retrieve the required annotations from an Ensembl database (e.g. from the main site <em>ensembldb.ensembl.org</em>). Thus, to use the functionality to built databases, the Ensembl Perl API needs to be installed (see <sup><a id="fnr.5" class="footref" href="#fn.5">5</a></sup> for details).</p>
-<p>Alternatively, the <code>ensDbFromAH</code>, <code>ensDbFromGff</code>, <code>ensDbFromGRanges</code> and <code>ensDbFromGtf</code> functions allow to build EnsDb SQLite files from a <code>GRanges</code> object or GFF/GTF files from Ensembl (either provided as files or <em>via</em> <code>AnnotationHub</code>). These functions do not depend on the Ensembl Perl API, but require a working internet connection to fetch the chromosome lengths from Ensembl as these are not provided within GT [...]
+<div id="getting-or-building-ensdb-databasespackages" class="section level1">
+<h1><span class="header-section-number">10</span> Getting or building <code>EnsDb</code> databases/packages</h1>
+<p>Some of the code in this section is not supposed to be automatically executed when the vignette is built, as this would require a working installation of the Ensembl Perl API, which is not expected to be available on each system. Also, building <code>EnsDb</code> from alternative sources, like GFF or GTF files takes some time and thus also these examples are not directly executed when the vignette is build.</p>
+<div id="getting-ensdb-databases" class="section level2">
+<h2><span class="header-section-number">10.1</span> Getting <code>EnsDb</code> databases</h2>
+<p>Some <code>EnsDb</code> databases are available as <code>R</code> packages from Bioconductor and can be simply installed with the <code>biocLite</code> function from the <code>BiocInstaller</code> package. The name of such annotation packages starts with <em>EnsDb</em> followed by the abbreviation of the organism and the Ensembl version on which the annotation bases. <code>EnsDb.Hsapiens.v86</code> provides thus an <code>EnsDb</code> database for homo sapiens with annotations from Ens [...]
+<p>Since Bioconductor version 3.5 <code>EnsDb</code> databases can also be retrieved directly from <code>AnnotationHub</code>.</p>
+<pre class="r"><code>library(AnnotationHub)
+## Load the annotation resource.
+ah <- AnnotationHub()
+
+## Query for all available EnsDb databases
+query(ah, "EnsDb")</code></pre>
+<p>We can simply fetch one of the databases.</p>
+<pre class="r"><code>ahDb <- query(ah, pattern = c("Xiphophorus Maculatus", "EnsDb", 87))
+## What have we got
+ahDb</code></pre>
+<p>Fetch the <code>EnsDb</code> and use it.</p>
+<pre class="r"><code>ahEdb <- ahDb[[1]]
+
+## retriebe all genes
+gns <- genes(ahEdb)</code></pre>
+<p>We could even make an annotation package from this <code>EnsDb</code> object using the <code>makeEnsembldbPackage</code> and passing <code>dbfile(dbconn(ahEdb))</code> as <code>ensdb</code> argument.</p>
 </div>
 <div id="building-annotation-packages" class="section level2">
 <h2><span class="header-section-number">10.2</span> Building annotation packages</h2>
-<p>The functions below use the Ensembl Perl API to fetch the required data directly from the Ensembl core databases. Thus, the path to the Perl API specific for the desired Ensembl version needs to be added to the <code>PERL5LIB</code> environment variable.</p>
-<p>An annotation package containing all human genes for Ensembl version 75 can be created using the code in the block below.</p>
+<div id="directly-from-ensembl-databases" class="section level3">
+<h3><span class="header-section-number">10.2.1</span> Directly from Ensembl databases</h3>
+<p>The <code>fetchTablesFromEnsembl</code> function uses the Ensembl Perl API to retrieve the required annotations from an Ensembl database (e.g. from the main site <em>ensembldb.ensembl.org</em>). Thus, to use this functionality to build databases, the Ensembl Perl API needs to be installed (see <sup><a id="fnr.5" class="footref" href="#fn.5">5</a></sup> for details).</p>
+<p>Below we create an <code>EnsDb</code> database by fetching the required data directly from the Ensembl core databases. The <code>makeEnsembldbPackage</code> function is then used to create an annotation package from this <code>EnsDb</code> containing all human genes for Ensembl version 75.</p>
 <pre class="r"><code>library(ensembldb)
 
 ## get all human gene/transcript/exon annotations from Ensembl (75)
@@ -1009,6 +1211,10 @@ makeEnsembldbPackage(ensdb = DBFile, version = "0.99.12",
              author = "J Rainer")</code></pre>
 <p>The generated package can then be build using <code>R CMD build EnsDb.Hsapiens.v75</code> and installed with <code>R CMD INSTALL EnsDb.Hsapiens.v75*</code>. Note that we could directly generate an <code>EnsDb</code> instance by loading the database file, i.e. by calling <code>edb <- EnsDb(DBFile)</code> and work with that annotation object.</p>
 <p>To fetch and build annotation packages for plant genomes (e.g. arabidopsis thaliana), the <em>Ensembl genomes</em> should be specified as a host, i.e. setting <code>host</code> to “mysql-eg-publicsql.ebi.ac.uk”, <code>port</code> to <code>4157</code> and <code>species</code> to e.g. “arabidopsis thaliana”.</p>
+</div>
+<div id="from-a-gtf-or-gff-file" class="section level3">
+<h3><span class="header-section-number">10.2.2</span> From a GTF or GFF file</h3>
+<p>Alternatively, the <code>ensDbFromAH</code>, <code>ensDbFromGff</code>, <code>ensDbFromGRanges</code> and <code>ensDbFromGtf</code> functions allow to build EnsDb SQLite files from a <code>GRanges</code> object or GFF/GTF files from Ensembl (either provided as files or <em>via</em> <code>AnnotationHub</code>). These functions do not depend on the Ensembl Perl API, but require a working internet connection to fetch the chromosome lengths from Ensembl as these are not provided within GT [...]
 <p>In the next example we create an <code>EnsDb</code> database using the <code>AnnotationHub</code> package and load also the corresponding genomic DNA sequence matching the Ensembl version. We thus first query the <code>AnnotationHub</code> package for all resources available for <code>Mus musculus</code> and the Ensembl release 77. Next we create the <code>EnsDb</code> object from the appropriate <code>AnnotationHub</code> resource. We then use the <code>getGenomeFaFile</code> method  [...]
 <pre class="r"><code>## Load the AnnotationHub data.
 library(AnnotationHub)
@@ -1031,110 +1237,25 @@ edb <- EnsDb(DbFile)
 Dna <- getGenomeFaFile(edb)
 library(Rsamtools)
 ## We next retrieve the sequence of all exons on chromosome Y.
-exons <- exons(edb, filter = SeqnameFilter("Y"))
+exons <- exons(edb, filter = SeqNameFilter("Y"))
 exonSeq <- getSeq(Dna, exons)
 
 ## Alternatively, look up and retrieve the toplevel DNA sequence manually.
 Dna <- ah[["AH22042"]]</code></pre>
-<p>In the example below we load a <code>GRanges</code> containing gene definitions for genes encoded on chromosome Y and generate a EnsDb SQLite database from that information.</p>
+<p>In the example below we load a <code>GRanges</code> containing gene definitions for genes encoded on chromosome Y and generate a <code>EnsDb</code> SQLite database from that information.</p>
 <pre class="r"><code>## Generate a sqlite database from a GRanges object specifying
 ## genes encoded on chromosome Y
 load(system.file("YGRanges.RData", package = "ensembldb"))
-Y</code></pre>
-<pre><code>## GRanges object with 7155 ranges and 16 metadata columns:
-##          seqnames               ranges strand |               source
-##             <Rle>            <IRanges>  <Rle> |             <factor>
-##      [1]        Y   [2652790, 2652894]      + |                snRNA
-##      [2]        Y   [2652790, 2652894]      + |                snRNA
-##      [3]        Y   [2652790, 2652894]      + |                snRNA
-##      [4]        Y   [2654896, 2655740]      - |       protein_coding
-##      [5]        Y   [2654896, 2655740]      - |       protein_coding
-##      ...      ...                  ...    ... .                  ...
-##   [7151]        Y [28772667, 28773306]      - | processed_pseudogene
-##   [7152]        Y [28772667, 28773306]      - | processed_pseudogene
-##   [7153]        Y [59001391, 59001635]      + |           pseudogene
-##   [7154]        Y [59001391, 59001635]      + | processed_pseudogene
-##   [7155]        Y [59001391, 59001635]      + | processed_pseudogene
-##                type     score     phase         gene_id   gene_name
-##            <factor> <numeric> <integer>     <character> <character>
-##      [1]       gene      <NA>      <NA> ENSG00000251841  RNU6-1334P
-##      [2] transcript      <NA>      <NA> ENSG00000251841  RNU6-1334P
-##      [3]       exon      <NA>      <NA> ENSG00000251841  RNU6-1334P
-##      [4]       gene      <NA>      <NA> ENSG00000184895         SRY
-##      [5] transcript      <NA>      <NA> ENSG00000184895         SRY
-##      ...        ...       ...       ...             ...         ...
-##   [7151] transcript      <NA>      <NA> ENSG00000231514     FAM58CP
-##   [7152]       exon      <NA>      <NA> ENSG00000231514     FAM58CP
-##   [7153]       gene      <NA>      <NA> ENSG00000235857     CTBP2P1
-##   [7154] transcript      <NA>      <NA> ENSG00000235857     CTBP2P1
-##   [7155]       exon      <NA>      <NA> ENSG00000235857     CTBP2P1
-##             gene_source   gene_biotype   transcript_id transcript_name
-##             <character>    <character>     <character>     <character>
-##      [1]        ensembl          snRNA            <NA>            <NA>
-##      [2]        ensembl          snRNA ENST00000516032  RNU6-1334P-201
-##      [3]        ensembl          snRNA ENST00000516032  RNU6-1334P-201
-##      [4] ensembl_havana protein_coding            <NA>            <NA>
-##      [5] ensembl_havana protein_coding ENST00000383070         SRY-001
-##      ...            ...            ...             ...             ...
-##   [7151]         havana     pseudogene ENST00000435741     FAM58CP-001
-##   [7152]         havana     pseudogene ENST00000435741     FAM58CP-001
-##   [7153]         havana     pseudogene            <NA>            <NA>
-##   [7154]         havana     pseudogene ENST00000431853     CTBP2P1-001
-##   [7155]         havana     pseudogene ENST00000431853     CTBP2P1-001
-##          transcript_source exon_number         exon_id         tag
-##                <character>   <numeric>     <character> <character>
-##      [1]              <NA>        <NA>            <NA>        <NA>
-##      [2]           ensembl        <NA>            <NA>        <NA>
-##      [3]           ensembl           1 ENSE00002088309        <NA>
-##      [4]              <NA>        <NA>            <NA>        <NA>
-##      [5]    ensembl_havana        <NA>            <NA>        CCDS
-##      ...               ...         ...             ...         ...
-##   [7151]            havana        <NA>            <NA>        <NA>
-##   [7152]            havana           1 ENSE00001616687        <NA>
-##   [7153]              <NA>        <NA>            <NA>        <NA>
-##   [7154]            havana        <NA>            <NA>        <NA>
-##   [7155]            havana           1 ENSE00001794473        <NA>
-##              ccds_id  protein_id
-##          <character> <character>
-##      [1]        <NA>        <NA>
-##      [2]        <NA>        <NA>
-##      [3]        <NA>        <NA>
-##      [4]        <NA>        <NA>
-##      [5]   CCDS14772        <NA>
-##      ...         ...         ...
-##   [7151]        <NA>        <NA>
-##   [7152]        <NA>        <NA>
-##   [7153]        <NA>        <NA>
-##   [7154]        <NA>        <NA>
-##   [7155]        <NA>        <NA>
-##   -------
-##   seqinfo: 1 sequence from GRCh37 genome</code></pre>
-<pre class="r"><code>DB <- ensDbFromGRanges(Y, path = tempdir(), version = 75,
-               organism = "Homo_sapiens")</code></pre>
-<pre><code>## Warning in ensDbFromGRanges(Y, path = tempdir(), version = 75, organism
-## = "Homo_sapiens"): I'm missing column(s): 'entrezid'. The corresponding
-## database column(s) will be empty!</code></pre>
-<pre class="r"><code>edb <- EnsDb(DB)
+Y
+
+## Create the EnsDb database file
+DB <- ensDbFromGRanges(Y, path = tempdir(), version = 75,
+               organism = "Homo_sapiens")
+
+## Load the database
+edb <- EnsDb(DB)
 edb</code></pre>
-<pre><code>## EnsDb for Ensembl:
-## |Backend: SQLite
-## |Db type: EnsDb
-## |Type of Gene ID: Ensembl Gene ID
-## |Supporting package: ensembldb
-## |Db created by: ensembldb package from Bioconductor
-## |script_version: 0.0.1
-## |Creation time: Wed Nov 16 19:52:30 2016
-## |ensembl_version: 75
-## |ensembl_host: unknown
-## |Organism: Homo_sapiens
-## |genome_build: GRCh37
-## |DBSCHEMAVERSION: 1.0
-## |source_file: GRanges object
-## | No. of genes: 495.
-## | No. of transcripts: 731.</code></pre>
-<pre class="r"><code>## As shown in the example below, we could make an EnsDb package on
-## this DB object using the makeEnsembldbPackage function.</code></pre>
-<p>Alternatively we can build the annotation database using the <code>ensDbFromGtf</code> <code>ensDbFromGff</code> functions, that extracts most of the required data from a GTF respectively GFF (version 3) file which can be downloaded from Ensembl (e.g. from <a href="ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens" class="uri">ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens</a> for human gene definitions from Ensembl version 75; for plant genomes etc files can be retrieved f [...]
+<p>Alternatively we can build the annotation database using the <code>ensDbFromGtf</code> <code>ensDbFromGff</code> functions, that extract most of the required data from a GTF respectively GFF (version 3) file which can be downloaded from Ensembl (e.g. from <a href="ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens" class="uri">ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens</a> for human gene definitions from Ensembl version 75; for plant genomes etc, files can be retrieved f [...]
 <p>Below we create the annotation from a gtf file that we fetch directly from Ensembl.</p>
 <pre class="r"><code>library(ensembldb)
 
@@ -1154,15 +1275,16 @@ makeEnsembldbPackage(ensdb = DB, version = "0.99.12",
              author = "J Rainer")</code></pre>
 </div>
 </div>
+</div>
 <div id="database-layout" class="section level1">
-<h1><span class="header-section-number">11</span> Database layout<a id="orgtarget1"></a></h1>
-<p>The database consists of the following tables and attributes (the layout is also shown in Figure <a href="#orgparagraph1">115</a>):</p>
+<h1><span class="header-section-number">11</span> Database layout<a id="org35014ed"></a></h1>
+<p>The database consists of the following tables and attributes (the layout is also shown in Figure <a href="#org6a42233">159</a>). Note that the protein-specific annotations might not be available in all <code>EnsDB</code> databases (e.g. such ones created with <code>ensembldb</code> version < 1.7 or created from GTF or GFF files).</p>
 <ul>
 <li><strong>gene</strong>: all gene specific annotations.
 <ul>
 <li><code>gene_id</code>: the Ensembl ID of the gene.</li>
-<li><code>gene_name</code>: the name (symbol) of the gene.</li>
-<li><code>entrezid</code>: the NCBI Entrezgene ID(s) of the gene. Note that this can be a <code>;</code> separated list of IDs for genes that are mapped to more than one Entrezgene.</li>
+<li><code>gene_name</code>: the name (symbol) of the gene. <<<<<<< variant A</li>
+<li><code>entrezid</code>: the NCBI Entrezgene ID(s) of the gene. Note that this can be a <code>;</code> separated list of IDs for genes that are mapped to more than one Entrezgene. >>>>>>> variant B ======= end</li>
 <li><code>gene_biotype</code>: the biotype of the gene.</li>
 <li><code>gene_seq_start</code>: the start coordinate of the gene on the sequence (usually a chromosome).</li>
 <li><code>gene_seq_end</code>: the end coordinate of the gene on the sequence.</li>
@@ -1170,6 +1292,11 @@ makeEnsembldbPackage(ensdb = DB, version = "0.99.12",
 <li><code>seq_strand</code>: the strand on which the gene is encoded.</li>
 <li><code>seq_coord_system</code>: the coordinate system of the sequence.</li>
 </ul></li>
+<li><strong>entrezgene</strong>: mapping of Ensembl genes to NCBI Entrezgene identifiers. Note that this mapping can be a one-to-many mapping.
+<ul>
+<li><code>gene_id</code>: the Ensembl gene ID.</li>
+<li><code>entrezid</code>: the NCBI Entrezgene ID.</li>
+</ul></li>
 <li><strong>tx</strong>: all transcript related annotations. Note that while no <code>tx_name</code> column is available in this database column, all methods to retrieve data from the database support also this column. The returned values are however the ID of the transcripts.
 <ul>
 <li><code>tx_id</code>: the Ensembl transcript ID.</li>
@@ -1198,9 +1325,31 @@ makeEnsembldbPackage(ensdb = DB, version = "0.99.12",
 <li><code>seq_length</code>: the length of the sequence.</li>
 <li><code>is_circular</code>: whether the sequence in circular.</li>
 </ul></li>
-<li><strong>information</strong>: some additional, internal, informations (Genome build, Ensembl version etc).
+<li><strong>protein</strong>: provides protein annotation for a (coding) transcript.
+<ul>
+<li><code>protein_id</code>: the Ensembl protein ID.</li>
+<li><code>tx_id</code>: the transcript ID which CDS encodes the protein.</li>
+<li><code>protein_sequence</code>: the peptide sequence of the protein (translated from the transcript’s coding sequence after applying eventual RNA editing).</li>
+</ul></li>
+<li><strong>uniprot</strong>: provides the mapping from Ensembl protein ID(s) to Uniprot ID(s). Not all Ensembl proteins are annotated to Uniprot IDs, also, each Ensembl protein might be mapped to multiple Uniprot IDs.
+<ul>
+<li><code>protein_id</code>: the Ensembl protein ID.</li>
+<li><code>uniprot_id</code>: the Uniprot ID.</li>
+<li><code>uniprot_db</code>: the Uniprot database in which the ID is defined.</li>
+<li><code>uniprot_mapping_type</code>: the type of the mapping method that was used to assign the Uniprot ID to an Ensembl protein ID.</li>
+</ul></li>
+<li><strong>protein_domain</strong>: provides protein domain annotations and mapping to proteins.
 <ul>
-<li><code>key</code></li>
+<li><code>protein_id</code>: the Ensembl protein ID on which the protein domain is present.</li>
+<li><code>protein_domain_id</code>: the ID of the protein domain (from the protein domain source).</li>
+<li><code>protein_domain_source</code>: the source/analysis method in/by which the protein domain was defined (such as pfam etc).</li>
+<li><code>interpro_accession</code>: the Interpro accession ID of the protein domain.</li>
+<li><code>prot_dom_start</code>: the start position of the protein domain within the protein’s sequence.</li>
+<li><code>prot_dom_end</code>: the end position of the protein domain within the protein’s sequence.</li>
+</ul></li>
+<li><strong>metadata</strong>: some additional, internal, informations (Genome build, Ensembl version etc).
+<ul>
+<li><code>name</code></li>
 <li><code>value</code></li>
 </ul></li>
 <li><em>virtual</em> columns:
@@ -1209,59 +1358,36 @@ makeEnsembldbPackage(ensdb = DB, version = "0.99.12",
 <li><code>tx_name</code>: similar to the <code>symbol</code> column, this column is <em>symlinked</em> to the <code>tx_id</code> column.</li>
 </ul></li>
 </ul>
+<p>The database layout: as already described above, protein related annotations (green) might not be available in each <code>EnsDb</code> database.</p>
 <div class="figure">
-<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAACe0AAAhRCAYAAAB251FjAAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAABYlAAAWJQFJUiTwAAAAB3RJTUUH3wMSCDkmQJa4YQAAACZpVFh0Q29tbWVudAAAAAAAQ3JlYXRlZCB3aXRoIEdJTVAgb24gYSBNYWOV5F9bAAAgAElEQVR42uzdd5wcdf3H8ffsbL1+SUghJLkkJKQgUgREARFEioIgiiCo/PDnD6UoWADBgoIgiop0FQQBEcSCSFMiKATpCSU9kN7ucrl+23fn98fc7O3s7l2ubvYur+fjMY+Z+c7c7Oxndvd2dj7z+RrLly2z5sydK/TfiuXLJUnEj/gRP+JH/ED8iB/xI34gfsSP+BE/ED/iR/yIH4gf8SN+xA/Ej/gRP+JH/Igf8SN+xI/4gfj1hYdDDwAAAAAAAAAAAAAAAABA [...]
+<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAA7QAAAIXCAYAAACsOBJaAAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAAB7CAAAewgFu0HU+AAAAB3RJTUUH4AoDBiYgfqjZMwAAACZpVFh0Q29tbWVudAAAAAAAQ3JlYXRlZCB3aXRoIEdJTVAgb24gYSBNYWOV5F9bAAAgAElEQVR42uydeZgdRdX/P9Xdd519ksxMNkgICQkEAhhCEraQgCzKjrKJYUfxlU1EUFHwRdDfq4KAomzKEgVE2U2QiGEPewAJCSZAFkgy+3bXXur3R3ffZebOZDK5F5mkvs/Tz8zc6VvdXX3q1Pmec+qUkFImgDBFQiwWIx6PM2LECEqBjo4OAKqqqkrSflNTExUVFYTD4aK3LaVk48aN1NfXI4RACFHU9m3bZuPGjYwePbokfZNOp2lubmbUqFElaT8ej9Pd3U1d [...]
 <p class="caption">img</p>
 </div>
-<div id="footnotes">
-<h2 class="footnotes">
-Footnotes:
-</h2>
-<div id="text-footnotes">
-<div class="footdef">
-<sup><a id="fn.1" class="footnum" href="#fnr.1">1</a></sup>
-<div class="footpara">
-<a href="http://www.ensembl.org" class="uri">http://www.ensembl.org</a>
-</div>
-</div>
-<div class="footdef">
-<sup><a id="fn.2" class="footnum" href="#fnr.2">2</a></sup>
-<div class="footpara">
-<a href="http://www.lrg-sequence.org" class="uri">http://www.lrg-sequence.org</a>
-</div>
-</div>
-<div class="footdef">
-<sup><a id="fn.3" class="footnum" href="#fnr.3">3</a></sup>
-<div class="footpara">
-<a href="http://www.ncbi.nlm.nih.gov/pubmed/23950696" class="uri">http://www.ncbi.nlm.nih.gov/pubmed/23950696</a>
-</div>
-</div>
-<div class="footdef">
-<sup><a id="fn.4" class="footnum" href="#fnr.4">4</a></sup>
-<div class="footpara">
-<a href="http://www.ncbi.nlm.nih.gov/pubmed/24227677" class="uri">http://www.ncbi.nlm.nih.gov/pubmed/24227677</a>
-</div>
-</div>
-<div class="footdef">
-<sup><a id="fn.5" class="footnum" href="#fnr.5">5</a></sup>
-<div class="footpara">
-<a href="http://www.ensembl.org/info/docs/api/api_installation.html" class="uri">http://www.ensembl.org/info/docs/api/api_installation.html</a>
-</div>
-</div>
-</div>
 </div>
+<div id="footnotes" class="section level1">
+<h1><span class="header-section-number">12</span> Footnotes</h1>
+<p><sup><a id="fn.1" href="#fnr.1">1</a></sup> <a href="http://www.ensembl.org" class="uri">http://www.ensembl.org</a></p>
+<p><sup><a id="fn.2" href="#fnr.2">2</a></sup> <a href="http://www.lrg-sequence.org" class="uri">http://www.lrg-sequence.org</a></p>
+<p><sup><a id="fn.3" href="#fnr.3">3</a></sup> <a href="http://www.ncbi.nlm.nih.gov/pubmed/23950696" class="uri">http://www.ncbi.nlm.nih.gov/pubmed/23950696</a></p>
+<p><sup><a id="fn.4" href="#fnr.4">4</a></sup> <a href="http://www.ncbi.nlm.nih.gov/pubmed/24227677" class="uri">http://www.ncbi.nlm.nih.gov/pubmed/24227677</a></p>
+<p><sup><a id="fn.5" href="#fnr.5">5</a></sup> <a href="http://www.ensembl.org/info/docs/api/api_installation.html" class="uri">http://www.ensembl.org/info/docs/api/api_installation.html</a></p>
 </div>
 
 
 
+</div>
+</div>
 
 </div>
 
 <script>
 
 // add bootstrap table styles to pandoc tables
-$(document).ready(function () {
+function bootstrapStylePandocTables() {
   $('tr.header').parent('thead').parent('table').addClass('table table-condensed');
+}
+$(document).ready(function () {
+  bootstrapStylePandocTables();
 });
 
 
@@ -1269,12 +1395,6 @@ $(document).ready(function () {
 
 <script type="text/x-mathjax-config">
   MathJax.Hub.Config({
-    TeX: {
-      TagSide: "right",
-      equationNumbers: {
-        autoNumber: "AMS"
-      }
-    },
     "HTML-CSS": {
       styles: {
         ".MathJax_Display": {
@@ -1291,7 +1411,7 @@ $(document).ready(function () {
   (function () {
     var script = document.createElement("script");
     script.type = "text/javascript";
-    script.src  = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
+    script.src  = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
     document.getElementsByTagName("head")[0].appendChild(script);
   })();
 </script>
diff --git a/inst/doc/proteins.R b/inst/doc/proteins.R
new file mode 100644
index 0000000..ef19851
--- /dev/null
+++ b/inst/doc/proteins.R
@@ -0,0 +1,94 @@
+## ----doeval, echo = FALSE, results = "hide"--------------------------------
+## Globally switch off execution of code chunks
+evalMe <- FALSE
+haveProt <- FALSE
+
+## ----loadlib, message = FALSE, eval = evalMe-------------------------------
+#  library(ensembldb)
+#  library(EnsDb.Hsapiens.v75)
+#  edb <- EnsDb.Hsapiens.v75
+#  ## Evaluate whether we have protein annotation available
+#  hasProteinData(edb)
+
+## ----listCols, message = FALSE, eval = evalMe------------------------------
+#  listTables(edb)
+
+## ----haveprot, echo = FALSE, results = "hide", eval = evalMe---------------
+#  ## Use this to conditionally disable eval on following chunks
+#  haveProt <- hasProteinData(edb) & evalMe
+
+## ----a_transcripts, eval = haveProt----------------------------------------
+#  ## Get also protein information for ZBTB16 transcripts
+#  txs <- transcripts(edb, filter = GenenameFilter("ZBTB16"),
+#  		   columns = c("protein_id", "uniprot_id", "tx_biotype"))
+#  txs
+
+## ----a_transcripts_coding_noncoding, eval = haveProt-----------------------
+#  ## Subset to transcripts with tx_biotype other than protein_coding.
+#  txs[txs$tx_biotype != "protein_coding", c("uniprot_id", "tx_biotype",
+#  					  "protein_id")]
+
+## ----a_transcripts_coding, eval = haveProt---------------------------------
+#  ## List the protein IDs and uniprot IDs for the coding transcripts
+#  mcols(txs[txs$tx_biotype == "protein_coding",
+#  	  c("tx_id", "protein_id", "uniprot_id")])
+
+## ----a_transcripts_coding_up, eval = haveProt------------------------------
+#  ## List all uniprot mapping types in the database.
+#  listUniprotMappingTypes(edb)
+#  
+#  ## Get all protein_coding transcripts of ZBTB16 along with their protein_id
+#  ## and Uniprot IDs, restricting to protein_id to uniprot_id mappings based
+#  ## on "DIRECT" mapping methods.
+#  txs <- transcripts(edb, filter = list(GenenameFilter("ZBTB16"),
+#  				      UniprotMappingTypeFilter("DIRECT")),
+#  		   columns = c("protein_id", "uniprot_id", "uniprot_db"))
+#  mcols(txs)
+
+## ----a_genes_protdomid_filter, eval = haveProt-----------------------------
+#  ## Get all genes that encode a transcript encoding for a protein that contains
+#  ## a certain protein domain.
+#  gns <- genes(edb, filter = ProtDomIdFilter("PS50097"))
+#  length(gns)
+#  
+#  sort(gns$gene_name)
+
+## ----a_2_annotationdbi, message = FALSE, eval = haveProt-------------------
+#  ## Show all columns that are provided by the database
+#  columns(edb)
+#  
+#  ## Show all key types/filters that are supported
+#  keytypes(edb)
+
+## ----a_2_select, message = FALSE, eval = haveProt--------------------------
+#  select(edb, keys = "ZBTB16", keytype = "GENENAME",
+#         columns = "UNIPROTID")
+
+## ----a_2_select_nmd, message = FALSE, eval = haveProt----------------------
+#  ## Call select, this time providing a GenenameFilter.
+#  select(edb, keys = GenenameFilter("ZBTB16"),
+#         columns = c("TXBIOTYPE", "UNIPROTID", "PROTEINID"))
+
+## ----b_proteins, message = FALSE, eval = haveProt--------------------------
+#  ## Get all proteins and return them as an AAStringSet
+#  prts <- proteins(edb, filter = GenenameFilter("ZBTB16"),
+#  		 return.type = "AAStringSet")
+#  prts
+
+## ----b_proteins_mcols, message = FALSE, eval = haveProt--------------------
+#  mcols(prts)
+
+## ----b_proteins_prot_doms, message = FALSE, eval = haveProt----------------
+#  ## Get also protein domain annotations in addition to the protein annotations.
+#  pd <- proteins(edb, filter = GenenameFilter("ZBTB16"),
+#  	       columns = c("tx_id", listColumns(edb, "protein_domain")),
+#  	       return.type = "AAStringSet")
+#  pd
+
+## ----b_proteins_prot_doms_2, message = FALSE, eval = haveProt--------------
+#  ## The number of protein domains per protein:
+#  table(names(pd))
+#  
+#  ## The mcols
+#  mcols(pd)
+
diff --git a/inst/doc/proteins.Rmd b/inst/doc/proteins.Rmd
new file mode 100644
index 0000000..7bf98ab
--- /dev/null
+++ b/inst/doc/proteins.Rmd
@@ -0,0 +1,273 @@
+---
+title: "Querying protein features"
+author: "Johannes Rainer"
+graphics: yes
+package: ensembldb
+output:
+  BiocStyle::html_document2:
+    toc_float: true
+vignette: >
+  %\VignetteIndexEntry{Querying protein features}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+  %\VignetteDepends{ensembldb,EnsDb.Hsapiens.v75,BiocStyle}
+---
+
+From Bioconductor release 3.5 on, `EnsDb` databases/packages created by the
+`ensembldb` package contain also, for transcripts with a coding regions, mappings
+between transcripts and proteins. Thus, in addition to the RNA/DNA-based
+features also the following protein related information is available:
+
+-   `protein_id`: the Ensembl protein ID. This is the primary ID for the proteins
+    defined in Ensembl and each (protein coding) Ensembl transcript has one
+    protein ID assigned to it.
+-   `protein_sequence`: the amino acid sequence of a protein.
+-   `uniprot_id`: the Uniprot ID for a protein. Note that not every Ensembl
+    `protein_id` has an Uniprot ID, and each `protein_id` might be mapped to several
+    `uniprot_id`. Also, the same Uniprot ID might be mapped to different `protein_id`.
+-   `uniprot_db`: the name of the Uniprot database in which the feature is
+    annotated. Can be either *SPTREMBL* or *SWISSPROT*.
+-   `uniprot_mapping_type`: the type of the mapping method that was used to assign
+    the Uniprot ID to the Ensembl protein ID.
+-   `protein_domain_id`: the ID of the protein domain according to the
+    source/analysis in/by which is was defined.
+-   `protein_domain_source`: the source of the protein domain information, one of
+    *pfscan*, *scanprosite*, *superfamily*, *pfam*, *prints*, *smart*, *pirsf* or *tigrfam*.
+-   `interpro_accession`: the Interpro accession ID of the protein domain (if
+    available).
+-   `prot_dom_start`: the start of the protein domain within the sequence of
+    the protein.
+-   `prot_dom_start`: the end position of the protein domain within the
+    sequence of the protein.
+
+Thus, for protein coding transcripts, these annotations can be fetched from the
+database too, given that protein annotations are available. Note that only `EnsDb`
+databases created through the Ensembl Perl API contain protein annotation, while
+databases created using `ensDbFromAH`, `ensDbFromGff`, `ensDbFromGRanges` and
+`ensDbFromGtf` don't.
+
+```{r doeval, echo = FALSE, results = "hide"}
+## Globally switch off execution of code chunks
+evalMe <- FALSE
+haveProt <- FALSE
+```
+
+```{r loadlib, message = FALSE, eval = evalMe}
+library(ensembldb)
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+## Evaluate whether we have protein annotation available
+hasProteinData(edb)
+```
+
+If protein annotation is available, the additional tables and columns are also
+listed by the `listTables` and `listColumns` methods.
+
+```{r listCols, message = FALSE, eval = evalMe}
+listTables(edb)
+```
+
+In the following sections we show examples how to 1) fetch protein annotations
+as additional columns to gene/transcript annotations, 2) fetch protein
+annotation data and 3) map proteins to the genome.
+
+```{r haveprot, echo = FALSE, results = "hide", eval = evalMe}
+## Use this to conditionally disable eval on following chunks
+haveProt <- hasProteinData(edb) & evalMe
+```
+
+
+# Fetch protein annotation for genes and transcripts
+
+Protein annotations for (protein coding) transcripts can be retrieved by simply
+adding the desired annotation columns to the `columns` parameter of the e.g. `genes`
+or `transcripts` methods.
+
+```{r a_transcripts, eval = haveProt}
+## Get also protein information for ZBTB16 transcripts
+txs <- transcripts(edb, filter = GenenameFilter("ZBTB16"),
+		   columns = c("protein_id", "uniprot_id", "tx_biotype"))
+txs
+```
+
+The gene ZBTB16 has protein coding and non-coding transcripts, thus, we get the
+protein ID for the coding- and `NA` for the non-coding transcripts. Note also that
+we have a transcript targeted for nonsense mediated mRNA-decay with a protein ID
+associated with it, but no Uniprot ID.
+
+```{r a_transcripts_coding_noncoding, eval = haveProt}
+## Subset to transcripts with tx_biotype other than protein_coding.
+txs[txs$tx_biotype != "protein_coding", c("uniprot_id", "tx_biotype",
+					  "protein_id")]
+```
+
+While the mapping from a protein coding transcript to a Ensembl protein ID
+(column `protein_id`) is 1:1, the mapping between `protein_id` and `uniprot_id` can be
+n:m, i.e. each Ensembl protein ID can be mapped to 1 or more Uniprot IDs and
+each Uniprot ID can be mapped to more than one `protein_id` (and hence
+`tx_id`). This should be kept in mind if querying transcripts from the database
+fetching Uniprot related additional columns or even protein ID features, as in
+such cases a redundant list of transcripts is returned.
+
+```{r a_transcripts_coding, eval = haveProt}
+## List the protein IDs and uniprot IDs for the coding transcripts
+mcols(txs[txs$tx_biotype == "protein_coding",
+	  c("tx_id", "protein_id", "uniprot_id")])
+```
+
+Some of the n:m mappings for Uniprot IDs can be resolved by restricting either
+to entries from one Uniprot database (*SPTREMBL* or *SWISSPROT*) or to mappings of a
+certain type of mapping method. The corresponding filters are the
+`UniprotDbFilter` and the `UniprotMappingTypeFilter` (using the `uniprot_db` and
+`uniprot_mapping_type` columns of the `uniprot` database table). In the example
+below we restrict the result to Uniprot IDs with the mapping type *DIRECT*.
+
+```{r a_transcripts_coding_up, eval = haveProt}
+## List all uniprot mapping types in the database.
+listUniprotMappingTypes(edb)
+
+## Get all protein_coding transcripts of ZBTB16 along with their protein_id
+## and Uniprot IDs, restricting to protein_id to uniprot_id mappings based
+## on "DIRECT" mapping methods.
+txs <- transcripts(edb, filter = list(GenenameFilter("ZBTB16"),
+				      UniprotMappingTypeFilter("DIRECT")),
+		   columns = c("protein_id", "uniprot_id", "uniprot_db"))
+mcols(txs)
+```
+
+For this example the use of the `UniprotMappingTypeFilter` resolved the multiple
+mapping of Uniprot IDs to Ensembl protein IDs, but the Uniprot ID *Q05516* is
+still assigned to the two Ensembl protein IDs *ENSP00000338157* and
+*ENSP00000376721*.
+
+All protein annotations can also be added as *metadata columns* to the
+results of the `genes`, `exons`, `exonsBy`, `transcriptsBy`, `cdsBy`, `fiveUTRsByTranscript`
+and `threeUTRsByTranscript` methods by specifying the desired column names with
+the `columns` parameter. For non coding transcripts `NA` will be reported in the
+protein annotation columns.
+
+In addition to retrieve protein annotations from the database, we can also use
+protein data to filter the results. In the example below we fetch for example
+all genes from the database that have a certain protein domain in the protein
+encoded by any of its transcripts.
+
+```{r a_genes_protdomid_filter, eval = haveProt}
+## Get all genes that encode a transcript encoding for a protein that contains
+## a certain protein domain.
+gns <- genes(edb, filter = ProtDomIdFilter("PS50097"))
+length(gns)
+
+sort(gns$gene_name)
+```
+
+So, in total we got 152 genes with that protein domain. In addition to the
+`ProtDomIdFilter`, also the `ProteinidFilter` and the `UniprotidFilter` can be used to
+query the database for entries matching conditions on their protein ID or
+Uniprot ID.
+
+
+# Use methods from the `AnnotationDbi` package to query protein annotation
+
+The `select`, `keys` and `mapIds` methods from the `AnnotationDbi` package can also be
+used to query `EnsDb` objects for protein annotations. Supported columns and
+key types are returned by the `columns` and `keytypes` methods.
+
+```{r a_2_annotationdbi, message = FALSE, eval = haveProt}
+## Show all columns that are provided by the database
+columns(edb)
+
+## Show all key types/filters that are supported
+keytypes(edb)
+```
+
+Below we fetch all Uniprot IDs annotated to the gene *ZBTB16*.
+
+```{r a_2_select, message = FALSE, eval = haveProt}
+select(edb, keys = "ZBTB16", keytype = "GENENAME",
+       columns = "UNIPROTID")
+```
+
+This returns us all Uniprot IDs of all proteins encoded by the gene's
+transcripts. One of the transcripts from ZBTB16, while having a CDS and being
+annotated to a protein, does not have an Uniprot ID assigned (thus `NA` is
+returned by the above call). As we see below, this transcript is targeted for
+non sense mediated mRNA decay.
+
+```{r a_2_select_nmd, message = FALSE, eval = haveProt}
+## Call select, this time providing a GenenameFilter.
+select(edb, keys = GenenameFilter("ZBTB16"),
+       columns = c("TXBIOTYPE", "UNIPROTID", "PROTEINID"))
+```
+
+Note also that we passed this time a `GenenameFilter` with the `keys` parameter.
+
+
+# Retrieve proteins from the database
+
+Proteins can be fetched using the dedicated `proteins` method that returns, unlike
+DNA/RNA-based methods like `genes` or `transcripts`, not a `GRanges` object by
+default, but a `DataFrame` object. Alternatively, results can be returned as a
+`data.frame` or as an `AAStringSet` object from the `Biobase` package. Note that this
+might change in future releases if a more appropriate object to represent
+protein annotations becomes available.
+
+In the code chunk below we fetch all protein annotations for the gene *ZBTB16*.
+
+```{r b_proteins, message = FALSE, eval = haveProt}
+## Get all proteins and return them as an AAStringSet
+prts <- proteins(edb, filter = GenenameFilter("ZBTB16"),
+		 return.type = "AAStringSet")
+prts
+```
+
+Besides the amino acid sequence, the `prts` contains also additional annotations
+that can be accessed with the `mcols` method (metadata columns). All additional
+columns provided with the parameter `columns` are also added to the `mcols`
+`DataFrame`.
+
+```{r b_proteins_mcols, message = FALSE, eval = haveProt}
+mcols(prts)
+```
+
+Note that the `proteins` method will retrieve only gene/transcript annotations of
+transcripts encoding a protein. Thus annotations for the non-coding transcripts
+of the gene *ZBTB16*, that were returned by calls to `genes` or `transcripts` in the
+previous section are not fetched.
+
+Querying in addition Uniprot identifiers or protein domain data will result at
+present in a redundant list of proteins as shown in the code block below.
+
+```{r b_proteins_prot_doms, message = FALSE, eval = haveProt}
+## Get also protein domain annotations in addition to the protein annotations.
+pd <- proteins(edb, filter = GenenameFilter("ZBTB16"),
+	       columns = c("tx_id", listColumns(edb, "protein_domain")),
+	       return.type = "AAStringSet")
+pd
+```
+
+The result contains one row/element for each protein domain in each of the
+proteins. The number of protein domains per protein and the `mcols` are shown
+below.
+
+```{r b_proteins_prot_doms_2, message = FALSE, eval = haveProt}
+## The number of protein domains per protein:
+table(names(pd))
+
+## The mcols
+mcols(pd)
+```
+
+As we can see each protein can have several protein domains with the start and
+end coordinates within the amino acid sequence being reported in columns
+`prot_dom_start` and `prot_dom_end`. Also, not all Ensembl protein IDs, like
+`protein_id` *ENSP00000445047* are mapped to an Uniprot ID or have protein domains.
+
+
+# Map peptide features within proteins to the genome
+
+Functionality to map peptide features (i.e. ranges within the amino acid
+sequence of the protein) to genomic coordinates are provided by the `Pbase`
+Bioconductor package. These rely in part on the protein annotations provided by
+`EnsDb` databases. See the corresponding vignette *Pbase-with-ensembldb* in that
+package.
+
diff --git a/inst/doc/proteins.html b/inst/doc/proteins.html
new file mode 100644
index 0000000..885cfc2
--- /dev/null
+++ b/inst/doc/proteins.html
@@ -0,0 +1,369 @@
+<!DOCTYPE html>
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+
+<meta charset="utf-8" />
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="pandoc" />
+
+
+<meta name="author" content="Johannes Rainer" />
+
+
+<title>Querying protein features</title>
+
+<script src="data:application/x-javascript;base64,LyohIGpRdWVyeSB2MS4xMS4zIHwgKGMpIDIwMDUsIDIwMTUgalF1ZXJ5IEZvdW5kYXRpb24sIEluYy4gfCBqcXVlcnkub3JnL2xpY2Vuc2UgKi8KIWZ1bmN0aW9uKGEsYil7Im9iamVjdCI9PXR5cGVvZiBtb2R1bGUmJiJvYmplY3QiPT10eXBlb2YgbW9kdWxlLmV4cG9ydHM/bW9kdWxlLmV4cG9ydHM9YS5kb2N1bWVudD9iKGEsITApOmZ1bmN0aW9uKGEpe2lmKCFhLmRvY3VtZW50KXRocm93IG5ldyBFcnJvcigialF1ZXJ5IHJlcXVpcmVzIGEgd2luZG93IHdpdGggYSBkb2N1bWVudCIpO3JldHVybiBiKGEpfTpiKGEpfSgidW5kZWZpbmVkIiE9dHlwZW9mIHdpbmRvdz93aW5kb3c6dG [...]
+<meta name="viewport" content="width=device-width, initial-scale=1" />
+<link href="data:text/css;charset=utf-8,html%7Bfont%2Dfamily%3Asans%2Dserif%3B%2Dwebkit%2Dtext%2Dsize%2Dadjust%3A100%25%3B%2Dms%2Dtext%2Dsize%2Dadjust%3A100%25%7Dbody%7Bmargin%3A0%7Darticle%2Caside%2Cdetails%2Cfigcaption%2Cfigure%2Cfooter%2Cheader%2Chgroup%2Cmain%2Cmenu%2Cnav%2Csection%2Csummary%7Bdisplay%3Ablock%7Daudio%2Ccanvas%2Cprogress%2Cvideo%7Bdisplay%3Ainline%2Dblock%3Bvertical%2Dalign%3Abaseline%7Daudio%3Anot%28%5Bcontrols%5D%29%7Bdisplay%3Anone%3Bheight%3A0%7D%5Bhidden%5D%2Ctem [...]
+<script src="data:application/x-javascript;base64,LyohCiAqIEJvb3RzdHJhcCB2My4zLjUgKGh0dHA6Ly9nZXRib290c3RyYXAuY29tKQogKiBDb3B5cmlnaHQgMjAxMS0yMDE1IFR3aXR0ZXIsIEluYy4KICogTGljZW5zZWQgdW5kZXIgdGhlIE1JVCBsaWNlbnNlCiAqLwppZigidW5kZWZpbmVkIj09dHlwZW9mIGpRdWVyeSl0aHJvdyBuZXcgRXJyb3IoIkJvb3RzdHJhcCdzIEphdmFTY3JpcHQgcmVxdWlyZXMgalF1ZXJ5Iik7K2Z1bmN0aW9uKGEpeyJ1c2Ugc3RyaWN0Ijt2YXIgYj1hLmZuLmpxdWVyeS5zcGxpdCgiICIpWzBdLnNwbGl0KCIuIik7aWYoYlswXTwyJiZiWzFdPDl8fDE9PWJbMF0mJjk9PWJbMV0mJmJbMl08MSl0aHJvdy [...]
+<script src="data:application/x-javascript;base64,LyoqCiogQHByZXNlcnZlIEhUTUw1IFNoaXYgMy43LjIgfCBAYWZhcmthcyBAamRhbHRvbiBAam9uX25lYWwgQHJlbSB8IE1JVC9HUEwyIExpY2Vuc2VkCiovCi8vIE9ubHkgcnVuIHRoaXMgY29kZSBpbiBJRSA4CmlmICghIXdpbmRvdy5uYXZpZ2F0b3IudXNlckFnZW50Lm1hdGNoKCJNU0lFIDgiKSkgewohZnVuY3Rpb24oYSxiKXtmdW5jdGlvbiBjKGEsYil7dmFyIGM9YS5jcmVhdGVFbGVtZW50KCJwIiksZD1hLmdldEVsZW1lbnRzQnlUYWdOYW1lKCJoZWFkIilbMF18fGEuZG9jdW1lbnRFbGVtZW50O3JldHVybiBjLmlubmVySFRNTD0ieDxzdHlsZT4iK2IrIjwvc3R5bGU+IixkLm [...]
+<script src="data:application/x-javascript;base64,LyohIFJlc3BvbmQuanMgdjEuNC4yOiBtaW4vbWF4LXdpZHRoIG1lZGlhIHF1ZXJ5IHBvbHlmaWxsICogQ29weXJpZ2h0IDIwMTMgU2NvdHQgSmVobAogKiBMaWNlbnNlZCB1bmRlciBodHRwczovL2dpdGh1Yi5jb20vc2NvdHRqZWhsL1Jlc3BvbmQvYmxvYi9tYXN0ZXIvTElDRU5TRS1NSVQKICogICovCgovLyBPbmx5IHJ1biB0aGlzIGNvZGUgaW4gSUUgOAppZiAoISF3aW5kb3cubmF2aWdhdG9yLnVzZXJBZ2VudC5tYXRjaCgiTVNJRSA4IikpIHsKIWZ1bmN0aW9uKGEpeyJ1c2Ugc3RyaWN0IjthLm1hdGNoTWVkaWE9YS5tYXRjaE1lZGlhfHxmdW5jdGlvbihhKXt2YXIgYixjPWEuZG [...]
+<script src="data:application/x-javascript;base64,LyohIGpRdWVyeSBVSSAtIHYxLjExLjQgLSAyMDE2LTAxLTA1CiogaHR0cDovL2pxdWVyeXVpLmNvbQoqIEluY2x1ZGVzOiBjb3JlLmpzLCB3aWRnZXQuanMsIG1vdXNlLmpzLCBwb3NpdGlvbi5qcywgZHJhZ2dhYmxlLmpzLCBkcm9wcGFibGUuanMsIHJlc2l6YWJsZS5qcywgc2VsZWN0YWJsZS5qcywgc29ydGFibGUuanMsIGFjY29yZGlvbi5qcywgYXV0b2NvbXBsZXRlLmpzLCBidXR0b24uanMsIGRpYWxvZy5qcywgbWVudS5qcywgcHJvZ3Jlc3NiYXIuanMsIHNlbGVjdG1lbnUuanMsIHNsaWRlci5qcywgc3Bpbm5lci5qcywgdGFicy5qcywgdG9vbHRpcC5qcywgZWZmZWN0LmpzLC [...]
+<link href="data:text/css;charset=utf-8,%0A%0A%2Etocify%20%7B%0Awidth%3A%2020%25%3B%0Amax%2Dheight%3A%2090%25%3B%0Aoverflow%3A%20auto%3B%0Amargin%2Dleft%3A%202%25%3B%0Aposition%3A%20fixed%3B%0Aborder%3A%201px%20solid%20%23ccc%3B%0Awebkit%2Dborder%2Dradius%3A%206px%3B%0Amoz%2Dborder%2Dradius%3A%206px%3B%0Aborder%2Dradius%3A%206px%3B%0A%7D%0A%0A%2Etocify%20ul%2C%20%2Etocify%20li%20%7B%0Alist%2Dstyle%3A%20none%3B%0Amargin%3A%200%3B%0Apadding%3A%200%3B%0Aborder%3A%20none%3B%0Aline%2Dheight%3 [...]
+<script src="data:application/x-javascript;base64,LyoganF1ZXJ5IFRvY2lmeSAtIHYxLjkuMSAtIDIwMTMtMTAtMjIKICogaHR0cDovL3d3dy5ncmVnZnJhbmtvLmNvbS9qcXVlcnkudG9jaWZ5LmpzLwogKiBDb3B5cmlnaHQgKGMpIDIwMTMgR3JlZyBGcmFua287IExpY2Vuc2VkIE1JVCAqLwoKLy8gSW1tZWRpYXRlbHktSW52b2tlZCBGdW5jdGlvbiBFeHByZXNzaW9uIChJSUZFKSBbQmVuIEFsbWFuIEJsb2cgUG9zdF0oaHR0cDovL2JlbmFsbWFuLmNvbS9uZXdzLzIwMTAvMTEvaW1tZWRpYXRlbHktaW52b2tlZC1mdW5jdGlvbi1leHByZXNzaW9uLykgdGhhdCBjYWxscyBhbm90aGVyIElJRkUgdGhhdCBjb250YWlucyBhbGwgb2YgdG [...]
+<script src="data:application/x-javascript;base64,CgovKioKICogalF1ZXJ5IFBsdWdpbjogU3RpY2t5IFRhYnMKICoKICogQGF1dGhvciBBaWRhbiBMaXN0ZXIgPGFpZGFuQHBocC5uZXQ+CiAqIGFkYXB0ZWQgYnkgUnViZW4gQXJzbGFuIHRvIGFjdGl2YXRlIHBhcmVudCB0YWJzIHRvbwogKiBodHRwOi8vd3d3LmFpZGFubGlzdGVyLmNvbS8yMDE0LzAzL3BlcnNpc3RpbmctdGhlLXRhYi1zdGF0ZS1pbi1ib290c3RyYXAvCiAqLwooZnVuY3Rpb24oJCkgewogICJ1c2Ugc3RyaWN0IjsKICAkLmZuLnJtYXJrZG93blN0aWNreVRhYnMgPSBmdW5jdGlvbigpIHsKICAgIHZhciBjb250ZXh0ID0gdGhpczsKICAgIC8vIFNob3cgdGhlIHRhYi [...]
+<link href="data:text/css;charset=utf-8,pre%20%2Eoperator%2C%0Apre%20%2Eparen%20%7B%0Acolor%3A%20rgb%28104%2C%20118%2C%20135%29%0A%7D%0Apre%20%2Eliteral%20%7B%0Acolor%3A%20%23990073%0A%7D%0Apre%20%2Enumber%20%7B%0Acolor%3A%20%23099%3B%0A%7D%0Apre%20%2Ecomment%20%7B%0Acolor%3A%20%23998%3B%0Afont%2Dstyle%3A%20italic%0A%7D%0Apre%20%2Ekeyword%20%7B%0Acolor%3A%20%23900%3B%0Afont%2Dweight%3A%20bold%0A%7D%0Apre%20%2Eidentifier%20%7B%0Acolor%3A%20rgb%280%2C%200%2C%200%29%3B%0A%7D%0Apre%20%2Estri [...]
+<script src="data:application/x-javascript;base64,dmFyIGhsanM9bmV3IGZ1bmN0aW9uKCl7ZnVuY3Rpb24gbShwKXtyZXR1cm4gcC5yZXBsYWNlKC8mL2dtLCImYW1wOyIpLnJlcGxhY2UoLzwvZ20sIiZsdDsiKX1mdW5jdGlvbiBmKHIscSxwKXtyZXR1cm4gUmVnRXhwKHEsIm0iKyhyLmNJPyJpIjoiIikrKHA/ImciOiIiKSl9ZnVuY3Rpb24gYihyKXtmb3IodmFyIHA9MDtwPHIuY2hpbGROb2Rlcy5sZW5ndGg7cCsrKXt2YXIgcT1yLmNoaWxkTm9kZXNbcF07aWYocS5ub2RlTmFtZT09IkNPREUiKXtyZXR1cm4gcX1pZighKHEubm9kZVR5cGU9PTMmJnEubm9kZVZhbHVlLm1hdGNoKC9ccysvKSkpe2JyZWFrfX19ZnVuY3Rpb24gaCh0LH [...]
+
+<style type="text/css">code{white-space: pre;}</style>
+<style type="text/css">
+
+</style>
+<script type="text/javascript">
+if (window.hljs && document.readyState && document.readyState === "complete") {
+   window.setTimeout(function() {
+      hljs.initHighlighting();
+   }, 0);
+}
+</script>
+
+
+
+<style type="text/css">
+h1 {
+  font-size: 34px;
+}
+h1.title {
+  font-size: 38px;
+}
+h2 {
+  font-size: 30px;
+}
+h3 {
+  font-size: 24px;
+}
+h4 {
+  font-size: 18px;
+}
+h5 {
+  font-size: 16px;
+}
+h6 {
+  font-size: 12px;
+}
+.table th:not([align]) {
+  text-align: left;
+}
+</style>
+
+<link href="data:text/css;charset=utf-8,body%20%7B%0Amargin%3A%200px%20auto%3B%0Amax%2Dwidth%3A%201134px%3B%0A%7D%0Abody%2C%20td%20%7B%0Afont%2Dfamily%3A%20sans%2Dserif%3B%0Afont%2Dsize%3A%2010pt%3B%0A%7D%0A%0Adiv%23TOC%20ul%20%7B%0Apadding%3A%200px%200px%200px%2045px%3B%0Alist%2Dstyle%3A%20none%3B%0Abackground%2Dimage%3A%20none%3B%0Abackground%2Drepeat%3A%20none%3B%0Abackground%2Dposition%3A%200%3B%0Afont%2Dsize%3A%2010pt%3B%0Afont%2Dfamily%3A%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B [...]
+
+</head>
+
+<body>
+
+<style type="text/css">
+.main-container {
+  max-width: 828px;
+  margin-left: auto;
+  margin-right: auto;
+}
+
+img {
+  max-width:100%;
+  height: auto;
+}
+.tabbed-pane {
+  padding-top: 12px;
+}
+button.code-folding-btn:focus {
+  outline: none;
+}
+</style>
+
+
+
+<div class="container-fluid main-container">
+
+<!-- tabsets -->
+<script>
+$(document).ready(function () {
+  window.buildTabsets("TOC");
+});
+</script>
+
+<!-- code folding -->
+
+
+
+
+<script>
+$(document).ready(function ()  {
+
+    // move toc-ignore selectors from section div to header
+    $('div.section.toc-ignore')
+        .removeClass('toc-ignore')
+        .children('h1,h2,h3,h4,h5').addClass('toc-ignore');
+
+    // establish options
+    var options = {
+      selectors: "h1,h2,h3",
+      theme: "bootstrap3",
+      context: '.toc-content',
+      hashGenerator: function (text) {
+        return text.replace(/[.\\/?&!#<>]/g, '').replace(/\s/g, '_').toLowerCase();
+      },
+      ignoreSelector: ".toc-ignore",
+      scrollTo: 0
+    };
+    options.showAndHide = true;
+    options.smoothScroll = true;
+
+    // tocify
+    var toc = $("#TOC").tocify(options).data("toc-tocify");
+});
+</script>
+
+<style type="text/css">
+
+#TOC {
+  margin: 25px 0px 20px 0px;
+}
+ at media (max-width: 768px) {
+#TOC {
+  position: relative;
+  width: 100%;
+}
+}
+
+
+
+
+div.main-container {
+  max-width: 1200px;
+}
+
+div.tocify {
+  width: 20%;
+  max-width: 246px;
+  max-height: 85%;
+}
+
+ at media (min-width: 768px) and (max-width: 991px) {
+  div.tocify {
+    width: 25%;
+  }
+}
+
+ at media (max-width: 767px) {
+  div.tocify {
+    width: 100%;
+    max-width: none;
+  }
+}
+
+.tocify ul, .tocify li {
+  line-height: 20px;
+}
+
+.tocify-subheader .tocify-item {
+  font-size: 0.90em;
+  padding-left: 25px;
+  text-indent: 0;
+}
+
+.tocify .list-group-item {
+  border-radius: 0px;
+}
+
+
+</style>
+
+<!-- setup 3col/9col grid for toc_float and main content  -->
+<div class="row-fluid">
+<div class="col-xs-12 col-sm-4 col-md-3">
+<div id="TOC" class="tocify">
+</div>
+</div>
+
+<div class="toc-content col-xs-12 col-sm-8 col-md-9">
+
+
+
+
+<div class="fluid-row" id="header">
+
+
+
+<h1 class="title toc-ignore">Querying protein features</h1>
+<p class="author-name">Johannes Rainer</p>
+<h4 class="date"><em>4 August 2017</em></h4>
+<h4 class="package">Package</h4>
+<p>ensembldb 2.0.4</p>
+
+</div>
+
+
+<p>From Bioconductor release 3.5 on, <code>EnsDb</code> databases/packages created by the <code>ensembldb</code> package contain also, for transcripts with a coding regions, mappings between transcripts and proteins. Thus, in addition to the RNA/DNA-based features also the following protein related information is available:</p>
+<ul>
+<li><code>protein_id</code>: the Ensembl protein ID. This is the primary ID for the proteins defined in Ensembl and each (protein coding) Ensembl transcript has one protein ID assigned to it.</li>
+<li><code>protein_sequence</code>: the amino acid sequence of a protein.</li>
+<li><code>uniprot_id</code>: the Uniprot ID for a protein. Note that not every Ensembl <code>protein_id</code> has an Uniprot ID, and each <code>protein_id</code> might be mapped to several <code>uniprot_id</code>. Also, the same Uniprot ID might be mapped to different <code>protein_id</code>.</li>
+<li><code>uniprot_db</code>: the name of the Uniprot database in which the feature is annotated. Can be either <em>SPTREMBL</em> or <em>SWISSPROT</em>.</li>
+<li><code>uniprot_mapping_type</code>: the type of the mapping method that was used to assign the Uniprot ID to the Ensembl protein ID.</li>
+<li><code>protein_domain_id</code>: the ID of the protein domain according to the source/analysis in/by which is was defined.</li>
+<li><code>protein_domain_source</code>: the source of the protein domain information, one of <em>pfscan</em>, <em>scanprosite</em>, <em>superfamily</em>, <em>pfam</em>, <em>prints</em>, <em>smart</em>, <em>pirsf</em> or <em>tigrfam</em>.</li>
+<li><code>interpro_accession</code>: the Interpro accession ID of the protein domain (if available).</li>
+<li><code>prot_dom_start</code>: the start of the protein domain within the sequence of the protein.</li>
+<li><code>prot_dom_start</code>: the end position of the protein domain within the sequence of the protein.</li>
+</ul>
+<p>Thus, for protein coding transcripts, these annotations can be fetched from the database too, given that protein annotations are available. Note that only <code>EnsDb</code> databases created through the Ensembl Perl API contain protein annotation, while databases created using <code>ensDbFromAH</code>, <code>ensDbFromGff</code>, <code>ensDbFromGRanges</code> and <code>ensDbFromGtf</code> don’t.</p>
+<pre class="r"><code>library(ensembldb)
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+## Evaluate whether we have protein annotation available
+hasProteinData(edb)</code></pre>
+<p>If protein annotation is available, the additional tables and columns are also listed by the <code>listTables</code> and <code>listColumns</code> methods.</p>
+<pre class="r"><code>listTables(edb)</code></pre>
+<p>In the following sections we show examples how to 1) fetch protein annotations as additional columns to gene/transcript annotations, 2) fetch protein annotation data and 3) map proteins to the genome.</p>
+<div id="fetch-protein-annotation-for-genes-and-transcripts" class="section level1">
+<h1><span class="header-section-number">1</span> Fetch protein annotation for genes and transcripts</h1>
+<p>Protein annotations for (protein coding) transcripts can be retrieved by simply adding the desired annotation columns to the <code>columns</code> parameter of the e.g. <code>genes</code> or <code>transcripts</code> methods.</p>
+<pre class="r"><code>## Get also protein information for ZBTB16 transcripts
+txs <- transcripts(edb, filter = GenenameFilter("ZBTB16"),
+           columns = c("protein_id", "uniprot_id", "tx_biotype"))
+txs</code></pre>
+<p>The gene ZBTB16 has protein coding and non-coding transcripts, thus, we get the protein ID for the coding- and <code>NA</code> for the non-coding transcripts. Note also that we have a transcript targeted for nonsense mediated mRNA-decay with a protein ID associated with it, but no Uniprot ID.</p>
+<pre class="r"><code>## Subset to transcripts with tx_biotype other than protein_coding.
+txs[txs$tx_biotype != "protein_coding", c("uniprot_id", "tx_biotype",
+                      "protein_id")]</code></pre>
+<p>While the mapping from a protein coding transcript to a Ensembl protein ID (column <code>protein_id</code>) is 1:1, the mapping between <code>protein_id</code> and <code>uniprot_id</code> can be n:m, i.e. each Ensembl protein ID can be mapped to 1 or more Uniprot IDs and each Uniprot ID can be mapped to more than one <code>protein_id</code> (and hence <code>tx_id</code>). This should be kept in mind if querying transcripts from the database fetching Uniprot related additional columns  [...]
+<pre class="r"><code>## List the protein IDs and uniprot IDs for the coding transcripts
+mcols(txs[txs$tx_biotype == "protein_coding",
+      c("tx_id", "protein_id", "uniprot_id")])</code></pre>
+<p>Some of the n:m mappings for Uniprot IDs can be resolved by restricting either to entries from one Uniprot database (<em>SPTREMBL</em> or <em>SWISSPROT</em>) or to mappings of a certain type of mapping method. The corresponding filters are the <code>UniprotDbFilter</code> and the <code>UniprotMappingTypeFilter</code> (using the <code>uniprot_db</code> and <code>uniprot_mapping_type</code> columns of the <code>uniprot</code> database table). In the example below we restrict the result  [...]
+<pre class="r"><code>## List all uniprot mapping types in the database.
+listUniprotMappingTypes(edb)
+
+## Get all protein_coding transcripts of ZBTB16 along with their protein_id
+## and Uniprot IDs, restricting to protein_id to uniprot_id mappings based
+## on "DIRECT" mapping methods.
+txs <- transcripts(edb, filter = list(GenenameFilter("ZBTB16"),
+                      UniprotMappingTypeFilter("DIRECT")),
+           columns = c("protein_id", "uniprot_id", "uniprot_db"))
+mcols(txs)</code></pre>
+<p>For this example the use of the <code>UniprotMappingTypeFilter</code> resolved the multiple mapping of Uniprot IDs to Ensembl protein IDs, but the Uniprot ID <em>Q05516</em> is still assigned to the two Ensembl protein IDs <em>ENSP00000338157</em> and <em>ENSP00000376721</em>.</p>
+<p>All protein annotations can also be added as <em>metadata columns</em> to the results of the <code>genes</code>, <code>exons</code>, <code>exonsBy</code>, <code>transcriptsBy</code>, <code>cdsBy</code>, <code>fiveUTRsByTranscript</code> and <code>threeUTRsByTranscript</code> methods by specifying the desired column names with the <code>columns</code> parameter. For non coding transcripts <code>NA</code> will be reported in the protein annotation columns.</p>
+<p>In addition to retrieve protein annotations from the database, we can also use protein data to filter the results. In the example below we fetch for example all genes from the database that have a certain protein domain in the protein encoded by any of its transcripts.</p>
+<pre class="r"><code>## Get all genes that encode a transcript encoding for a protein that contains
+## a certain protein domain.
+gns <- genes(edb, filter = ProtDomIdFilter("PS50097"))
+length(gns)
+
+sort(gns$gene_name)</code></pre>
+<p>So, in total we got 152 genes with that protein domain. In addition to the <code>ProtDomIdFilter</code>, also the <code>ProteinidFilter</code> and the <code>UniprotidFilter</code> can be used to query the database for entries matching conditions on their protein ID or Uniprot ID.</p>
+</div>
+<div id="use-methods-from-the-annotationdbi-package-to-query-protein-annotation" class="section level1">
+<h1><span class="header-section-number">2</span> Use methods from the <code>AnnotationDbi</code> package to query protein annotation</h1>
+<p>The <code>select</code>, <code>keys</code> and <code>mapIds</code> methods from the <code>AnnotationDbi</code> package can also be used to query <code>EnsDb</code> objects for protein annotations. Supported columns and key types are returned by the <code>columns</code> and <code>keytypes</code> methods.</p>
+<pre class="r"><code>## Show all columns that are provided by the database
+columns(edb)
+
+## Show all key types/filters that are supported
+keytypes(edb)</code></pre>
+<p>Below we fetch all Uniprot IDs annotated to the gene <em>ZBTB16</em>.</p>
+<pre class="r"><code>select(edb, keys = "ZBTB16", keytype = "GENENAME",
+       columns = "UNIPROTID")</code></pre>
+<p>This returns us all Uniprot IDs of all proteins encoded by the gene’s transcripts. One of the transcripts from ZBTB16, while having a CDS and being annotated to a protein, does not have an Uniprot ID assigned (thus <code>NA</code> is returned by the above call). As we see below, this transcript is targeted for non sense mediated mRNA decay.</p>
+<pre class="r"><code>## Call select, this time providing a GenenameFilter.
+select(edb, keys = GenenameFilter("ZBTB16"),
+       columns = c("TXBIOTYPE", "UNIPROTID", "PROTEINID"))</code></pre>
+<p>Note also that we passed this time a <code>GenenameFilter</code> with the <code>keys</code> parameter.</p>
+</div>
+<div id="retrieve-proteins-from-the-database" class="section level1">
+<h1><span class="header-section-number">3</span> Retrieve proteins from the database</h1>
+<p>Proteins can be fetched using the dedicated <code>proteins</code> method that returns, unlike DNA/RNA-based methods like <code>genes</code> or <code>transcripts</code>, not a <code>GRanges</code> object by default, but a <code>DataFrame</code> object. Alternatively, results can be returned as a <code>data.frame</code> or as an <code>AAStringSet</code> object from the <code>Biobase</code> package. Note that this might change in future releases if a more appropriate object to represent  [...]
+<p>In the code chunk below we fetch all protein annotations for the gene <em>ZBTB16</em>.</p>
+<pre class="r"><code>## Get all proteins and return them as an AAStringSet
+prts <- proteins(edb, filter = GenenameFilter("ZBTB16"),
+         return.type = "AAStringSet")
+prts</code></pre>
+<p>Besides the amino acid sequence, the <code>prts</code> contains also additional annotations that can be accessed with the <code>mcols</code> method (metadata columns). All additional columns provided with the parameter <code>columns</code> are also added to the <code>mcols</code> <code>DataFrame</code>.</p>
+<pre class="r"><code>mcols(prts)</code></pre>
+<p>Note that the <code>proteins</code> method will retrieve only gene/transcript annotations of transcripts encoding a protein. Thus annotations for the non-coding transcripts of the gene <em>ZBTB16</em>, that were returned by calls to <code>genes</code> or <code>transcripts</code> in the previous section are not fetched.</p>
+<p>Querying in addition Uniprot identifiers or protein domain data will result at present in a redundant list of proteins as shown in the code block below.</p>
+<pre class="r"><code>## Get also protein domain annotations in addition to the protein annotations.
+pd <- proteins(edb, filter = GenenameFilter("ZBTB16"),
+           columns = c("tx_id", listColumns(edb, "protein_domain")),
+           return.type = "AAStringSet")
+pd</code></pre>
+<p>The result contains one row/element for each protein domain in each of the proteins. The number of protein domains per protein and the <code>mcols</code> are shown below.</p>
+<pre class="r"><code>## The number of protein domains per protein:
+table(names(pd))
+
+## The mcols
+mcols(pd)</code></pre>
+<p>As we can see each protein can have several protein domains with the start and end coordinates within the amino acid sequence being reported in columns <code>prot_dom_start</code> and <code>prot_dom_end</code>. Also, not all Ensembl protein IDs, like <code>protein_id</code> <em>ENSP00000445047</em> are mapped to an Uniprot ID or have protein domains.</p>
+</div>
+<div id="map-peptide-features-within-proteins-to-the-genome" class="section level1">
+<h1><span class="header-section-number">4</span> Map peptide features within proteins to the genome</h1>
+<p>Functionality to map peptide features (i.e. ranges within the amino acid sequence of the protein) to genomic coordinates are provided by the <code>Pbase</code> Bioconductor package. These rely in part on the protein annotations provided by <code>EnsDb</code> databases. See the corresponding vignette <em>Pbase-with-ensembldb</em> in that package.</p>
+</div>
+
+
+
+</div>
+</div>
+
+</div>
+
+<script>
+
+// add bootstrap table styles to pandoc tables
+function bootstrapStylePandocTables() {
+  $('tr.header').parent('thead').parent('table').addClass('table table-condensed');
+}
+$(document).ready(function () {
+  bootstrapStylePandocTables();
+});
+
+
+</script>
+
+<script type="text/x-mathjax-config">
+  MathJax.Hub.Config({
+    "HTML-CSS": {
+      styles: {
+        ".MathJax_Display": {
+           "text-align": "center",
+           padding: "0px 150px 0px 65px",
+           margin: "0px 0px 0.5em"
+        },
+      }
+    }
+  });
+</script>
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    script.src  = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+
+</body>
+</html>
diff --git a/inst/extended_tests/extended_tests.R b/inst/extended_tests/extended_tests.R
new file mode 100644
index 0000000..d57960f
--- /dev/null
+++ b/inst/extended_tests/extended_tests.R
@@ -0,0 +1,855 @@
+## This script comprises extended tests.
+##*****************************************************************
+## Gviz stuff
+notrun_test_genetrack_df <- function(){
+    do.plot <- FALSE
+    if(do.plot){
+        ##library(Gviz)
+        options(ucscChromosomeNames=FALSE)
+        data(geneModels)
+        geneModels$chromosome <- 7
+        chr <- 7
+        start <- min(geneModels$start)
+        end <- max(geneModels$end)
+        myGeneModels <- getGeneRegionTrackForGviz(edb, chromosome=chr,
+                                                  start=start,
+                                                  end=end)
+        ## chromosome has to be the same....
+        gtrack <- GenomeAxisTrack()
+        gvizTrack <- GeneRegionTrack(geneModels, name="Gviz")
+        ensdbTrack <- GeneRegionTrack(myGeneModels, name="ensdb")
+        plotTracks(list(gtrack, gvizTrack, ensdbTrack))
+        plotTracks(list(gtrack, gvizTrack, ensdbTrack), from=26700000,
+                   to=26780000)
+        ## Looks very nice...
+    }
+    ## Put the stuff below into the vignette:
+    ## Next we get all lincRNAs on chromosome Y
+    Lncs <- getGeneRegionTrackForGviz(edb,
+                                      filter=list(SeqNameFilter("Y"),
+                                                  GeneBiotypeFilter("lincRNA")))
+    Prots <- getGeneRegionTrackForGviz(edb,
+                                       filter=list(SeqNameFilter("Y"),
+                                                   GeneBiotypeFilter("protein_coding")))
+    if(do.plot){
+        plotTracks(list(gtrack, GeneRegionTrack(Lncs, name="lincRNAs"),
+                        GeneRegionTrack(Prots, name="proteins")))
+        plotTracks(list(gtrack, GeneRegionTrack(Lncs, name="lincRNAs"),
+                        GeneRegionTrack(Prots, name="proteins")),
+                   from=5000000, to=7000000, transcriptAnnotation="symbol")
+    }
+    ## is that the same than:
+    TestL <- getGeneRegionTrackForGviz(edb,
+                                       filter=list(GeneBiotypeFilter("lincRNA")),
+                                       chromosome="Y", start=5000000, end=7000000)
+    TestP <- getGeneRegionTrackForGviz(edb,
+                                       filter=list(GeneBiotypeFilter("protein_coding")),
+                                       chromosome="Y", start=5000000, end=7000000)
+    if(do.plot){
+        plotTracks(list(gtrack, GeneRegionTrack(Lncs, name="lincRNAs"),
+                        GeneRegionTrack(Prots, name="proteins"),
+                        GeneRegionTrack(TestL, name="compareL"),
+                        GeneRegionTrack(TestP, name="compareP")),
+                   from=5000000, to=7000000, transcriptAnnotation="symbol")
+    }
+    expect_true(all(TestL$exon %in% Lncs$exon))
+    expect_true(all(TestP$exon %in% Prots$exon))
+    ## Crazy amazing stuff
+    ## system.time(
+    ##     All <- getGeneRegionTrackForGviz(edb)
+    ## )
+}
+
+
+
+notrun_test_getSeqlengthsFromMysqlFolder <- function() {
+    ## Test this for some more seqlengths.
+    library(EnsDb.Rnorvegicus.v79)
+    db <- EnsDb.Rnorvegicus.v79
+    seq_info <- seqinfo(db)
+    seq_lengths <- ensembldb:::.getSeqlengthsFromMysqlFolder(
+        organism = "Rattus norvegicus", ensembl = 79,
+        seqnames = seqlevels(seq_info))
+    sl <- seqlengths(seq_info)
+    sl_2 <- seq_lengths$length
+    names(sl_2) <- rownames(seq_lengths)
+    checkEquals(sl, sl_2)
+    ## Mus musculus
+}
+
+notrun_test_ensDbFromGtf_Gff_AH <- function() {
+    gtf <- paste0("/Users/jo/Projects/EnsDbs/80/caenorhabditis_elegans/",
+                  "Caenorhabditis_elegans.WBcel235.80.gtf.gz")
+    outf <- tempfile()
+    db <- ensDbFromGtf(gtf = gtf, outfile = outf)
+    ## use Gff
+    gff <- paste0("/Users/jo/Projects/EnsDbs/84/canis_familiaris/gff3/",
+                  "Canis_familiaris.CanFam3.1.84.gff3.gz")
+    outf <- tempfile()
+    db <- ensDbFromGff(gff, outfile = outf)
+
+    ## Checking one from ensemblgenomes:
+    gtf <- paste0("/Users/jo/Projects/EnsDbs/ensemblgenomes/30/",
+                  "solanum_lycopersicum/",
+                  "Solanum_lycopersicum.GCA_000188115.2.30.chr.gtf.gz"
+                  )
+    outf <- tempfile()
+    db <- ensDbFromGtf(gtf = gtf, outfile = outf)
+    gtf <- paste0("/Users/jo/Projects/EnsDbs/ensemblgenomes/30/",
+                  "solanum_lycopersicum/",
+                  "Solanum_lycopersicum.GCA_000188115.2.30.gtf.gz"
+                  )
+    outf <- tempfile()
+    db <- ensDbFromGtf(gtf = gtf, outfile = outf)
+
+    ## AH
+    library(AnnotationHub)
+    ah <- AnnotationHub()
+    query(ah, c("release-83", "gtf"))
+    ah_1 <- ah["AH50418"]
+    db <- ensDbFromAH(ah_1, outfile = outf)
+    ah_2 <- ah["AH50352"]
+    db <- ensDbFromAH(ah_2, outfile = outf)
+}
+
+notrun_test_builds <- function(){
+    input <- "/Users/jo/Projects/EnsDbs/83/Homo_sapiens.GRCh38.83.gtf.gz"
+    fromGtf <- ensDbFromGtf(input, outfile=tempfile())
+    ## provide wrong ensembl version
+    fromGtf <- ensDbFromGtf(input, outfile=tempfile(), version="75")
+    ## provide wrong genome version
+    fromGtf <- ensDbFromGtf(input, outfile=tempfile(), genomeVersion="75")
+    EnsDb(fromGtf)
+    ## provide wrong organism
+    fromGtf <- ensDbFromGtf(input, outfile=tempfile(), organism="blalba")
+    EnsDb(fromGtf)
+    ## GFF
+    input <- "/Users/jo/Projects/EnsDbs/83/Homo_sapiens.GRCh38.83.chr.gff3.gz"
+    fromGff <- ensDbFromGff(input, outfile=tempfile())
+    EnsDb(fromGff)
+    fromGff <- ensDbFromGff(input, outfile=tempfile(), version="75")
+    EnsDb(fromGff)
+    fromGff <- ensDbFromGff(input, outfile=tempfile(), genomeVersion="bla")
+    EnsDb(fromGff)
+    fromGff <- ensDbFromGff(input, outfile=tempfile(), organism="blabla")
+    EnsDb(fromGff)
+
+    ## AH
+    library(AnnotationHub)
+    ah <- AnnotationHub()
+    fromAh <- ensDbFromAH(ah["AH47963"], outfile=tempfile())
+    EnsDb(fromAH)
+    fromAh <- ensDbFromAH(ah["AH47963"], outfile=tempfile(), version="75")
+    EnsDb(fromAH)
+    fromAh <- ensDbFromAH(ah["AH47963"], outfile=tempfile(), genomeVersion="bla")
+    EnsDb(fromAH)
+    fromAh <- ensDbFromAH(ah["AH47963"], outfile=tempfile(), organism="blabla")
+    EnsDb(fromAH)
+}
+
+
+
+notrun_test_ensdbFromGFF <- function(){
+    library(ensembldb)
+    ##library(rtracklayer)
+    ## VERSION 83
+    gtf <- "/Users/jo/Projects/EnsDbs/83/Homo_sapiens.GRCh38.83.gtf.gz"
+    fromGtf <- ensDbFromGtf(gtf, outfile=tempfile())
+    egtf <- EnsDb(fromGtf)
+
+    gff <- "/Users/jo/Projects/EnsDbs/83/Homo_sapiens.GRCh38.83.gff3.gz"
+    fromGff <- ensDbFromGff(gff, outfile=tempfile())
+    egff <- EnsDb(fromGff)
+
+    ## Compare EnsDbs
+    ensembldb:::compareEnsDbs(egtf, egff)
+    ## OK, only Entrezgene ID "problems"
+
+    ## Compare with the one built with the Perl API
+    library(EnsDb.Hsapiens.v83)
+    db <- EnsDb.Hsapiens.v83
+
+    ensembldb:::compareEnsDbs(egtf, edb)
+
+    ensembldb:::compareEnsDbs(egff, edb)
+    ## OK, I get different genes...
+    genes1 <- genes(egtf)
+    genes2 <- genes(edb)
+
+    only2 <- genes2[!(genes2$gene_id %in% genes1$gene_id)]
+
+    ## That below was before the fix to include feature type start_codon and stop_codon
+    ## to the CDS type.
+    ## Identify which are the different transcripts:
+    txGtf <- transcripts(egtf)
+    txGff <- transcripts(egff)
+    commonIds <- intersect(names(txGtf), names(txGff))
+    haveCds <- commonIds[!is.na(txGtf[commonIds]$tx_cds_seq_start) & !is.na(txGff[commonIds]$tx_cds_seq_start)]
+    diffs <- haveCds[txGtf[haveCds]$tx_cds_seq_start != txGff[haveCds]$tx_cds_seq_start]
+    head(diffs)
+
+    ## What could be reasons?
+    ## 1) alternative CDS?
+    ## Checking the GTF:
+    ## tx ENST00000623834: start_codon: 195409 195411.
+    ##                     first CDS: 195259 195411.
+    ##                     last CDS: 185220 185350.
+    ##                     stop_codon: 185217 185219.
+    ## So, why the heck is the stop codon OUTSIDE the CDS???
+    ## library(rtracklayer)
+    ## theGtf <- import(gtf, format="gtf")
+    ## ## Apparently, the GTF contains the additional elements start_codon/stop_codon.
+    ## theGff <- import(gff, format="gff3")
+
+
+    ## transcripts(egtf, filter=TxIdFilter(diffs[1]))
+    ## transcripts(egff, filter=TxIdFilter(diffs[1]))
+
+
+    ## VERSION 81
+    ## Try to get the same via AnnotationHub
+    gff <- "/Users/jo/Projects/EnsDbs/81/homo_sapiens/Homo_sapiens.GRCh38.81.gff3.gz"
+    fromGff <- ensDbFromGff(gff, outfile=tempfile())
+    egff <- EnsDb(fromGff)
+
+    gtf <- "/Users/jo/Projects/EnsDbs/81/homo_sapiens/Homo_sapiens.GRCh38.81.gtf.gz"
+    fromGtf <- ensDbFromGtf(gtf, outfile=tempfile())
+    egtf <- EnsDb(fromGtf)
+
+    ## Compare those two:
+    ensembldb:::compareEnsDbs(egff, egtf)
+    ## Why are there some differences in the transcripts???
+    trans1 <- transcripts(egff)
+    trans2 <- transcripts(egtf)
+    onlyInGtf <- trans2[!(trans2$tx_id %in% trans1$tx_id)]
+
+    ##gtfGRanges <- ah["AH47963"]
+
+    library(AnnotationHub)
+    ah <- AnnotationHub()
+    fromAh <- ensDbFromAH(ah["AH47963"], outfile=tempfile())  ## That's human...
+    eah <- EnsDb(fromAh)
+
+    ## Compare it to gtf:
+    ensembldb:::compareEnsDbs(eah, egtf)
+    ## OK. Same cds starts and cds ends.
+
+    ## Compare it to gff:
+    ensembldb:::compareEnsDbs(eah, egff)
+    ## hm.
+
+    ## Compare to EnsDb
+    library(EnsDb.Hsapiens.v81)
+    edb <- EnsDb.Hsapiens.v81
+    ensembldb:::compareEnsDbs(edb, egtf)
+    ## Problem with CDS
+    ensembldb:::compareEnsDbs(edb, egff)
+    ## That's fine.
+
+    ## Summary:
+    ## GTF and AH are the same.
+    ## GFF and Perl API are the same.
+
+    ## OLD STUFF BELOW.
+
+    ##fromAh <- EnsDbFromAH(ah["AH47963"], outfile=tempfile(), organism="Homo sapiens", version=81)
+
+    ## Try with a fancy species:
+    gff <- "/Users/jo/Projects/EnsDbs/83/gadus_morhua/Gadus_morhua.gadMor1.83.gff3.gz"
+    fromGtf <- ensDbFromGff(gff, outfile=tempfile())
+
+    gff <- "/Users/jo/Projects/EnsDbs/83/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.83.gff3.gz"
+    fromGff <- ensDbFromGff(gff, outfile=tempfile())
+    ## That works.
+
+    ## Try with a file from AnnotationHub: Gorilla gorilla.
+    library(AnnotationHub)
+    ah <- AnnotationHub()
+    ah <- ah["AH47962"]
+
+    res <- ensDbFromAH(ah, outfile=tempfile())
+    edb <- EnsDb(res)
+    genes(edb)
+
+
+    ## ensRel <- query(ah, c("GTF", "ensembl"))
+
+    ## gtf <- "/Users/jo/Projects/EnsDbs/83/Homo_sapiens.GRCh38.83.gtf.gz"
+    ## ## GTF
+    ## dir.create("/tmp/fromGtf")
+    ## fromGtf <- ensDbFromGtf(gtf, path="/tmp/fromGtf", verbose=TRUE)
+    ## ## GFF
+    ## dir.create("/tmp/fromGff")
+    ## fromGff <- ensembldb:::ensDbFromGff(gff, path="/tmp/fromGff", verbose=TRUE)
+
+    ## ## ZBTB16:
+    ## ## exon: ENSE00003606532 is 3rd exon of tx: ENST00000335953
+    ## ## exon: ENSE00003606532 is 3rd exon of tx: ENST00000392996
+    ## ## the Ensembl GFF has 2 entries for this exon.
+
+}
+
+
+
+############################################################
+## Can not perform these tests right away, as they require a
+## working MySQL connection.
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+dontrun_test_useMySQL <- function() {
+    edb_mysql <- useMySQL(edb, user = "anonuser", host = "localhost", pass = "")
+}
+
+dontrun_test_connect_EnsDb <- function() {
+    library(RMySQL)
+    con <- dbConnect(MySQL(), user = "anonuser", host = "localhost", pass = "")
+
+    ensembldb:::listEnsDbs(dbcon = con)
+    ## just with user.
+    ensembldb:::listEnsDbs(user = "anonuser", host = "localhost", pass = "",
+                           port = 3306)
+
+    ## Connecting directly to a EnsDb MySQL database.
+    con <- dbConnect(MySQL(), user = "anonuser", host = "localhost", pass = "",
+                     dbname = "ensdb_hsapiens_v75")
+    edb_mysql <- EnsDb(con)
+}
+
+notrun_compareEnsDbs <- function() {
+    res <- ensembldb:::compareEnsDbs(edb, edb)
+}
+
+############################################################
+## Massive test validating the cds:
+## compare the length of the CDS with the length of the encoded protein.
+## Get the CDS sequence, translate that and compare to protein sequence.
+notrun_massive_cds_test <- function() {
+    ## Get all CDS:
+    tx_cds <- cdsBy(edb, by = "tx", filter = SeqNameFilter(c(1:22, "X", "Y")))
+    prots <- proteins(edb, filter = TxIdFilter(names(tx_cds)),
+                      return.type = "AAStringSet")
+    checkTrue(all(names(tx_cds) %in% mcols(prots)$tx_id))
+    tx_cds <- tx_cds[mcols(prots)$tx_id]
+    ## Check that the length of the protein sequence is length of CDS/3
+    diff_width <- sum(width(tx_cds)) != width(prots) * 3
+    ## Why??? I've got some many differences here???
+    sum(diff_width)
+    ## Check some of them manually in Ensembl
+
+    ## 1st: - strand.
+    tx_1 <- tx_cds[diff_width][1]
+    ## Protein: 245aa
+    prots[diff_width][1]
+    ## OK.
+    ## Tx 2206bp:
+    exns <- exonsBy(edb, filter = TxIdFilter(names(tx_1)))
+    sum(width(exns))
+    ## OK.
+    ## Now to the CDS:
+    cds_ex1 <- "ATGGCGTCCCCGTCTCGGAGACTGCAGACTAAACCAGTCATTACTTGTTTCAAGAGCGTTCTGCTAATCTACACTTTTATTTTCTGG"
+    cds_ex2 <- "ATCACTGGCGTTATCCTTCTTGCAGTTGGCATTTGGGGCAAGGTGAGCCTGGAGAATTACTTTTCTCTTTTAAATGAGAAGGCCACCAATGTCCCCTTCGTGCTCATTGCTACTGGTACCGTCATTATTCTTTTGGGCACCTTTGGTTGTTTTGCTACCTGCCGAGCTTCTGCATGGATGCTAAAACTG"
+    cds_ex3 <- "TATGCAATGTTTCTGACTCTCGTTTTTTTGGTCGAACTGGTCGCTGCCATCGTAGGATTTGTTTTCAGACATGAG"
+    cds_ex4 <- "ATTAAGAACAGCTTTAAGAATAATTATGAGAAGGCTTTGAAGCAGTATAACTCTACAGGAGATTATAGAAGCCATGCAGTAGACAAGATCCAAAATACG"
+    cds_ex5 <- "TTGCATTGTTGTGGTGTCACCGATTATAGAGATTGGACAGATACTAATTATTACTCAGAAAAAGGATTTCCTAAGAGTTGCTGTAAACTTGAAGATTGTACTCCACAGAGAGATGCAGACAAAGTAAACAATGAA"
+    cds_ex6 <- "GGTTGTTTTATAAAGGTGATGACCATTATAGAGTCAGAAATGGGAGTCGTTGCAGGAATTTCCTTTGGAGTTGCTTGCTTCCAA"
+    cds_ex7 <- "CTGATTGGAATCTTTCTCGCCTACTGCCTCTCTCGTGCCATAACAAATAACCAGTATGAGATAGTGTAA"
+    cds_seq <- c(cds_ex1, cds_ex2, cds_ex3, cds_ex4, cds_ex5, cds_ex6, cds_ex7)
+    nchar(cds_seq)
+    width(tx_1)
+    ## The length should be identical:
+    checkEquals(sum(nchar(cds_seq)), sum(width(tx_1)), checkNames = FALSE)
+    ## OK; so WHAT???
+    sum(width(tx_1)) / 3
+    ## So, start codon is encoded into a methionine.
+    ## Stop codon is either a TAA, TGA or TAG. UAG can be encoded into Sec (U), UAG into Pyl (O)
+    library(Biostrings)
+    dna_s <- DNAString(paste0(cds_ex1, cds_ex2, cds_ex3, cds_ex4, cds_ex5, cds_ex6, cds_ex7))
+    translate(dna_s)
+    ## Look at that!
+    translate(DNAString("TAA")) ## -> translates into *
+    translate(DNAString("TGA")) ## -> translates into *
+    translate(DNAString("TAG")) ## -> translates into *
+    translate(DNAString("ATG")) ## -> translates into M
+
+    ## Assumption:
+    ## If the mRNA ends with a TAA, the protein sequence will be 1aa shorter than
+    ## length(CDS)/3.
+    ## If the mRNA ends with a TAG, UAG the AA length is length(CDS)/3
+
+    ## Check one of the mRNA where it fits:
+    tx_2 <- tx_cds[!diff_width][1]
+    prots[!diff_width][1]
+    ## AA is 137 long, ends with I.
+    sum(width(tx_2)) / 3  ## OK
+    ## Check Ensembl:
+    tx_2_1 <- "ATGCTAAAACTG"
+    tx_2_2 <- "TATGCAATGTTTCTGACTCTCGTTTTTTTGGTCGAACTGGTCGCTGCCATCGTAGGATTTGTTTTCAGACATGAG"
+    tx_2_3 <- "ATTAAGAACAGCTTTAAGAATAATTATGAGAAGGCTTTGAAGCAGTATAACTCTACAGGAGATTATAGAAGCCATGCAGTAGACAAGATCCAAAATACG"
+    tx_2_4 <- "TTGCATTGTTGTGGTGTCACCGATTATAGAGATTGGACAGATACTAATTATTACTCAGAAAAAGGATTTCCTAAGAGTTGCTGTAAACTTGAAGATTGTACTCCACAGAGAGATGCAGACAAAGTAAACAATGAA"
+    tx_2_5 <- "GGTTGTTTTATAAAGGTGATGACCATTATAGAGTCAGAAATGGGAGTCGTTGCAGGAATTTCCTTTGGAGTTGCTTGCTTCCAA"
+    tx_2_6 <- "GACATT"
+    tx_2_cds <- paste0(tx_2_1, tx_2_2, tx_2_3, tx_2_4, tx_2_5, tx_2_6)
+    nchar(tx_2_cds)
+    sum(width(tx_2))
+    ## OK.
+    translate(DNAString(tx_2_cds))
+
+    ## Next assumption:
+    ## If we don't have a 3' UTR the AA sequence corresponds to length(CDS)/3
+    tx_cds <- cdsBy(edb, by = "tx", filter = SeqNameFilter(c(1:22, "X", "Y")),
+                    columns = c("tx_seq_start", "tx_seq_end", "tx_cds_seq_start",
+                                "tx_cds_seq_end"))
+    prots <- proteins(edb, filter = TxIdFilter(names(tx_cds)),
+                      return.type = "AAStringSet")
+    checkTrue(all(names(tx_cds) %in% mcols(prots)$tx_id))
+    tx_cds <- tx_cds[mcols(prots)$tx_id]
+    ## Calculate the CDS width.
+    tx_cds_width <- sum(width(tx_cds))
+    txs <- transcripts(edb, filter = TxIdFilter(names(tx_cds)))
+    txs <- txs[names(tx_cds)]
+    ## Subtract 3 from the width if we've got an 3'UTR.
+    to_subtract <- rep(3, length(tx_cds_width))
+    to_subtract[((end(txs) == txs$tx_cds_seq_end) &
+                 as.logical(strand(txs) == "+"))
+                | ((start(txs) == txs$tx_cds_seq_start)
+                    & as.logical(strand(txs) == "-"))] <- 0
+    tx_cds_width <- tx_cds_width - to_subtract
+    ## Check that the length of the protein sequence is length of CDS/3
+    diff_width <- tx_cds_width != width(prots) * 3
+    ## Why??? I've got some many differences here???
+    sum(diff_width)
+    length(diff_width)
+    ## AAAA, still have some that don't fit!!!
+    tx_3 <- tx_cds[diff_width][1]
+    prots[diff_width][1]
+    ## AA is 259aa long, ends with T., TX is: ENST00000371584
+    ## WTF, we've got no START CODON!!!
+    sum(width(tx_3)) / 3 ## OMG!!!
+
+    ## Now, exclude those without a 5' UTR:
+    no_five <- ((start(txs) == txs$tx_cds_seq_start) &
+                as.logical(strand(txs) == "+")) |
+        ((end(txs) == txs$tx_cds_seq_end) &
+         as.logical(strand(txs) == "-"))
+    still_prot <- (diff_width & !no_five)
+}
+
+notrun_test_getGenomeFaFile <- function(){
+    library(EnsDb.Hsapiens.v82)
+    edb <- EnsDb.Hsapiens.v82
+
+    ## We know that there is no Fasta file for that Ensembl release available.
+    Fa <- getGenomeFaFile(edb)
+    ## Got the one from Ensembl 81.
+    genes <- genes(edb, filter=SeqNameFilter("Y"))
+    geneSeqsFa <- getSeq(Fa, genes)
+    ## Get the transcript sequences...
+    txSeqsFa <- extractTranscriptSeqs(Fa, edb, filter=SeqNameFilter("Y"))
+
+    ## Get the TwoBitFile.
+    twob <- ensembldb:::getGenomeTwoBitFile(edb)
+    ## Get thegene sequences.
+    ## ERROR FIX BELOW WITH UPDATED VERSIONS!!!
+    geneSeqs2b <- getSeq(twob, genes)
+
+    ## Have to fix the seqnames.
+    si <- seqinfo(twob)
+    sn <- unlist(lapply(strsplit(seqnames(si), split=" ", fixed=TRUE), function(z){
+        return(z[1])
+    }))
+    seqnames(si) <- sn
+    seqinfo(twob) <- si
+
+    ## Do the same with the TwoBitFile
+    geneSeqsTB <- getSeq(twob, genes)
+
+    ## Subset to all genes that are encoded on chromosomes for which
+    ## we do have DNA sequence available.
+    genes <- genes[seqnames(genes) %in% seqnames(seqinfo(Dna))]
+
+    ## Get the gene sequences, i.e. the sequence including the sequence of
+    ## all of the gene's exons and introns.
+    geneSeqs <- getSeq(Dna, genes)
+
+    library(AnnotationHub)
+    ah <- AnnotationHub()
+    quer <- query(ah, c("release-", "Homo sapiens"))
+    ## So, I get 2bit files and toplevel stuff.
+    Test <- ah[["AH50068"]]
+
+}
+
+
+
+notrun_test_extractTranscriptSeqs <- function(){
+    ## Note: we can't run that by default as we can not assume everybody has
+    ## AnnotationHub and the required ressource installed.
+    ## That's how we want to test the transcript seqs.
+    genome <- getGenomeFaFile(edb)
+    ZBTB <- extractTranscriptSeqs(genome, edb, filter=GenenameFilter("ZBTB16"))
+    ## Load the sequences for one ZBTB16 transcript from FA.
+    faf <- system.file("txt/ENST00000335953.fa.gz", package="ensembldb")
+    Seqs <- readDNAStringSet(faf)
+    tx <- "ENST00000335953"
+    ## cDNA
+    checkEquals(unname(as.character(ZBTB[tx])),
+                unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
+    ## CDS
+    cBy <- cdsBy(edb, "tx", filter=TxIdFilter(tx))
+    CDS <- extractTranscriptSeqs(genome, cBy)
+    checkEquals(unname(as.character(CDS)),
+                unname(as.character(Seqs[grep(names(Seqs), pattern="cds")])))
+    ## 5' UTR
+    fBy <- fiveUTRsByTranscript(edb, filter=TxIdFilter(tx))
+    UTR <- extractTranscriptSeqs(genome, fBy)
+    checkEquals(unname(as.character(UTR)),
+                unname(as.character(Seqs[grep(names(Seqs), pattern="utr5")])))
+    ## 3' UTR
+    tBy <- threeUTRsByTranscript(edb, filter=TxIdFilter(tx))
+    UTR <- extractTranscriptSeqs(genome, tBy)
+    checkEquals(unname(as.character(UTR)),
+                unname(as.character(Seqs[grep(names(Seqs), pattern="utr3")])))
+
+
+    ## Another gene on the reverse strand:
+    faf <- system.file("txt/ENST00000200135.fa.gz", package="ensembldb")
+    Seqs <- readDNAStringSet(faf)
+    tx <- "ENST00000200135"
+    ## cDNA
+    cDNA <- extractTranscriptSeqs(genome, edb, filter=TxIdFilter(tx))
+    checkEquals(unname(as.character(cDNA)),
+                unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
+    ## do the same, but from other strand
+    exns <- exonsBy(edb, "tx", filter=TxIdFilter(tx))
+    cDNA <- extractTranscriptSeqs(genome, exns)
+    checkEquals(unname(as.character(cDNA)),
+                unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
+    strand(exns) <- "+"
+    cDNA <- extractTranscriptSeqs(genome, exns)
+    checkTrue(unname(as.character(cDNA)) !=
+              unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
+    ## CDS
+    cBy <- cdsBy(edb, "tx", filter=TxIdFilter(tx))
+    CDS <- extractTranscriptSeqs(genome, cBy)
+    checkEquals(unname(as.character(CDS)),
+                unname(as.character(Seqs[grep(names(Seqs), pattern="cds")])))
+    ## 5' UTR
+    fBy <- fiveUTRsByTranscript(edb, filter=TxIdFilter(tx))
+    UTR <- extractTranscriptSeqs(genome, fBy)
+    checkEquals(unname(as.character(UTR)),
+                unname(as.character(Seqs[grep(names(Seqs), pattern="utr5")])))
+    ## 3' UTR
+    tBy <- threeUTRsByTranscript(edb, filter=TxIdFilter(tx))
+    UTR <- extractTranscriptSeqs(genome, tBy)
+    checkEquals(unname(as.character(UTR)),
+                unname(as.character(Seqs[grep(names(Seqs), pattern="utr3")])))
+}
+
+notrun_test_getCdsSequence <- function(){
+    ## That's when we like to get the sequence from the coding region.
+    genome <- getGenomeFaFile(edb)
+    tx <- extractTranscriptSeqs(genome, edb, filter=SeqNameFilter("Y"))
+    cdsSeq <- extractTranscriptSeqs(genome, cdsBy(edb, filter=SeqNameFilter("Y")))
+    ## that's basically to get the CDS sequence.
+    ## UTR sequence:
+    tutr <- extractTranscriptSeqs(genome, threeUTRsByTranscript(edb, filter=SeqNameFilter("Y")))
+    futr <- extractTranscriptSeqs(genome, fiveUTRsByTranscript(edb, filter=SeqNameFilter("Y")))
+    theTx <- "ENST00000602770"
+    fullSeq <- as.character(tx[theTx])
+    ## build the one from 5', cds and 3'
+    compSeq <- ""
+    if(any(names(futr) == theTx))
+        compSeq <- paste0(compSeq, as.character(futr[theTx]))
+    if(any(names(cdsSeq) == theTx))
+        compSeq <- paste0(compSeq, as.character(cdsSeq[theTx]))
+    if(any(names(tutr) == theTx))
+        compSeq <- paste(compSeq, as.character(tutr[theTx]))
+    checkEquals(unname(fullSeq), compSeq)
+}
+
+notrun_test_cds <- function(){
+    library(TxDb.Hsapiens.UCSC.hg19.knownGene)
+    txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
+    cds <- cds(txdb)
+    cby <- cdsBy(txdb, by="tx")
+
+    gr <- cby[[7]][1]
+    seqlevels(gr) <- sub(seqlevels(gr), pattern="chr", replacement="")
+    tx <- transcripts(edb, filter=GRangesFilter(gr, condition="overlapping"))
+    cby[[7]]
+
+    ## Note: so that fits! And we've to include the stop_codon feature for GTF import!
+    ## Make an TxDb from GTF:
+    gtf <- "/Users/jo/Projects/EnsDbs/75/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz"
+    library(GenomicFeatures)
+    Test <- makeTxDbFromGFF(gtf, format="gtf", organism="Homo sapiens")
+    scds <- cdsBy(Test, by="tx")
+    gr <- scds[[7]][1]
+    tx <- transcripts(edb, filter=GRangesFilter(gr, condition="overlapping"))
+    scds[[7]]
+    ## Compare:
+    ## TxDb form GTF has: 865692 879533
+    ## EnsDb: 865692 879533
+
+    ## Next test:
+    gr <- scds[[2]][1]
+    tx <- transcripts(edb, filter=GRangesFilter(gr, condition="overlapping"))
+    tx
+    scds[[2]]
+    ## start_codon: 367659 367661, stop_codon: 368595 368597 CDS: 367659 368594.
+    ## TxDb from GTF includes the stop_codon!
+}
+
+
+dontrun_benchmark_ordering_genes <- function() {
+    .withR <- function(x, ...) {
+        ensembldb:::orderResultsInR(x) <- TRUE
+        genes(x, ...)
+    }
+    .withSQL <- function(x, ...) {
+        ensembldb:::orderResultsInR(x) <- FALSE
+        genes(x, ...)
+    }
+    library(microbenchmark)
+    microbenchmark(.withR(edb), .withSQL(edb), times = 10)  ## same
+    microbenchmark(.withR(edb, columns = c("gene_id", "tx_id")),
+                   .withSQL(edb, columns = c("gene_id", "tx_id")),
+                   times = 10)  ## R slightly faster.
+    microbenchmark(.withR(edb, columns = c("gene_id", "tx_id"),
+                          SeqNameFilter("Y")),
+                   .withSQL(edb, columns = c("gene_id", "tx_id"),
+                            SeqNameFilter("Y")),
+                   times = 10)  ## same.
+}
+
+## We aim to fix issue #11 by performing the ordering in R instead
+## of SQL. Thus, we don't want to run this as a "regular" test
+## case.
+dontrun_test_ordering_cdsBy <- function() {
+    doBench <- FALSE
+    if (doBench)
+        library(microbenchmark)
+    .withR <- function(x, ...) {
+        ensembldb:::orderResultsInR(x) <- TRUE
+        cdsBy(x, ...)
+    }
+    .withSQL <- function(x, ...) {
+        ensembldb:::orderResultsInR(x) <- FALSE
+        cdsBy(x, ...)
+    }
+    res_sql <- .withSQL(edb)
+    res_r <- .withR(edb)
+    checkEquals(res_sql, res_r)
+    if (dobench)
+        microbenchmark(.withSQL(edb), .withR(edb),
+                       times = 3)  ## R slightly faster.
+    res_sql <- .withSQL(edb, filter = SeqNameFilter("Y"))
+    res_r <- .withR(edb, filter = SeqNameFilter("Y"))
+    checkEquals(res_sql, res_r)
+    if (dobench)
+        microbenchmark(.withSQL(edb, filter = SeqNameFilter("Y")),
+                       .withR(edb, filter = SeqNameFilter("Y")),
+                       times = 10)  ## R 6x faster.
+}
+
+dontrun_test_ordering_exonsBy <- function() {
+    doBench <- FALSE
+    if (doBench)
+        library(microbenchmark)
+    .withR <- function(x, ...) {
+        ensembldb:::orderResultsInR(x) <- TRUE
+        exonsBy(x, ...)
+    }
+    .withSQL <- function(x, ...) {
+        ensembldb:::orderResultsInR(x) <- FALSE
+        exonsBy(x, ...)
+    }
+    res_sql <- .withSQL(edb)
+    res_r <- .withR(edb)
+    checkEquals(res_sql, res_r)
+    if (doBench)
+        microbenchmark(.withSQL(edb), .withR(edb),
+                       times = 3)  ## about the same; R slightly faster.
+    ## with using a SeqNameFilter in addition.
+    res_sql <- .withSQL(edb, filter = SeqNameFilter("Y"))
+    res_r <- .withR(edb, filter = SeqNameFilter("Y")) ## query takes longer.
+    checkEquals(res_sql, res_r)
+    if (doBench)
+        microbenchmark(.withSQL(edb, filter = SeqNameFilter("Y")),
+                       .withR(edb, filter = SeqNameFilter("Y")),
+                       times = 3)  ## SQL twice as fast.
+    ## Now getting stuff by gene
+    res_sql <- .withSQL(edb, by = "gene")
+    res_r <- .withR(edb, by = "gene")
+    ## checkEquals(res_sql, res_r) ## Differences due to ties
+    if (doBench)
+        microbenchmark(.withSQL(edb, by = "gene"),
+                       .withR(edb, by = "gene"),
+                       times = 3)  ## SQL faster; ???
+    ## Along with a SeqNameFilter
+    res_sql <- .withSQL(edb, by = "gene", filter = SeqNameFilter("Y"))
+    res_r <- .withR(edb, by = "gene", filter = SeqNameFilter("Y"))
+    ## Why does the query take longer for R???
+    ## checkEquals(res_sql, res_r) ## Differences due to ties
+    if (doBench)
+        microbenchmark(.withSQL(edb, by = "gene", filter = SeqNameFilter("Y")),
+                       .withR(edb, by = "gene", filter = SeqNameFilter("Y")),
+                       times = 3)  ## SQL faster.
+    ## Along with a GeneBiotypeFilter
+    if (doBench)
+        microbenchmark(.withSQL(edb, by = "gene", filter = GeneBiotypeFilter("protein_coding"))
+                     , .withR(edb, by = "gene", filter = GeneBiotypeFilter("protein_coding"))
+                     , times = 3)
+}
+
+dontrun_test_ordering_transcriptsBy <- function() {
+    .withR <- function(x, ...) {
+        ensembldb:::orderResultsInR(x) <- TRUE
+        transcriptsBy(x, ...)
+    }
+    .withSQL <- function(x, ...) {
+        ensembldb:::orderResultsInR(x) <- FALSE
+        transcriptsBy(x, ...)
+    }
+    res_sql <- .withSQL(edb)
+    res_r <- .withR(edb)
+    checkEquals(res_sql, res_r)
+    microbenchmark(.withSQL(edb), .withR(edb), times = 3) ## same speed
+
+    res_sql <- .withSQL(edb, filter = SeqNameFilter("Y"))
+    res_r <- .withR(edb, filter = SeqNameFilter("Y"))
+    checkEquals(res_sql, res_r)
+    microbenchmark(.withSQL(edb, filter = SeqNameFilter("Y")),
+                   .withR(edb, filter = SeqNameFilter("Y")),
+                   times = 3) ## SQL slighly faster.
+}
+
+dontrun_query_tune <- function() {
+    ## Query tuning:
+    library(RSQLite)
+    con <- dbconn(edb)
+
+    Q <- "select distinct tx2exon.exon_id,exon.exon_seq_start,exon.exon_seq_end,gene.seq_name,tx2exon.tx_id,gene.seq_strand,tx2exon.exon_idx from gene join tx on (gene.gene_id=tx.gene_id) join tx2exon on (tx.tx_id=tx2exon.tx_id) join exon on (tx2exon.exon_id=exon.exon_id) where gene.seq_name = 'Y'"
+    system.time(dbGetQuery(con, Q))
+
+    Q2 <- "select distinct tx2exon.exon_id,exon.exon_seq_start,exon.exon_seq_end,gene.seq_name,tx2exon.tx_id,gene.seq_strand,tx2exon.exon_idx from exon join tx2exon on (tx2exon.exon_id = exon.exon_id) join tx on (tx2exon.tx_id = tx.tx_id) join gene on (gene.gene_id=tx.gene_id) where gene.seq_name = 'Y'"
+    system.time(dbGetQuery(con, Q2))
+
+    Q3 <- "select distinct tx2exon.exon_id,exon.exon_seq_start,exon.exon_seq_end,gene.seq_name,tx2exon.tx_id,gene.seq_strand,tx2exon.exon_idx from tx2exon join exon on (tx2exon.exon_id = exon.exon_id) join tx on (tx2exon.tx_id = tx.tx_id) join gene on (gene.gene_id=tx.gene_id) where gene.seq_name = 'Y'"
+    system.time(dbGetQuery(con, Q3))
+
+    Q4 <- "select distinct tx2exon.exon_id,exon.exon_seq_start,exon.exon_seq_end,gene.seq_name,tx2exon.tx_id,gene.seq_strand,tx2exon.exon_idx from tx2exon join exon on (tx2exon.exon_id = exon.exon_id) join tx on (tx2exon.tx_id = tx.tx_id) join gene on (gene.gene_id=tx.gene_id) where gene.seq_name = 'Y' order by tx.tx_id"
+    system.time(dbGetQuery(con, Q4))
+
+    Q5 <- "select distinct tx2exon.exon_id,exon.exon_seq_start,exon.exon_seq_end,gene.seq_name,tx2exon.tx_id,gene.seq_strand,tx2exon.exon_idx from tx2exon inner join exon on (tx2exon.exon_id = exon.exon_id) inner join tx on (tx2exon.tx_id = tx.tx_id) inner join gene on (gene.gene_id=tx.gene_id) where gene.seq_name = 'Y' order by tx.tx_id"
+    system.time(dbGetQuery(con, Q5))
+
+    Q6 <- "select distinct tx2exon.exon_id,exon.exon_seq_start,exon.exon_seq_end,gene.seq_name,tx2exon.tx_id,gene.seq_strand,tx2exon.exon_idx from gene inner join tx on (gene.gene_id=tx.gene_id) inner join tx2exon on (tx.tx_id=tx2exon.tx_id) inner join exon on (tx2exon.exon_id=exon.exon_id) where gene.seq_name = 'Y' order by tx.tx_id asc"
+    system.time(dbGetQuery(con, Q6))
+}
+
+## implement:
+## .checkOrderBy: checks order.by argument removing columns that are
+## not present in the database
+## orderBy columns are added to the columns.
+## .orderDataFrameBy: orders the dataframe by the specified columns.
+
+notrun_test_protein_domains <- function() {
+    res <- ensembldb:::getWhat(edb, columns = c("protein_id", "tx_id", "gene_id",
+                                                "gene_name"),
+                               filter = list(ProtDomIdFilter("PF00096")))
+}
+
+notrun_compare_full <- function(){
+    ## That's on the full thing.
+    ## Test if the result has the same ordering than the transcripts call.
+    allTx <- transcripts(edb)
+    txLen <- transcriptLengths(edb, with.cds_len=TRUE, with.utr5_len=TRUE,
+                               with.utr3_len=TRUE)
+    checkEquals(names(allTx), rownames(txLen))
+    system.time(
+        futr <- fiveUTRsByTranscript(edb)
+    )
+    ## 23 secs.
+    futrLen <- sum(width(futr))  ## do I need reduce???
+    checkEquals(unname(futrLen), txLen[names(futrLen), "utr5_len"])
+    ## 3'
+    system.time(
+        tutr <- threeUTRsByTranscript(edb)
+    )
+    system.time(
+        tutrLen <- sum(width(tutr))
+    )
+    checkEquals(unname(tutrLen), txLen[names(tutrLen), "utr3_len"])
+}
+
+notrun_compare_to_genfeat <- function(){
+    library(TxDb.Hsapiens.UCSC.hg19.knownGene)
+    txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
+
+    system.time(
+        Len <- transcriptLengths(edb)
+    )
+    ## Woa, 52 sec
+    system.time(
+        txLen <- lengthOf(edb, "tx")
+    )
+    ## Faster, 31 sec
+    checkEquals(Len$tx_len, unname(txLen[rownames(Len)]))
+    system.time(
+        Len2 <- transcriptLengths(txdb)
+    )
+    ## :) 2.5 sec.
+    ## Next.
+    system.time(
+        Len <- transcriptLengths(edb, with.cds_len = TRUE)
+    )
+    ## 56 sec
+    system.time(
+        Len2 <- transcriptLengths(txdb, with.cds_len=TRUE)
+    )
+    ## 4 sec.
+
+    ## Calling the transcriptLengths of GenomicFeatures on the EnsDb.
+    system.time(
+        Def <- GenomicFeatures::transcriptLengths(edb)
+    ) ## 26.5 sec
+
+    system.time(
+        WithCds <- GenomicFeatures::transcriptLengths(edb, with.cds_len=TRUE)
+    ) ## 55 sec
+
+    system.time(
+        WithAll <- GenomicFeatures::transcriptLengths(edb, with.cds_len=TRUE,
+                                                      with.utr5_len=TRUE,
+                                                      with.utr3_len=TRUE)
+    ) ## 99 secs
+
+    ## Get my versions...
+    system.time(
+        MyDef <- ensembldb:::.transcriptLengths(edb)
+    ) ## 31 sec
+    system.time(
+        MyWithCds <- ensembldb:::.transcriptLengths(edb, with.cds_len=TRUE)
+    ) ## 44 sec
+    system.time(
+        MyWithAll <- ensembldb:::.transcriptLengths(edb, with.cds_len=TRUE,
+                                                    with.utr5_len=TRUE,
+                                                    with.utr3_len=TRUE)
+    ) ## 63 sec
+
+    ## Should be all the same!!!
+    rownames(MyDef) <- NULL
+    checkEquals(Def, MyDef)
+    ##
+    rownames(MyWithCds) <- NULL
+    MyWithCds[is.na(MyWithCds$cds_len), "cds_len"] <- 0
+    checkEquals(WithCds, MyWithCds)
+    ##
+    rownames(MyWithAll) <- NULL
+    MyWithAll[is.na(MyWithAll$cds_len), "cds_len"] <- 0
+    MyWithAll[is.na(MyWithAll$utr3_len), "utr3_len"] <- 0
+    MyWithAll[is.na(MyWithAll$utr5_len), "utr5_len"] <- 0
+    checkEquals(WithAll, MyWithAll)
+}
diff --git a/inst/extended_tests/performance_tests.R b/inst/extended_tests/performance_tests.R
new file mode 100644
index 0000000..d035f31
--- /dev/null
+++ b/inst/extended_tests/performance_tests.R
@@ -0,0 +1,173 @@
+############################################################
+## Compare MySQL vs SQLite backends:
+## Amazing how inefficient the MySQL backend seems to be! Most
+## likely it's due to RMySQL, not MySQL.
+dontrun_test_MySQL_vs_SQLite <- function() {
+    ## Compare the performance of the MySQL backend against
+    ## the SQLite backend.
+    edb_mysql <- useMySQL(edb, user = "anonuser", pass = "")
+
+    library(microbenchmark)
+    ## genes
+    microbenchmark(genes(edb), genes(edb_mysql), times = 5)
+    microbenchmark(genes(edb, filter = GeneBiotypeFilter("lincRNA")),
+                   genes(edb_mysql, filter = GeneBiotypeFilter("lincRNA")),
+                   times = 5)
+    microbenchmark(genes(edb, filter = SeqNameFilter(20:23)),
+                   genes(edb_mysql, filter = SeqNameFilter(20:23)),
+                   times = 5)
+    microbenchmark(genes(edb, columns = "tx_id"),
+                   genes(edb_mysql, columns = "tx_id"),
+                   times = 5)
+    microbenchmark(genes(edb, filter = GenenameFilter("BCL2L11")),
+                   genes(edb_mysql, filter = GenenameFilter("BCL2L11")),
+                   times = 5)
+    ## transcripts
+    microbenchmark(transcripts(edb),
+                   transcripts(edb_mysql),
+                   times = 5)
+    microbenchmark(transcripts(edb, filter = GenenameFilter("BCL2L11")),
+                   transcripts(edb_mysql, filter = GenenameFilter("BCL2L11")),
+                   times = 5)
+    ## exons
+    microbenchmark(exons(edb),
+                   exons(edb_mysql),
+                   times = 5)
+    microbenchmark(exons(edb, filter = GenenameFilter("BCL2L11")),
+                   exons(edb_mysql, filter = GenenameFilter("BCL2L11")),
+                   times = 5)
+    ## exonsBy
+    microbenchmark(exonsBy(edb),
+                   exonsBy(edb_mysql),
+                   times = 5)
+    microbenchmark(exonsBy(edb, filter = SeqNameFilter("Y")),
+                   exonsBy(edb_mysql, filter = SeqNameFilter("Y")),
+                   times = 5)
+    ## cdsBy
+    microbenchmark(cdsBy(edb), cdsBy(edb_mysql), times = 5)
+    microbenchmark(cdsBy(edb, by = "gene"), cdsBy(edb_mysql, by = "gene"),
+                   times = 5)
+    microbenchmark(cdsBy(edb, filter = SeqStrandFilter("-")),
+                   cdsBy(edb_mysql, filter = SeqStrandFilter("-")),
+                   times = 5)
+
+}
+
+## Compare the performance of doing the sorting within R or
+## directly in the SQL query.
+dontrun_test_ordering_performance <- function() {
+
+    library(RUnit)
+    library(RSQLite)
+    ## gene table: order by in SQL query vs R:
+    db_con <- dbconn(edb)
+
+    .callWithOrder <- function(con, query, orderBy = "",
+                               orderSQL = TRUE) {
+        if (all(orderBy == ""))
+            orderBy <- NULL
+        if (orderSQL & !is.null(orderBy)) {
+            orderBy <- paste(orderBy, collapse = ", ")
+            query <- paste0(query, " order by ", orderBy)
+        }
+        res <- dbGetQuery(con, query)
+        if (!orderSQL & !all(is.null(orderBy))) {
+            if (!all(orderBy %in% colnames(res)))
+                stop("orderBy not in columns!")
+            ## Do the ordering in R
+            res <- res[do.call(order,
+                               c(list(method = "radix"),
+                                 as.list(res[, orderBy, drop = FALSE]))), ]
+        }
+        rownames(res) <- NULL
+        return(res)
+    }
+
+    #######################
+    ## gene table
+    ## Simple condition
+    the_q <- "select * from gene"
+    system.time(res1 <- .callWithOrder(db_con, query = the_q))
+    system.time(res2 <- .callWithOrder(db_con, query = the_q,
+                                       orderSQL = FALSE))
+    checkIdentical(res1, res2)
+    ## order by gene_id
+    orderBy <- "gene_id"
+    system.time(res1 <- .callWithOrder(db_con, query = the_q, orderBy = orderBy))
+    system.time(res2 <- .callWithOrder(db_con, query = the_q,
+                                       orderBy = orderBy, orderSQL = FALSE))
+    ## SQL: 0.16, R: 0.164.
+    checkIdentical(res1, res2)
+    ## order by gene_name
+    orderBy <- "gene_name"
+    system.time(res1 <- .callWithOrder(db_con, query = the_q, orderBy = orderBy))
+    system.time(res2 <- .callWithOrder(db_con, query = the_q,
+                                       orderBy = orderBy, orderSQL = FALSE))
+    checkIdentical(res1, res2)
+    ## SQL: 0.245, R: 0.185
+    ## sort by gene_name and gene_seq_start
+    orderBy <- c("gene_name", "gene_seq_start")
+    system.time(res1 <- .callWithOrder(db_con, query = the_q, orderBy = orderBy))
+    system.time(res2 <- .callWithOrder(db_con, query = the_q,
+                                       orderBy = orderBy, orderSQL = FALSE))
+    ## SQL: 0.26, R: 0.188
+    checkEquals(res1, res2)
+    ## with subsetting:
+    the_q <- "select * from gene where seq_name in ('5', 'Y')"
+    orderBy <- c("gene_name", "gene_seq_start")
+    system.time(res1 <- .callWithOrder(db_con, query = the_q, orderBy = orderBy))
+    system.time(res2 <- .callWithOrder(db_con, query = the_q,
+                                       orderBy = orderBy, orderSQL = FALSE))
+    ## SQL: 0.031, R: 0.024
+    checkEquals(res1, res2)
+
+    ########################
+    ## joining tables.
+    the_q <- paste0("select * from gene join tx on (gene.gene_id = tx.gene_id)",
+                    " join tx2exon on (tx.tx_id = tx2exon.tx_id)")
+    orderBy <- c("tx_id", "exon_id")
+    system.time(res1 <- .callWithOrder(db_con, query = the_q, orderBy = orderBy))
+    system.time(res2 <- .callWithOrder(db_con, query = the_q,
+                                       orderBy = orderBy, orderSQL = FALSE))
+    ## SQL: 9.6, R: 9.032
+    checkEquals(res1, res2)
+    ## subsetting.
+    the_q <- paste0("select * from gene join tx on (gene.gene_id = tx.gene_id)",
+                    " join tx2exon on (tx.tx_id = tx2exon.tx_id) where",
+                    " seq_name = 'Y'")
+    orderBy <- c("tx_id", "exon_id")
+    system.time(res1 <- .callWithOrder(db_con, query = the_q, orderBy = orderBy))
+    system.time(res2 <- .callWithOrder(db_con, query = the_q,
+                                       orderBy = orderBy, orderSQL = FALSE))
+    ## SQL: 0.9, R: 1.6
+    checkEquals(res1, res2)
+}
+
+## Compare the performance of inner join with left outer join.
+dontrun_test_outer_join_performance <- function() {
+    Q_1 <- ensembldb:::joinQueryOnTables2(edb, tab = c("gene", "exon"))
+    Q_2 <- ensembldb:::joinQueryOnTables2(edb, tab = c("gene", "exon"),
+                                          startWith = "exon")
+    Q_3 <- ensembldb:::joinQueryOnTables2(edb, tab = c("gene", "exon"),
+                                          startWith = "exon",
+                                          join = "left outer join")
+    library(microbenchmark)
+    library(RSQLite)
+    microbenchmark(dbGetQuery(dbconn(edb), paste0("select * from ", Q_1)),
+                   dbGetQuery(dbconn(edb), paste0("select * from ", Q_2)),
+                   dbGetQuery(dbconn(edb), paste0("select * from ", Q_3)),
+                   times = 10)
+    ## Result: Q_1 is a second faster (13 instead of 14).
+    ## Check performance joining tx and genes.
+    Q_1 <- ensembldb:::joinQueryOnTables2(edb, tab = c("tx", "gene"))
+    Q_2 <- ensembldb:::joinQueryOnTables2(edb, tab = c("tx", "gene"),
+                                          startWith = "tx")
+    Q_3 <- ensembldb:::joinQueryOnTables2(edb, tab = c("tx", "gene"),
+                                          startWith = "tx",
+                                          join = "left outer join")
+    microbenchmark(dbGetQuery(dbconn(edb), paste0("select * from ", Q_1)),
+                   dbGetQuery(dbconn(edb), paste0("select * from ", Q_2)),
+                   dbGetQuery(dbconn(edb), paste0("select * from ", Q_3)),
+                   times = 10)
+    ## No difference.
+}
diff --git a/inst/gff/Devosia_geojensis.ASM96941v1.32.gff3.gz b/inst/gff/Devosia_geojensis.ASM96941v1.32.gff3.gz
new file mode 100644
index 0000000..537e7ce
Binary files /dev/null and b/inst/gff/Devosia_geojensis.ASM96941v1.32.gff3.gz differ
diff --git a/inst/gtf/Devosia_geojensis.ASM96941v1.32.gtf.gz b/inst/gtf/Devosia_geojensis.ASM96941v1.32.gtf.gz
new file mode 100644
index 0000000..1582a53
Binary files /dev/null and b/inst/gtf/Devosia_geojensis.ASM96941v1.32.gtf.gz differ
diff --git a/inst/perl/get_gene_transcript_exon_tables.pl b/inst/perl/get_gene_transcript_exon_tables.pl
index e54d108..bc07669 100644
--- a/inst/perl/get_gene_transcript_exon_tables.pl
+++ b/inst/perl/get_gene_transcript_exon_tables.pl
@@ -1,5 +1,15 @@
 #!/usr/bin/perl
 #####################################
+## version 0.3.0: * Change database layout by adding a dedicated entrezgene
+##                  table.
+## version 0.2.4: * Extract taxonomy ID and add that to  metadata table.
+## version 0.2.3: * Add additional columns to the uniprot table:
+##                  o uniprot_db: the Uniprot database name.
+##                  o uniprot_mapping_type: method by which the Uniprot ID was
+##                    mapped to the Ensembl protein ID.
+## version 0.2.2: * Transform gene coordinates always to toplevel instead of
+##                  try-and-error transformation to chromosome.
+## version 0.2.1: * Get protein IDs and (eventually) Uniprot IDs.
 ## version 0.0.2: * get also gene_seq_start, gene_seq_end, tx_seq_start and tx_seq_end from the database!
 ##                * did rename chrom_start to seq_start.
 
@@ -14,7 +24,7 @@ use Bio::EnsEMBL::ApiVersion;
 use Bio::EnsEMBL::Registry;
 ## unification function for arrays
 use List::MoreUtils qw/ uniq /;
-my $script_version = "0.1.3";
+my $script_version = "0.3.0";
 
 ## connecting to the ENSEMBL data base
 use Bio::EnsEMBL::Registry;
@@ -48,10 +58,14 @@ if($option{ h }){
   print("-s (optional): the species; defaults to human.\n");
   print("\n\nThe script will generate the following tables:\n");
   print("- ens_gene.txt: contains all genes defined in Ensembl.\n");
+  print("- ens_entrezgene.txt: contains mapping between ensembl gene_id and entrezgene ID.\n");
   print("- ens_transcript.txt: contains all transcripts of all genes.\n");
   print("- ens_exon.txt: contains all (unique) exons, along with their genomic alignment.\n");
   print("- ens_tx2exon.txt: relates transcript ids to exon ids (m:n), along with the index of the exon in the respective transcript (since the same exon can be part of different transcripts and have a different index in each transcript).\n");
   print("- ens_chromosome.txt: the information of all chromosomes (chromosome/sequence/contig names). \n");
+  print("- ens_protein.txt: the mapping between (protein coding) transcripts and protein IDs including also the peptide sequence.\n");
+  print("- ens_protein_domain.txt: provides for each protein all annotated protein domains along with their start and end coordinates on the protein sequence.");
+  print("- ens_uniprot.txt: provides the mapping between Ensembl protein IDs and Uniprot IDs (if available). The mapping can be 1:n.");
   print("- ens_metadata.txt\n");
   exit 0;
 }
@@ -91,10 +105,14 @@ $registry->load_registry_from_db(-host => $host, -user => $user,
 				 -pass => $pass, -port => $port);
 my $gene_adaptor = $registry->get_adaptor($species, $ensembl_database, "gene");
 my $slice_adaptor = $registry->get_adaptor($species, $ensembl_database, "slice");
-
+my $meta_container = $registry->get_adaptor($species, $ensembl_database,
+					    'MetaContainer' );
 ## determine the species:
 my $species_id = $gene_adaptor->db->species_id;
 my $species_ens = $gene_adaptor->db->species;
+## Determine the taxonomy ID:
+my $taxonomy_id = 0;
+$taxonomy_id = $meta_container->get_taxonomy_id();
 
 my $infostring = "# get_gene_transcript_exon_tables.pl version $script_version:\nRetrieve gene models for Ensembl version $ensembl_version, species $species from Ensembl database at host: $host\n";
 
@@ -102,7 +120,7 @@ print $infostring;
 
 ## preparing output files:
 open(GENE , ">ens_gene.txt");
-print GENE "gene_id\tgene_name\tentrezid\tgene_biotype\tgene_seq_start\tgene_seq_end\tseq_name\tseq_strand\tseq_coord_system\n";
+print GENE "gene_id\tgene_name\tgene_biotype\tgene_seq_start\tgene_seq_end\tseq_name\tseq_strand\tseq_coord_system\n";
 
 open(TRANSCRIPT , ">ens_tx.txt");
 print TRANSCRIPT "tx_id\ttx_biotype\ttx_seq_start\ttx_seq_end\ttx_cds_seq_start\ttx_cds_seq_end\tgene_id\n";
@@ -110,12 +128,24 @@ print TRANSCRIPT "tx_id\ttx_biotype\ttx_seq_start\ttx_seq_end\ttx_cds_seq_start\
 open(EXON , ">ens_exon.txt");
 print EXON "exon_id\texon_seq_start\texon_seq_end\n";
 
+open(ENTREZGENE, ">ens_entrezgene.txt");
+print ENTREZGENE "gene_id\tentrezid\n";
 # open(G2T , ">ens_gene2transcript.txt");
 # print G2T "g2t_gene_id\tg2t_tx_id\n";
 
 open(T2E , ">ens_tx2exon.txt");
 print T2E "tx_id\texon_id\texon_idx\n";
 
+open(PROTEIN, ">ens_protein.txt");
+## print PROTEIN "tx_id\tprotein_id\tuniprot_id\tprotein_sequence\n";
+print PROTEIN "tx_id\tprotein_id\tprotein_sequence\n";
+
+open(UNIPROT, ">ens_uniprot.txt");
+print UNIPROT "protein_id\tuniprot_id\tuniprot_db\tuniprot_mapping_type\n";
+
+open(PROTDOM, ">ens_protein_domain.txt");
+print PROTDOM "protein_id\tprotein_domain_id\tprotein_domain_source\tinterpro_accession\tprot_dom_start\tprot_dom_end\n";
+
 open(CHR , ">ens_chromosome.txt");
 print CHR "seq_name\tseq_length\tis_circular\n";
 
@@ -135,7 +165,11 @@ foreach my $gene_id (@gene_ids){
   $orig_gene = $gene_adaptor->fetch_by_stable_id($gene_id);
   if(defined $orig_gene){
     my $do_transform=1;
-    my $gene  = $orig_gene->transform("chromosome");
+    ## Instead of transforming to chromosome we transform to 'toplevel',
+    ## for genes encoded on chromosome this should be the chromosome, for others
+    ## the most "top" level sequence.
+    ## my $gene  = $orig_gene->transform("chromosome");
+    my $gene  = $orig_gene->transform("toplevel");
     if(!defined $gene){
       ## gene is not on known defined chromosomes!
       $gene = $orig_gene;
@@ -155,19 +189,13 @@ foreach my $gene_id (@gene_ids){
       my $length = $chr_slice->length;
       my $is_circular = $chr_slice->is_circular;
       print CHR "$name\t$length\t$is_circular\n";
-      my $chr_slice_again = $slice_adaptor->fetch_by_region('chromosome', $chrom);
-      if(defined($chr_slice_again)){
-	$coord_system_version = $chr_slice_again->coord_system()->version();
+      my $tmp_version = $chr_slice->coord_system()->version();
+      if (defined $tmp_version and length $tmp_version) {
+	$coord_system_version = $tmp_version;
       }
-      # if(defined $chr_slice){
-      # 	my $name = $chr_slice->seq_region_name;
-      # 	my $length = $chr_slice->length;
-      # 	my $is_circular = $chr_slice->is_circular;
-      # 	$coord_system_version = $chr_slice->coord_system()->version();
-      # 	print CHR "$name\t$length\t$is_circular\n";
-      # }else{
-      # 	my $length = $gene->slice->seq_region_length();
-      # 	print CHR "$chrom\t0\t0\n";
+      # my $chr_slice_again = $slice_adaptor->fetch_by_region('chromosome', $chrom);
+      # if(defined($chr_slice_again)){
+      # 	$coord_system_version = $chr_slice_again->coord_system()->version();
       # }
     }
 
@@ -181,16 +209,19 @@ foreach my $gene_id (@gene_ids){
     my $gene_seq_end = $gene->end;
     ## get entrezgene ID, if any...
     my $all_entries = $gene->get_all_DBLinks("EntrezGene");
-    my %entrezgene_hash=();
     foreach my $dbe (@{$all_entries}){
-      $entrezgene_hash{ $dbe->primary_id } = 1;
-    }
-    my $hash_size = keys %entrezgene_hash;
-    my $entrezid = "";
-    if($hash_size > 0){
-      $entrezid = join(";", keys %entrezgene_hash);
+      print ENTREZGENE "$gene_id\t".$dbe->primary_id."\n";
     }
-    print GENE "$gene_id\t$gene_external_name\t$entrezid\t$gene_biotype\t$gene_seq_start\t$gene_seq_end\t$chrom\t$strand\t$coord_system\n";
+    # my %entrezgene_hash=();
+    # foreach my $dbe (@{$all_entries}){
+    #   $entrezgene_hash{ $dbe->primary_id } = 1;
+    # }
+    # my $hash_size = keys %entrezgene_hash;
+    # my $entrezid = "";
+    # if($hash_size > 0){
+    #   $entrezid = join(";", keys %entrezgene_hash);
+    # }
+    print GENE "$gene_id\t$gene_external_name\t$gene_biotype\t$gene_seq_start\t$gene_seq_end\t$chrom\t$strand\t$coord_system\n";
 
     ## process transcript(s)
     my @transcripts = @{ $gene->get_all_Transcripts };
@@ -198,7 +229,8 @@ foreach my $gene_id (@gene_ids){
     foreach my $transcript (@transcripts){
       if($do_transform==1){
 	## just to be shure that we have the transcript in chromosomal coordinations.
-	$transcript = $transcript->transform("chromosome");
+	## $transcript = $transcript->transform("chromosome");
+	$transcript = $transcript->transform("toplevel");
       }
       ##my $tx_start = $transcript->start;
       ##my $tx_end = $transcript->end;
@@ -220,13 +252,42 @@ foreach my $gene_id (@gene_ids){
       print TRANSCRIPT "$tx_id\t$tx_biotype\t$tx_seq_start\t$tx_seq_end\t$tx_cds_start\t$tx_cds_end\t$gene_id\n";
 ##      print G2T "$gene_id\t$tx_id\n";
 
+      ## Process proteins/translations (if possible)
+      my $transl = $transcript->translation();
+      if (defined($transl)) {
+	my $transl_id = $transl->stable_id();
+	my $prot_seq = $transl->seq();
+	## Check if we could get UNIPROT ID(s):
+	my @unip = @{ $transl->get_all_DBLinks('Uniprot%') };
+	if (scalar(@unip) > 0) {
+	  foreach my $uniprot (@unip) {
+	    my $unip_id = $uniprot->display_id();
+	    my $dbn = $uniprot->dbname();
+	    $dbn =~ s/Uniprot\///g;
+	    my $maptype = $uniprot->info_type();
+	    print UNIPROT "$transl_id\t$unip_id\t$dbn\t$maptype\n";
+	    ## print PROTEIN "$tx_id\t$transl_id\t$unip_id\t$prot_seq\n";
+	  }
+	}
+	print PROTEIN "$tx_id\t$transl_id\t$prot_seq\n";
+	my $prot_doms = $transl->get_all_DomainFeatures;
+	while ( my $prot_dom = shift @{$prot_doms}) {
+	  my $logic_name = $prot_dom->analysis()->logic_name();
+	  my $prot_dom_id = $prot_dom->display_id();
+	  my $interpro_acc = $prot_dom->interpro_ac();
+	  my $prot_start = $prot_dom->start();
+	  my $prot_end = $prot_dom->end();
+	  print PROTDOM "$transl_id\t$prot_dom_id\t$logic_name\t$interpro_acc\t$prot_start\t$prot_end\n";
+	}
+      }
       ## process exon(s)
       ##my @exons = @{ $transcript->get_all_Exons(-constitutive => 1) };
       my @exons = @{ $transcript->get_all_Exons() };  ## exons always returned 5' 3' of transcript!
       my $current_exon_idx = 1;
       foreach my $exon (@exons){
 	if($do_transform==1){
-	  $exon->transform("chromosome");
+	  ## $exon->transform("chromosome");
+	  $exon->transform("toplevel");
 	}
 	my $exon_start = $exon->start;
 	my $exon_end = $exon->end;
@@ -263,16 +324,19 @@ print INFO "Creation time\t".localtime()."\n";
 print INFO "ensembl_version\t$ensembl_version\n";
 print INFO "ensembl_host\t$host\n";
 print INFO "Organism\t$species_ens\n";
+print INFO "taxonomy_id\t$taxonomy_id\n";
 print INFO "genome_build\t$coord_system_version\n";
-print INFO "DBSCHEMAVERSION\t1.0\n";
+print INFO "DBSCHEMAVERSION\t2.0\n";
 
 close(INFO);
 
 close(GENE);
 close(TRANSCRIPT);
 close(EXON);
+close(ENTREZGENE);
 ##close(G2T);
 close(T2E);
 close(CHR);
-
-
+close(PROTEIN);
+close(PROTDOM);
+close(UNIPROT);
diff --git a/inst/perl/test_script.pl b/inst/perl/test_script.pl
new file mode 100644
index 0000000..1d3d6e6
--- /dev/null
+++ b/inst/perl/test_script.pl
@@ -0,0 +1,78 @@
+## uses environment variable ENS pointing to the
+## ENSEMBL API on the computer
+use lib $ENV{ENS} || $ENV{PERL5LIB};
+use IO::File;
+use Getopt::Std;
+use strict;
+use warnings;
+use Bio::EnsEMBL::ApiVersion;
+use Bio::EnsEMBL::Registry;
+## unification function for arrays
+use List::MoreUtils qw/ uniq /;
+my $script_version = "0.2.2";
+
+## connecting to the ENSEMBL data base
+use Bio::EnsEMBL::Registry;
+use Bio::EnsEMBL::ApiVersion;
+my $user = "anonymous";
+my $host = "ensembldb.ensembl.org";
+my $port = 5306;
+my $pass = "";
+my $registry = 'Bio::EnsEMBL::Registry';
+my $ensembl_version="none";
+my $ensembl_database="core";
+my $species = "human";
+my $slice;
+my $coord_system_version="unknown";
+## get all gene ids defined in the database...
+my @gene_ids = ();
+
+my $gene_id = "ENSG00000109906";
+
+$registry->load_registry_from_db(-host => $host, -user => $user,
+				 -pass => $pass, -port => $port);
+my $gene_adaptor = $registry->get_adaptor($species, $ensembl_database, "gene");
+my $slice_adaptor = $registry->get_adaptor($species, $ensembl_database, "slice");
+
+my $current_gene = $gene_adaptor->fetch_by_stable_id($gene_id);
+print "Current gene: ".$current_gene->display_id()."\n";
+my @transcripts = @{ $current_gene->get_all_Transcripts };
+
+foreach my $transcript (@transcripts){
+  print "Current tx: ".$transcript->display_id()."\n";
+  my $transl = $transcript->translation();
+  if (defined($transl)) {
+    my $transl_id = $transl->stable_id();
+    print "Current translation ".$transl_id."\n";
+    my $attr = $transl->get_all_Attributes();
+    foreach my $a (@{$attr}) {
+      print "\tName: ", $a->name(), "\n";
+      print "\tCode: ", $a->code(), "\n";
+      print "\tDesc: ", $a->description(), "\n";
+      print "\tValu: ", $a->value(), "\n";
+    }
+    my @unip = @{ $transl->get_all_DBLinks('Uniprot%') };
+    if (scalar(@unip) > 0) {
+      foreach my $uniprot (@unip) {
+	my $unip_id = $uniprot->display_id();
+	##print UNIPROT "$transl_id\t$unip_id\n";
+	## OK, add also
+	## o uniprot_db: $uniprot->dbname();
+	## o uniprot_info: $uniprot->info_text();
+	my $dbn = $uniprot->dbname();
+	$dbn =~ s/Uniprot\///g;
+	my $descr = $uniprot->description();
+	my $infot = $uniprot->info_text();
+	my $stat = $uniprot->status();
+	my $infotype = $uniprot->info_type();
+	## mapping type.
+	print "uniprot: ".$unip_id."\n";
+	print " dbname: ".$dbn."\n";
+	print " info_text ".$infot."\n";
+	## Defines the method by which this ID was mapped (Uniprot ID was
+	## matched to the Ensembl protein ID).
+	print " info_type ".$infotype."\n";
+      }
+    }
+  }
+}
diff --git a/inst/scripts/checkEnsDbs.R b/inst/scripts/checkEnsDbs.R
new file mode 100644
index 0000000..4b7fda8
--- /dev/null
+++ b/inst/scripts/checkEnsDbs.R
@@ -0,0 +1,22 @@
+#' @description Check EnsDb sqlite files found in the specified folder.
+#'
+#' @param x \code{character(1)} with the folder in which we're looking for EnsDb
+#'     objects.
+#'
+#' @author Johannes Rainer
+#' 
+#' @noRd
+#'
+#' @examples
+#' dir <- "/Users/jo/tmp/ensdb_20"
+checkEnsDbs <- function(x) {
+    edbs <- dir(x, pattern = ".sqlite$", full.names = TRUE)
+    for (i in 1:length(edbs)) {
+        message("\nChecking EnsDb: ", basename(edbs[i]))
+        edb <- EnsDb(edbs[i])
+        ensembldb:::validateEnsDb(edb)
+        ## Now check also some query calls:
+        gns <- genes(edb)
+        message(" OK")
+    }
+}
diff --git a/inst/scripts/generate-EnsDBs.R b/inst/scripts/generate-EnsDBs.R
new file mode 100644
index 0000000..9d66ec4
--- /dev/null
+++ b/inst/scripts/generate-EnsDBs.R
@@ -0,0 +1,321 @@
+## Functions related to create EnsDbs by downloading and installing MySQL
+## databases from Ensembl.
+library(RCurl)
+library(RMySQL)
+library(ensembldb)
+
+#' @description Get core database names from the specified folder.
+#' 
+#' @param ftp_folder The ftp url to the per-species mysql folders.
+#' 
+#' @author Johannes Rainer
+#' 
+#' @noRd
+listCoreDbsInFolder <- function(ftp_folder) {
+    if (missing(ftp_folder))
+        stop("Argument 'ftp_folder' missing!")
+    folders <- unlist(strsplit(getURL(ftp_folder,
+                                      dirlistonly = TRUE), split = "\n"))
+    res <- t(sapply(folders, function(z) {
+        tmp <- unlist(strsplit(z, split = "_"))
+        return(c(folder = z,
+                 organism = paste0(tmp[1:2], collapse = "_"),
+                 type = tmp[3],
+                 version = paste0(tmp[4:length(tmp)], collapse = "_")))
+    }))
+    return(res[which(res[, "type"] == "core"), ])
+}
+
+#' @description Creates an EnsDb for the specified species by first downloading
+#'     the corresponding MySQL database from Ensembl, installing it and
+#'     subsequently creating the EnsDb database from it.
+#'
+#' @param ftp_folder The ftp url to the per-species mysql folders. If not
+#'     provided it will use the default Ensembl ftp:
+#'     \code{ftp://ftp.ensembl.org/pub/release-<ens_version>/mysql/}.
+#' 
+#' @param ens_version The Ensembl version (version of the Ensembl Perl API).
+#' 
+#' @param species The name of the species (e.g. "homo_sapiens").
+#' 
+#' @param user The user name for the MySQL database (write access).
+#' 
+#' @param host The host on which the MySQL database is running.
+#' 
+#' @param pass The password for the MySQL database.
+#' 
+#' @param port The port of the MySQL database.
+#' 
+#' @param local_tmp Local directory that will be used to temporarily store the
+#'     downloaded MySQL database files.
+#' 
+#' @param dropDb Whether the Ensembl core database should be deleted once the
+#'     EnsDb has been created.
+#'
+#' @author Johannes Rainer
+#' 
+#' @examples
+#'
+#' ## For Ensemblgenomes:
+#' ftp_folder <- "ftp://ftp.ensemblgenomes.org/pub/release-33/fungi/mysql/"
+#' @noRd
+createEnsDbForSpecies <- function(ftp_folder,
+                                  ens_version = 86, species, user, host, pass,
+                                  port = 3306, local_tmp = tempdir(),
+                                  sub_dir = "",
+                                  dropDb = TRUE) {
+    ## if ftp_folder is missing use the default one:
+    base_url = "ftp://ftp.ensembl.org/pub"
+    ## (1) Get all directories from Ensembl
+    if (missing(ftp_folder))
+        ftp_folder <- paste0(base_url, "/release-", ens_version, "/mysql/")
+    res <- listCoreDbsInFolder(ftp_folder)
+
+    folders <- unlist(strsplit(getURL(ftp_folder,
+                                      dirlistonly = TRUE), split = "\n"))
+    res <- t(sapply(folders, function(z) {
+        tmp <- unlist(strsplit(z, split = "_"))
+        return(c(folder = z,
+                 organism = paste0(tmp[1:2], collapse = "_"),
+                 type = tmp[3],
+                 version = paste0(tmp[4:length(tmp)], collapse = "_")))
+    }))
+    res <- res[which(res[, "type"] == "core"), ]
+    if (nrow(res) == 0)
+        stop("No directories found!")
+    if (missing(species))
+        species <- res[, "organism"]
+    rownames(res) <- res[, "organism"]
+    ##     Check if we've got the species available
+    got_specs <- species %in% rownames(res)
+    if (!all(got_specs))
+        warning("No core database for species ",
+                paste0(species[!got_specs], collapse = ", "), " found.")
+    species <- species[got_specs]
+    res <- res[species, , drop = FALSE]
+    if (length(species) == 0)
+        stop("No database for any provided species found!")
+    ## (2) Process each species
+    message("Going to process ", nrow(res), " species.")
+    for (i in 1:nrow(res)) {
+        message("Processing species: ", res[i, "organism"], " (", i, " of ",
+                nrow(res), ")")
+        processOneSpecies(ftp_folder = paste0(ftp_folder, res[i, "folder"]),
+                          ens_version = ens_version,
+                          species = species[i], user = user, host = host,
+                          pass = pass, port = port, local_tmp = local_tmp,
+                          dropDb = dropDb)
+        message("Done with species: ", res[i, "organism"], ", ",
+                nrow(res) - i, " left.")
+    }
+}
+
+#' @description This function performs the actual tasks of downloading the
+#'     database files, installing them, deleting the download, creating the
+#'     EnsDb and deleting the database.
+#'
+#' @details While the location of the downloaded temporary MySQL database file
+#'     can be specified, the final SQLite file as well as all intermediate files
+#'     will be placed in the current working directory.
+#'
+#' @param ftp_folder The folder on Ensembl's ftp server containing the mysql
+#'     database files. Has to be the full path to these files.
+#' 
+#' @param ens_version The Ensembl version (version of the Ensembl Perl API).
+#' 
+#' @param species The name of the species (e.g. "homo_sapiens").
+#' 
+#' @param user The user name for the MySQL database (write access).
+#' 
+#' @param host The host on which the MySQL database is running.
+#' 
+#' @param pass The password for the MySQL database.
+#' 
+#' @param port The port of the MySQL database.
+#' 
+#' @param local_tmp Local directory that will be used to temporarily store the
+#'     downloaded MySQL database files.
+#' 
+#' @param dropDb Whether the Ensembl core database should be deleted once the
+#'     EnsDb has been created.
+#'
+#' @author Johannes Rainer
+#' 
+#' @noRd
+processOneSpecies <- function(ftp_folder, ens_version = 86, species, user,
+                              host = "localhost",
+                              pass, port = 3306, local_tmp = tempdir(),
+                              dropDb = TRUE) {
+    if (missing(ftp_folder))
+        stop("'ftp_folder' has to be specified!")
+    if (missing(user))
+        stop("'user' has to be specified!")
+    if (missing(species))
+        stop("'species' has to be specified!")
+    ## (1) Download database files.
+    res <- downloadFilesFromFtpFolder(url = ftp_folder, dest = local_tmp)
+    ## (2) Install database.
+    db_name <- basename(ftp_folder)
+    res <- installEnsemblDb(dir = local_tmp, host = host, dbname = db_name,
+                            user = user, pass = pass, port = port)
+    ## (3) Delete the downloads.
+    fls <- dir(local_tmp, full.names = TRUE)
+    res <- sapply(fls, unlink)
+    ## (4) Create the EnsDb (requires the correct Ensembl API)
+    ##     They are created in the local directory.
+    fetchTablesFromEnsembl(version = ens_version, species = species,
+                           user = user, host = host, pass = pass, port = port)
+    DBFile <- makeEnsemblSQLiteFromTables()
+    unlink("*.txt")
+    ## (5) Delete the database.
+    if (dropDb) {
+        con <- dbConnect(MySQL(), host = host, user = user, pass = pass,
+                         port = port, dbname = "mysql")
+        res <- dbGetQuery(con, paste("drop database ", db_name))
+        dbDisconnect(con)
+    }
+}
+
+
+#'
+#' @description Download all files from an ftp directory to a local directory.
+#'
+#' @param url A character string specifying the url of the directory.
+#'
+#' @param dest A character string specifying the local directory.
+#'
+#' @return A character string with the path of the local directory.
+#'
+#' @author Johannes Rainer
+#' 
+#' @noRd
+#'
+#' @examples
+#'
+#' ftp_dir <- "ftp://ftp.ensembl.org/pub/release-88/mysql/homo_sapiens_core_88_38"
+#' local_dir <- downloadFilesFromFtpFolder(ftp_dir)
+downloadFilesFromFtpFolder <- function(url, dest = tempdir()) {
+    fls <- getURL(paste0(url, "/"), dirlistonly = TRUE)
+    fls <- unlist(strsplit(fls, split = "\n"))
+    message("Downloading ", length(fls), " files ... ", appendLF = FALSE)
+    for (i in 1:length(fls)) {
+        download.file(url = paste0(url, "/", fls[i]),
+                      destfile = paste0(dest, "/", fls[i]), quiet = TRUE)
+    }
+    message("OK")
+    return(dest)
+}
+
+#' @description Install an Ensembl MySQL database downloaded from the Ensembl
+#'     ftp server (e.g. using \link{downloadFilesFromFtpFolder}).
+#'
+#' @note The local directory is expected to correspond to the name of the
+#'     database, i.e. \code{basename(dir)} will be used as the database name if
+#'     argument \code{dbname} is missing.
+#'
+#' @param dir The path to the local directory where the database files are.
+#' 
+#' @param host The host running the MySQL database.
+#' 
+#' @param dbname The name of the database. If not provided the name of the
+#'     provided directory will be used instead.
+#' 
+#' @param user The user name for the MySQL database (rw access).
+#' 
+#' @param pass The password for the MySQL database.
+#' 
+#' @param port The port of the MySQL database.
+#' 
+#' @author Johannes Rainer
+#' 
+#' @noRd
+#'
+#' @examples
+#' user <- "user"
+#' pass <- "pass"
+#' dbname <- "homo_sapiens_core_88_38"
+#' ## set to directory returned by the downloadFilesFromFtpFolder
+#' dir <- local_dir
+#' 
+#' installEnsemblDb(dir = dir, dbname = dbname, user = user, pass = pass)
+installEnsemblDb <- function(dir, host = "localhost", dbname, user, pass,
+                              port = 3306) {
+    if (missing(dir))
+        stop("Argument 'dir' missing!")
+    if (missing(dbname))
+        dbname <- basename(dir)
+    if (missing(user))
+        stop("Argument 'user' missing!")
+    ## Eventually unzip the files...
+    tmp <- system(paste0("gunzip ", dir, "/*.gz"))
+    ## Create the database
+    con <- dbConnect(MySQL(), host = host, user = user, pass = pass, port = port,
+                     dbname = "mysql")
+    res <- dbGetQuery(con, paste0("create database ", dbname))
+    dbDisconnect(con)
+    ## Now create the tables and stuff.
+    tmp <- system(paste0("mysql -h ", host, " -u ", user, " --password=", pass,
+                         " -P ", port, " ", dbname, " < ", dir, "/", dbname,
+                         ".sql"))
+    ## Importing the data.
+    cmd <- paste0("mysqlimport -h ", host, " -u ", user,
+                  " --password=", pass, " -P ", port,
+                  " ", dbname, " -L ", dir, "/*.txt")
+    tmp <- system(cmd)
+}
+
+#' @description Creates EnsDb packages from all sqlite database files found in
+#' the directory specified with parameter \code{dir}.
+#' @param dir The path to the directory where the SQLite files can be found.
+#' @param author The author of the package.
+#' @param maintainer The maintainer of the package.
+#' @param version The version of the package.
+#' @noRd
+createPackagesFromSQLite <- function(dir = ".", author, maintainer, version) {
+    if (missing(author) | missing(maintainer) | missing(version))
+        stop("Parameter 'author', 'maintainer' and 'version' are required!")
+    edbs <- dir(dir, full.names = TRUE, pattern = ".sqlite")
+    if (length(edbs) == 0)
+        stop("Found no SQLite database files in the specified directory!")
+    message("Processing ", length(edbs), " packages.")
+    for (i in 1:length(edbs)) {
+        message("Processing ", basename(edbs[i]), " (", i, " of ",
+                length(edbs), ")", appendLF = FALSE)
+        makeEnsembldbPackage(ensdb = edbs[i], version = version,
+                             author = author, maintainer = maintainer)
+        message("OK")
+    }
+}
+
+
+## ftpf <- paste0("ftp://ftp.ensembl.org/pub/release-86/mysql/",
+##                "anas_platyrhynchos_core_86_1")
+## local_dir <- tempdir()
+## downloadFilesFromFtpFolder(ftpf, dest = local_dir)
+## installEnsemblDb(dir = local_dir, host = "localhost", user = "jo",
+##                  pass = "jo123", dbname = "anas_platyrhynchos_core_86_1")
+## fls <- dir(local_dir, full.names = TRUE)
+## res <- sapply(fls, unlink)
+
+## fetchTablesFromEnsembl(86, species = "anas_platyrhynchos", user = "jo",
+##                        host = "localhost", pass = "jo123", port = 3306)
+## DBFile <- makeEnsemblSQLiteFromTables()
+## unlink("*.txt")
+
+## system.time(fetchTablesFromEnsembl(86, species = "anas_platyrhynchos"))
+
+
+## ftpf <- paste0("ftp://ftp.ensembl.org/pub/release-86/mysql/",
+##                "homo_sapiens_core_86_38")
+## local_dir <- tempdir()
+## processOneSpecies(ftp_folder = ftpf, version = 86,
+##                   species = "homo_sapiens", user = "jo",
+##                   host = "localhost",
+##                   pass = "jo123", port = 3306, local_tmp = local_dir,
+##                   dropDb = FALSE)
+
+
+## Add an issue:
+## + Fix problem of non-defined sequence type "chromosome" in anas platyrhynchos
+##   database. -> update to the perl script.
+## + Compare Hsapiens EnsDb created with new script and the "original" one.
diff --git a/inst/shinyHappyPeople/server.R b/inst/shinyHappyPeople/server.R
index d169d05..7c3cb4c 100644
--- a/inst/shinyHappyPeople/server.R
+++ b/inst/shinyHappyPeople/server.R
@@ -22,13 +22,13 @@ TheFilter <- function(input){
         return(GenenameFilter(Vals, condition=Cond))
     }
     if(input$type=="Chrom name"){
-        return(SeqnameFilter(Vals, condition=Cond))
+        return(SeqNameFilter(Vals, condition=Cond))
     }
     if(input$type=="Gene biotype"){
-        return(GenebiotypeFilter(Vals, condition=Cond))
+        return(GeneBiotypeFilter(Vals, condition=Cond))
     }
     if(input$type=="Tx biotype"){
-        return(TxbiotypeFilter(Vals, condition=Cond))
+        return(TxBiotypeFilter(Vals, condition=Cond))
     }
 }
 
diff --git a/inst/test/testFunctionality.R b/inst/test/testFunctionality.R
deleted file mode 100644
index 3161093..0000000
--- a/inst/test/testFunctionality.R
+++ /dev/null
@@ -1,293 +0,0 @@
-## check namespace.
-detachem <- function( x ){
-    NS <- loadedNamespaces()
-    if( any( NS==x ) ){
-        pkgn <- paste0( "package:", x )
-        detach( pkgn, unload=TRUE, character.only=TRUE )
-    }
-}
-Pkgs <- c( "EnsDb.Hsapiens.v75", "ensembldb" )
-tmp <- sapply( Pkgs, detachem )
-tmp <- sapply( Pkgs, library, character.only=TRUE )
-
-###
-
-## just get all genes.
-cat( "getting all genes..." )
-Gns <- genes( EnsDb.Hsapiens.v75 )
-Gns
-cat("done\n")
-
-cat( "getting all transcripts..." )
-Gns <- transcripts( EnsDb.Hsapiens.v75 )
-Gns
-cat("done\n")
-
-cat( "getting all exons..." )
-Gns <- exons( EnsDb.Hsapiens.v75 )
-Gns
-cat("done\n")
-
-## get exons, sort by exon_seq_start
-Gns <- exons( EnsDb.Hsapiens.v75, columns=c( "exon_id", "tx_id" ), filter=list( TxidFilter( "a" ) ) )
-ensembldb:::.buildQuery( EnsDb.Hsapiens.v75, columns=c( "exon_id", "tx_id" ), filter=list( TxidFilter( "a" ) ))
-
-cat( "all transcripts by..." )
-tmp <- transcriptsBy( EnsDb.Hsapiens.v75 )
-tmp
-cat("done\n")
-
-cat( "all exons by..." )
-tmp <- exonsBy( EnsDb.Hsapiens.v75 )
-tmp
-cat("done\n")
-
-
-###########
-## getWhat... generic query interface to the database.
-Test <- ensembldb:::getWhat( EnsDb.Hsapiens.v75, columns=c( "gene_id", "gene_biotype", "gene_name", "seq_name" ), filter=list( SeqnameFilter( "Y" ) ) )
-head(Test)
-dim(Test)
-
-## now let's joind exon...
-Test <- ensembldb:::getWhat( EnsDb.Hsapiens.v75, columns=c( "gene_id", "gene_biotype", "gene_name", "seq_name", "exon_id", "exon_seq_start", "exon_seq_end" ), order.by="exon_seq_end", order.type="desc", filter=list( SeqnameFilter( "Y" ) ) )
-head(Test)
-dim(Test)
-
-
-## throws a warning since exon_chrom_end is not valid.
-Test <- ensembldb:::getWhat( EnsDb.Hsapiens.v75, columns=c( "gene_id", "gene_biotype", "gene_name", "seq_name", "exon_id", "exon_seq_start", "exon_seq_end" ), order.by="exon_chrom_end", order.type="desc", filter=list( SeqnameFilter( "Y" ) ) )
-head(Test)
-dim(Test)
-
-
-## add a Txid Filter.
-Test <- ensembldb:::getWhat( EnsDb.Hsapiens.v75, columns=c( "gene_id", "gene_biotype", "gene_name", "seq_name", "exon_id", "exon_seq_start", "exon_seq_end", "tx_id" ), order.by="exon_seq_end", order.type="desc", filter=list( TxidFilter( "ENST00000028008" ) ) )
-Test
-
-Test <- ensembldb:::getWhat( EnsDb.Hsapiens.v75, columns=c( "gene_id", "gene_biotype", "gene_name", "seq_name" ), filter=list( TxidFilter( "ENST00000028008" ) ) )
-Test
-
-
-
-######
-## exonsBy
-## get all Exons by gene for genes encoded on chromosomes 1, 2, 4
-Test <- exonsBy( EnsDb.Hsapiens.v75, by="gene", columns=c( "gene_id", "gene_name", "gene_biotype" ), filter=list( SeqnameFilter( c( 1, 2,4 ) ), SeqstrandFilter( "-" ) ) )
-Test
-
-## tx_biotype and tx_id have been removed.
-Test <- exonsBy( EnsDb.Hsapiens.v75, by="gene", columns=c( "gene_id", "gene_name", "gene_biotype", "tx_biotype", "tx_id" ), filter=list( SeqnameFilter( c( 1, 2,4 ) ), SeqstrandFilter( "-" ) ) )
-Test
-
-Test <- exonsBy( EnsDb.Hsapiens.v75, by="tx", columns=c( "gene_id", "tx_id", "tx_biotype" ), filter=list( SeqnameFilter( c( 1, 2,4 ) ) ) )
-Test
-
-## exons for a specific transcript
-Test <- exonsBy( EnsDb.Hsapiens.v75, by="tx", columns=c( "gene_id", "tx_id", "tx_biotype" ), filter=list( TxidFilter( "ENST00000028008" ) ) )
-Test
-
-## that also works, albeit throwing an warning.
-Test <- exonsBy( EnsDb.Hsapiens.v75, by="gene", columns=c( "gene_id", "tx_id", "tx_biotype" ), filter=list( TxidFilter( "ENST00000028008" ) ) )
-Test
-
-
-
-########
-## transcriptsBy
-Test <- transcriptsBy( EnsDb.Hsapiens.v75, by="gene", filter=list( SeqstrandFilter( "+" ), SeqnameFilter( "X" ) ) )
-Test
-
-## that should throw a warning
-Test <- transcriptsBy( EnsDb.Hsapiens.v75, by="gene", filter=list( SeqstrandFilter( "+" ), SeqnameFilter( "X" ) ), columns=c( "exon_id", "exon_seq_start" ) )
-Test
-
-
-Test <- transcriptsBy( EnsDb.Hsapiens.v75, by="exon", filter=list( SeqstrandFilter( "+" ), SeqnameFilter( "X" ) ), columns="tx_biotype" )
-Test
-
-## that should throw a warning
-Test <- transcriptsBy( EnsDb.Hsapiens.v75, by="exon", filter=list( SeqstrandFilter( "+" ), SeqnameFilter( "X" ) ), columns=c( "exon_id", "exon_seq_start", "tx_biotype" ) )
-Test
-
-
-######
-## genes
-Test <- genes( EnsDb.Hsapiens.v75, filter=list( GenebiotypeFilter( "lincRNA" ) ) )
-head( Test )
-length( Test )
-
-## adding tx properties along with gene columns; this will return a data.frame with the
-## additional information; gene columns can however no longer be unique in the data.frame
-Test <- genes( EnsDb.Hsapiens.v75, filter=list( GenebiotypeFilter( "lincRNA" ) ), columns=c( listColumns( EnsDb.Hsapiens.v75, "gene"), "tx_id", "tx_biotype" ) )
-head( Test )
-length( Test )
-
-######
-## transcripts
-## get all transcripts that are target to nonsense mediated decay
-Test <- transcripts( EnsDb.Hsapiens.v75, filter=list( TxbiotypeFilter( "nonsense_mediated_decay" ) ) )
-head( Test )
-length( Test )
-
-## order the transcripts by seq_name; this does not work.
-Test <- transcripts( EnsDb.Hsapiens.v75, filter=list( TxbiotypeFilter( "nonsense_mediated_decay" ) ), order.by="seq_name" )
-head( Test )
-nrow( Test )
-
-## order the transcripts by seq_name; have to explicitely add seq_name to the columns.
-Test <- transcripts( EnsDb.Hsapiens.v75, filter=list( TxbiotypeFilter( "nonsense_mediated_decay" ) ), order.by="seq_name", columns=c( listColumns( EnsDb.Hsapiens.v75, "tx" ), "seq_name" ) )
-head( Test )
-nrow( Test )
-
-## get in addition the gene_name and gene_id
-Test <- transcripts( EnsDb.Hsapiens.v75, filter=list( TxbiotypeFilter( "nonsense_mediated_decay" ) ), columns=c( listColumns( EnsDb.Hsapiens.v75, "tx" ), "gene_id", "gene_name" ) )
-head( Test )
-nrow( Test )
-
-## get in addition the gene_name and gene_id and also exon_id and exon_idx
-Test <- transcripts( EnsDb.Hsapiens.v75, filter=list( TxbiotypeFilter( "nonsense_mediated_decay" ) ), columns=c( listColumns( EnsDb.Hsapiens.v75, "tx" ), "gene_id", "gene_name", "exon_id", "exon_idx" ) )
-head( Test )
-nrow( Test )
-
-
-#####
-## exons
-##
-Test <- exons( EnsDb.Hsapiens.v75, filter=list( TxidFilter( "ENST00000028008" ) ), columns=c( "gene_id","gene_name", "gene_biotype" ) )
-Test
-
-
-
-
-##################
-## examples from EnsDb-class:
-
-## display some information:
-EnsDb.Hsapiens.v75
-
-organism( EnsDb.Hsapiens.v75 )
-
-seqinfo( EnsDb.Hsapiens.v75 )
-
-## show the tables
-listTables( EnsDb.Hsapiens.v75 )
-
-
-######    buildQuery
-##
-## join tables gene and transcript and return gene_id and tx_id
-buildQuery( EnsDb.Hsapiens.v75, columns=c( "gene_id", "tx_id" ) )
-
-
-## get all exon_ids and transcript ids of genes encoded on chromosome Y.
-buildQuery( EnsDb.Hsapiens.v75, columns=c( "exon_id", "tx_id" ), filter=list( SeqnameFilter(  "Y") ) )
-
-
-######   genes
-##
-## get all genes coded on chromosome Y
-AllY <- genes( EnsDb.Hsapiens.v75, filter=list( SeqnameFilter( "Y" ) ) )
-head( AllY )
-
-## return result as GRanges.
-AllY.granges <- genes( EnsDb.Hsapiens.v75, filter=list( SeqnameFilter(
-  "Y" ) ), return.type="GRanges" )
-AllY.granges
-
-## include all transcripts of the gene and their chromosomal
-## coordinates, sort by chrom start of transcripts and return as
-## GRanges.
-AllY.granges.tx <- genes( EnsDb.Hsapiens.v75, filter=list(
-  SeqnameFilter( "Y" ) ), return.type="GRanges", columns=c(
-  "gene_id", "seq_name", "seq_strand", "tx_id", "tx_biotype",
-  "tx_seq_start", "tx_seq_end" ), order.by="tx_seq_start" )
-AllY.granges.tx
-
-
-
-######   transcripts
-##
-## get all transcripts of a gene
-Tx <- transcripts( EnsDb.Hsapiens.v75, filter=list( GeneidFilter(
-  "ENSG00000184895" ) ), order.by="tx_seq_start" )
-Tx
-
-## get all transcripts of two genes along with some information on the
-## gene and transcript
-Tx.granges <- transcripts( EnsDb.Hsapiens.v75, filter=list(
-  GeneidFilter( c( "ENSG00000184895", "ENSG00000092377" ),
-  condition="in" )), return.type="GRanges", order.by="tx_seq_start",
-  columns=c( "gene_id", "gene_seq_start", "gene_seq_end",
-  "gene_biotype", "tx_biotype" ) )
-Tx.granges
-
-
-
-######   exons
-##
-## get all exons of the provided genes
-Exon.granges <- exons( EnsDb.Hsapiens.v75, filter=list( GeneidFilter( c(
-  "ENSG00000184895", "ENSG00000092377" ) )),
-  return.type="GRanges", order.by="exon_seq_start", columns=c(
-  "gene_id", "gene_seq_start", "gene_seq_end", "gene_biotype" ) )
-Exon.granges
-
-
-
-#####    exonsBy
-##
-## get all exons for transcripts encoded on chromosomes 1 to 22, X and Y.
-ETx <- exonsBy( EnsDb.Hsapiens.v75, by="tx", filter=list( SeqnameFilter(
-  c( 1:22, "X", "Y" ) ) ) )
-ETx
-## get all exons for genes encoded on chromosome 1 to 22, X and Y and
-## include additional annotation columns in the result
-EGenes <- exonsBy( EnsDb.Hsapiens.v75, by="gene", filter=list(
-  SeqnameFilter( c( 1:22, "X", "Y" ) ) ), columns=c( "gene_biotype",
-  "gene_name" ) )
-EGenes
-
-## Note that this might also contain "LRG" genes.
-sum( grep( names( EGenes ), pattern="LRG" ) )
-## fetch just Ensembl genes:
-EGenes <- exonsBy( EnsDb.Hsapiens.v75, by="gene", filter=list(
-  SeqnameFilter( c( 1:22, "X", "Y" ) ), GeneidFilter( "ENS%", "like" ) ), columns=c( "gene_biotype",
-  "gene_name" ) )
-
-sum( grep( names( EGenes ), pattern="LRG" ) )
-
-
-
-#####    transcriptsBy
-##
-TGenes <- transcriptsBy( EnsDb.Hsapiens.v75, by="gene", filter=list(
-  SeqnameFilter( c( 1:22, "X", "Y" ) ) ) )
-TGenes
-
-
-
-#####    lengthOf
-##
-## length of a specific gene.
-lengthOf( EnsDb.Hsapiens.v75, filter=list( GeneidFilter(
-  "ENSG00000000003" ) ) )
-
-## length of a transcript
-lengthOf( EnsDb.Hsapiens.v75, of="tx", filter=list( TxidFilter(
-  "ENST00000494424" ) ) )
-
-## average length of all protein coding genes
-mean( lengthOf( EnsDb.Hsapiens.v75, of="gene", filter=list(
-  GenebiotypeFilter( "protein_coding" ),
-  SeqnameFilter( c( 1:22, "X", "Y" ) ) ) ) )
-
-## average length of all snoRNAs
-mean( lengthOf( EnsDb.Hsapiens.v75, of="gene", filter=list(
-  GenebiotypeFilter( "snoRNA" ),
-  SeqnameFilter( c( 1:22, "X", "Y" ) ) ) ) )
-
-listGenebiotypes(EnsDb.Hsapiens.v75)
-
-listTxbiotypes(EnsDb.Hsapiens.v75)
-
diff --git a/inst/test/testInternals.R b/inst/test/testInternals.R
deleted file mode 100644
index 7590d88..0000000
--- a/inst/test/testInternals.R
+++ /dev/null
@@ -1,146 +0,0 @@
-detachem <- function(x){
-    NS <- loadedNamespaces()
-    if(any(NS==x)){
-        pkgn <- paste0("package:", x)
-        detach(pkgn, unload=TRUE, character.only=TRUE)
-    }
-}
-Pkgs <- c("EnsDb.Hsapiens.v75", "ensembldb")
-tmp <- sapply(Pkgs, detachem)
-tmp <- sapply(Pkgs, library, character.only=TRUE)
-DB <- EnsDb.Hsapiens.v75
-
-
-#######################################################
-##
-##  add required tables if needed.
-##
-## check if we get what we want...
-Expect <- c("exon", "tx2exon", "tx")
-Get <- ensembldb:::addRequiredTables(EnsDb.Hsapiens.v75, c("exon", "tx"))
-Get
-if(sum(Get %in% Expect)!=length(Expect))
-    stop("Didn't get what I expected!")
-
-
-Expect <- c("exon", "tx2exon", "tx", "gene")
-Get <- ensembldb:::addRequiredTables(EnsDb.Hsapiens.v75, c("exon", "gene"))
-Get
-if(sum(Get %in% Expect)!=length(Expect))
-    stop("Didn't get what I expected!")
-
-
-
-Expect <- c("exon", "tx2exon", "tx", "gene")
-Get <- ensembldb:::addRequiredTables(EnsDb.Hsapiens.v75, c("exon", "gene", "tx"))
-Get
-if(sum(Get %in% Expect)!=length(Expect))
-    stop("Didn't get what I expected!")
-
-
-#######################################################
-##
-##  join queries
-##
-ensembldb:::joinQueryOnTables(EnsDb.Hsapiens.v75, c("exon", "t2exon", "tx"))
-
-
-ensembldb:::joinQueryOnTables(EnsDb.Hsapiens.v75, c("exon"))
-
-
-ensembldb:::joinQueryOnTables(EnsDb.Hsapiens.v75, c("exon", "t2exon", "tx", "gene"))
-
-
-ensembldb:::joinQueryOnTables(EnsDb.Hsapiens.v75, c("tx", "gene"))
-
-
-ensembldb:::joinQueryOnTables(EnsDb.Hsapiens.v75, c("chromosome", "gene"))
-
-
-
-
-#######################################################
-##
-##  join queries on column names
-##
-## for that query we don't need the exon table
-ensembldb:::cleanColumns(EnsDb.Hsapiens.v75, c("gene_id","tx_id", "bla", "value"))
-
-## don't require the exon table here, exon_id is also in tx2exon.
-ensembldb:::joinQueryOnColumns(EnsDb.Hsapiens.v75, c("gene_id", "tx_id", "gene_name", "exon_id"))
-
-##
-ensembldb:::joinQueryOnColumns(EnsDb.Hsapiens.v75, c("gene_id", "tx_id", "gene_name", "exon_idx"))
-
-
-ensembldb:::joinQueryOnColumns(EnsDb.Hsapiens.v75, c("gene_id", "tx_id", "gene_name", "exon_id", "exon_seq_start"))
-
-
-
-#######################################################
-##
-##  clean columns
-##
-ensembldb:::cleanColumns(EnsDb.Hsapiens.v75, c("gene_id" ,"bma", "gene.gene_biotype"))
-
-ensembldb:::cleanColumns(EnsDb.Hsapiens.v75, c("gene_id" ,"gene.gene_name", "gene.gene_biotype"))
-
-
-
-#######################################################
-##
-##  check built queries
-##
-ensembldb:::.buildQuery(EnsDb.Hsapiens.v75, columns=c("gene_id", "gene_name", "tx_id", "exon_id"), filter=list(SeqnameFilter("Y"), SeqstrandFilter("-")))
-
-
-## throws a warning
-ensembldb:::.buildQuery(EnsDb.Hsapiens.v75, columns=c("gene_id", "gene_name", "tx_id", "exon_id"), filter=list(SeqnameFilter("Y"), SeqstrandFilter("-")), order.by="exon_seq_end", order.type="desc")
-
-
-## works
-ensembldb:::.buildQuery(EnsDb.Hsapiens.v75, columns=c("gene_id", "gene_name", "tx_id", "exon_id", "exon_seq_end"), filter=list(SeqnameFilter("Y"), SeqstrandFilter("-")), order.by="exon_seq_end", order.type="desc")
-
-
-ensembldb:::.buildQuery(EnsDb.Hsapiens.v75, columns=c("tx_id", "exon_id", "exon_seq_end"), filter=list(SeqnameFilter("Y"), SeqstrandFilter("-")), order.by="exon_seq_end", order.type="desc")
-
-
-ensembldb:::.buildQuery(EnsDb.Hsapiens.v75, columns=c("tx_id", "gene_id"))
-
-
-## check the new filter thingy.
-GF <- GeneidFilter("a")
-where(GF)
-column(GF)
-
-## with db
-column(GF, DB)
-where(GF, DB)
-
-## with db and with.tables
-column(GF, DB, with.tables="tx")
-where(GF, DB, with.tables="tx")
-
-
-column(GF, DB, with.tables=c("gene", "tx"))
-where(GF, DB, with.tables=c("gene", "tx"))
-
-## does throw an error!
-##column(GF, DB, with.tables="exon")
-
-## silently drops the submitted ones.
-column(GF, DB, with.tables="blu")
-
-##
-ensembldb:::.buildQuery(DB, columns=c("tx_id", "gene_id"))
-## with filter
-ensembldb:::.buildQuery(DB, columns=c("tx_id", "gene_id"),
-                        filter=list(GeneidFilter("a")))
-ensembldb:::.buildQuery(DB, columns=c("tx_id", "gene_id"),
-                        filter=list(GeneidFilter("a"),
-                                    SeqnameFilter(1)))
-
-ensembldb:::.buildQuery(DB, columns=c("tx_id", "gene_id", "exon_idx"),
-                        filter=list(GeneidFilter("a"),
-                                    SeqnameFilter(1)))
-
diff --git a/inst/unitTests/test_Filters.R b/inst/unitTests/test_Filters.R
deleted file mode 100644
index 4300f4a..0000000
--- a/inst/unitTests/test_Filters.R
+++ /dev/null
@@ -1,241 +0,0 @@
-library("EnsDb.Hsapiens.v75")
-edb <- EnsDb.Hsapiens.v75
-
-## testing GeneidFilter
-test_GeneidFilter <- function(){
-    GF <- GeneidFilter("ENSG0000001")
-    ## check if column matches the present database.
-    checkEquals(column(GF, EnsDb.Hsapiens.v75), "gene.gene_id")
-    ## check error if value is not as expected.
-    checkException(GeneidFilter("ENSG000001", ">"))
-    ## expect the filter to change the condition if lenght of values
-    ## is > 1
-    checkMultiValsIn(GeneidFilter(c("a", "b"), "="))
-    checkMultiValsNotIn(GeneidFilter(c("a", "b"), "!="))
-}
-
-test_GenebiotypeFilter <- function(){
-    Filt <- GenebiotypeFilter("protein_coding")
-    checkEquals(column(Filt, EnsDb.Hsapiens.v75), "gene.gene_biotype")
-    checkException(GenebiotypeFilter("protein_coding", ">"))
-    ## expect the filter to change the condition if lenght of values
-    ## is > 1
-    checkMultiValsIn(GenebiotypeFilter(c("a", "b"), "="))
-    checkMultiValsNotIn(GenebiotypeFilter(c("a", "b"), "!="))
-
-}
-
-test_GenenameFilter <- function(){
-    Filt <- GenenameFilter("genename")
-    checkEquals(column(Filt, EnsDb.Hsapiens.v75), "gene.gene_name")
-    checkException(GenenameFilter("genename", ">"))
-    ## expect the filter to change the condition if lenght of values
-    ## is > 1
-    checkMultiValsIn(GenenameFilter(c("a", "b"), "="))
-    checkMultiValsNotIn(GenenameFilter(c("a", "b"), "!="))
-    ## check if we're escaping correctly!
-    Filt <- GenenameFilter("I'm a gene")
-    checkEquals(where(Filt, EnsDb.Hsapiens.v75), "gene.gene_name = 'I''m a gene'")
-}
-
-test_TxidFilter <- function(){
-    Filt <- TxidFilter("a")
-    checkEquals(column(Filt, EnsDb.Hsapiens.v75), "tx.tx_id")
-    checkException(TxidFilter("a", ">"))
-    ## expect the filter to change the condition if lenght of values
-    ## is > 1
-    checkMultiValsIn(TxidFilter(c("a", "b"), "="))
-    checkMultiValsNotIn(TxidFilter(c("a", "b"), "!="))
-}
-
-test_TxbiotypeFilter <- function(){
-    Filt <- TxbiotypeFilter("a")
-    checkEquals(column(Filt, EnsDb.Hsapiens.v75), "tx.tx_biotype")
-    checkException(TxbiotypeFilter("a", ">"))
-    ## expect the filter to change the condition if lenght of values
-    ## is > 1
-    checkMultiValsIn(TxbiotypeFilter(c("a", "b"), "="))
-    checkMultiValsNotIn(TxbiotypeFilter(c("a", "b"), "!="))
-}
-
-test_ExonidFilter <- function(){
-    Filt <- ExonidFilter("a")
-    checkEquals(column(Filt, EnsDb.Hsapiens.v75), "tx2exon.exon_id")
-    checkException(ExonidFilter("a", ">"))
-    ## expect the filter to change the condition if lenght of values
-    ## is > 1
-    checkMultiValsIn(ExonidFilter(c("a", "b"), "="))
-    checkMultiValsNotIn(ExonidFilter(c("a", "b"), "!="))
-}
-
-## SeqnameFilter
-test_SeqnameFilter <- function(){
-    Filt <- SeqnameFilter("a")
-    checkEquals(column(Filt, EnsDb.Hsapiens.v75), "gene.seq_name")
-    checkException(SeqnameFilter("a", ">"))
-}
-
-## SeqstrandFilter
-test_SeqstrandFilter <- function(){
-    checkException(SeqstrandFilter("a"))
-    Filt <- SeqstrandFilter("-")
-    checkEquals(column(Filt, EnsDb.Hsapiens.v75), "gene.seq_strand")
-}
-
-## SeqstartFilter, feature
-test_SeqstartFilter <- function(){
-    Filt <- SeqstartFilter(123, feature="gene")
-    checkEquals(column(Filt, EnsDb.Hsapiens.v75), "gene.gene_seq_start")
-    Filt <- SeqstartFilter(123, feature="transcript")
-    checkEquals(column(Filt, EnsDb.Hsapiens.v75), "tx.tx_seq_start")
-}
-
-## SeqendFilter
-test_SeqendFilter <- function(){
-    Filt <- SeqendFilter(123, feature="gene")
-    checkEquals(column(Filt, EnsDb.Hsapiens.v75), "gene.gene_seq_end")
-    Filt <- SeqendFilter(123, feature="transcript")
-    checkEquals(column(Filt, EnsDb.Hsapiens.v75), "tx.tx_seq_end")
-}
-
-
-
-## checks if "condition" of the filter is "in"
-checkMultiValsIn <- function(filt){
-    checkEquals(condition(filt), "in")
-}
-## checks if "condition" of the filter is "in"
-checkMultiValsNotIn <- function(filt){
-    checkEquals(condition(filt), "not in")
-}
-
-test_ExonrankFilter <- function(){
-    Filt <- ExonrankFilter(123)
-    checkException(ExonrankFilter("a"))
-
-    edb <- EnsDb.Hsapiens.v75
-    checkException(value(Filt) <- "b")
-
-    checkEquals(column(Filt), "exon_idx")
-    checkEquals(column(Filt, edb), "tx2exon.exon_idx")
-    where(Filt, edb)
-}
-
-## SymbolFilter
-test_SymbolFilter <- function() {
-    edb <- EnsDb.Hsapiens.v75
-    sf <- SymbolFilter("SKA2")
-
-    ## Check the column method.
-    checkEquals(column(sf), "symbol")
-    ## For EnsDb we want it to link to gene_name
-    checkEquals(column(sf, edb), "gene.gene_name")
-    checkException(column(sf, edb, with.tables = c("tx", "exon")))
-
-    ## Check the where method.
-    checkEquals(where(sf), "symbol = 'SKA2'")
-    condition(sf) <- "!="
-    checkEquals(where(sf, edb), "gene.gene_name != 'SKA2'")
-
-    ## Test if we can use it:
-    condition(sf) <- "="
-    Res <- genes(edb, filter = sf, return.type = "data.frame")
-    checkEquals(Res$gene_id, "ENSG00000182628")
-    ## We need now also a column "symbol"!
-    checkEquals(Res$symbol, Res$gene_name)
-    ## Asking explicitely for symbol
-    Res <- genes(edb, filter = sf, return.type = "data.frame",
-                 columns = c("symbol", "gene_id"))
-    checkEquals(colnames(Res), c("symbol", "gene_id"))
-    ## Some more stuff, also shuffling the order.
-    Res <- genes(edb, filter = sf, return.type = "data.frame",
-                 columns = c("gene_name", "symbol", "gene_id"))
-    checkEquals(colnames(Res), c("gene_name", "symbol", "gene_id"))
-    Res <- genes(edb, filter = sf, return.type = "data.frame",
-                 columns = c("gene_id", "gene_name", "symbol"))
-    checkEquals(colnames(Res), c("gene_id", "gene_name", "symbol"))
-    ## And with GRanges as return type.
-    Res <- genes(edb, filter = sf, return.type = "GRanges",
-                 columns = c("gene_id", "gene_name", "symbol"))
-    checkEquals(colnames(mcols(Res)), c("gene_id", "gene_name", "symbol"))
-
-    ## Combine tx_name and symbol
-    Res <- genes(edb, filter = sf, columns = c("tx_name", "symbol"),
-                 return.type = "data.frame")
-    checkEquals(colnames(Res), c("tx_name", "symbol", "gene_id"))
-    checkTrue(all(Res$symbol == "SKA2"))
-
-    ## Test for transcripts
-    Res <- transcripts(edb, filter=sf, return.type="data.frame")
-    checkTrue(all(Res$symbol == "SKA2"))
-    Res <- transcripts(edb, filter = sf, return.type = "data.frame",
-                       columns = c("symbol", "tx_id", "gene_name"))
-    checkTrue(all(Res$symbol == "SKA2"))
-    checkEquals(Res$symbol, Res$gene_name)
-    checkEquals(colnames(Res), c("symbol", "tx_id", "gene_name"))
-
-    ## Test for exons
-    Res <- exons(edb, filter=sf, return.type="data.frame")
-    checkTrue(all(Res$symbol == "SKA2"))
-    Res <- exons(edb, filter = c(sf, TxbiotypeFilter("nonsense_mediated_decay")),
-                 return.type = "data.frame",
-                 columns = c("symbol", "tx_id", "gene_name"))
-    checkTrue(all(Res$symbol == "SKA2"))
-    checkEquals(Res$symbol, Res$gene_name)
-    checkEquals(colnames(Res), c("symbol", "tx_id", "gene_name", "exon_id", "tx_biotype"))
-
-    ## Test for exonsBy
-    Res <- exonsBy(edb, filter=sf)
-    checkTrue(all(unlist(Res)$symbol == "SKA2"))
-    Res <- exonsBy(edb, filter = c(sf, TxbiotypeFilter("nonsense_mediated_decay")),
-                 columns = c("symbol", "tx_id", "gene_name"))
-    checkTrue(all(unlist(Res)$symbol == "SKA2"))
-
-    checkEquals(unlist(Res)$symbol, unlist(Res)$gene_name)
-
-    ## Test for transcriptsBy too
-}
-
-
-## Here we want to test if we get always also the filter columns back.
-test_multiFilterReturnCols <- function() {
-    cols <- ensembldb:::addFilterColumns(edb, cols = c("exon_id"),
-                                         filter = SymbolFilter("SKA2"))
-    checkEquals(cols, c("exon_id", "symbol"))
-    ## Two filter
-    cols <- ensembldb:::addFilterColumns(edb, cols = c("exon_id"),
-                                         filter = list(SymbolFilter("SKA2"),
-                                                       GenenameFilter("SKA2")))
-    checkEquals(cols, c("exon_id", "symbol", "gene_name"))
-    cols <- ensembldb:::addFilterColumns(edb, cols = c("exon_id"),
-                                         filter = list(SymbolFilter("SKA2"),
-                                                       GenenameFilter("SKA2"),
-                                                       GRangesFilter(GRanges("3",
-                                                                             IRanges(3, 5)
-                                                                             ))))
-    checkEquals(cols, c("exon_id", "symbol", "gene_name", "gene_seq_start",
-                        "gene_seq_end", "seq_name", "seq_strand"))
-    cols <- ensembldb:::addFilterColumns(edb, cols = c("exon_id"),
-                                         filter = list(SymbolFilter("SKA2"),
-                                                       GenenameFilter("SKA2"),
-                                                       GRangesFilter(GRanges("3",
-                                                                             IRanges(3, 5)
-                                                                             ),
-                                                                     feature = "exon")))
-    checkEquals(cols, c("exon_id", "symbol", "gene_name", "exon_seq_start",
-                        "exon_seq_end", "seq_name", "seq_strand"))
-    ## SeqstartFilter and GRangesFilter
-    ssf <- SeqstartFilter(123, feature="tx")
-    cols <- ensembldb:::addFilterColumns(edb, cols = c("exon_id"),
-                                         filter = list(SymbolFilter("SKA2"),
-                                                       GenenameFilter("SKA2"),
-                                                       GRangesFilter(GRanges("3",
-                                                                             IRanges(3, 5)
-                                                                             ),
-                                                                     feature = "exon"),
-                                                       ssf))
-    checkEquals(cols, c("exon_id", "symbol", "gene_name", "exon_seq_start",
-                        "exon_seq_end", "seq_name", "seq_strand", "tx_seq_start"))
-
-}
-
diff --git a/inst/unitTests/test_Functionality.R b/inst/unitTests/test_Functionality.R
deleted file mode 100644
index bbb79ce..0000000
--- a/inst/unitTests/test_Functionality.R
+++ /dev/null
@@ -1,507 +0,0 @@
-## that's just a plain simple R-script calling the standard methods.
-
-library( "EnsDb.Hsapiens.v75" )
-DB <- EnsDb.Hsapiens.v75
-
-## testing genes method.
-test_genes <- function(){
-    Gns <- genes(DB, filter=SeqnameFilter("Y"))
-    Gns <- genes(DB, filter=SeqnameFilter("Y"), return.type="DataFrame")
-    checkEquals(sort(colnames(Gns)), sort(listColumns(DB, "gene")))
-    Gns <- genes(DB, filter=SeqnameFilter("Y"), return.type="DataFrame",
-                 columns=c("gene_id", "tx_name"))
-    checkEquals(colnames(Gns), c("gene_id", "tx_name", "seq_name"))
-
-    Gns <- genes(DB, filter=SeqnameFilter("Y"), columns=c("gene_id", "gene_name"))
-    ## Here we don't need the seqnames in mcols!
-    checkEquals(colnames(mcols(Gns)), c("gene_id", "gene_name"))
-
-
-    ## checkEquals(class(genes(DB, return.type="DataFrame",
-    ##                         filter=list(SeqnameFilter("Y")))), "DataFrame" )
-}
-
-test_transcripts <- function(){
-    Tns <- transcripts(DB, filter=SeqnameFilter("Y"), return.type="DataFrame")
-    checkEquals(sort(colnames(Tns)), sort(c(listColumns(DB, "tx"), "seq_name")))
-
-    Tns <- transcripts(DB, columns=c("tx_id", "tx_name"), filter=SeqnameFilter("Y"))
-    checkEquals(sort(colnames(mcols(Tns))), sort(c("tx_id", "tx_name")))
-
-    ## Check the default ordering.
-    Tns <- transcripts(DB, filter = TxbiotypeFilter("protein_coding"),
-                       return.type = "data.frame",
-                       columns = c("seq_name", listColumns(DB, "tx")))
-    checkEquals(order(Tns$seq_name, method = "radix"), 1:nrow(Tns))
-}
-
-test_transcriptsBy <- function(){
-    ## Expect results on the forward strand to be ordered by tx_seq_start
-    res <- transcriptsBy(DB, filter = list(SeqnameFilter("Y"),
-                                           SeqstrandFilter("+")),
-                         by = "gene")
-    fw <- res[[3]]
-    checkEquals(order(start(fw)), 1:length(fw))
-    ## Expect results on the reverse strand to be ordered by -tx_seq_end
-    res <- transcriptsBy(DB, filter = list(SeqnameFilter("Y"),
-                                           SeqstrandFilter("-")),
-                         by = "gene")
-    rv <- res[[3]]
-    checkEquals(order(start(rv), decreasing = TRUE), 1:length(rv))
-}
-
-test_exons <- function(){
-    Exns <- exons(DB, filter=SeqnameFilter("Y"), return.type="DataFrame")
-    checkEquals(sort(colnames(Exns)), sort(c(listColumns(DB, "exon"), "seq_name")))
-
-    ## Check correct ordering.
-    Exns <- exons(DB, return.type = "data.frame", filter = SeqnameFilter(20:23))
-    checkEquals(order(Exns$seq_name, method = "radix"), 1:nrow(Exns))
-}
-
-test_exonsBy <- function() {
-    ##ExnsBy <- exonsBy(DB, filter=list(SeqnameFilter("X")), by="tx")
-    ExnsBy <- exonsBy(DB, filter = list(SeqnameFilter("Y")), by = "tx",
-                      columns = c("tx_name"))
-    checkEquals(sort(colnames(mcols(ExnsBy[[1]]))),
-                sort(c("exon_id", "exon_rank", "tx_name")))
-
-    ## Check what happens if we specify tx_id.
-    ExnsBy <- exonsBy(DB, filter=list(SeqnameFilter("Y")), by="tx",
-                      columns=c("tx_id"))
-    checkEquals(sort(colnames(mcols(ExnsBy[[1]]))),
-                sort(c("exon_id", "exon_rank", "tx_id")))
-
-    ## ExnsBy <- exonsBy(DB, filter=list(SeqnameFilter("Y")), by="tx",
-    ##                   columns=c("exon_rank"))
-    ## checkEquals(sort(colnames(mcols(ExnsBy[[1]]))),
-    ##             sort(c("exon_id", "exon_rank")))
-
-    ExnsBy <- exonsBy(DB, filter=list(SeqnameFilter("Y"), SeqstrandFilter("+")),
-                      by="gene")
-    ## Check that ordering is on start on the forward strand.
-    fw <- ExnsBy[[3]]
-    checkEquals(order(start(fw)), 1:length(fw))
-    ##
-    ExnsBy <- exonsBy(DB, filter=list(SeqnameFilter("Y"), SeqstrandFilter("-")),
-                      by="gene")
-    ## Check that ordering is on start on the forward strand.
-    rv <- ExnsBy[[3]]
-    checkEquals(order(end(rv), decreasing = TRUE), 1:length(rv))
-}
-
-test_dbfunctionality <- function(){
-    GBT <- listGenebiotypes(DB)
-    TBT <- listTxbiotypes(DB)
-}
-
-## test if we get the expected exceptions if we're not submitting
-## correct filter objects
-test_filterExceptions <- function(){
-    checkException(genes(DB, filter="d"))
-    checkException(genes(DB, filter=list(SeqnameFilter("X"),
-                                 "z")))
-    checkException(transcripts(DB, filter="d"))
-    checkException(transcripts(DB, filter=list(SeqnameFilter("X"),
-                                 "z")))
-    checkException(exons(DB, filter="d"))
-    checkException(exons(DB, filter=list(SeqnameFilter("X"),
-                                 "z")))
-    checkException(exonsBy(DB, filter="d"))
-    checkException(exonsBy(DB, filter=list(SeqnameFilter("X"),
-                                 "z")))
-    checkException(transcriptsBy(DB, filter="d"))
-    checkException(transcriptsBy(DB, filter=list(SeqnameFilter("X"),
-                                 "z")))
-}
-
-test_promoters <- function(){
-    promoters(EnsDb.Hsapiens.v75, filter=GeneidFilter(c("ENSG00000184895",
-                                                    "ENSG00000092377")))
-}
-
-test_return_columns_gene <- function(){
-    cols <- c("gene_name", "tx_id")
-    Resu <- genes(DB, filter=SeqnameFilter("Y"), columns=cols, return.type="data.frame")
-    checkEquals(sort(c(cols, "seq_name", "gene_id")), sort(colnames(Resu)))
-
-    Resu <- genes(DB, filter=SeqnameFilter("Y"), columns=cols, return.type="DataFrame")
-    checkEquals(sort(c(cols, "seq_name", "gene_id")), sort(colnames(Resu)))
-
-    Resu <- genes(DB, filter=SeqnameFilter("Y"), columns=cols)
-    checkEquals(sort(c(cols, "gene_id")), sort(colnames(mcols(Resu))))
-}
-
-test_return_columns_tx <- function(){
-    cols <- c("tx_id", "exon_id", "tx_biotype")
-    Resu <- transcripts(DB, filter=SeqnameFilter("Y"), columns=cols, return.type="data.frame")
-    checkEquals(sort(c(cols, "seq_name")), sort(colnames(Resu)))
-
-    Resu <- transcripts(DB, filter=SeqnameFilter("Y"), columns=cols, return.type="DataFrame")
-    checkEquals(sort(c(cols, "seq_name")), sort(colnames(Resu)))
-
-    Resu <- transcripts(DB, filter=SeqnameFilter("Y"), columns=cols)
-    checkEquals(sort(cols), sort(colnames(mcols(Resu))))
-}
-test_return_columns_exon <- function(){
-    cols <- c("tx_id", "exon_id", "tx_biotype")
-    Resu <- exons(DB, filter=SeqnameFilter("Y"), columns=cols, return.type="data.frame")
-    checkEquals(sort(c(cols, "seq_name")), sort(colnames(Resu)))
-
-    Resu <- exons(DB, filter=SeqnameFilter("Y"), columns=cols, return.type="DataFrame")
-    checkEquals(sort(c(cols, "seq_name")), sort(colnames(Resu)))
-
-    Resu <- exons(DB, filter=SeqnameFilter("Y"), columns=cols)
-    checkEquals(sort(cols), sort(colnames(mcols(Resu))))
-}
-
-test_cdsBy <- function(){
-    ## Just checking if we get also tx_name
-    cs <- cdsBy(DB, filter=SeqnameFilter("Y"), column="tx_name")
-    checkTrue(any(colnames(mcols(cs[[1]])) == "tx_name"))
-
-    do.plot <- FALSE
-    ## By tx
-    cs <- cdsBy(DB, filter=list(SeqnameFilter("Y"), SeqstrandFilter("+")))
-    tx <- exonsBy(DB, filter=list(SeqnameFilter("Y"), SeqstrandFilter("+")))
-    ## Check for the first if it makes sense:
-    whichTx <- names(cs)[1]
-    whichCs <- cs[[1]]
-    tx <- transcripts(DB, filter=TxidFilter(whichTx),
-                      columns=c("tx_seq_start", "tx_seq_end", "tx_cds_seq_start",
-                                "tx_cds_seq_end", "exon_seq_start", "exon_seq_end",
-                                "exon_idx", "exon_id", "seq_strand"),
-                      return.type="data.frame")
-    checkSingleTx(tx=tx, cds=whichCs, do.plot=do.plot)
-    ## Next one:
-    whichTx <- names(cs)[2]
-    tx <- transcripts(DB, filter=TxidFilter(whichTx),
-                      columns=c("tx_seq_start", "tx_seq_end", "tx_cds_seq_start",
-                                "tx_cds_seq_end", "exon_seq_start", "exon_seq_end",
-                                "exon_idx", "exon_id"), return.type="data.frame")
-    checkSingleTx(tx=tx, cds=cs[[2]], do.plot=do.plot)
-
-    ## Now for reverse strand:
-    cs <- cdsBy(DB, filter=list(SeqnameFilter("Y"), SeqstrandFilter("-")))
-    whichTx <- names(cs)[1]
-    whichCs <- cs[[1]]
-    tx <- transcripts(DB, filter=TxidFilter(whichTx),
-                      columns=c("tx_seq_start", "tx_seq_end", "tx_cds_seq_start",
-                                "tx_cds_seq_end", "exon_seq_start", "exon_seq_end",
-                                "exon_idx", "exon_id"), return.type="data.frame")
-    ## order the guys by seq_start
-    whichCs <- whichCs[order(start(whichCs))]
-    checkSingleTx(tx=tx, cds=whichCs, do.plot=do.plot)
-    ## Next one:
-    whichTx <- names(cs)[2]
-    whichCs <- cs[[2]]
-    tx <- transcripts(DB, filter=TxidFilter(whichTx),
-                      columns=c("tx_seq_start", "tx_seq_end", "tx_cds_seq_start",
-                                "tx_cds_seq_end", "exon_seq_start", "exon_seq_end",
-                                "exon_idx", "exon_id"), return.type="data.frame")
-    ## order the guys by seq_start
-    whichCs <- whichCs[order(start(whichCs))]
-    checkSingleTx(tx=tx, cds=whichCs, do.plot=do.plot)
-
-    ## Check adding columns
-    Test <- cdsBy(DB, filter=list(SeqnameFilter("Y")),
-                  columns=c("gene_biotype", "gene_name"))
-}
-
-test_cdsByGene <- function(){
-    do.plot <- FALSE
-    ## By gene.
-    cs <- cdsBy(DB, filter=list(SeqnameFilter("Y"), SeqstrandFilter("+")),
-                by="gene", columns=NULL)
-    checkSingleGene(cs[[1]], gene=names(cs)[[1]], do.plot=do.plot)
-    checkSingleGene(cs[[2]], gene=names(cs)[[2]], do.plot=do.plot)
-    ## - strand
-    cs <- cdsBy(DB, filter=list(SeqnameFilter("Y"), SeqstrandFilter("-")),
-                by="gene", columns=NULL)
-    checkSingleGene(cs[[1]], gene=names(cs)[[1]], do.plot=do.plot)
-    checkSingleGene(cs[[2]], gene=names(cs)[[2]], do.plot=do.plot)
-
-    ## looks good!
-    cs2 <- cdsBy(DB, filter=list(SeqnameFilter("Y"), SeqstrandFilter("+")),
-                by="gene", use.names=TRUE)
-}
-
-test_UTRs <- function() {
-    ## check presence of tx_name
-    fUTRs <- fiveUTRsByTranscript(DB,
-                                  filter = TxidFilter("ENST00000155093"),
-                                  column = "tx_name")
-    checkTrue(any(colnames(mcols(fUTRs[[1]])) == "tx_name"))
-
-    do.plot <- FALSE
-    fUTRs <- fiveUTRsByTranscript(DB, filter = list(SeqnameFilter("Y"),
-                                                    SeqstrandFilter("+")))
-    tUTRs <- threeUTRsByTranscript(DB, filter = list(SeqnameFilter("Y"),
-                                                     SeqstrandFilter("+")))
-    cds <- cdsBy(DB, "tx", filter = list(SeqnameFilter("Y"),
-                                         SeqstrandFilter("+")))
-    ## Check a TX:
-    tx <- names(fUTRs)[1]
-    checkGeneUTRs(fUTRs[[tx]], tUTRs[[tx]], cds[[tx]], tx = tx,
-                  do.plot = do.plot)
-    tx <- names(fUTRs)[2]
-    checkGeneUTRs(fUTRs[[tx]], tUTRs[[tx]], cds[[tx]], tx = tx,
-                  do.plot = do.plot)
-    tx <- names(fUTRs)[3]
-    checkGeneUTRs(fUTRs[[tx]], tUTRs[[tx]], cds[[tx]], tx = tx,
-                  do.plot = do.plot)
-
-    ## Reverse strand
-    fUTRs <- fiveUTRsByTranscript(DB, filter = list(SeqnameFilter("Y"),
-                                                    SeqstrandFilter("-")))
-    tUTRs <- threeUTRsByTranscript(DB, filter = list(SeqnameFilter("Y"),
-                                                     SeqstrandFilter("-")))
-    cds <- cdsBy(DB, "tx", filter = list(SeqnameFilter("Y"),
-                                         SeqstrandFilter("-")))
-    ## Check a TX:
-    tx <- names(fUTRs)[1]
-    checkGeneUTRs(fUTRs[[tx]], tUTRs[[tx]], cds[[tx]], tx = tx,
-                  do.plot = do.plot)
-    tx <- names(fUTRs)[2]
-    checkGeneUTRs(fUTRs[[tx]], tUTRs[[tx]], cds[[tx]], tx = tx,
-                  do.plot = do.plot)
-    tx <- names(fUTRs)[3]
-    checkGeneUTRs(fUTRs[[tx]], tUTRs[[tx]], cds[[tx]], tx = tx,
-                  do.plot = do.plot)
-}
-
-## The "test_UTRs" has a very poor performance with the RSQLite 1.0.9011
-## release candidate. Here we want to evaluate the performance.
-dontrun_test_UTRs_performance <- function() {
-    system.time(fUTRs <- fiveUTRsByTranscript(DB,
-                                              filter = list(SeqnameFilter("Y"),
-                                                            SeqstrandFilter("+")),
-                                              column = "tx_name")
-                )
-    ## 6.4 secs.
-    system.time(fUTRs <- fiveUTRsByTranscript(DB,
-                                              filter = list(SeqnameFilter("Y"),
-                                                            SeqstrandFilter("+"))))
-    ## 6.4 secs.
-    system.time(tUTRs <- threeUTRsByTranscript(DB,
-                                               filter = list(SeqnameFilter("Y"),
-                                                             SeqstrandFilter("+"))))
-    ## 6.3 secs.
-    system.time(cds <- cdsBy(DB, "tx", filter = list(SeqnameFilter("Y"),
-                                                     SeqstrandFilter("+"))))
-    ## 6.3 secs.
-    system.time(fUTRs <- fiveUTRsByTranscript(DB,
-                                              filter = list(SeqnameFilter("Y"),
-                                                            SeqstrandFilter("-"))))
-    ## 6.4 secs.
-    system.time(tUTRs <- threeUTRsByTranscript(DB,
-                                               filter = list(SeqnameFilter("Y"),
-                                                             SeqstrandFilter("-"))))
-    ## 6.6 secs.
-    system.time(cds <- cdsBy(DB, "tx", filter = list(SeqnameFilter("Y"),
-                                                     SeqstrandFilter("-"))))
-    ## 6.3 secs.
-}
-
-checkGeneUTRs <- function(f, t, c, tx, do.plot=FALSE){
-    if(any(strand(c) == "+")){
-        ## End of five UTR has to be smaller than any start of cds
-        checkTrue(max(end(f)) < min(start(c)))
-        ## 3'
-        checkTrue(min(start(t)) > max(end(c)))
-    }else{
-        ## 5'
-        checkTrue(min(start(f)) > max(end(c)))
-        ## 3'
-        checkTrue(max(end(t)) < min(start(c)))
-    }
-    ## just plot...
-    if(do.plot){
-        tx <- transcripts(DB, filter=TxidFilter(tx), columns=c("exon_seq_start", "exon_seq_end"),
-                          return.type="data.frame")
-        XL <- range(c(start(f), start(c), start(t), end(f), end(c), end(t)))
-        YL <- c(0, 4)
-        plot(4, 4, pch=NA, xlim=XL, ylim=YL, yaxt="n", ylab="", xlab="")
-        ## five UTR
-        rect(xleft=start(f), xright=end(f), ybottom=0.1, ytop=0.9, col="blue")
-        ## cds
-        rect(xleft=start(c), xright=end(c), ybottom=1.1, ytop=1.9)
-        ## three UTR
-        rect(xleft=start(t), xright=end(t), ybottom=2.1, ytop=2.9, col="red")
-        ## all exons
-        rect(xleft=tx$exon_seq_start, xright=tx$exon_seq_end, ybottom=3.1, ytop=3.9)
-    }
-}
-
-checkSingleGene <- function(whichCs, gene, do.plot=FALSE){
-    tx <- transcripts(DB, filter=GeneidFilter(gene),
-                      columns=c("tx_seq_start", "tx_seq_end", "tx_cds_seq_start", "tx_cds_seq_end", "tx_id",
-                                "exon_id", "exon_seq_start", "exon_seq_end"), return.type="data.frame")
-    XL <- range(tx[, c("tx_seq_start", "tx_seq_end")])
-    tx <- split(tx, f=tx$tx_id)
-    if(do.plot){
-        ##XL <- range(c(start(whichCs), end(whichCs)))
-        YL <- c(0, length(tx) + 1)
-        plot(4, 4, pch=NA, xlim=XL, ylim=YL, yaxt="n", ylab="", xlab="")
-        ## plot the txses
-        for(i in 1:length(tx)){
-            current <- tx[[i]]
-            rect(xleft=current$exon_seq_start, xright=current$exon_seq_end,
-                 ybottom=rep((i-1+0.1), nrow(current)), ytop=rep((i-0.1), nrow(current)))
-            ## coding:
-            rect(xleft=current$tx_cds_seq_start, xright=current$tx_cds_seq_end,
-                 ybottom=rep((i-1+0.1), nrow(current)), ytop=rep((i-0.1), nrow(current)),
-                 border="blue")
-        }
-        rect(xleft=start(whichCs), xright=end(whichCs), ybottom=rep(length(tx)+0.1, length(whichCs)),
-             ytop=rep(length(tx)+0.9, length(whichCs)), border="red")
-    }
-}
-
-checkSingleTx <- function(tx, cds, do.plot=FALSE){
-    rownames(tx) <- tx$exon_id
-    tx <- tx[cds$exon_id, ]
-    ## cds start and end have to be within the correct range.
-    checkTrue(all(start(cds) >= min(tx$tx_cds_seq_start)))
-    checkTrue(all(end(cds) <= max(tx$tx_cds_seq_end)))
-    ## For all except the first and the last we have to assume that exon_seq_start
-    ## is equal to start of cds.
-    checkTrue(all(start(cds)[-1] == tx$exon_seq_start[-1]))
-    checkTrue(all(end(cds)[-nrow(tx)] == tx$exon_seq_end[-nrow(tx)]))
-    ## just plotting the stuff...
-    if(do.plot){
-        XL <- range(tx[, c("exon_seq_start", "exon_seq_end")])
-        YL <- c(0, 4)
-        plot(3, 3, pch=NA, xlim=XL, ylim=YL, xlab="", yaxt="n", ylab="")
-        ## plotting the "real" exons:
-        rect(xleft=tx$exon_seq_start, xright=tx$exon_seq_end, ybottom=rep(0, nrow(tx)),
-             ytop=rep(1, nrow(tx)))
-        ## plotting the cds:
-        rect(xleft=start(cds), xright=end(cds), ybottom=rep(1.2, nrow(tx)),
-             ytop=rep(2.2, nrow(tx)), col="blue")
-    }
-}
-
-
-##*****************************************************************
-## Gviz stuff
-notrun_test_genetrack_df <- function(){
-    do.plot <- FALSE
-    if(do.plot){
-        library(Gviz)
-        options(ucscChromosomeNames=FALSE)
-        data(geneModels)
-        geneModels$chromosome <- 7
-        chr <- 7
-        start <- min(geneModels$start)
-        end <- max(geneModels$end)
-        myGeneModels <- getGeneRegionTrackForGviz(DB, chromosome=chr, start=start,
-                                                  end=end)
-        ## chromosome has to be the same....
-        gtrack <- GenomeAxisTrack()
-        gvizTrack <- GeneRegionTrack(geneModels, name="Gviz")
-        ensdbTrack <- GeneRegionTrack(myGeneModels, name="ensdb")
-        plotTracks(list(gtrack, gvizTrack, ensdbTrack))
-        plotTracks(list(gtrack, gvizTrack, ensdbTrack), from=26700000, to=26780000)
-        ## Looks very nice...
-    }
-    ## Put the stuff below into the vignette:
-    ## Next we get all lincRNAs on chromosome Y
-    Lncs <- getGeneRegionTrackForGviz(DB,
-                                      filter=list(SeqnameFilter("Y"),
-                                                  GenebiotypeFilter("lincRNA")))
-    Prots <- getGeneRegionTrackForGviz(DB,
-                                       filter=list(SeqnameFilter("Y"),
-                                                   GenebiotypeFilter("protein_coding")))
-    if(do.plot){
-        plotTracks(list(gtrack, GeneRegionTrack(Lncs, name="lincRNAs"),
-                        GeneRegionTrack(Prots, name="proteins")))
-        plotTracks(list(gtrack, GeneRegionTrack(Lncs, name="lincRNAs"),
-                        GeneRegionTrack(Prots, name="proteins")),
-                   from=5000000, to=7000000, transcriptAnnotation="symbol")
-    }
-    ## is that the same than:
-    TestL <- getGeneRegionTrackForGviz(DB,
-                                      filter=list(GenebiotypeFilter("lincRNA")),
-                                      chromosome="Y", start=5000000, end=7000000)
-    TestP <- getGeneRegionTrackForGviz(DB,
-                                      filter=list(GenebiotypeFilter("protein_coding")),
-                                      chromosome="Y", start=5000000, end=7000000)
-    if(do.plot){
-        plotTracks(list(gtrack, GeneRegionTrack(Lncs, name="lincRNAs"),
-                        GeneRegionTrack(Prots, name="proteins"),
-                        GeneRegionTrack(TestL, name="compareL"),
-                        GeneRegionTrack(TestP, name="compareP")),
-                   from=5000000, to=7000000, transcriptAnnotation="symbol")
-    }
-    checkTrue(all(TestL$exon %in% Lncs$exon))
-    checkTrue(all(TestP$exon %in% Prots$exon))
-    ## Crazy amazing stuff
-    ## system.time(
-    ##     All <- getGeneRegionTrackForGviz(DB)
-    ## )
-}
-
-####============================================================
-##  length stuff
-##
-####------------------------------------------------------------
-test_lengthOf <- function(){
-    system.time(
-        lenY <- lengthOf(DB, "tx", filter=SeqnameFilter("Y"))
-    )
-    ## Check what would happen if we do it ourselfs...
-    system.time(
-        lenY2 <- sum(width(reduce(exonsBy(DB, "tx", filter=SeqnameFilter("Y")))))
-    )
-    checkEquals(lenY, lenY2)
-    ## Same for genes.
-    system.time(
-        lenY <- lengthOf(DB, "gene", filter=SeqnameFilter("Y"))
-    )
-    ## Check what would happen if we do it ourselfs...
-    system.time(
-        lenY2 <- sum(width(reduce(exonsBy(DB, "gene", filter=SeqnameFilter("Y")))))
-    )
-    checkEquals(lenY, lenY2)
-    ## Just using the transcriptLengths
-
-
-}
-
-####============================================================
-##  ExonrankFilter
-##
-####------------------------------------------------------------
-test_ExonrankFilter <- function(){
-    txs <- transcripts(DB, columns=c("exon_id", "exon_idx"),
-                       filter=SeqnameFilter(c("Y")))
-    txs <- txs[order(names(txs))]
-
-    txs2 <- transcripts(DB, columns=c("exon_id"),
-                        filter=list(SeqnameFilter(c("Y")),
-                                    ExonrankFilter(3)))
-    txs2 <- txs[order(names(txs2))]
-    ## hm, that's weird somehow.
-    exns <- exons(DB, columns=c("tx_id", "exon_idx"),
-                  filter=list(SeqnameFilter("Y"),
-                              ExonrankFilter(3)))
-    checkTrue(all(exns$exon_idx == 3))
-    exns <- exons(DB, columns=c("tx_id", "exon_idx"),
-                  filter=list(SeqnameFilter("Y"),
-                              ExonrankFilter(3, condition="<")))
-    checkTrue(all(exns$exon_idx < 3))
-}
-
-
-notrun_lengthOf <- function(){
-    ## How does TxDb do that?s
-    library(TxDb.Hsapiens.UCSC.hg19.knownGene)
-    txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
-    Test <- transcriptLengths(txdb)
-    head(Test)
-}
-
-
-
-
diff --git a/inst/unitTests/test_GFF.R b/inst/unitTests/test_GFF.R
deleted file mode 100644
index d9ef256..0000000
--- a/inst/unitTests/test_GFF.R
+++ /dev/null
@@ -1,179 +0,0 @@
-notrun_test_builds <- function(){
-    input <- "/Users/jo/Projects/EnsDbs/83/Homo_sapiens.GRCh38.83.gtf.gz"
-    fromGtf <- ensDbFromGtf(input, outfile=tempfile())
-    ## provide wrong ensembl version
-    fromGtf <- ensDbFromGtf(input, outfile=tempfile(), version="75")
-    ## provide wrong genome version
-    fromGtf <- ensDbFromGtf(input, outfile=tempfile(), genomeVersion="75")
-    EnsDb(fromGtf)
-    ## provide wrong organism
-    fromGtf <- ensDbFromGtf(input, outfile=tempfile(), organism="blalba")
-    EnsDb(fromGtf)
-    ## GFF
-    input <- "/Users/jo/Projects/EnsDbs/83/Homo_sapiens.GRCh38.83.chr.gff3.gz"
-    fromGff <- ensDbFromGff(input, outfile=tempfile())
-    EnsDb(fromGff)
-    fromGff <- ensDbFromGff(input, outfile=tempfile(), version="75")
-    EnsDb(fromGff)
-    fromGff <- ensDbFromGff(input, outfile=tempfile(), genomeVersion="bla")
-    EnsDb(fromGff)
-    fromGff <- ensDbFromGff(input, outfile=tempfile(), organism="blabla")
-    EnsDb(fromGff)
-
-    ## AH
-    library(AnnotationHub)
-    ah <- AnnotationHub()
-    fromAh <- ensDbFromAH(ah["AH47963"], outfile=tempfile())
-    EnsDb(fromAH)
-    fromAh <- ensDbFromAH(ah["AH47963"], outfile=tempfile(), version="75")
-    EnsDb(fromAH)
-    fromAh <- ensDbFromAH(ah["AH47963"], outfile=tempfile(), genomeVersion="bla")
-    EnsDb(fromAH)
-    fromAh <- ensDbFromAH(ah["AH47963"], outfile=tempfile(), organism="blabla")
-    EnsDb(fromAH)
-}
-
-
-
-notrun_test_ensdbFromGFF <- function(){
-    library(ensembldb)
-    ##library(rtracklayer)
-    ## VERSION 83
-    gtf <- "/Users/jo/Projects/EnsDbs/83/Homo_sapiens.GRCh38.83.gtf.gz"
-    fromGtf <- ensDbFromGtf(gtf, outfile=tempfile())
-    egtf <- EnsDb(fromGtf)
-
-    gff <- "/Users/jo/Projects/EnsDbs/83/Homo_sapiens.GRCh38.83.gff3.gz"
-    fromGff <- ensDbFromGff(gff, outfile=tempfile())
-    egff <- EnsDb(fromGff)
-
-    ## Compare EnsDbs
-    ensembldb:::compareEnsDbs(egtf, egff)
-    ## OK, only Entrezgene ID "problems"
-
-    ## Compare with the one built with the Perl API
-    library(EnsDb.Hsapiens.v83)
-    edb <- EnsDb.Hsapiens.v83
-
-    ensembldb:::compareEnsDbs(egtf, edb)
-
-    ensembldb:::compareEnsDbs(egff, edb)
-    ## OK, I get different genes...
-    genes1 <- genes(egtf)
-    genes2 <- genes(edb)
-
-    only2 <- genes2[!(genes2$gene_id %in% genes1$gene_id)]
-
-    ## That below was before the fix to include feature type start_codon and stop_codon
-    ## to the CDS type.
-    ## Identify which are the different transcripts:
-    txGtf <- transcripts(egtf)
-    txGff <- transcripts(egff)
-    commonIds <- intersect(names(txGtf), names(txGff))
-    haveCds <- commonIds[!is.na(txGtf[commonIds]$tx_cds_seq_start) & !is.na(txGff[commonIds]$tx_cds_seq_start)]
-    diffs <- haveCds[txGtf[haveCds]$tx_cds_seq_start != txGff[haveCds]$tx_cds_seq_start]
-    head(diffs)
-
-    ## What could be reasons?
-    ## 1) alternative CDS?
-    ## Checking the GTF:
-    ## tx ENST00000623834: start_codon: 195409 195411.
-    ##                     first CDS: 195259 195411.
-    ##                     last CDS: 185220 185350.
-    ##                     stop_codon: 185217 185219.
-    ## So, why the heck is the stop codon OUTSIDE the CDS???
-    ## library(rtracklayer)
-    ## theGtf <- import(gtf, format="gtf")
-    ## ## Apparently, the GTF contains the additional elements start_codon/stop_codon.
-    ## theGff <- import(gff, format="gff3")
-
-
-    ## transcripts(egtf, filter=TxidFilter(diffs[1]))
-    ## transcripts(egff, filter=TxidFilter(diffs[1]))
-
-
-    ## VERSION 81
-    ## Try to get the same via AnnotationHub
-    gff <- "/Users/jo/Projects/EnsDbs/81/homo_sapiens/Homo_sapiens.GRCh38.81.gff3.gz"
-    fromGff <- ensDbFromGff(gff, outfile=tempfile())
-    egff <- EnsDb(fromGff)
-
-    gtf <- "/Users/jo/Projects/EnsDbs/81/homo_sapiens/Homo_sapiens.GRCh38.81.gtf.gz"
-    fromGtf <- ensDbFromGtf(gtf, outfile=tempfile())
-    egtf <- EnsDb(fromGtf)
-
-    ## Compare those two:
-    ensembldb:::compareEnsDbs(egff, egtf)
-    ## Why are there some differences in the transcripts???
-    trans1 <- transcripts(egff)
-    trans2 <- transcripts(egtf)
-    onlyInGtf <- trans2[!(trans2$tx_id %in% trans1$tx_id)]
-
-    ##gtfGRanges <- ah["AH47963"]
-
-    library(AnnotationHub)
-    ah <- AnnotationHub()
-    fromAh <- ensDbFromAH(ah["AH47963"], outfile=tempfile())  ## That's human...
-    eah <- EnsDb(fromAh)
-
-    ## Compare it to gtf:
-    ensembldb:::compareEnsDbs(eah, egtf)
-    ## OK. Same cds starts and cds ends.
-
-    ## Compare it to gff:
-    ensembldb:::compareEnsDbs(eah, egff)
-    ## hm.
-
-    ## Compare to EnsDb
-    library(EnsDb.Hsapiens.v81)
-    edb <- EnsDb.Hsapiens.v81
-    ensembldb:::compareEnsDbs(edb, egtf)
-    ## Problem with CDS
-    ensembldb:::compareEnsDbs(edb, egff)
-    ## That's fine.
-
-    ## Summary:
-    ## GTF and AH are the same.
-    ## GFF and Perl API are the same.
-
-    ## OLD STUFF BELOW.
-
-    ##fromAh <- EnsDbFromAH(ah["AH47963"], outfile=tempfile(), organism="Homo sapiens", version=81)
-
-    ## Try with a fancy species:
-    gff <- "/Users/jo/Projects/EnsDbs/83/gadus_morhua/Gadus_morhua.gadMor1.83.gff3.gz"
-    fromGtf <- ensDbFromGff(gff, outfile=tempfile())
-
-    gff <- "/Users/jo/Projects/EnsDbs/83/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.83.gff3.gz"
-    fromGff <- ensDbFromGff(gff, outfile=tempfile())
-    ## That works.
-
-    ## Try with a file from AnnotationHub: Gorilla gorilla.
-    library(AnnotationHub)
-    ah <- AnnotationHub()
-    ah <- ah["AH47962"]
-
-    res <- ensDbFromAH(ah, outfile=tempfile())
-    edb <- EnsDb(res)
-    genes(edb)
-
-
-    ## ensRel <- query(ah, c("GTF", "ensembl"))
-
-    ## gtf <- "/Users/jo/Projects/EnsDbs/83/Homo_sapiens.GRCh38.83.gtf.gz"
-    ## ## GTF
-    ## dir.create("/tmp/fromGtf")
-    ## fromGtf <- ensDbFromGtf(gtf, path="/tmp/fromGtf", verbose=TRUE)
-    ## ## GFF
-    ## dir.create("/tmp/fromGff")
-    ## fromGff <- ensembldb:::ensDbFromGff(gff, path="/tmp/fromGff", verbose=TRUE)
-
-    ## ## ZBTB16:
-    ## ## exon: ENSE00003606532 is 3rd exon of tx: ENST00000335953
-    ## ## exon: ENSE00003606532 is 3rd exon of tx: ENST00000392996
-    ## ## the Ensembl GFF has 2 entries for this exon.
-
-}
-
-
-
diff --git a/inst/unitTests/test_GRangeFilter.R b/inst/unitTests/test_GRangeFilter.R
deleted file mode 100644
index 684aa91..0000000
--- a/inst/unitTests/test_GRangeFilter.R
+++ /dev/null
@@ -1,102 +0,0 @@
-###============================================================
-##  Testing the GRangesFilter
-###------------------------------------------------------------
-library(EnsDb.Hsapiens.v75)
-edb <- EnsDb.Hsapiens.v75
-
-test_GRangesFilterValidity <- function(){
-    checkException(GRangesFilter(value="bla"))
-    checkException(GRangesFilter(GRanges(seqnames="X", ranges=IRanges(4, 6)),
-                                 condition=">"))
-    ## Testing slots
-    gr <- GRanges("X", ranges=IRanges(123, 234), strand="-")
-    grf <- GRangesFilter(gr, condition="within")
-    ## Now check some stuff
-    checkEquals(start(grf), start(gr))
-    checkEquals(end(grf), end(gr))
-    checkEquals(as.character(strand(gr)), strand(grf))
-    checkEquals(as.character(seqnames(gr)), seqnames(grf))
-
-    ## Test column:
-    ## filter alone.
-    tocomp <- c(start="gene_seq_start", end="gene_seq_end", seqname="seq_name",
-                strand="seq_strand")
-    checkEquals(column(grf), tocomp)
-    grf at feature <- "tx"
-    tocomp <- c(start="tx_seq_start", end="tx_seq_end", seqname="seq_name",
-                strand="seq_strand")
-    checkEquals(column(grf), tocomp)
-    grf at feature <- "exon"
-    tocomp <- c(start="exon_seq_start", end="exon_seq_end", seqname="seq_name",
-                strand="seq_strand")
-    checkEquals(column(grf), tocomp)
-    ## filter and ensdb.
-    tocomp <- c(start="exon.exon_seq_start", end="exon.exon_seq_end", seqname="gene.seq_name",
-                strand="gene.seq_strand")
-    checkEquals(column(grf, edb), tocomp)
-    grf at feature <- "tx"
-    tocomp <- c(start="tx.tx_seq_start", end="tx.tx_seq_end", seqname="gene.seq_name",
-                strand="gene.seq_strand")
-    checkEquals(column(grf, edb), tocomp)
-    grf at feature <- "gene"
-    tocomp <- c(start="gene.gene_seq_start", end="gene.gene_seq_end", seqname="gene.seq_name",
-                strand="gene.seq_strand")
-    checkEquals(column(grf, edb), tocomp)
-
-    ## Test where:
-    ## filter alone.
-    tocomp <- "gene_seq_start >= 123 and gene_seq_end <= 234 and seq_name == 'X' and seq_strand = -1"
-    checkEquals(where(grf), tocomp)
-    ## what if we set strand to *
-    grf2 <- GRangesFilter(GRanges("1", IRanges(123, 234)))
-    tocomp <- "gene.gene_seq_start >= 123 and gene.gene_seq_end <= 234 and gene.seq_name == '1'"
-    checkEquals(where(grf2, edb), tocomp)
-
-    ## Now, using overlapping.
-    grf at location <- "overlapping"
-    grf at feature <- "transcript"
-    tocomp <- "tx.tx_seq_start <= 234 and tx.tx_seq_end >= 123 and gene.seq_name = 'X' and gene.seq_strand = -1"
-    checkEquals(where(grf, edb), tocomp)
-}
-
-## Here we check if we fetch what we expect from the database.
-test_GRangesFilterQuery <- function(){
-    do.plot <- FALSE
-    zbtb <- genes(edb, filter=GenenameFilter("ZBTB16"))
-    txs <- transcripts(edb, filter=GenenameFilter("ZBTB16"))
-
-    ## Now use the GRangesFilter to fetch all tx
-    txs2 <- transcripts(edb, filter=GRangesFilter(zbtb))
-    checkEquals(txs$tx_id, txs2$tx_id)
-
-    ## Exons:
-    exs <- exons(edb, filter=GenenameFilter("ZBTB16"))
-    exs2 <- exons(edb, filter=GRangesFilter(zbtb))
-    checkEquals(exs$exon_id, exs2$exon_id)
-
-    ## Now check the filter with "overlapping".
-    intr <- GRanges("11", ranges=IRanges(114000000, 114000050), strand="+")
-    gns <- genes(edb, filter=GRangesFilter(intr, condition="overlapping"))
-    checkEquals(gns$gene_name, "ZBTB16")
-
-    txs <- transcripts(edb, filter=GRangesFilter(intr, condition="overlapping"))
-    if(do.plot){
-        plot(3, 3, pch=NA, xlim=c(start(zbtb), end(zbtb)), ylim=c(0, length(txs2)))
-        rect(xleft=start(intr), xright=end(intr), ybottom=0, ytop=length(txs2), col="red", border="red")
-        for(i in 1:length(txs2)){
-            current <- txs2[i]
-            rect(xleft=start(current), xright=end(current), ybottom=i-0.975, ytop=i-0.125, border="grey")
-            text(start(current), y=i-0.5,pos=4, cex=0.75, labels=current$tx_id)
-        }
-        ## OK, that' OK.
-    }
-
-    ## OK, now for a GRangesFilter with more than one GRanges.
-    ir2 <- IRanges(start=c(2654890, 2709520, 28111770),
-                   end=c(2654900, 2709550, 28111790))
-    grf2 <- GRangesFilter(GRanges(rep("Y", length(ir2)), ir2), condition="overlapping")
-    Test <- transcripts(edb, filter=grf2)
-    checkEquals(names(Test), c("ENST00000383070", "ENST00000250784", "ENST00000598545"))
-
-}
-
diff --git a/inst/unitTests/test_SymbolFilter.R b/inst/unitTests/test_SymbolFilter.R
deleted file mode 100644
index a29b4ec..0000000
--- a/inst/unitTests/test_SymbolFilter.R
+++ /dev/null
@@ -1,58 +0,0 @@
-############################################################
-## Testing the SymbolFilter.
-library(EnsDb.Hsapiens.v75)
-edb <- EnsDb.Hsapiens.v75
-
-test_sf_on_genes <- function(){
-    sf <- SymbolFilter("SKA2")
-    gnf <- GenenameFilter("SKA2")
-
-    returnFilterColumns(edb) <- FALSE
-    gns_sf <- genes(edb, filter=sf)
-    gns_gnf <- genes(edb, filter=gnf)
-    checkEquals(gns_sf, gns_gnf)
-
-    returnFilterColumns(edb) <- TRUE
-    gns_sf <- genes(edb, filter=sf)
-    checkEquals(gns_sf$gene_name, gns_sf$symbol)
-
-    ## Hm, what happens if we use both?
-    gns <- genes(edb, filter=list(sf, gnf))
-    ## All fine.
-}
-
-
-test_sf_on_tx <- function(){
-    sf <- SymbolFilter("SKA2")
-    gnf <- GenenameFilter("SKA2")
-
-    returnFilterColumns(edb) <- FALSE
-    tx_sf <- transcripts(edb, filter=sf)
-    tx_gnf <- transcripts(edb, filter=gnf)
-    checkEquals(tx_sf, tx_gnf)
-
-    returnFilterColumns(edb) <- TRUE
-    tx_sf <- transcripts(edb, filter=sf, columns=c("gene_name"))
-    checkEquals(tx_sf$gene_name, tx_sf$symbol)
-
-}
-
-
-test_sf_on_exons <- function(){
-    sf <- SymbolFilter("SKA2")
-    gnf <- GenenameFilter("SKA2")
-
-    returnFilterColumns(edb) <- FALSE
-    ex_sf <- exons(edb, filter=sf)
-    ex_gnf <- exons(edb, filter=gnf)
-    checkEquals(ex_sf, ex_gnf)
-
-    returnFilterColumns(edb) <- TRUE
-    ex_sf <- exons(edb, filter=sf, columns=c("gene_name"))
-    checkEquals(ex_sf$gene_name, ex_sf$symbol)
-}
-
-
-############################################################
-##   select method
-
diff --git a/inst/unitTests/test_buildEdb.R b/inst/unitTests/test_buildEdb.R
deleted file mode 100644
index c45b09f..0000000
--- a/inst/unitTests/test_buildEdb.R
+++ /dev/null
@@ -1,45 +0,0 @@
-test_ensDbFromGRanges <- function(){
-    load(system.file("YGRanges.RData", package="ensembldb"))
-    DB <- ensDbFromGRanges(Y, path=tempdir(), version=75,
-                           organism="Homo_sapiens")
-    edb <- EnsDb(DB)
-    checkEquals(unname(genome(edb)), "GRCh37")
-}
-
-
-## Test some internal functions...
-test_processEnsemblFileNames <- function(){
-    Test <- "Homo_sapiens.GRCh38.83.gtf.gz"
-    checkTrue(ensembldb:::isEnsemblFileName(Test))
-    checkEquals(ensembldb:::organismFromGtfFileName(Test), "Homo_sapiens")
-    checkEquals(ensembldb:::genomeVersionFromGtfFileName(Test), "GRCh38")
-    checkEquals(ensembldb:::ensemblVersionFromGtfFileName(Test), "83")
-
-    Test <- "Homo_sapiens.GRCh38.83.chr.gff3.gz"
-    checkTrue(ensembldb:::isEnsemblFileName(Test))
-    checkEquals(ensembldb:::organismFromGtfFileName(Test), "Homo_sapiens")
-    checkEquals(ensembldb:::genomeVersionFromGtfFileName(Test), "GRCh38")
-    checkEquals(ensembldb:::ensemblVersionFromGtfFileName(Test), "83")
-
-    Test <- "Gadus_morhua.gadMor1.83.gff3.gz"
-    checkTrue(ensembldb:::isEnsemblFileName(Test))
-    checkEquals(ensembldb:::organismFromGtfFileName(Test), "Gadus_morhua")
-    checkEquals(ensembldb:::genomeVersionFromGtfFileName(Test), "gadMor1")
-    checkEquals(ensembldb:::ensemblVersionFromGtfFileName(Test), "83")
-
-    Test <- "Solanum_lycopersicum.GCA_000188115.2.30.chr.gtf.gz"
-    checkTrue(ensembldb:::isEnsemblFileName(Test))
-    checkEquals(ensembldb:::organismFromGtfFileName(Test), "Solanum_lycopersicum")
-    checkEquals(ensembldb:::genomeVersionFromGtfFileName(Test), "GCA_000188115.2")
-    checkEquals(ensembldb:::ensemblVersionFromGtfFileName(Test), "30")
-
-    Test <- "ref_GRCh38.p2_top_level.gff3.gz"
-    checkEquals(ensembldb:::isEnsemblFileName(Test), FALSE)
-    ensembldb:::organismFromGtfFileName(Test)
-    checkException(ensembldb:::genomeVersionFromGtfFileName(Test))
-    ##checkException(ensembldb:::ensemblVersionFromGtfFileName(Test))
-}
-
-
-
-
diff --git a/inst/unitTests/test_getGenomeFaFile.R b/inst/unitTests/test_getGenomeFaFile.R
deleted file mode 100644
index 2dbd0b6..0000000
--- a/inst/unitTests/test_getGenomeFaFile.R
+++ /dev/null
@@ -1,49 +0,0 @@
-library(EnsDb.Hsapiens.v75)
-edb <- EnsDb.Hsapiens.v75
-
-notrun_test_getGenomeFaFile <- function(){
-    library(EnsDb.Hsapiens.v82)
-    edb <- EnsDb.Hsapiens.v82
-
-    ## We know that there is no Fasta file for that Ensembl release available.
-    Fa <- getGenomeFaFile(edb)
-    ## Got the one from Ensembl 81.
-    genes <- genes(edb, filter=SeqnameFilter("Y"))
-    geneSeqsFa <- getSeq(Fa, genes)
-    ## Get the transcript sequences...
-    txSeqsFa <- extractTranscriptSeqs(Fa, edb, filter=SeqnameFilter("Y"))
-
-    ## Get the TwoBitFile.
-    twob <- ensembldb:::getGenomeTwoBitFile(edb)
-    ## Get thegene sequences.
-    ## ERROR FIX BELOW WITH UPDATED VERSIONS!!!
-    geneSeqs2b <- getSeq(twob, genes)
-
-    ## Have to fix the seqnames.
-    si <- seqinfo(twob)
-    sn <- unlist(lapply(strsplit(seqnames(si), split=" ", fixed=TRUE), function(z){
-        return(z[1])
-    }))
-    seqnames(si) <- sn
-    seqinfo(twob) <- si
-
-    ## Do the same with the TwoBitFile
-    geneSeqsTB <- getSeq(twob, genes)
-
-    ## Subset to all genes that are encoded on chromosomes for which
-    ## we do have DNA sequence available.
-    genes <- genes[seqnames(genes) %in% seqnames(seqinfo(Dna))]
-
-    ## Get the gene sequences, i.e. the sequence including the sequence of
-    ## all of the gene's exons and introns.
-    geneSeqs <- getSeq(Dna, genes)
-
-    library(AnnotationHub)
-    ah <- AnnotationHub()
-    quer <- query(ah, c("release-", "Homo sapiens"))
-    ## So, I get 2bit files and toplevel stuff.
-    Test <- ah[["AH50068"]]
-
-}
-
-
diff --git a/inst/unitTests/test_get_sequence.R b/inst/unitTests/test_get_sequence.R
deleted file mode 100644
index 801bf65..0000000
--- a/inst/unitTests/test_get_sequence.R
+++ /dev/null
@@ -1,189 +0,0 @@
-library(EnsDb.Hsapiens.v75)
-edb <- EnsDb.Hsapiens.v75
-
-## That's now using the BSGenome package...
-test_extractTranscriptSeqs_with_BSGenome <- function(){
-    library(BSgenome.Hsapiens.UCSC.hg19)
-    bsg <- BSgenome.Hsapiens.UCSC.hg19
-
-    ## Changing the seqlevels tyle to UCSC
-    seqlevelsStyle(edb) <- "UCSC"
-    ZBTB <- extractTranscriptSeqs(bsg, edb, filter=GenenameFilter("ZBTB16"))
-    ## Load the sequences for one ZBTB16 transcript from FA.
-    faf <- system.file("txt/ENST00000335953.fa.gz", package="ensembldb")
-    Seqs <- readDNAStringSet(faf)
-    tx <- "ENST00000335953"
-    ## cDNA
-    checkEquals(unname(as.character(ZBTB[tx])),
-                unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
-    ## CDS
-    cBy <- cdsBy(edb, "tx", filter=TxidFilter(tx))
-    CDS <- extractTranscriptSeqs(bsg, cBy)
-    checkEquals(unname(as.character(CDS)),
-                unname(as.character(Seqs[grep(names(Seqs), pattern="cds")])))
-    ## 5' UTR
-    fBy <- fiveUTRsByTranscript(edb, filter=TxidFilter(tx))
-    UTR <- extractTranscriptSeqs(bsg, fBy)
-    checkEquals(unname(as.character(UTR)),
-                unname(as.character(Seqs[grep(names(Seqs), pattern="utr5")])))
-    ## 3' UTR
-    tBy <- threeUTRsByTranscript(edb, filter=TxidFilter(tx))
-    UTR <- extractTranscriptSeqs(bsg, tBy)
-    checkEquals(unname(as.character(UTR)),
-                unname(as.character(Seqs[grep(names(Seqs), pattern="utr3")])))
-
-
-    ## Another gene on the reverse strand:
-    faf <- system.file("txt/ENST00000200135.fa.gz", package="ensembldb")
-    Seqs <- readDNAStringSet(faf)
-    tx <- "ENST00000200135"
-    ## cDNA
-    cDNA <- extractTranscriptSeqs(bsg, edb, filter=TxidFilter(tx))
-    checkEquals(unname(as.character(cDNA)),
-                unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
-    ## do the same, but from other strand
-    exns <- exonsBy(edb, "tx", filter=TxidFilter(tx))
-    cDNA <- extractTranscriptSeqs(bsg, exns)
-    checkEquals(unname(as.character(cDNA)),
-                unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
-    strand(exns) <- "+"
-    cDNA <- extractTranscriptSeqs(bsg, exns)
-    checkTrue(unname(as.character(cDNA)) !=
-              unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
-    ## CDS
-    cBy <- cdsBy(edb, "tx", filter=TxidFilter(tx))
-    CDS <- extractTranscriptSeqs(bsg, cBy)
-    checkEquals(unname(as.character(CDS)),
-                unname(as.character(Seqs[grep(names(Seqs), pattern="cds")])))
-    ## 5' UTR
-    fBy <- fiveUTRsByTranscript(edb, filter=TxidFilter(tx))
-    UTR <- extractTranscriptSeqs(bsg, fBy)
-    checkEquals(unname(as.character(UTR)),
-                unname(as.character(Seqs[grep(names(Seqs), pattern="utr5")])))
-    ## 3' UTR
-    tBy <- threeUTRsByTranscript(edb, filter=TxidFilter(tx))
-    UTR <- extractTranscriptSeqs(bsg, tBy)
-    checkEquals(unname(as.character(UTR)),
-                unname(as.character(Seqs[grep(names(Seqs), pattern="utr3")])))
-}
-
-
-notrun_test_extractTranscriptSeqs <- function(){
-    ## Note: we can't run that by default as we can not assume everybody has
-    ## AnnotationHub and the required ressource installed.
-    ## That's how we want to test the transcript seqs.
-    genome <- getGenomeFaFile(edb)
-    ZBTB <- extractTranscriptSeqs(genome, edb, filter=GenenameFilter("ZBTB16"))
-    ## Load the sequences for one ZBTB16 transcript from FA.
-    faf <- system.file("txt/ENST00000335953.fa.gz", package="ensembldb")
-    Seqs <- readDNAStringSet(faf)
-    tx <- "ENST00000335953"
-    ## cDNA
-    checkEquals(unname(as.character(ZBTB[tx])),
-                unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
-    ## CDS
-    cBy <- cdsBy(edb, "tx", filter=TxidFilter(tx))
-    CDS <- extractTranscriptSeqs(genome, cBy)
-    checkEquals(unname(as.character(CDS)),
-                unname(as.character(Seqs[grep(names(Seqs), pattern="cds")])))
-    ## 5' UTR
-    fBy <- fiveUTRsByTranscript(edb, filter=TxidFilter(tx))
-    UTR <- extractTranscriptSeqs(genome, fBy)
-    checkEquals(unname(as.character(UTR)),
-                unname(as.character(Seqs[grep(names(Seqs), pattern="utr5")])))
-    ## 3' UTR
-    tBy <- threeUTRsByTranscript(edb, filter=TxidFilter(tx))
-    UTR <- extractTranscriptSeqs(genome, tBy)
-    checkEquals(unname(as.character(UTR)),
-                unname(as.character(Seqs[grep(names(Seqs), pattern="utr3")])))
-
-
-    ## Another gene on the reverse strand:
-    faf <- system.file("txt/ENST00000200135.fa.gz", package="ensembldb")
-    Seqs <- readDNAStringSet(faf)
-    tx <- "ENST00000200135"
-    ## cDNA
-    cDNA <- extractTranscriptSeqs(genome, edb, filter=TxidFilter(tx))
-    checkEquals(unname(as.character(cDNA)),
-                unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
-    ## do the same, but from other strand
-    exns <- exonsBy(edb, "tx", filter=TxidFilter(tx))
-    cDNA <- extractTranscriptSeqs(genome, exns)
-    checkEquals(unname(as.character(cDNA)),
-                unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
-    strand(exns) <- "+"
-    cDNA <- extractTranscriptSeqs(genome, exns)
-    checkTrue(unname(as.character(cDNA)) !=
-              unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
-    ## CDS
-    cBy <- cdsBy(edb, "tx", filter=TxidFilter(tx))
-    CDS <- extractTranscriptSeqs(genome, cBy)
-    checkEquals(unname(as.character(CDS)),
-                unname(as.character(Seqs[grep(names(Seqs), pattern="cds")])))
-    ## 5' UTR
-    fBy <- fiveUTRsByTranscript(edb, filter=TxidFilter(tx))
-    UTR <- extractTranscriptSeqs(genome, fBy)
-    checkEquals(unname(as.character(UTR)),
-                unname(as.character(Seqs[grep(names(Seqs), pattern="utr5")])))
-    ## 3' UTR
-    tBy <- threeUTRsByTranscript(edb, filter=TxidFilter(tx))
-    UTR <- extractTranscriptSeqs(genome, tBy)
-    checkEquals(unname(as.character(UTR)),
-                unname(as.character(Seqs[grep(names(Seqs), pattern="utr3")])))
-}
-
-notrun_test_getCdsSequence <- function(){
-    ## That's when we like to get the sequence from the coding region.
-    genome <- getGenomeFaFile(edb)
-    tx <- extractTranscriptSeqs(genome, edb, filter=SeqnameFilter("Y"))
-    cdsSeq <- extractTranscriptSeqs(genome, cdsBy(edb, filter=SeqnameFilter("Y")))
-    ## that's basically to get the CDS sequence.
-    ## UTR sequence:
-    tutr <- extractTranscriptSeqs(genome, threeUTRsByTranscript(edb, filter=SeqnameFilter("Y")))
-    futr <- extractTranscriptSeqs(genome, fiveUTRsByTranscript(edb, filter=SeqnameFilter("Y")))
-    theTx <- "ENST00000602770"
-    fullSeq <- as.character(tx[theTx])
-    ## build the one from 5', cds and 3'
-    compSeq <- ""
-    if(any(names(futr) == theTx))
-        compSeq <- paste0(compSeq, as.character(futr[theTx]))
-    if(any(names(cdsSeq) == theTx))
-        compSeq <- paste0(compSeq, as.character(cdsSeq[theTx]))
-    if(any(names(tutr) == theTx))
-        compSeq <- paste(compSeq, as.character(tutr[theTx]))
-    checkEquals(unname(fullSeq), compSeq)
-}
-
-notrun_test_cds <- function(){
-    library(TxDb.Hsapiens.UCSC.hg19.knownGene)
-    txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
-    cds <- cds(txdb)
-    cby <- cdsBy(txdb, by="tx")
-
-    gr <- cby[[7]][1]
-    seqlevels(gr) <- sub(seqlevels(gr), pattern="chr", replacement="")
-    tx <- transcripts(edb, filter=GRangesFilter(gr, condition="overlapping"))
-    cby[[7]]
-
-    ## Note: so that fits! And we've to include the stop_codon feature for GTF import!
-    ## Make an TxDb from GTF:
-    gtf <- "/Users/jo/Projects/EnsDbs/75/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz"
-    library(GenomicFeatures)
-    Test <- makeTxDbFromGFF(gtf, format="gtf", organism="Homo sapiens")
-    scds <- cdsBy(Test, by="tx")
-    gr <- scds[[7]][1]
-    tx <- transcripts(edb, filter=GRangesFilter(gr, condition="overlapping"))
-    scds[[7]]
-    ## Compare:
-    ## TxDb form GTF has: 865692 879533
-    ## EnsDb: 865692 879533
-
-    ## Next test:
-    gr <- scds[[2]][1]
-    tx <- transcripts(edb, filter=GRangesFilter(gr, condition="overlapping"))
-    tx
-    scds[[2]]
-    ## start_codon: 367659 367661, stop_codon: 368595 368597 CDS: 367659 368594.
-    ## TxDb from GTF includes the stop_codon!
-}
-
diff --git a/inst/unitTests/test_mysql.R b/inst/unitTests/test_mysql.R
deleted file mode 100644
index e1ba213..0000000
--- a/inst/unitTests/test_mysql.R
+++ /dev/null
@@ -1,24 +0,0 @@
-############################################################
-## Can not perform these tests right away, as they require a
-## working MySQL connection.
-library(EnsDb.Hsapiens.v75)
-edb <- EnsDb.Hsapiens.v75
-
-dontrun_test_useMySQL <- function() {
-    edb_mysql <- useMySQL(edb, user = "anonuser", host = "localhost", pass = "")
-}
-
-dontrun_test_connect_EnsDb <- function() {
-    library(RMySQL)
-    con <- dbConnect(MySQL(), user = "anonuser", host = "localhost", pass = "")
-
-    ensembldb:::listEnsDbs(dbcon = con)
-    ## just with user.
-    ensembldb:::listEnsDbs(user = "anonuser", host = "localhost", pass = "",
-                           port = 3306)
-
-    ## Connecting directly to a EnsDb MySQL database.
-    con <- dbConnect(MySQL(), user = "anonuser", host = "localhost", pass = "",
-                     dbname = "ensdb_hsapiens_v75")
-    edb_mysql <- EnsDb(con)
-}
diff --git a/inst/unitTests/test_ordering.R b/inst/unitTests/test_ordering.R
deleted file mode 100644
index 2e4f0b4..0000000
--- a/inst/unitTests/test_ordering.R
+++ /dev/null
@@ -1,280 +0,0 @@
-############################################################
-## Some tests on the ordering/sorting of the results.
-library(EnsDb.Hsapiens.v75)
-edb <- EnsDb.Hsapiens.v75
-
-## Compare the results for genes call with and without ordering in R
-test_ordering_genes <- function() {
-    orig <- ensembldb:::orderResultsInR(edb)
-    ensembldb:::orderResultsInR(edb) <- FALSE
-    res_sql <- genes(edb, return.type = "data.frame")
-    ensembldb:::orderResultsInR(edb) <- TRUE
-    res_r <- genes(edb, return.type = "data.frame")
-    rownames(res_sql) <- NULL
-    rownames(res_r) <- NULL
-    checkIdentical(res_sql, res_r)
-    ## Join tx table
-    ensembldb:::orderResultsInR(edb) <- FALSE
-    res_sql <- genes(edb, columns = c("gene_id", "tx_id"),
-                     return.type = "data.frame")
-    ensembldb:::orderResultsInR(edb) <- TRUE
-    res_r <- genes(edb, columns = c("gene_id", "tx_id"),
-                   return.type = "data.frame")
-    rownames(res_sql) <- NULL
-    rownames(res_r) <- NULL
-    checkIdentical(res_sql, res_r)
-    ## Join tx table and use an SeqnameFilter
-    ensembldb:::orderResultsInR(edb) <- FALSE
-    res_sql <- genes(edb, columns = c("gene_id", "tx_id"),
-                     filter = SeqnameFilter("Y"))
-    ensembldb:::orderResultsInR(edb) <- TRUE
-    res_r <- genes(edb, columns = c("gene_id", "tx_id"),
-                   filter = SeqnameFilter("Y"))
-    checkIdentical(res_sql, res_r)
-
-    ensembldb:::orderResultsInR(edb) <- orig
-}
-
-dontrun_benchmark_ordering_genes <- function() {
-    .withR <- function(x, ...) {
-        ensembldb:::orderResultsInR(x) <- TRUE
-        genes(x, ...)
-    }
-    .withSQL <- function(x, ...) {
-        ensembldb:::orderResultsInR(x) <- FALSE
-        genes(x, ...)
-    }
-    library(microbenchmark)
-    microbenchmark(.withR(edb), .withSQL(edb), times = 10)  ## same
-    microbenchmark(.withR(edb, columns = c("gene_id", "tx_id")),
-                   .withSQL(edb, columns = c("gene_id", "tx_id")),
-                   times = 10)  ## R slightly faster.
-    microbenchmark(.withR(edb, columns = c("gene_id", "tx_id"),
-                          SeqnameFilter("Y")),
-                   .withSQL(edb, columns = c("gene_id", "tx_id"),
-                            SeqnameFilter("Y")),
-                   times = 10)  ## same.
-}
-
-## We aim to fix issue #11 by performing the ordering in R instead
-## of SQL. Thus, we don't want to run this as a "regular" test
-## case.
-dontrun_test_ordering_cdsBy <- function() {
-    doBench <- FALSE
-    if (doBench)
-        library(microbenchmark)
-    .withR <- function(x, ...) {
-        ensembldb:::orderResultsInR(x) <- TRUE
-        cdsBy(x, ...)
-    }
-    .withSQL <- function(x, ...) {
-        ensembldb:::orderResultsInR(x) <- FALSE
-        cdsBy(x, ...)
-    }
-    res_sql <- .withSQL(edb)
-    res_r <- .withR(edb)
-    checkEquals(res_sql, res_r)
-    if (dobench)
-        microbenchmark(.withSQL(edb), .withR(edb),
-                       times = 3)  ## R slightly faster.
-    res_sql <- .withSQL(edb, filter = SeqnameFilter("Y"))
-    res_r <- .withR(edb, filter = SeqnameFilter("Y"))
-    checkEquals(res_sql, res_r)
-    if (dobench)
-        microbenchmark(.withSQL(edb, filter = SeqnameFilter("Y")),
-                       .withR(edb, filter = SeqnameFilter("Y")),
-                       times = 10)  ## R 6x faster.
-}
-
-dontrun_test_ordering_exonsBy <- function() {
-    doBench <- FALSE
-    if (doBench)
-        library(microbenchmark)
-    .withR <- function(x, ...) {
-        ensembldb:::orderResultsInR(x) <- TRUE
-        exonsBy(x, ...)
-    }
-    .withSQL <- function(x, ...) {
-        ensembldb:::orderResultsInR(x) <- FALSE
-        exonsBy(x, ...)
-    }
-    res_sql <- .withSQL(edb)
-    res_r <- .withR(edb)
-    checkEquals(res_sql, res_r)
-    if (doBench)
-        microbenchmark(.withSQL(edb), .withR(edb),
-                       times = 3)  ## about the same; R slightly faster.
-    ## with using a SeqnameFilter in addition.
-    res_sql <- .withSQL(edb, filter = SeqnameFilter("Y"))
-    res_r <- .withR(edb, filter = SeqnameFilter("Y")) ## query takes longer.
-    checkEquals(res_sql, res_r)
-    if (doBench)
-        microbenchmark(.withSQL(edb, filter = SeqnameFilter("Y")),
-                       .withR(edb, filter = SeqnameFilter("Y")),
-                       times = 3)  ## SQL twice as fast.
-    ## Now getting stuff by gene
-    res_sql <- .withSQL(edb, by = "gene")
-    res_r <- .withR(edb, by = "gene")
-    ## checkEquals(res_sql, res_r) ## Differences due to ties
-    if (doBench)
-        microbenchmark(.withSQL(edb, by = "gene"),
-                       .withR(edb, by = "gene"),
-                       times = 3)  ## SQL faster; ???
-    ## Along with a SeqnameFilter
-    res_sql <- .withSQL(edb, by = "gene", filter = SeqnameFilter("Y"))
-    res_r <- .withR(edb, by = "gene", filter = SeqnameFilter("Y"))
-    ## Why does the query take longer for R???
-    ## checkEquals(res_sql, res_r) ## Differences due to ties
-    if (doBench)
-        microbenchmark(.withSQL(edb, by = "gene", filter = SeqnameFilter("Y")),
-                       .withR(edb, by = "gene", filter = SeqnameFilter("Y")),
-                       times = 3)  ## SQL faster.
-    ## Along with a GenebiotypeFilter
-    if (doBench)
-        microbenchmark(.withSQL(edb, by = "gene", filter = GenebiotypeFilter("protein_coding"))
-                     , .withR(edb, by = "gene", filter = GenebiotypeFilter("protein_coding"))
-                     , times = 3)
-}
-
-dontrun_test_ordering_transcriptsBy <- function() {
-    .withR <- function(x, ...) {
-        ensembldb:::orderResultsInR(x) <- TRUE
-        transcriptsBy(x, ...)
-    }
-    .withSQL <- function(x, ...) {
-        ensembldb:::orderResultsInR(x) <- FALSE
-        transcriptsBy(x, ...)
-    }
-    res_sql <- .withSQL(edb)
-    res_r <- .withR(edb)
-    checkEquals(res_sql, res_r)
-    microbenchmark(.withSQL(edb), .withR(edb), times = 3) ## same speed
-
-    res_sql <- .withSQL(edb, filter = SeqnameFilter("Y"))
-    res_r <- .withR(edb, filter = SeqnameFilter("Y"))
-    checkEquals(res_sql, res_r)
-    microbenchmark(.withSQL(edb, filter = SeqnameFilter("Y")),
-                   .withR(edb, filter = SeqnameFilter("Y")),
-                   times = 3) ## SQL slighly faster.
-}
-
-dontrun_query_tune <- function() {
-    ## Query tuning:
-    library(RSQLite)
-    con <- dbconn(edb)
-
-    Q <- "select distinct tx2exon.exon_id,exon.exon_seq_start,exon.exon_seq_end,gene.seq_name,tx2exon.tx_id,gene.seq_strand,tx2exon.exon_idx from gene join tx on (gene.gene_id=tx.gene_id) join tx2exon on (tx.tx_id=tx2exon.tx_id) join exon on (tx2exon.exon_id=exon.exon_id) where gene.seq_name = 'Y'"
-    system.time(dbGetQuery(con, Q))
-
-    Q2 <- "select distinct tx2exon.exon_id,exon.exon_seq_start,exon.exon_seq_end,gene.seq_name,tx2exon.tx_id,gene.seq_strand,tx2exon.exon_idx from exon join tx2exon on (tx2exon.exon_id = exon.exon_id) join tx on (tx2exon.tx_id = tx.tx_id) join gene on (gene.gene_id=tx.gene_id) where gene.seq_name = 'Y'"
-    system.time(dbGetQuery(con, Q2))
-
-    Q3 <- "select distinct tx2exon.exon_id,exon.exon_seq_start,exon.exon_seq_end,gene.seq_name,tx2exon.tx_id,gene.seq_strand,tx2exon.exon_idx from tx2exon join exon on (tx2exon.exon_id = exon.exon_id) join tx on (tx2exon.tx_id = tx.tx_id) join gene on (gene.gene_id=tx.gene_id) where gene.seq_name = 'Y'"
-    system.time(dbGetQuery(con, Q3))
-
-    Q4 <- "select distinct tx2exon.exon_id,exon.exon_seq_start,exon.exon_seq_end,gene.seq_name,tx2exon.tx_id,gene.seq_strand,tx2exon.exon_idx from tx2exon join exon on (tx2exon.exon_id = exon.exon_id) join tx on (tx2exon.tx_id = tx.tx_id) join gene on (gene.gene_id=tx.gene_id) where gene.seq_name = 'Y' order by tx.tx_id"
-    system.time(dbGetQuery(con, Q4))
-
-    Q5 <- "select distinct tx2exon.exon_id,exon.exon_seq_start,exon.exon_seq_end,gene.seq_name,tx2exon.tx_id,gene.seq_strand,tx2exon.exon_idx from tx2exon inner join exon on (tx2exon.exon_id = exon.exon_id) inner join tx on (tx2exon.tx_id = tx.tx_id) inner join gene on (gene.gene_id=tx.gene_id) where gene.seq_name = 'Y' order by tx.tx_id"
-    system.time(dbGetQuery(con, Q5))
-
-    Q6 <- "select distinct tx2exon.exon_id,exon.exon_seq_start,exon.exon_seq_end,gene.seq_name,tx2exon.tx_id,gene.seq_strand,tx2exon.exon_idx from gene inner join tx on (gene.gene_id=tx.gene_id) inner join tx2exon on (tx.tx_id=tx2exon.tx_id) inner join exon on (tx2exon.exon_id=exon.exon_id) where gene.seq_name = 'Y' order by tx.tx_id asc"
-    system.time(dbGetQuery(con, Q6))
-}
-
-
-## Compare the performance of doing the sorting within R or
-## directly in the SQL query.
-dontrun_test_ordering_performance <- function() {
-
-    library(RUnit)
-    library(RSQLite)
-    ## gene table: order by in SQL query vs R:
-    db_con <- dbconn(edb)
-
-    .callWithOrder <- function(con, query, orderBy = "",
-                               orderSQL = TRUE) {
-        if (all(orderBy == ""))
-            orderBy <- NULL
-        if (orderSQL & !is.null(orderBy)) {
-            orderBy <- paste(orderBy, collapse = ", ")
-            query <- paste0(query, " order by ", orderBy)
-        }
-        res <- dbGetQuery(con, query)
-        if (!orderSQL & !all(is.null(orderBy))) {
-            if (!all(orderBy %in% colnames(res)))
-                stop("orderBy not in columns!")
-            ## Do the ordering in R
-            res <- res[do.call(order,
-                               c(list(method = "radix"),
-                                 as.list(res[, orderBy, drop = FALSE]))), ]
-        }
-        rownames(res) <- NULL
-        return(res)
-    }
-
-    #######################
-    ## gene table
-    ## Simple condition
-    the_q <- "select * from gene"
-    system.time(res1 <- .callWithOrder(db_con, query = the_q))
-    system.time(res2 <- .callWithOrder(db_con, query = the_q,
-                                       orderSQL = FALSE))
-    checkIdentical(res1, res2)
-    ## order by gene_id
-    orderBy <- "gene_id"
-    system.time(res1 <- .callWithOrder(db_con, query = the_q, orderBy = orderBy))
-    system.time(res2 <- .callWithOrder(db_con, query = the_q,
-                                       orderBy = orderBy, orderSQL = FALSE))
-    ## SQL: 0.16, R: 0.164.
-    checkIdentical(res1, res2)
-    ## order by gene_name
-    orderBy <- "gene_name"
-    system.time(res1 <- .callWithOrder(db_con, query = the_q, orderBy = orderBy))
-    system.time(res2 <- .callWithOrder(db_con, query = the_q,
-                                       orderBy = orderBy, orderSQL = FALSE))
-    checkIdentical(res1, res2)
-    ## SQL: 0.245, R: 0.185
-    ## sort by gene_name and gene_seq_start
-    orderBy <- c("gene_name", "gene_seq_start")
-    system.time(res1 <- .callWithOrder(db_con, query = the_q, orderBy = orderBy))
-    system.time(res2 <- .callWithOrder(db_con, query = the_q,
-                                       orderBy = orderBy, orderSQL = FALSE))
-    ## SQL: 0.26, R: 0.188
-    checkEquals(res1, res2)
-    ## with subsetting:
-    the_q <- "select * from gene where seq_name in ('5', 'Y')"
-    orderBy <- c("gene_name", "gene_seq_start")
-    system.time(res1 <- .callWithOrder(db_con, query = the_q, orderBy = orderBy))
-    system.time(res2 <- .callWithOrder(db_con, query = the_q,
-                                       orderBy = orderBy, orderSQL = FALSE))
-    ## SQL: 0.031, R: 0.024
-    checkEquals(res1, res2)
-
-    ########################
-    ## joining tables.
-    the_q <- paste0("select * from gene join tx on (gene.gene_id = tx.gene_id)",
-                    " join tx2exon on (tx.tx_id = tx2exon.tx_id)")
-    orderBy <- c("tx_id", "exon_id")
-    system.time(res1 <- .callWithOrder(db_con, query = the_q, orderBy = orderBy))
-    system.time(res2 <- .callWithOrder(db_con, query = the_q,
-                                       orderBy = orderBy, orderSQL = FALSE))
-    ## SQL: 9.6, R: 9.032
-    checkEquals(res1, res2)
-    ## subsetting.
-    the_q <- paste0("select * from gene join tx on (gene.gene_id = tx.gene_id)",
-                    " join tx2exon on (tx.tx_id = tx2exon.tx_id) where",
-                    " seq_name = 'Y'")
-    orderBy <- c("tx_id", "exon_id")
-    system.time(res1 <- .callWithOrder(db_con, query = the_q, orderBy = orderBy))
-    system.time(res2 <- .callWithOrder(db_con, query = the_q,
-                                       orderBy = orderBy, orderSQL = FALSE))
-    ## SQL: 0.9, R: 1.6
-    checkEquals(res1, res2)
-}
-
-## implement:
-## .checkOrderBy: checks order.by argument removing columns that are
-## not present in the database
-## orderBy columns are added to the columns.
-## .orderDataFrameBy: orders the dataframe by the specified columns.
diff --git a/inst/unitTests/test_performance.R b/inst/unitTests/test_performance.R
deleted file mode 100644
index 057c3d0..0000000
--- a/inst/unitTests/test_performance.R
+++ /dev/null
@@ -1,62 +0,0 @@
-############################################################
-## These are not test cases to be executed, but performance
-## comparisons.
-library(EnsDb.Hsapiens.v75)
-edb <- EnsDb.Hsapiens.v75
-
-
-############################################################
-## Compare MySQL vs SQLite backends:
-## Amazing how inefficient the MySQL backend seems to be! Most
-## likely it's due to RMySQL, not MySQL.
-dontrun_test_MySQL_vs_SQLite <- function() {
-    ## Compare the performance of the MySQL backend against
-    ## the SQLite backend.
-    edb_mysql <- useMySQL(edb, user = "anonuser", pass = "")
-
-    library(microbenchmark)
-    ## genes
-    microbenchmark(genes(edb), genes(edb_mysql), times = 5)
-    microbenchmark(genes(edb, filter = GenebiotypeFilter("lincRNA")),
-                   genes(edb_mysql, filter = GenebiotypeFilter("lincRNA")),
-                   times = 5)
-    microbenchmark(genes(edb, filter = SeqnameFilter(20:23)),
-                   genes(edb_mysql, filter = SeqnameFilter(20:23)),
-                   times = 5)
-    microbenchmark(genes(edb, columns = "tx_id"),
-                   genes(edb_mysql, columns = "tx_id"),
-                   times = 5)
-    microbenchmark(genes(edb, filter = GenenameFilter("BCL2L11")),
-                   genes(edb_mysql, filter = GenenameFilter("BCL2L11")),
-                   times = 5)
-    ## transcripts
-    microbenchmark(transcripts(edb),
-                   transcripts(edb_mysql),
-                   times = 5)
-    microbenchmark(transcripts(edb, filter = GenenameFilter("BCL2L11")),
-                   transcripts(edb_mysql, filter = GenenameFilter("BCL2L11")),
-                   times = 5)
-    ## exons
-    microbenchmark(exons(edb),
-                   exons(edb_mysql),
-                   times = 5)
-    microbenchmark(exons(edb, filter = GenenameFilter("BCL2L11")),
-                   exons(edb_mysql, filter = GenenameFilter("BCL2L11")),
-                   times = 5)
-    ## exonsBy
-    microbenchmark(exonsBy(edb),
-                   exonsBy(edb_mysql),
-                   times = 5)
-    microbenchmark(exonsBy(edb, filter = SeqnameFilter("Y")),
-                   exonsBy(edb_mysql, filter = SeqnameFilter("Y")),
-                   times = 5)
-    ## cdsBy
-    microbenchmark(cdsBy(edb), cdsBy(edb_mysql), times = 5)
-    microbenchmark(cdsBy(edb, by = "gene"), cdsBy(edb_mysql, by = "gene"),
-                   times = 5)
-    microbenchmark(cdsBy(edb, filter = SeqstrandFilter("-")),
-                   cdsBy(edb_mysql, filter = SeqstrandFilter("-")),
-                   times = 5)
-
-}
-
diff --git a/inst/unitTests/test_select.R b/inst/unitTests/test_select.R
deleted file mode 100644
index 17ff834..0000000
--- a/inst/unitTests/test_select.R
+++ /dev/null
@@ -1,229 +0,0 @@
-####============================================================
-##  test cases for AnnotationDbi methods.
-##
-####------------------------------------------------------------
-library(EnsDb.Hsapiens.v75)
-edb <- EnsDb.Hsapiens.v75
-
-test_columns <- function(){
-    cols <- columns(edb)
-    ## Don't expect to see any _ there...
-    checkEquals(length(grep(cols, pattern="_")), 0)
-}
-
-test_keytypes <- function(){
-    keyt <- keytypes(edb)
-    checkEquals(all(c("GENEID", "EXONID", "TXID") %in% keyt), TRUE)
-}
-
-test_mapper <- function(){
-    Test <- ensembldb:::ensDbColumnForColumn(edb, "GENEID")
-    checkEquals(unname(Test), "gene_id")
-
-    Test <- ensembldb:::ensDbColumnForColumn(edb, c("GENEID", "TXID"))
-    checkEquals(unname(Test), c("gene_id", "tx_id"))
-
-    Test <- ensembldb:::ensDbColumnForColumn(edb, c("GENEID", "TXID", "bla"))
-    checkEquals(unname(Test), c("gene_id", "tx_id"))
-}
-
-test_keys <- function(){
-    ## get all gene ids
-    system.time(
-        ids <- keys(edb, "GENEID")
-    )
-    checkEquals(length(ids), length(unique(ids)))
-    ## get all tx ids
-    system.time(
-        ids <- keys(edb, "TXID")
-    )
-    ## Get the TXNAME...
-    nms <- keys(edb, "TXNAME")
-    checkEquals(nms, ids)
-    checkEquals(length(ids), length(unique(ids)))
-    ## get all gene names
-    system.time(
-        ids <- keys(edb, "GENENAME")
-    )
-    checkEquals(length(ids), length(unique(ids)))
-    ## get all seq names
-    system.time(
-        ids <- keys(edb, "SEQNAME")
-    )
-    checkEquals(length(ids), length(unique(ids)))
-    ## get all seq strands
-    system.time(
-        ids <- keys(edb, "SEQSTRAND")
-    )
-    checkEquals(length(ids), length(unique(ids)))
-    ## get all gene biotypes
-    system.time(
-        ids <- keys(edb, "GENEBIOTYPE")
-    )
-    checkEquals(ids, listGenebiotypes(edb))
-}
-
-test_select <- function(){
-    ## Test:
-    ## Provide GenenameFilter.
-    gf <- GenenameFilter("BCL2")
-    system.time(
-        Test <- select(edb, keys=gf)
-    )
-    ## Provide list of GenenameFilter and TxbiotypeFilter.
-    Test2 <- select(edb, keys=list(gf, TxbiotypeFilter("protein_coding")))
-    checkEquals(Test$EXONID[Test$TXBIOTYPE == "protein_coding"], Test2$EXONID)
-    ## Choose selected columns.
-    Test3 <- select(edb, keys=gf, columns=c("GENEID", "GENENAME", "SEQNAME"))
-    checkEquals(unique(Test[, c("GENEID", "GENENAME", "SEQNAME")]), Test3)
-    ## Provide keys.
-    Test4 <- select(edb, keys="BCL2", keytype="GENENAME")
-    checkEquals(Test[, colnames(Test4)], Test4)
-    txs <- keys(edb, "TXID")
-    ## Just get stuff from the tx table; should be faster.
-    system.time(
-        Test <- select(edb, keys=txs, columns=c("TXID", "TXBIOTYPE", "GENEID"), keytype="TXID")
-    )
-    checkEquals(all(Test$TXID==txs), TRUE)
-    ## Get all lincRNA genes
-    Test <- select(edb, keys="lincRNA", columns=c("GENEID", "GENEBIOTYPE", "GENENAME"),
-                   keytype="GENEBIOTYPE")
-    Test2 <- select(edb, keys=GenebiotypeFilter("lincRNA"),
-                    columns=c("GENEID", "GENEBIOTYPE", "GENENAME"))
-    checkEquals(Test[, colnames(Test2)], Test2)
-    ## All on chromosome 21
-    Test <- select(edb, keys="21", columns=c("GENEID", "GENEBIOTYPE", "GENENAME"),
-                   keytype="SEQNAME")
-    Test2 <- select(edb, keys=SeqnameFilter("21"), columns=c("GENEID", "GENEBIOTYPE", "GENENAME"))
-    checkEquals(Test[, colnames(Test2)], Test2)
-    ## What if we can't find it?
-    Test <- select(edb, keys="bla", columns=c("GENEID", "GENENAME"), keytype="GENENAME")
-    ## Run the full thing.
-    ## system.time(
-    ##     All <- select(edb)
-    ## )
-    ## Test <- select(edb, keys=txs, keytype="TXID")
-    ## checkEquals(Test, All)
-    Test <- select(edb, keys="ENST00000000233", columns=c("GENEID", "GENENAME"), keytype="TXNAME")
-    checkEquals(Test$TXNAME, "ENST00000000233")
-    ## Check what happens if we just add TXNAME and also TXID.
-    Test2 <- select(edb, keys=list(gf, TxbiotypeFilter("protein_coding")), columns=c("TXID", "TXNAME",
-                                                                                     "GENENAME", "GENEID"))
-
-}
-
-test_mapIds <- function(){
-    ## Simple... map gene ids to gene names
-    allgenes <- keys(edb, keytype="GENEID")
-    randordergenes <- allgenes[sample(1:length(allgenes), 100)]
-    system.time(
-        mi <- mapIds(edb, keys=allgenes, keytype="GENEID", column = "GENENAME")
-    )
-    checkEquals(allgenes, names(mi))
-    ## What happens if the ordering is different:
-    mi <- mapIds(edb, keys=randordergenes, keytype="GENEID", column = "GENENAME")
-    checkEquals(randordergenes, names(mi))
-
-    ## Now check the different options:
-    ## Handle multi mappings.
-    ## first
-    first <- mapIds(edb, keys=randordergenes, keytype="GENEID", column="TXID")
-    checkEquals(names(first), randordergenes)
-    ## list
-    lis <- mapIds(edb, keys=randordergenes, keytype="GENEID", column="TXID", multiVals="list")
-    checkEquals(names(lis), randordergenes)
-    Test <- lapply(lis, function(z){return(z[1])})
-    checkEquals(first, unlist(Test))
-    ## filter
-    filt <- mapIds(edb, keys=randordergenes, keytype="GENEID", column="TXID", multiVals="filter")
-    checkEquals(filt, unlist(lis[unlist(lapply(lis, length)) == 1]))
-    ## asNA
-    asNA <- mapIds(edb, keys=randordergenes, keytype="GENEID", column="TXID", multiVals="asNA")
-
-    ## Check what happens if we provide 2 identical keys.
-    Test <- mapIds(edb, keys=c("BCL2", "BCL2L11", "BCL2"), keytype="GENENAME", column="TXID")
-
-    ## Submit Filter:
-    Test <- mapIds(edb, keys=SeqnameFilter("Y"), column="GENEID", multiVals="list")
-    TestS <- select(edb, keys=Test[[1]], columns="SEQNAME", keytype="GENEID")
-    checkEquals(unique(TestS$SEQNAME), "Y")
-    ## Submit 2 filter.
-    Test <- mapIds(edb, keys=list(SeqnameFilter("Y"), SeqstrandFilter("-")), multiVals="list",
-                   column="GENEID")
-    TestS <- select(edb, keys=Test[[1]], keytype="GENEID", columns=c("SEQNAME", "SEQSTRAND"))
-    checkTrue(all(TestS$SEQNAME == "Y"))
-    checkTrue(all(TestS$SEQSTRAND == -1))
-}
-
-## Test if the results are properly sorted if we submit a single filter or just keys.
-test_select_sorted <- function() {
-    ks <- c("ZBTB16", "BCL2", "SKA2", "BCL2L11")
-    ## gene_name
-    res <- select(edb, keys = ks, keytype = "GENENAME")
-    checkEquals(unique(res$GENENAME), ks)
-    res <- select(edb, keys = GenenameFilter(ks))
-    checkEquals(unique(res$GENENAME), ks)
-
-    ## Using two filters;
-    res <- select(edb, keys = list(GenenameFilter(ks),
-                                   TxbiotypeFilter("nonsense_mediated_decay")))
-    ## We don't expect same sorting here!
-    checkTrue(!all(unique(res$GENENAME) == ks[ks %in% unique(res$GENENAME)]))
-
-    ## symbol
-    res <- select(edb, keys = ks, keytype = "SYMBOL",
-                  columns = c("GENENAME", "SYMBOL", "SEQNAME"))
-
-    ## tx_biotype
-    ks <- c("retained_intron", "nonsense_mediated_decay")
-    res <- select(edb, keys = ks, keytype = "TXBIOTYPE",
-                  columns = c("GENENAME", "TXBIOTYPE"))
-    checkEquals(unique(res$TXBIOTYPE), ks)
-    res <- select(edb, keys = TxbiotypeFilter(ks),
-                  keytype = "TXBIOTYPE", columns = c("GENENAME", "TXBIOTYPE"))
-    checkEquals(unique(res$TXBIOTYPE), ks)
-}
-
-test_select_symbol <- function() {
-    ## Can I use SYMBOL as keytype?
-    ks <- c("ZBTB16", "BCL2", "SKA2", "BCL2L11")
-    res <- select(edb, keys = ks, keytype = "GENENAME")
-    res2 <- select(edb, keys = ks, keytype = "SYMBOL")
-    checkEquals(res, res2)
-
-    ## Can I use the SymbolFilter?
-    res <- select(edb, keys = GenenameFilter(ks),
-                  columns = c("TXNAME", "SYMBOL", "GENEID"))
-    checkEquals(colnames(res), c("TXNAME", "SYMBOL", "GENEID", "GENENAME"))
-
-    res <- select(edb, keys = SymbolFilter(ks), columns=c("GENEID"))
-    checkEquals(colnames(res), c("GENEID", "SYMBOL"))
-    checkEquals(res$SYMBOL, ks)
-
-    ## Can I ask for SYMBOL?
-    res <- select(edb, keys = list(SeqnameFilter("Y"),
-                                   GenebiotypeFilter("lincRNA")),
-                  columns = c("GENEID", "SYMBOL"))
-    checkEquals(colnames(res), c("GENEID", "SYMBOL", "SEQNAME", "GENEBIOTYPE"))
-}
-
-test_select_symbol_n_txname <- function() {
-    ks <- c("ZBTB16", "BCL2", "SKA2")
-    ## Symbol allowed in keytype
-    res <- select(edb, keys = ks, keytype = "SYMBOL", columns = "GENENAME")
-    checkEquals(colnames(res), c("SYMBOL", "GENENAME"))
-    checkEquals(res$SYMBOL, ks)
-
-    ## Symbol using SymbolFilter
-    res <- select(edb, keys = SymbolFilter(ks), columns = "GENENAME")
-    checkEquals(colnames(res), c("GENENAME", "SYMBOL"))
-    checkEquals(res$SYMBOL, ks)
-
-    ## Symbol as a column.
-    res <- select(edb, keys = ks, keytype = "GENENAME", columns = "SYMBOL")
-    checkEquals(colnames(res), c("GENENAME", "SYMBOL"))
-
-    ## TXNAME as a column
-    res <- select(edb, keys = ks, keytype = "GENENAME", columns = c("TXNAME"))
-    checkEquals(colnames(res), c("GENENAME", "TXNAME"))
-}
diff --git a/inst/unitTests/test_transcript_lengths.R b/inst/unitTests/test_transcript_lengths.R
deleted file mode 100644
index 3240e4b..0000000
--- a/inst/unitTests/test_transcript_lengths.R
+++ /dev/null
@@ -1,140 +0,0 @@
-####============================================================
-##  Tests related to transcript/feature length calculations.
-##
-##
-####------------------------------------------------------------
-## Loading data and stuff
-library(EnsDb.Hsapiens.v75)
-edb <- EnsDb.Hsapiens.v75
-
-## Just run that after Herve has added the mods to the transcriptLengths function.
-notyetrun_transcriptLengths <- function(){
-
-    ## With filter.
-    daFilt <- SeqnameFilter("Y")
-    allTxY <- transcripts(edb, filter=daFilt)
-    txLenY <- transcriptLengths(edb, filter=daFilt)
-    checkEquals(names(allTxY), rownames(txLenY))
-
-    ## Check if lengths are OK:
-    txLenY2 <- lengthOf(edb, "tx", filter=daFilt)
-    checkEquals(unname(txLenY2[rownames(txLenY)]), txLenY$tx_len)
-
-    ## Include the cds, 3' and 5' UTR
-    txLenY <- transcriptLengths(edb, with.cds_len = TRUE, with.utr5_len = TRUE,
-                                with.utr3_len = TRUE,
-                                filter=daFilt)
-    ## sum of 5' CDS and 3' has to match tx_len:
-    txLen <- rowSums(txLenY[, c("cds_len", "utr5_len", "utr3_len")])
-    checkEquals(txLenY[!is.na(txLen), "tx_len"], unname(txLen[!is.na(txLen)]))
-    ## just to be sure...
-    checkEquals(txLenY[!is.na(txLenY$utr3_len), "tx_len"],
-                unname(txLen[!is.na(txLenY$utr3_len)]))
-    ## Seems to be OK.
-
-    ## Next check the 5' UTR lengths: that also verifies the fiveUTR call.
-    futr <- fiveUTRsByTranscript(edb, filter=daFilt)
-    futrLen <- sum(width(futr))
-    checkEquals(unname(futrLen), txLenY[names(futrLen), "utr5_len"])
-    ## 3'
-    tutr <- threeUTRsByTranscript(edb, filter=daFilt)
-    tutrLen <- sum(width(tutr))
-    checkEquals(unname(tutrLen), txLenY[names(tutrLen), "utr3_len"])
-}
-
-notrun_compare_full <- function(){
-    ## That's on the full thing.
-    ## Test if the result has the same ordering than the transcripts call.
-    allTx <- transcripts(edb)
-    txLen <- transcriptLengths(edb, with.cds_len=TRUE, with.utr5_len=TRUE,
-                               with.utr3_len=TRUE)
-    checkEquals(names(allTx), rownames(txLen))
-    system.time(
-        futr <- fiveUTRsByTranscript(edb)
-    )
-    ## 23 secs.
-    futrLen <- sum(width(futr))  ## do I need reduce???
-    checkEquals(unname(futrLen), txLen[names(futrLen), "utr5_len"])
-    ## 3'
-    system.time(
-        tutr <- threeUTRsByTranscript(edb)
-    )
-    system.time(
-        tutrLen <- sum(width(tutr))
-    )
-    checkEquals(unname(tutrLen), txLen[names(tutrLen), "utr3_len"])
-}
-
-notrun_compare_to_genfeat <- function(){
-    library(TxDb.Hsapiens.UCSC.hg19.knownGene)
-    txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
-
-    system.time(
-        Len <- transcriptLengths(edb)
-    )
-    ## Woa, 52 sec
-    system.time(
-        txLen <- lengthOf(edb, "tx")
-    )
-    ## Faster, 31 sec
-    checkEquals(Len$tx_len, unname(txLen[rownames(Len)]))
-    system.time(
-        Len2 <- transcriptLengths(txdb)
-    )
-    ## :) 2.5 sec.
-    ## Next.
-    system.time(
-        Len <- transcriptLengths(edb, with.cds_len = TRUE)
-    )
-    ## 56 sec
-    system.time(
-        Len2 <- transcriptLengths(txdb, with.cds_len=TRUE)
-    )
-    ## 4 sec.
-
-    ## Calling the transcriptLengths of GenomicFeatures on the EnsDb.
-    system.time(
-        Def <- GenomicFeatures::transcriptLengths(edb)
-    ) ## 26.5 sec
-
-    system.time(
-        WithCds <- GenomicFeatures::transcriptLengths(edb, with.cds_len=TRUE)
-    ) ## 55 sec
-
-    system.time(
-        WithAll <- GenomicFeatures::transcriptLengths(edb, with.cds_len=TRUE,
-                                                      with.utr5_len=TRUE,
-                                                      with.utr3_len=TRUE)
-    ) ## 99 secs
-
-    ## Get my versions...
-    system.time(
-        MyDef <- ensembldb:::.transcriptLengths(edb)
-    ) ## 31 sec
-    system.time(
-        MyWithCds <- ensembldb:::.transcriptLengths(edb, with.cds_len=TRUE)
-    ) ## 44 sec
-    system.time(
-        MyWithAll <- ensembldb:::.transcriptLengths(edb, with.cds_len=TRUE,
-                                                    with.utr5_len=TRUE,
-                                                    with.utr3_len=TRUE)
-    ) ## 63 sec
-
-    ## Should be all the same!!!
-    rownames(MyDef) <- NULL
-    checkEquals(Def, MyDef)
-    ##
-    rownames(MyWithCds) <- NULL
-    MyWithCds[is.na(MyWithCds$cds_len), "cds_len"] <- 0
-    checkEquals(WithCds, MyWithCds)
-    ##
-    rownames(MyWithAll) <- NULL
-    MyWithAll[is.na(MyWithAll$cds_len), "cds_len"] <- 0
-    MyWithAll[is.na(MyWithAll$utr3_len), "utr3_len"] <- 0
-    MyWithAll[is.na(MyWithAll$utr5_len), "utr5_len"] <- 0
-    checkEquals(WithAll, MyWithAll)
-}
-
-
-
-
diff --git a/inst/unitTests/test_ucscChromosomeNames.R b/inst/unitTests/test_ucscChromosomeNames.R
deleted file mode 100644
index 096a5d6..0000000
--- a/inst/unitTests/test_ucscChromosomeNames.R
+++ /dev/null
@@ -1,508 +0,0 @@
-###================================================
-##  Here we check functionality to use EnsDbs with
-##  UCSC chromosome names
-###------------------------------------------------
-library(EnsDb.Hsapiens.v75)
-edb <- EnsDb.Hsapiens.v75
-
-## library(EnsDb.Hsapiens.v83)
-## edb <- EnsDb.Hsapiens.v83
-## library(EnsDb.Hsapiens.v81)
-
-test_seqlevels <- function(){
-    orig <- getOption("ensembldb.seqnameNotFound")
-    options(ensembldb.seqnameNotFound=NA)
-    edb <- EnsDb.Hsapiens.v75
-    SL <- seqlevels(edb)
-    ucscs <- paste0("chr", c(1:22, "X", "Y", "M"))
-    seqlevelsStyle(edb) <- "UCSC"
-    SL2 <- seqlevels(edb)
-    checkEquals(sort(ucscs), sort(SL2[!is.na(SL2)]))
-    ## Check if we throw an error message
-    options(ensembldb.seqnameNotFound="MISSING")
-    checkException(seqlevels(edb))
-    ## Check if returning original names works.
-    options(ensembldb.seqnameNotFound="ORIGINAL")
-    SL3 <- seqlevels(edb)
-    idx <- which(SL3 %in% ucscs)
-    checkEquals(sort(SL[-idx]), sort(SL3[-idx]))
-    options(ensembldb.seqnameNotFound=orig)
-}
-
-test_seqinfo <- function(){
-    edb <- EnsDb.Hsapiens.v75
-    orig <- getOption("ensembldb.seqnameNotFound")
-    options(ensembldb.seqnameNotFound="MISSING")
-    seqlevelsStyle(edb) <- "UCSC"
-    checkException(seqinfo(edb))
-    options(ensembldb.seqnameNotFound="ORIGINAL")
-    si <- seqinfo(edb)
-    options(ensembldb.seqnameNotFound=orig)
-}
-
-## Testing if getWhat returns what we expect.
-test_getWhat_seqnames <- function(){
-    orig <- getOption("ensembldb.seqnameNotFound")
-    edb <- EnsDb.Hsapiens.v75
-    seqlevelsStyle(edb) <- "Ensembl"
-    ensRes <- ensembldb:::getWhat(edb, columns=c("seq_name", "seq_strand"))
-    seqlevelsStyle(edb) <- "UCSC"
-    ucscRes <- ensembldb:::getWhat(edb, columns=c("seq_name", "seq_strand"))
-    seqlevelsStyle(edb) <- "NCBI"
-    ncbiRes <- ensembldb:::getWhat(edb, columns=c("seq_name", "seq_strand"))
-    options(ensembldb.seqnameNotFound=orig)
-}
-
-test_SeqnameFilter_seqnames <- function(){
-    orig <- getOption("ensembldb.seqnameNotFound")
-    options(ensembldb.seqnameNotFound="MISSING")
-    edb <- EnsDb.Hsapiens.v75
-    seqlevelsStyle(edb) <- "Ensembl"
-    snf <- SeqnameFilter("chrX")
-    snfEns <- SeqnameFilter(c("X", "Y"))
-    snfNo <- SeqnameFilter(c("bla", "blu"))
-    snfSomeNo <- SeqnameFilter(c("bla", "X"))
-
-    seqlevelsStyle(edb) <- "Ensembl"
-    checkEquals(value(snf), "chrX")
-    ## That makes no sense for a query though.
-    checkEquals(value(snf, edb), "chrX")
-    checkEquals(value(snfEns, edb), c("X", "Y"))
-    seqlevelsStyle(edb) <- "UCSC"
-    checkEquals(value(snf, edb), "X")
-    checkException(value(snfEns, edb))
-    checkException(value(snfNo, edb))
-    checkException(value(snfSomeNo, edb))
-
-    ## Setting the options to "ORIGINAL"
-    options(ensembldb.seqnameNotFound="ORIGINAL")
-    checkEquals(value(snf, edb), "X")
-    checkEquals(value(snfEns, edb), c("X", "Y"))
-    checkEquals(value(snfNo, edb), c("bla", "blu"))
-    checkEquals(value(snfSomeNo, edb), c("bla", "X"))
-    ##
-    snf <- SeqnameFilter(c("chrX", "Y"))
-    checkEquals(value(snf, edb), c("X", "Y"))
-
-    options(ensembldb.seqnameNotFound=orig)
-}
-
-test_genes_seqnames <- function(){
-    orig <- getOption("ensembldb.seqnameNotFound")
-    edb <- EnsDb.Hsapiens.v75
-    ## Here we want to test whether the result returned by the function does really
-    ## work when changing the seqnames.
-    seqlevelsStyle(edb) <- "Ensembl"
-    ensAll <- genes(edb)
-    ens21Y <- genes(edb, filter=SeqnameFilter(c("Y", "21")))
-    checkEquals(sort(as.character(unique(seqnames(ens21Y)))), c("21", "Y"))
-    gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
-    ensY <- genes(edb, filter=GRangesFilter(gr))
-    checkEquals(seqlevels(ensY), "Y")
-    checkEquals(unique(as.character(strand(ensY))), "+")
-
-    ## Check UCSC stuff
-    seqlevelsStyle(edb) <- "UCSC"
-    options(ensembldb.seqnameNotFound="ORIGINAL")
-    ## Just visually inspect the seqinfo and seqnames for the "all" query.
-    ucscAll <- genes(edb)
-    as.character(unique(seqnames(ucscAll)))
-    ucsc21Y <- genes(edb, filter=SeqnameFilter(c("chrY", "chr21")))
-    checkEquals(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
-    checkEquals(sort(names(ens21Y)), sort(names(ucsc21Y)))
-    ## GRangesFilter.
-    gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566), strand="+")
-    ucscY <- genes(edb, filter=GRangesFilter(gr))
-    checkEquals(seqlevels(ucscY), "chrY")
-    checkEquals(unique(as.character(strand(ucscY))), "+")
-    checkEquals(sort(names(ensY)), sort(names(ucscY)))
-    options(ensembldb.seqnameNotFound=orig)
-}
-
-test_transcripts_seqnames <- function(){
-    orig <- getOption("ensembldb.seqnameNotFound")
-    edb <- EnsDb.Hsapiens.v75
-    seqlevelsStyle(edb) <- "Ensembl"
-    ens21Y <- transcripts(edb, filter=SeqnameFilter(c("Y", "21")))
-    checkEquals(sort(seqlevels(ens21Y)), c("21", "Y"))
-    gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
-    ensY <- transcripts(edb, filter=GRangesFilter(gr))
-    checkEquals(seqlevels(ensY), "Y")
-    checkEquals(unique(as.character(strand(ensY))), "+")
-
-    ## Check UCSC stuff
-    seqlevelsStyle(edb) <- "UCSC"
-    options(ensembldb.seqnameNotFound="ORIGINAL")
-    ucsc21Y <- transcripts(edb, filter=SeqnameFilter(c("chrY", "chr21")))
-    checkEquals(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
-    checkEquals(sort(names(ens21Y)), sort(names(ucsc21Y)))
-    ## GRangesFilter.
-    gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566), strand="+")
-    ucscY <- transcripts(edb, filter=GRangesFilter(gr))
-    checkEquals(seqlevels(ucscY), "chrY")
-    checkEquals(unique(as.character(strand(ucscY))), "+")
-    checkEquals(sort(names(ensY)), sort(names(ucscY)))
-    options(ensembldb.seqnameNotFound=orig)
-}
-
-test_transcriptsBy_seqnames <- function(){
-    orig <- getOption("ensembldb.seqnameNotFound")
-    edb <- EnsDb.Hsapiens.v75
-    seqlevelsStyle(edb) <- "Ensembl"
-    ens21Y <- transcriptsBy(edb, filter=SeqnameFilter(c("Y", "21")))
-    checkEquals(sort(seqlevels(ens21Y)), c("21", "Y"))
-    gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
-    ensY <- transcriptsBy(edb, filter=GRangesFilter(gr))
-    checkEquals(seqlevels(ensY), "Y")
-    checkEquals(unique(as.character(unlist(strand(ensY)))), "+")
-
-    ## Check UCSC stuff
-    seqlevelsStyle(edb) <- "UCSC"
-    options(ensembldb.seqnameNotFound="ORIGINAL")
-    ucsc21Y <- transcriptsBy(edb, filter=SeqnameFilter(c("chrY", "chr21")))
-    checkEquals(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
-    checkEquals(sort(names(ens21Y)), sort(names(ucsc21Y)))
-    ## GRangesFilter.
-    gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566), strand="+")
-    ucscY <- transcriptsBy(edb, filter=GRangesFilter(gr))
-    checkEquals(seqlevels(ucscY), "chrY")
-    checkEquals(unique(as.character(unlist(strand(ucscY)))), "+")
-    checkEquals(sort(names(ensY)), sort(names(ucscY)))
-    options(ensembldb.seqnameNotFound=orig)
-}
-
-test_exons_seqnames <- function(){
-    orig <- getOption("ensembldb.seqnameNotFound")
-    edb <- EnsDb.Hsapiens.v75
-    seqlevelsStyle(edb) <- "Ensembl"
-    ens21Y <- exons(edb, filter=SeqnameFilter(c("Y", "21")))
-    checkEquals(sort(seqlevels(ens21Y)), c("21", "Y"))
-    gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
-    ensY <- exons(edb, filter=GRangesFilter(gr))
-    checkEquals(seqlevels(ensY), "Y")
-    checkEquals(unique(as.character(strand(ensY))), "+")
-
-    ## Check UCSC stuff
-    seqlevelsStyle(edb) <- "UCSC"
-    options(ensembldb.seqnameNotFound="ORIGINAL")
-    ucsc21Y <- exons(edb, filter=SeqnameFilter(c("chrY", "chr21")))
-    checkEquals(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
-    checkEquals(sort(names(ens21Y)), sort(names(ucsc21Y)))
-    ## GRangesFilter.
-    gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566), strand="+")
-    ucscY <- exons(edb, filter=GRangesFilter(gr))
-    checkEquals(seqlevels(ucscY), "chrY")
-    checkEquals(unique(as.character(strand(ucscY))), "+")
-    checkEquals(sort(names(ensY)), sort(names(ucscY)))
-    options(ensembldb.seqnameNotFound=orig)
-}
-
-test_exonsBy_seqnames <- function(){
-    orig <- getOption("ensembldb.seqnameNotFound")
-    edb <- EnsDb.Hsapiens.v75
-    seqlevelsStyle(edb) <- "Ensembl"
-    ens21Y <- exonsBy(edb, filter=SeqnameFilter(c("Y", "21")))
-    checkEquals(sort(seqlevels(ens21Y)), c("21", "Y"))
-    gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
-    ensY <- exonsBy(edb, filter=GRangesFilter(gr))
-    checkEquals(seqlevels(ensY), "Y")
-    checkEquals(unique(as.character(unlist(strand(ensY)))), "+")
-
-    ## Check UCSC stuff
-    seqlevelsStyle(edb) <- "UCSC"
-    options(ensembldb.seqnameNotFound="ORIGINAL")
-    ucsc21Y <- exonsBy(edb, filter=SeqnameFilter(c("chrY", "chr21")))
-    checkEquals(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
-    checkEquals(sort(names(ens21Y)), sort(names(ucsc21Y)))
-    ## GRangesFilter.
-    gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566), strand="+")
-    ucscY <- exonsBy(edb, filter=GRangesFilter(gr))
-    checkEquals(seqlevels(ucscY), "chrY")
-    checkEquals(unique(as.character(unlist(strand(ucscY)))), "+")
-    checkEquals(sort(names(ensY)), sort(names(ucscY)))
-    options(ensembldb.seqnameNotFound=orig)
-}
-
-
-test_cdsBy_seqnames <- function(){
-    orig <- getOption("ensembldb.seqnameNotFound")
-    edb <- EnsDb.Hsapiens.v75
-    seqlevelsStyle(edb) <- "Ensembl"
-    ens21Y <- cdsBy(edb, filter=SeqnameFilter(c("Y", "21")))
-    checkEquals(sort(seqlevels(ens21Y)), c("21", "Y"))
-    gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
-    ensY <- cdsBy(edb, filter=GRangesFilter(gr))
-    checkEquals(seqlevels(ensY), "Y")
-    checkEquals(unique(as.character(unlist(strand(ensY)))), "+")
-
-    ## Check UCSC stuff
-    seqlevelsStyle(edb) <- "UCSC"
-    options(ensembldb.seqnameNotFound="ORIGINAL")
-    ucsc21Y <- cdsBy(edb, filter=SeqnameFilter(c("chrY", "chr21")))
-    checkEquals(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
-    checkEquals(sort(names(ens21Y)), sort(names(ucsc21Y)))
-    ## GRangesFilter.
-    gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566), strand="+")
-    ucscY <- cdsBy(edb, filter=GRangesFilter(gr))
-    checkEquals(seqlevels(ucscY), "chrY")
-    checkEquals(unique(as.character(unlist(strand(ucscY)))), "+")
-    checkEquals(sort(names(ensY)), sort(names(ucscY)))
-    options(ensembldb.seqnameNotFound=orig)
-}
-
-test_threeUTRsByTranscript_seqnames <- function(){
-    orig <- getOption("ensembldb.seqnameNotFound")
-    edb <- EnsDb.Hsapiens.v75
-    seqlevelsStyle(edb) <- "Ensembl"
-    ens21Y <- threeUTRsByTranscript(edb, filter=SeqnameFilter(c("Y", "21")))
-    checkEquals(sort(seqlevels(ens21Y)), c("21", "Y"))
-    gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
-    ensY <- threeUTRsByTranscript(edb, filter=GRangesFilter(gr))
-    checkEquals(seqlevels(ensY), "Y")
-    checkEquals(unique(as.character(unlist(strand(ensY)))), "+")
-
-    ## Check UCSC stuff
-    seqlevelsStyle(edb) <- "UCSC"
-    options(ensembldb.seqnameNotFound="ORIGINAL")
-    ucsc21Y <- threeUTRsByTranscript(edb, filter=SeqnameFilter(c("chrY", "chr21")))
-    checkEquals(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
-    checkEquals(sort(names(ens21Y)), sort(names(ucsc21Y)))
-    ## GRangesFilter.
-    gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566), strand="+")
-    ucscY <- threeUTRsByTranscript(edb, filter=GRangesFilter(gr))
-    checkEquals(seqlevels(ucscY), "chrY")
-    checkEquals(unique(as.character(unlist(strand(ucscY)))), "+")
-    checkEquals(sort(names(ensY)), sort(names(ucscY)))
-    options(ensembldb.seqnameNotFound=orig)
-}
-
-test_fiveUTRsByTranscript_seqnames <- function(){
-    orig <- getOption("ensembldb.seqnameNotFound")
-    edb <- EnsDb.Hsapiens.v75
-    seqlevelsStyle(edb) <- "Ensembl"
-    ens21Y <- fiveUTRsByTranscript(edb, filter=SeqnameFilter(c("Y", "21")))
-    checkEquals(sort(seqlevels(ens21Y)), c("21", "Y"))
-    gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
-    ensY <- fiveUTRsByTranscript(edb, filter=GRangesFilter(gr))
-    checkEquals(seqlevels(ensY), "Y")
-    checkEquals(unique(as.character(unlist(strand(ensY)))), "+")
-
-    ## Check UCSC stuff
-    seqlevelsStyle(edb) <- "UCSC"
-    options(ensembldb.seqnameNotFound="ORIGINAL")
-    ucsc21Y <- fiveUTRsByTranscript(edb, filter=SeqnameFilter(c("chrY", "chr21")))
-    checkEquals(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
-    checkEquals(sort(names(ens21Y)), sort(names(ucsc21Y)))
-    ## GRangesFilter.
-    gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566), strand="+")
-    ucscY <- fiveUTRsByTranscript(edb, filter=GRangesFilter(gr))
-    checkEquals(seqlevels(ucscY), "chrY")
-    checkEquals(unique(as.character(unlist(strand(ucscY)))), "+")
-    checkEquals(sort(names(ensY)), sort(names(ucscY)))
-    options(ensembldb.seqnameNotFound=orig)
-}
-
-
-test_updateEnsDb <- function(){
-    edb2 <- updateEnsDb(edb)
-    checkEquals(edb2 at tables, edb at tables)
-    checkTrue(.hasSlot(edb2, ".properties"))
-}
-
-test_properties <- function(){
-    checkEquals(ensembldb:::getProperty(edb, "foo"), NA)
-
-    checkException(ensembldb:::setProperty(edb, "foo"))
-
-    edb <- ensembldb:::setProperty(edb, foo="bar")
-    checkEquals(ensembldb:::getProperty(edb, "foo"), "bar")
-    checkEquals(length(ensembldb:::properties(edb)), 4)
-}
-
-test_set_get_seqlevelsStyle <- function(){
-    edb <- EnsDb.Hsapiens.v75
-    ## Testing the getter/setter for the seqlevelsStyle.
-    checkEquals(seqlevelsStyle(edb), "Ensembl")
-    checkEquals(NA, ensembldb:::getProperty(edb, "seqlevelsStyle"))
-
-    seqlevelsStyle(edb) <- "Ensembl"
-    checkEquals(seqlevelsStyle(edb), "Ensembl")
-    checkEquals("Ensembl", ensembldb:::getProperty(edb, "seqlevelsStyle"))
-
-    ## Try NCBI.
-    seqlevelsStyle(edb) <- "NCBI"
-    checkEquals(seqlevelsStyle(edb), "NCBI")
-
-    ## Try UCSC.
-    seqlevelsStyle(edb) <- "UCSC"
-    checkEquals(seqlevelsStyle(edb), "UCSC")
-
-    ## Error checking:
-    checkException(seqlevelsStyle(edb) <- "bla")
-}
-
-## Just dry run this without any actual query.
-test_formatSeqnamesForQuery <- function(){
-    ## Testing if the formating/mapping between seqnames works as expected
-    ## We want to map anything TO Ensembl.
-    ## Check also the warning messages!
-    ucscs <- c("chr1", "chr3", "chr1", "chr9", "chrM", "chr1", "chrX")
-    enses <- c("1", "3", "1", "9", "MT", "1", "X")
-    ## reset
-    edb <- EnsDb.Hsapiens.v75
-    ## Shouldn't do anything here.
-    seqlevelsStyle(edb)
-    ensembldb:::dbSeqlevelsStyle(edb)
-    got <- ensembldb:::formatSeqnamesForQuery(edb, enses)
-    checkEquals(got, enses)
-    ## Change the seqlevels to UCSC
-    seqlevelsStyle(edb) <- "UCSC"
-    ## If ifNotFound is not specified we suppose to get an error.
-    options(ensembldb.seqnameNotFound="MISSING")
-    checkException(ensembldb:::formatSeqnamesForQuery(edb, enses))
-    ## With specifying ifNotFound
-    got <- ensembldb:::formatSeqnamesForQuery(edb, enses, ifNotFound=NA)
-    checkEquals(all(is.na(got)), TRUE)
-    ## Same by setting the option
-    options(ensembldb.seqnameNotFound=NA)
-    got <- ensembldb:::formatSeqnamesForQuery(edb, enses)
-    checkEquals(all(is.na(got)), TRUE)
-
-    ## Now the working example:
-    got <- ensembldb:::formatSeqnamesForQuery(edb, ucscs)
-    checkEquals(got, enses)
-    ## What if one is not mappable:
-    got <- ensembldb:::formatSeqnamesForQuery(edb, c(ucscs, "asdfd"), ifNotFound=NA)
-    checkEquals(got, c(enses, NA))
-}
-
-## Just dry run this without any actual query
-test_formatSeqnamesFromQuery <- function(){
-    ucscs <- c("chr1", "chr3", "chr1", "chr9", "chrM", "chr1", "chrX")
-    enses <- c("1", "3", "1", "9", "MT", "1", "X")
-    edb <- EnsDb.Hsapiens.v75
-    ## Shouldn't do anything here.
-    seqlevelsStyle(edb)
-    ensembldb:::dbSeqlevelsStyle(edb)
-    got <- ensembldb:::formatSeqnamesFromQuery(edb, enses)
-    checkEquals(got, enses)
-    ## Change the seqlevels to UCSC
-    seqlevelsStyle(edb) <- "UCSC"
-    ## If ifNotFound is not specified we suppose to get an error.
-    options(ensembldb.seqnameNotFound="MISSING")
-    checkException(ensembldb:::formatSeqnamesFromQuery(edb, ucsc))
-    ## With specifying ifNotFound
-    got <- ensembldb:::formatSeqnamesFromQuery(edb, ucscs, ifNotFound=NA)
-    checkEquals(all(is.na(got)), TRUE)
-    ## Same using options
-    options(ensembldb.seqnameNotFound=NA)
-    got <- ensembldb:::formatSeqnamesFromQuery(edb, ucscs, ifNotFound=NA)
-    checkEquals(all(is.na(got)), TRUE)
-    ## Now the working example:
-    got <- ensembldb:::formatSeqnamesFromQuery(edb, enses)
-    checkEquals(got, ucscs)
-    ## What if one is not mappable:
-    got <- ensembldb:::formatSeqnamesFromQuery(edb, c(enses, "asdfd"), ifNotFound=NA)
-    checkEquals(got, c(ucscs, NA))
-    got <- ensembldb:::formatSeqnamesFromQuery(edb, c(enses, "asdfd"))
-    checkEquals(got, c(ucscs, NA))
-}
-
-notrun_test_set_seqlevels <- function(){
-    ## To test what happens if no mapping is available
-    ##gff <- "/Users/jo/Projects/EnsDbs/83/gadus_morhua/Gadus_morhua.gadMor1.83.gff3.gz"
-    library(AnnotationHub)
-    ah <- AnnotationHub()
-    ah <- ah["AH47962"]
-    fromG <- ensDbFromAH(ah, outfile=tempfile())
-    edb <- EnsDb(fromG)
-    seqlevelsStyle(edb)
-    checkException(seqlevelsStyle(edb) <- "UCSC")
-}
-
-
-
-
-
-deprecated_test_check_SeqnameFilter <- function(){
-    Orig <- getOption("ucscChromosomeNames", FALSE)
-    options(ucscChromosomeNames=TRUE)
-    snf <- SeqnameFilter(c("chrX", "chr3"))
-    checkEquals(value(snf), c("chrX", "chr3"))
-    checkEquals(value(snf, edb), c("X", "3"))
-
-    options(ucscChromosomeNames=FALSE)
-    checkEquals(value(snf, edb), c("X", "3"))
-
-    ## No matter what, where has to return names without chr!
-    checkEquals(where(snf, edb), "gene.seq_name in ('X','3')")
-
-    ## GRangesFilter:
-    grf <- GRangesFilter(GRanges("chrX", IRanges(123, 345)))
-    checkEqualsNumeric(length(grep(where(grf), pattern="seq_name == 'chrX'")), 1)
-    checkEqualsNumeric(length(grep(where(grf, edb), pattern="seq_name == 'X'")), 1)
-
-    ## Check chromosome MT/chrM
-    options(ucscChromosomeNames=FALSE)
-    snf <- SeqnameFilter("MT")
-    checkEquals(where(snf, edb), "gene.seq_name = 'MT'")
-    snf <- SeqnameFilter("chrM")
-    checkEquals(where(snf, edb), "gene.seq_name = 'MT'")
-    options(ucscChromosomeNames=TRUE)
-    snf <- SeqnameFilter("MT")
-    checkEquals(where(snf, edb), "gene.seq_name = 'MT'")
-    snf <- SeqnameFilter("chrM")
-    checkEquals(where(snf, edb), "gene.seq_name = 'MT'")
-
-    options(ucscChromosomeNames=Orig)
-}
-
-deprecated_test_check_retrieve_data <- function(){
-    Orig <- getOption("ucscChromosomeNames", FALSE)
-
-    options(ucscChromosomeNames=FALSE)
-    genes <- genes(edb, filter=SeqnameFilter(c("21", "Y", "X")))
-    checkEquals(all(seqlevels(genes) %in% c("21", "X", "Y")), TRUE)
-    options(ucscChromosomeNames=TRUE)
-    genes <- genes(edb, filter=SeqnameFilter(c("21", "Y", "X")))
-    checkEquals(all(seqlevels(genes) %in% c("chr21", "chrX", "chrY")), TRUE)
-
-    ## Check chromosome MT
-    options(ucscChromosomeNames=FALSE)
-    exons <- exons(edb, filter=SeqnameFilter("MT"))
-    checkEquals(seqlevels(exons), "MT")
-    options(ucscChromosomeNames=TRUE)
-    exons <- exons(edb, filter=SeqnameFilter("MT"))
-    checkEquals(seqlevels(exons), "chrM")
-
-    options(ucscChromosomeNames=Orig)
-}
-
-
-notrun_check_get_sequence_bsgenome <- function(){
-    edb <- EnsDb.Hsapiens.v75
-    ## Using first the Ensembl fasta stuff.
-    ensSeqs <- extractTranscriptSeqs(getGenomeFaFile(edb),
-                                     exonsBy(edb, "tx", filter=SeqnameFilter("Y")))
-    ## Now the same using the BSgenome stuff.
-    seqlevelsStyle(edb) <- "UCSC"
-    options(ensembldb.seqnameNotFound="ORIGINAL")
-    exs <- exonsBy(edb, "tx", filter=SeqnameFilter("chrY"))
-    library(BSgenome.Hsapiens.UCSC.hg19)
-    bsg <- BSgenome.Hsapiens.UCSC.hg19
-    ucscSeqs <- extractTranscriptSeqs(bsg, exs)
-
-    checkEquals(as.character(ensSeqs), as.character(ucscSeqs))
-}
-
-
-## Use the stuff from GenomeInfoDb!
-notrun_test_newstuff <- function(){
-    library(GenomeInfoDb)
-    Map <- mapSeqlevels(seqlevels(edb), style="Ensembl")
-    Map <- mapSeqlevels(seqlevels(edb), style="UCSC")
-    ## just check what's out there
-    genomeStyles()
-}
-
-
diff --git a/inst/unitTests/test_validity.R b/inst/unitTests/test_validity.R
deleted file mode 100644
index 9560f12..0000000
--- a/inst/unitTests/test_validity.R
+++ /dev/null
@@ -1,11 +0,0 @@
-library(EnsDb.Hsapiens.v75)
-edb <- EnsDb.Hsapiens.v75
-
-test_validity_functions <- function() {
-    OK <- ensembldb:::dbHasRequiredTables(dbconn(edb))
-    checkTrue(OK)
-    ## Check the tables
-    OK <- ensembldb:::dbHasValidTables(dbconn(edb))
-    checkTrue(OK)
-}
-
diff --git a/inst/unitTests/test_xByOverlap.R b/inst/unitTests/test_xByOverlap.R
deleted file mode 100644
index 3da75f6..0000000
--- a/inst/unitTests/test_xByOverlap.R
+++ /dev/null
@@ -1,102 +0,0 @@
-####============================================================
-##  tests for exonsByOverlaps, transcriptsByOverlaps
-##
-####------------------------------------------------------------
-library(EnsDb.Hsapiens.v75)
-edb <- EnsDb.Hsapiens.v75
-
-test_transcriptsByOverlaps <- function(){
-    ir2 <- IRanges(start=c(2654890, 2709520, 28111770),
-                   end=c(2654900, 2709550, 28111790))
-    gr2 <- GRanges(rep("Y", length(ir2)), ir2)
-    grf2 <- GRangesFilter(gr2, condition="overlapping")
-    Test <- transcripts(edb, filter=grf2)
-
-    Test2 <- transcriptsByOverlaps(edb, gr2)
-    checkEquals(names(Test), names(Test2))
-
-    ## on one strand.
-    gr2 <- GRanges(rep("Y", length(ir2)), ir2, strand=rep("-", length(ir2)))
-    grf2 <- GRangesFilter(gr2, condition="overlapping")
-    Test <- transcripts(edb, filter=grf2)
-    Test2 <- transcriptsByOverlaps(edb, gr2)
-    checkEquals(names(Test), names(Test2))
-
-    ## Combine with filter...
-    gr2 <- GRanges(rep("Y", length(ir2)), ir2)
-    Test3 <- transcriptsByOverlaps(edb, gr2, filter=SeqstrandFilter("-"))
-    checkEquals(names(Test), names(Test3))
-}
-
-test_exonsByOverlaps <- function(){
-    ir2 <- IRanges(start=c(2654890, 2709520, 28111770),
-                   end=c(2654900, 2709550, 28111790))
-    gr2 <- GRanges(rep("Y", length(ir2)), ir2)
-    grf2 <- GRangesFilter(gr2, condition="overlapping")
-    Test <- exons(edb, filter=grf2)
-
-    Test2 <- exonsByOverlaps(edb, gr2)
-    checkEquals(names(Test), names(Test2))
-
-    ## on one strand.
-    gr2 <- GRanges(rep("Y", length(ir2)), ir2, strand=rep("-", length(ir2)))
-    grf2 <- GRangesFilter(gr2, condition="overlapping")
-    Test <- exons(edb, filter=grf2)
-    Test2 <- exonsByOverlaps(edb, gr2)
-    checkEquals(names(Test), names(Test2))
-
-    ## Combine with filter...
-    gr2 <- GRanges(rep("Y", length(ir2)), ir2)
-    Test3 <- exonsByOverlaps(edb, gr2, filter=SeqstrandFilter("-"))
-    checkEquals(names(Test), names(Test3))
-}
-
-
-testing_txByOverlap <- function(){
-    ## Apparently, a combination between transcripts and findoverlaps.
-    grf <- GRangesFilter(GRanges(seqname="Y", IRanges(start=2655145, end=2655500)),
-                         condition="overlapping")
-    grf2 <- GRangesFilter(GRanges(seqname="Y", IRanges(start=28740998, end=28741998)),
-                          condition="overlapping")
-    transcripts(edb, filter=list(SeqnameFilter("Y"), GenebiotypeFilter("protein_coding")))
-    where(grf)
-    con <- dbconn(edb)
-    library(RSQLite)
-    q <- paste0("select * from gene where (", where(grf, edb),
-                ") or (", where(grf2), ")")
-    Test <- dbGetQuery(con, q)
-
-    ## Here we go...
-    ir <- IRanges(start=c(142999, 231380, 27635900),
-                  end=c(143300, 231800, 27636200))
-    gr <- GRanges(seqname=rep("Y", length(ir)), ir)
-    grf <- GRangesFilter(gr, condition="overlapping")
-    where(grf)
-    where(grf, edb)
-    Test <- transcripts(edb, filter=grf)
-    ## ?? Nothing ??
-    ir2 <- IRanges(start=c(2654890, 2709520, 28111770),
-                   end=c(2654900, 2709550, 28111790))
-    grf2 <- GRangesFilter(GRanges(rep("Y", length(ir2)), ir2), condition="overlapping")
-    Test <- transcripts(edb, filter=grf2)
-    checkEquals(names(Test), c("ENST00000383070", "ENST00000250784", "ENST00000598545"))
-    ## ## TxDb...
-    ## library(TxDb.Hsapiens.UCSC.hg19.knownGene)
-    ## txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
-    ## gr <- GRanges(seqname=c("chrY", "chrY", "chrY", "chrY"),
-    ##               IRanges(start=c(2655145, 28740998, 2709990, 28111770),
-    ##                       end=c(2655200, 28741998, 2709999, 28112800)))
-    ## transcriptsByOverlaps(txdb, GRanges(seqname=rep("chrY", length(ir)), ir))
-    ## transcriptsByOverlaps(txdb, GRanges(seqname=rep("chrY", length(ir2)), ir2))
-
-}
-
-notrun_txdb <- function(){
-    txdb <- loadDb(system.file("extdata", "hg19_knownGene_sample.sqlite",
-                               package="GenomicFeatures"))
-    gr <- GRanges(seqnames = rep("chr1",2),
-                  ranges = IRanges(start=c(500,10500), end=c(10000,30000)),
-                  strand = strand(rep("-",2)))
-    transcriptsByOverlaps(txdb, gr)
-}
-
diff --git a/man/Deprecated.Rd b/man/Deprecated.Rd
new file mode 100644
index 0000000..73bff07
--- /dev/null
+++ b/man/Deprecated.Rd
@@ -0,0 +1,92 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/Deprecated.R
+\name{Deprecated}
+\alias{Deprecated}
+\alias{ensembldb-deprecated}
+\alias{GeneidFilter}
+\alias{GenebiotypeFilter}
+\alias{EntrezidFilter}
+\alias{TxidFilter}
+\alias{TxbiotypeFilter}
+\alias{ExonidFilter}
+\alias{ExonrankFilter}
+\alias{SeqnameFilter}
+\alias{SeqstrandFilter}
+\alias{SeqstartFilter}
+\alias{SeqendFilter}
+\title{Deprecated functionality}
+\usage{
+GeneidFilter(value, condition = "==")
+
+GenebiotypeFilter(value, condition = "==")
+
+EntrezidFilter(value, condition = "==")
+
+TxidFilter(value, condition = "==")
+
+TxbiotypeFilter(value, condition = "==")
+
+ExonidFilter(value, condition = "==")
+
+ExonrankFilter(value, condition = "==")
+
+SeqnameFilter(value, condition = "==")
+
+SeqstrandFilter(value, condition = "==")
+
+SeqstartFilter(value, condition = ">", feature = "gene")
+
+SeqendFilter(value, condition = "<", feature = "gene")
+}
+\arguments{
+\item{value}{The value for the filter.}
+
+\item{condition}{The condition for the filter.}
+
+\item{feature}{For \code{SeqstartFilter} and \code{SeqendFilter}: on what type
+of feature should the filter be applied? Supported are \code{"gene"},
+\code{"tx"} and \code{"exon"}.}
+}
+\description{
+All functions, methods and classes listed on this page are
+deprecated and might be removed in future releases.
+
+\code{GeneidFilter} creates a \code{GeneIdFilter}. Use
+\code{\link[AnnotationFilter]{GeneIdFilter}} instead.
+
+\code{GenebiotypeFilter} creates a \code{GeneBiotypeFilter}. Use
+\code{\link[AnnotationFilter]{GeneBiotypeFilter}} instead.
+
+\code{EntrezidFilter} creates a \code{EntrezFilter}. Use
+\code{\link[AnnotationFilter]{EntrezFilter}} instead.
+
+\code{TxidFilter} creates a \code{TxIdFilter}. Use
+\code{\link[AnnotationFilter]{TxIdFilter}} instead.
+
+\code{TxbiotypeFilter} creates a \code{TxBiotypeFilter}. Use
+\code{\link[AnnotationFilter]{TxBiotypeFilter}} instead.
+
+\code{ExonidFilter} creates a \code{ExonIdFilter}. Use
+\code{\link[AnnotationFilter]{ExonIdFilter}} instead.
+
+\code{ExonrankFilter} creates a \code{ExonRankFilter}. Use
+\code{\link[AnnotationFilter]{ExonRankFilter}} instead.
+
+\code{SeqNameFilter} creates a \code{SeqNameFilter}. Use
+\code{\link[AnnotationFilter]{SeqNameFilter}} instead.
+
+\code{SeqstrandFilter} creates a \code{SeqStrandFilter}. Use
+\code{\link[AnnotationFilter]{SeqStrandFilter}} instead.
+
+\code{SeqstartFilter} creates a \code{GeneStartFilter},
+\code{TxStartFilter} or \code{ExonStartFilter} depending on the value of the
+parameter \code{feature}. Use \code{\link[AnnotationFilter]{GeneStartFilter}},
+\code{\link[AnnotationFilter]{TxStartFilter}} and
+\code{\link[AnnotationFilter]{ExonStartFilter}} instead.
+
+\code{SeqendFilter} creates a \code{GeneEndFilter},
+\code{TxEndFilter} or \code{ExonEndFilter} depending on the value of the
+parameter \code{feature}. Use \code{\link[AnnotationFilter]{GeneEndFilter}},
+\code{\link[AnnotationFilter]{TxEndFilter}} and
+\code{\link[AnnotationFilter]{ExonEndFilter}} instead.
+}
diff --git a/man/EnsDb-AnnotationDbi.Rd b/man/EnsDb-AnnotationDbi.Rd
index 497f736..be26f64 100644
--- a/man/EnsDb-AnnotationDbi.Rd
+++ b/man/EnsDb-AnnotationDbi.Rd
@@ -46,30 +46,34 @@
   \item{keys}{
     The keys/ids for which data should be retrieved from the
     database. This can be either a character vector of keys/IDs, a
-    single filter object extending \code{\linkS4class{BasicFilter}} or a
-    list of such objects.
+    single filter object extending
+    \code{\link[AnnotationFilter]{AnnotationFilter}}, an combination of
+    filters \code{\link[AnnotationFilter]{AnnotationFilterList}} or a
+    \code{formula} representing a filter expression (see
+    \code{\link[AnnotationFilter]{AnnotationFilter}} for more details).
   }
 
   \item{keytype}{
     For \code{mapIds} and \code{select}: the type (column) that matches
     the provided keys. This argument does not have to be specified if
     argument \code{keys} is a filter object extending
-    \code{\linkS4class{BasicFilter}} or a \code{list} of such objects.
+    \code{AnnotationFilter} or a \code{list} of such objects.
 
     For \code{keys}: which keys should be returned from the database.
   }
 
   \item{filter}{
     For \code{keys}: either a single object extending
-    \code{\linkS4class{BasicFilter}} or a list of such object to
+    \code{AnnotationFilter} or a list of such object to
     retrieve only specific keys from the database.
   }
 
   \item{multiVals}{
     What should \code{mapIds} do when there are multiple values that
-    could be returned? Options are: \code{"first"}, \code{"list"},
+    could be returned? Options are: \code{"first"} (default), \code{"list"},
     \code{"filter"}, \code{"asNA"}. See
-    \code{\link[AnnotationDbi]{mapIds}} for a detailed description.
+    \code{\link[AnnotationDbi]{mapIds}} 
+    for a detailed description.
   }
 
   \item{x}{
@@ -114,7 +118,7 @@
       Retrieve the mapped ids for a set of keys that are of a particular
       keytype. Argument \code{keys} can be either a character vector of
       keys/IDs, a single filter object extending
-      \code{\linkS4class{BasicFilter}} or a list of such objects. For
+      \code{AnnotationFilter} or a list of such objects. For
       the latter, the argument \code{keytype} does not have to be
       specified. Importantly however, if the filtering system is used,
       the ordering of the results might not represent the ordering of
@@ -132,7 +136,7 @@
       arguments. Multiple matches of the keys are returned in one row
       for each possible match. Argument \code{keys} can be either a
       character vector of keys/IDs, a single filter object extending
-      \code{\linkS4class{BasicFilter}} or a list of such objects. For
+      \code{AnnotationFilter} or a list of such objects. For
       the latter, the argument \code{keytype} does not have to be
       specified.
 
@@ -144,6 +148,11 @@
       Returns a \code{data.frame} with the column names corresponding to
       the argument \code{columns} and rows with all data matching the
       criteria specified with \code{keys}.
+
+      The use of \code{select} without filters or keys and without
+      restricting to specicic columns is strongly discouraged, as the
+      SQL query to join all of the tables, especially if protein
+      annotation data is available is very expensive.
     }
 
   }
@@ -156,7 +165,6 @@
   Johannes Rainer
 }
 \seealso{
-  \code{\linkS4class{BasicFilter}}
   \code{\link{listColumns}}
   \code{\link{transcripts}}
 }
@@ -175,42 +183,44 @@ columns(edb)
 listColumns(edb)
 
 ## Retrieve all keys corresponding to transcript ids.
-txids <- keys(edb, keytype="TXID")
+txids <- keys(edb, keytype = "TXID")
 length(txids)
 head(txids)
 
 ## Retrieve all keys corresponding to gene names of genes encoded on chromosome X
-gids <- keys(edb, keytype="GENENAME", filter=SeqnameFilter("X"))
+gids <- keys(edb, keytype = "GENENAME", filter = SeqNameFilter("X"))
 length(gids)
 head(gids)
 
 ## Get a mapping of the genes BCL2 and BCL2L11 to all of their
 ## transcript ids and return the result as list
-maps <- mapIds(edb, keys=c("BCL2", "BCL2L11"), column="TXID",
-               keytype="GENENAME", multiVals="list")
+maps <- mapIds(edb, keys = c("BCL2", "BCL2L11"), column = "TXID",
+               keytype = "GENENAME", multiVals = "list")
 maps
 
-## Perform the same query using a combination of a GenenameFilter and a TxbiotypeFilter
-## to just retrieve protein coding transcripts for these two genes.
-mapIds(edb, keys=list(GenenameFilter(c("BCL2", "BCL2L11")),
-                      TxbiotypeFilter("protein_coding")), column="TXID",
-       multiVals="list")
+## Perform the same query using a combination of a GenenameFilter and a
+## TxBiotypeFilter to just retrieve protein coding transcripts for these
+## two genes.
+mapIds(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
+                        TxBiotypeFilter("protein_coding")), column = "TXID",
+       multiVals = "list")
 
 ## select:
 ## Retrieve all transcript and gene related information for the above example.
-select(edb, keys=list(GenenameFilter(c("BCL2", "BCL2L11")),
-                      TxbiotypeFilter("protein_coding")),
-       columns=c("GENEID", "GENENAME", "TXID", "TXBIOTYPE", "TXSEQSTART", "TXSEQEND",
-                 "SEQNAME", "SEQSTRAND"))
+select(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
+                        TxBiotypeFilter("protein_coding")),
+       columns = c("GENEID", "GENENAME", "TXID", "TXBIOTYPE", "TXSEQSTART",
+                   "TXSEQEND", "SEQNAME", "SEQSTRAND"))
 
 ## Get all data for genes encoded on chromosome Y
-Y <- select(edb, keys="Y", keytype="SEQNAME")
+Y <- select(edb, keys = "Y", keytype = "SEQNAME")
 head(Y)
 nrow(Y)
 
-## Get selected columns for all lincRNAs encoded on chromosome Y
-Y <- select(edb, keys=list(SeqnameFilter("Y"), GenebiotypeFilter("lincRNA")),
-            columns=c("GENEID", "GENEBIOTYPE", "TXID", "GENENAME"))
+## Get selected columns for all lincRNAs encoded on chromosome Y. Here we use
+## a filter expression to define what data to retrieve.
+Y <- select(edb, keys = ~ seq_name == "Y" & gene_biotype == "lincRNA",
+            columns = c("GENEID", "GENEBIOTYPE", "TXID", "GENENAME"))
 head(Y)
 nrow(Y)
 
diff --git a/man/EnsDb-class.Rd b/man/EnsDb-class.Rd
index 2c0c943..6ccce43 100644
--- a/man/EnsDb-class.Rd
+++ b/man/EnsDb-class.Rd
@@ -2,8 +2,6 @@
 \Rdversion{1.1}
 \docType{class}
 \alias{EnsDb-class}
-\alias{buildQuery}
-\alias{buildQuery,EnsDb-method}
 \alias{dbconn}
 \alias{dbconn,EnsDb-method}
 \alias{ensemblVersion}
@@ -33,12 +31,11 @@
 \alias{returnFilterColumns<-}
 \alias{returnFilterColumns<-,EnsDb-method}
 
-
 \title{Basic usage of an Ensembl based annotation database}
 \description{
-  Get some basic information from an Ensembl based annotation package
-  generated with \code{\link{makeEnsembldbPackage}}.
-
+  The \code{EnsDb} class provides access to an Ensembl-based annotation
+  package. This help page describes functions to get some basic
+  informations from such an object.
 }
 \section{Objects from the Class}{
   A connection to the respective annotation database is created upon
@@ -50,10 +47,6 @@
 }
 \usage{
 
-\S4method{buildQuery}{EnsDb}(x, columns=c("gene_id", "gene_biotype",
-                                    "gene_name"), filter=list(), order.by,
-                             order.type="asc", skip.order.check=FALSE)
-
 \S4method{dbconn}{EnsDb}(x)
 
 \S4method{ensemblVersion}{EnsDb}(x)
@@ -91,43 +84,18 @@
     Not used.
   }
 
-  \item{columns}{
-    Columns (attributes) to be retrieved from the database tables. Use the
-    \code{listColumns} or \code{listTables} method for a list of
-    supported columns.
-  }
-
-  \item{filter}{
-    list of \code{\linkS4class{BasicFilter}} instance(s) to
-    select specific entries from the database (see examples below).
-  }
-
   \item{object}{
     For \code{organism}: an \code{EnsDb} instance.
   }
 
-  \item{order.by}{name of one of the columns above on which the
-    results should be sorted.
-  }
-
-  \item{order.type}{if the results should be ordered ascending
-    (\code{asc}, default) or descending (\code{desc}).
-  }
-
   \item{skip.keys}{
     for \code{listColumns}: whether primary and foreign keys (not
     being e.g. \code{"gene_id"} or alike) should be returned or not. By
     default these will not be returned.
   }
 
-  \item{skip.order.check}{
-    if paramter \code{order.by} should be checked for allowed column
-    names. If \code{TRUE} the function checks if the provided order
-    criteria orders on columns present in the database tables.
-  }
-
   \item{table}{
-    For \code{listColumns}: optionally specify the table name for
+    For \code{listColumns}: optionally specify the table name(s) for
     which the columns should be returned.
   }
 
@@ -164,11 +132,6 @@
 \section{Methods and Functions}{
   \describe{
 
-    \item{buildQuery}{
-      Helper function building the SQL query to be used to retrieve the
-      wanted information. Usually there is no need to call this method.
-    }
-
     \item{dbconn}{
       Returns the connection to the internal SQL database.
     }
@@ -232,10 +195,6 @@
 }
 \value{
   \describe{
-    \item{For \code{buildQuery}}{
-      A character string with the SQL query.
-    }
-
     \item{For \code{connection}}{
       The SQL connection to the RSQLite database.
     }
@@ -299,7 +258,7 @@
 }
 \seealso{
   \code{\link{EnsDb}},
-  \code{\link{makeEnsembldbPackage}}, \code{\linkS4class{BasicFilter}},
+  \code{\link{makeEnsembldbPackage}},
       \code{\link{exonsBy}}, \code{\link{genes}},
       \code{\link{transcripts}},
       \code{\link{makeEnsemblSQLiteFromTables}}
@@ -326,16 +285,6 @@ metadata(EnsDb.Hsapiens.v75)
 ## Get all the sequence names.
 seqlevels(EnsDb.Hsapiens.v75)
 
-######    buildQuery
-##
-## Join tables gene and transcript and return gene_id and tx_id
-buildQuery(EnsDb.Hsapiens.v75, columns=c("gene_id", "tx_id"))
-
-
-## Get all exon_ids and transcript ids of genes encoded on chromosome Y.
-buildQuery(EnsDb.Hsapiens.v75, columns=c("exon_id", "tx_id"),
-           filter=list(SeqnameFilter( "Y")))
-
 ## List all available gene biotypes from the database:
 listGenebiotypes(EnsDb.Hsapiens.v75)
 
@@ -350,16 +299,16 @@ returnFilterColumns(EnsDb.Hsapiens.v75)
 
 ## Get protein coding genes on chromosome X, specifying to return
 ## only columns gene_name as additional column.
-genes(EnsDb.Hsapiens.v75, filter=list(SeqnameFilter("X"),
-                                      GenebiotypeFilter("protein_coding")),
+genes(EnsDb.Hsapiens.v75, filter=list(SeqNameFilter("X"),
+                                      GeneBiotypeFilter("protein_coding")),
       columns=c("gene_name"))
 ## By default we get also the gene_biotype column as the data was filtered
 ## on this column.
 
 ## This can be changed using the returnFilterColumns option
 returnFilterColumns(EnsDb.Hsapiens.v75) <- FALSE
-genes(EnsDb.Hsapiens.v75, filter=list(SeqnameFilter("X"),
-                                      GenebiotypeFilter("protein_coding")),
+genes(EnsDb.Hsapiens.v75, filter=list(SeqNameFilter("X"),
+                                      GeneBiotypeFilter("protein_coding")),
       columns=c("gene_name"))
 
 
diff --git a/man/EnsDb-exonsBy.Rd b/man/EnsDb-exonsBy.Rd
index b2c5956..2e4c888 100644
--- a/man/EnsDb-exonsBy.Rd
+++ b/man/EnsDb-exonsBy.Rd
@@ -27,49 +27,51 @@
 \description{
   Retrieve gene/transcript/exons annotations stored in an Ensembl based
   database package generated with the \code{\link{makeEnsembldbPackage}}
-  function.
+  function. Parameter \code{filter} enables to define filters to
+  retrieve only specific data.
 }
 \usage{
 
-\S4method{exons}{EnsDb}(x, columns=listColumns(x,"exon"),
-                        filter, order.by, order.type="asc",
-                        return.type="GRanges")
+\S4method{exons}{EnsDb}(x, columns = listColumns(x,"exon"),
+        filter = AnnotationFilterList(), order.by,
+        order.type = "asc", return.type = "GRanges")
 
-\S4method{exonsBy}{EnsDb}(x, by=c("tx", "gene"),
-                          columns=listColumns(x, "exon"), filter, use.names=FALSE)
+\S4method{exonsBy}{EnsDb}(x, by = c("tx", "gene"),
+        columns = listColumns(x, "exon"), filter =
+        AnnotationFilterList(), use.names = FALSE)
 
-\S4method{exonsByOverlaps}{EnsDb}(x, ranges, maxgap=0L, minoverlap=1L,
-                                  type=c("any", "start", "end"),
-                                  columns=listColumns(x, "exon"),
-                                  filter)
+\S4method{exonsByOverlaps}{EnsDb}(x, ranges, maxgap = 0L, minoverlap = 1L,
+        type = c("any", "start", "end"), columns = listColumns(x, "exon"),
+        filter = AnnotationFilterList())
 
-\S4method{transcripts}{EnsDb}(x, columns=listColumns(x, "tx"),
-                              filter, order.by, order.type="asc",
-                              return.type="GRanges")
+\S4method{transcripts}{EnsDb}(x, columns = listColumns(x, "tx"),
+        filter = AnnotationFilterList(), order.by, order.type = "asc",
+        return.type = "GRanges")
 
-\S4method{transcriptsBy}{EnsDb}(x, by=c("gene", "exon"),
-                                columns=listColumns(x, "tx"), filter)
+\S4method{transcriptsBy}{EnsDb}(x, by = c("gene", "exon"),
+        columns = listColumns(x, "tx"), filter = AnnotationFilterList())
 
-\S4method{transcriptsByOverlaps}{EnsDb}(x, ranges, maxgap=0L, minoverlap=1L,
-                                        type=c("any", "start", "end"),
-                                        columns=listColumns(x, "tx"),
-                                        filter)
+\S4method{transcriptsByOverlaps}{EnsDb}(x, ranges, maxgap = 0L,
+        minoverlap = 1L, type = c("any", "start", "end"),
+        columns = listColumns(x, "tx"), filter = AnnotationFilterList())
 
-\S4method{promoters}{EnsDb}(x, upstream=2000, downstream=200, ...)
+\S4method{promoters}{EnsDb}(x, upstream = 2000, downstream = 200, ...)
 
-\S4method{genes}{EnsDb}(x, columns=listColumns(x, "gene"), filter,
-                        order.by, order.type="asc",
-                        return.type="GRanges")
+\S4method{genes}{EnsDb}(x, columns = c(listColumns(x, "gene"), "entrezid"),
+        filter = AnnotationFilterList(), order.by, order.type = "asc",
+        return.type = "GRanges")
 
-\S4method{disjointExons}{EnsDb}(x, aggregateGenes=FALSE,
-                                includeTranscripts=TRUE, filter, ...)
+\S4method{disjointExons}{EnsDb}(x, aggregateGenes = FALSE,
+        includeTranscripts = TRUE, filter = AnnotationFilterList(), ...)
 
-\S4method{cdsBy}{EnsDb}(x, by=c("tx", "gene"), columns=NULL, filter,
-                        use.names=FALSE)
+\S4method{cdsBy}{EnsDb}(x, by = c("tx", "gene"), columns = NULL,
+        filter = AnnotationFilterList(), use.names = FALSE)
 
-\S4method{fiveUTRsByTranscript}{EnsDb}(x, columns=NULL, filter)
+\S4method{fiveUTRsByTranscript}{EnsDb}(x, columns = NULL,
+        filter = AnnotationFilterList())
 
-\S4method{threeUTRsByTranscript}{EnsDb}(x, columns=NULL, filter)
+\S4method{threeUTRsByTranscript}{EnsDb}(x, columns = NULL,
+        filter = AnnotationFilterList())
 
 \S4method{toSAF}{GRangesList}(x, ...)
 
@@ -127,9 +129,13 @@
   }
 
   \item{filter}{
-    A filter object extending \code{\linkS4class{BasicFilter}} or a list
-    of such object(s) to select specific entries from the database (see
-    examples below).
+    A filter describing which results to retrieve from the database. Can
+    be a single object extending
+    \code{\link[AnnotationFilter]{AnnotationFilter}}, an
+    \code{\link[AnnotationFilter]{AnnotationFilterList}} object
+    combining several such objects or a \code{formula} representing a
+    filter expression (see examples below or
+    \code{\link[AnnotationFilter]{AnnotationFilter}} for more details).
   }
 
   \item{includeTranscripts}{
@@ -154,8 +160,9 @@
   }
 
   \item{order.by}{
-    Name of one of the columns above on which the
-    results should be sorted.
+    Character vector specifying the column(s) by which the result should
+    be ordered. This can be either in the form of
+    \code{"gene_id, seq_name"} or \code{c("gene_id", "seq_name")}.
   }
 
   \item{order.type}{
@@ -170,10 +177,10 @@
 
   \item{return.type}{
     Type of the returned object. Can be either
-    \code{"data.frame"}, \code{"DataFrame"} or \code{"GRanges"}. In the latter case the return
-    object will be a \code{GRanges} object with the GRanges specifying the
-    chromosomal start and end coordinates of the feature (gene,
-    transcript or exon, depending whether \code{genes},
+    \code{"data.frame"}, \code{"DataFrame"} or \code{"GRanges"}. In the
+    latter case the return object will be a \code{GRanges} object with
+    the GRanges specifying the chromosomal start and end coordinates of
+    the feature (gene, transcript or exon, depending whether \code{genes},
     \code{transcripts} or \code{exons} was called). All additional
     columns are added as metadata columns to the GRanges object.
   }
@@ -271,9 +278,12 @@
       and are added to the respective promoter annotation.
     }
     \item{genes}{
-      Retrieve gene information from the database. Additional
-      columns from transcripts or exons associated with the genes can be specified
-      and are added to the respective gene annotation.
+      Retrieve gene information from the database. Additional columns
+      from transcripts or exons associated with the genes can be
+      specified and are added to the respective gene annotation. Note
+      that column \code{"entrezid"} is a \code{list} of Entrezgene
+      identifiers to accomodate the potential 1:n mapping between
+      Ensembl genes and Entrezgene IDs.
     }
 
     \item{disjointExons}{
@@ -324,9 +334,10 @@
   \describe{
     \item{gene_id}{the Ensembl gene ID of the gene.}
     \item{gene_name}{the name of the gene (in most cases its official symbol).}
-    \item{entrezid}{the NCBI Entrezgene ID of the gene; note that this
-      can also be a \code{";"} separated list of IDs for Ensembl genes
-      mapped to more than one Entrezgene.}
+    \item{entrezid}{the NCBI Entrezgene ID of the gene. Note that this
+      column contains a \code{list} of Entrezgene identifiers to
+      accommodate the potential 1:n mapping between Ensembl genes and
+      Entrezgene IDs.}
     \item{gene_biotype}{the biotype of the gene.}
     \item{gene_seq_start}{the start coordinate of the gene on the
       sequence (usually a chromosome).}
@@ -352,14 +363,14 @@
       its position inside these transcript might differ.}
   }
 
-  Also, the vignette provides examples on how to retrieve sequences for
-  genes/transcripts/exons.
+  Many \code{EnsDb} databases provide also protein related
+  annotations. See \code{\link{listProteinColumns}} for more information.
 }
 \note{
   Ensembl defines genes not only on standard chromosomes, but also on
   patched chromosomes and chromosome variants. Thus it might be
   advisable to restrict the queries to just those chromosomes of
-  interest (e.g. by specifying a \code{SeqnameFilter(c(1:22, "X", "Y"))}).
+  interest (e.g. by specifying a \code{SeqNameFilter(c(1:22, "X", "Y"))}).
   In addition, also so called LRG genes (Locus Reference Genomic) are defined in
   Ensembl. Their gene id starts with LRG instead of ENS for Ensembl
   genes, thus, a filter can be applied to specifically select those
@@ -381,10 +392,10 @@
   For \code{exons}, \code{transcripts} and \code{genes},
   a \code{data.frame}, \code{DataFrame}
   or a \code{GRanges}, depending on the value of the
-  \code{return.type} parameter. The result
-  is ordered as specified by the parameter \code{order.by} or, if not
-  provided, by \code{seq_name} and chromosomal start coordinate, but NOT by any
-  ordering of values in eventually submitted filter objects.
+  \code{return.type} parameter. The result is ordered as specified by
+  the parameter \code{order.by} or, if not provided, by \code{seq_name}
+  and chromosomal start coordinate, but NOT by any ordering of values in
+  eventually submitted filter objects.
 
   For \code{exonsBy}, \code{transcriptsBy}:
   a \code{GRangesList}, depending on the value of the
@@ -428,8 +439,9 @@
   Johannes Rainer, Tim Triche
 }
 \seealso{
-  \code{\link{makeEnsembldbPackage}}, \code{\linkS4class{BasicFilter}},
-      \code{\link{listColumns}}, \code{\link{lengthOf}}
+  \code{\link{supportedFilters}} to get an overview of supported filters.
+  \code{\link{makeEnsembldbPackage}},
+  \code{\link{listColumns}}, \code{\link{lengthOf}}
 }
 \examples{
 
@@ -438,89 +450,94 @@ edb <- EnsDb.Hsapiens.v75
 
 ######   genes
 ##
-## get all genes endcoded on chromosome Y
-AllY <- genes(edb, filter=SeqnameFilter("Y"))
+## Get all genes encoded on chromosome Y
+AllY <- genes(edb, filter = SeqNameFilter("Y"))
 AllY
 
-## return result as DataFrame.
+## Return the result as a DataFrame; also, we use a filter expression here
+## to define which features to extract from the database.
 AllY.granges <- genes(edb,
-                      filter=SeqnameFilter("Y"),
+                      filter = ~ seq_name == "Y",
                       return.type="DataFrame")
 AllY.granges
 
-## include all transcripts of the gene and their chromosomal
+## Include all transcripts of the gene and their chromosomal
 ## coordinates, sort by chrom start of transcripts and return as
 ## GRanges.
 AllY.granges.tx <- genes(edb,
-                         filter=SeqnameFilter("Y"),
-                         columns=c("gene_id", "seq_name",
-                             "seq_strand", "tx_id", "tx_biotype",
-                             "tx_seq_start", "tx_seq_end"),
-                         order.by="tx_seq_start")
+                         filter = SeqNameFilter("Y"),
+                         columns = c("gene_id", "seq_name",
+                                     "seq_strand", "tx_id", "tx_biotype",
+                                     "tx_seq_start", "tx_seq_end"),
+                         order.by = "tx_seq_start")
 AllY.granges.tx
 
 
 
 ######   transcripts
 ##
-## get all transcripts of a gene
+## Get all transcripts of a gene
 Tx <- transcripts(edb,
-                  filter=GeneidFilter("ENSG00000184895"),
-                  order.by="tx_seq_start")
+                  filter = GeneIdFilter("ENSG00000184895"),
+                  order.by = "tx_seq_start")
 Tx
 
-## get all transcripts of two genes along with some information on the
+## Get all transcripts of two genes along with some information on the
 ## gene and transcript
 Tx <- transcripts(edb,
-                  filter=GeneidFilter(c("ENSG00000184895",
-                      "ENSG00000092377")),
-                      columns=c("gene_id", "gene_seq_start",
-                          "gene_seq_end", "gene_biotype", "tx_biotype"))
+                  filter = GeneIdFilter(c("ENSG00000184895",
+                                          "ENSG00000092377")),
+                  columns = c("gene_id", "gene_seq_start", "gene_seq_end",
+                              "gene_biotype", "tx_biotype"))
 Tx
 
 ######   promoters
 ##
-## get the bona-fide promoters (2k up- to 200nt downstream of TSS)
-promoters(edb, filter=GeneidFilter(c("ENSG00000184895",
-                                     "ENSG00000092377")))
+## Get the bona-fide promoters (2k up- to 200nt downstream of TSS)
+promoters(edb, filter = GeneIdFilter(c("ENSG00000184895",
+                                       "ENSG00000092377")))
 
 ######   exons
 ##
-## get all exons of the provided genes
+## Get all exons of protein coding transcript for the gene ENSG00000184895
 Exon <- exons(edb,
-              filter=GeneidFilter(c("ENSG00000184895",
-                  "ENSG00000092377")),
-              order.by="exon_seq_start",
-              columns=c( "gene_id", "gene_seq_start",
-                  "gene_seq_end", "gene_biotype"))
+              filter = ~ gene_id == "ENSG00000184895" &
+                  tx_biotype == "protein_coding",
+              columns = c("gene_id", "gene_seq_start", "gene_seq_end",
+                          "tx_biotype", "gene_biotype"))
 Exon
 
 
 
 #####    exonsBy
 ##
-## get all exons for transcripts encoded on chromosomes X and Y.
-ETx <- exonsBy(edb, by="tx",
-               filter=SeqnameFilter(c("X", "Y")))
+## Get all exons for transcripts encoded on chromosomes X and Y.
+ETx <- exonsBy(edb, by = "tx",
+               filter = SeqNameFilter(c("X", "Y")))
 ETx
-## get all exons for genes encoded on chromosome 1 to 22, X and Y and
+## Get all exons for genes encoded on chromosome 1 to 22, X and Y and
 ## include additional annotation columns in the result
-EGenes <- exonsBy(edb, by="gene",
-                  filter=SeqnameFilter(c("X", "Y")),
-                  columns=c("gene_biotype", "gene_name"))
+EGenes <- exonsBy(edb, by = "gene",
+                  filter = SeqNameFilter(c("X", "Y")),
+                  columns = c("gene_biotype", "gene_name"))
 EGenes
 
 ## Note that this might also contain "LRG" genes.
 length(grep(names(EGenes), pattern="LRG"))
 
-## to fetch just Ensemblgenes, use an GeneidFilter with value
+## to fetch just Ensemblgenes, use an GeneIdFilter with value
 ## "ENS%" and condition "like"
-
+eg <- exonsBy(edb, by = "gene",
+              filter = AnnotationFilterList(SeqNameFilter(c("X", "Y")),
+                                            GeneIdFilter("ENS", "startsWith")),
+              columns = c("gene_biotype", "gene_name"))
+eg
+length(grep(names(eg), pattern="LRG"))
 
 #####    transcriptsBy
 ##
-TGenes <- transcriptsBy(edb, by="gene",
-                        filter=SeqnameFilter(c("X", "Y")))
+TGenes <- transcriptsBy(edb, by = "gene",
+                        filter = SeqNameFilter(c("X", "Y")))
 TGenes
 
 ## convert this to a SAF formatted data.frame that can be used by the
@@ -530,8 +547,8 @@ head(toSAF(TGenes))
 
 #####   transcriptsByOverlaps
 ##
-ir <- IRanges(start=c(2654890, 2709520, 28111770),
-              end=c(2654900, 2709550, 28111790))
+ir <- IRanges(start = c(2654890, 2709520, 28111770),
+              end = c(2654900, 2709550, 28111790))
 gr <- GRanges(rep("Y", length(ir)), ir)
 
 ## Retrieve all transcripts overlapping any of the regions.
@@ -539,8 +556,8 @@ txs <- transcriptsByOverlaps(edb, gr)
 txs
 
 ## Alternatively, use a GRangesFilter
-grf <- GRangesFilter(gr, condition="overlapping")
-txs <- transcripts(edb, filter=grf)
+grf <- GRangesFilter(gr, type = "any")
+txs <- transcripts(edb, filter = grf)
 txs
 
 
@@ -548,15 +565,15 @@ txs
 ## Get the coding region for all transcripts on chromosome Y.
 ## Specifying also additional annotation columns (in addition to the default
 ## exon_id and exon_rank).
-cds <- cdsBy(edb, by="tx", filter=SeqnameFilter("Y"),
-             columns=c("tx_biotype", "gene_name"))
+cds <- cdsBy(edb, by = "tx", filter = SeqNameFilter("Y"),
+             columns = c("tx_biotype", "gene_name"))
 
 ####    the 5' untranslated regions:
-fUTRs <- fiveUTRsByTranscript(edb, filter=SeqnameFilter("Y"))
+fUTRs <- fiveUTRsByTranscript(edb, filter = SeqNameFilter("Y"))
 
 ####    the 3' untranslated regions with additional column gene_name.
-tUTRs <- threeUTRsByTranscript(edb, filter=SeqnameFilter("Y"),
-                               columns="gene_name")
+tUTRs <- threeUTRsByTranscript(edb, filter = SeqNameFilter("Y"),
+                               columns = "gene_name")
 
 
 }
diff --git a/man/EnsDb-lengths.Rd b/man/EnsDb-lengths.Rd
index c6cc796..1c56bcd 100644
--- a/man/EnsDb-lengths.Rd
+++ b/man/EnsDb-lengths.Rd
@@ -14,7 +14,7 @@
 }
 \usage{
 
-\S4method{lengthOf}{EnsDb}(x, of="gene", filter=list())
+\S4method{lengthOf}{EnsDb}(x, of="gene", filter = AnnotationFilterList())
 
 }
 \arguments{
@@ -22,8 +22,13 @@
   (In alphabetic order)
 
   \item{filter}{
-    list of \code{\linkS4class{BasicFilter}} instance(s) to
-    select specific entries from the database (see examples below).
+    A filter describing which results to retrieve from the database. Can
+    be a single object extending
+    \code{\link[AnnotationFilter]{AnnotationFilter}}, an
+    \code{\link[AnnotationFilter]{AnnotationFilterList}} object
+    combining several such objects or a \code{formula} representing a
+    filter expression (see examples below or
+    \code{\link[AnnotationFilter]{AnnotationFilter}} for more details).
   }
 
   \item{of}{
@@ -75,31 +80,28 @@ edb <- EnsDb.Hsapiens.v75
 #####    lengthOf
 ##
 ## length of a specific gene.
-lengthOf(edb,
-         filter=list(GeneidFilter("ENSG00000000003")))
+lengthOf(edb, filter = GeneIdFilter("ENSG00000000003"))
 
 ## length of a transcript
-lengthOf(edb, of="tx",
-         filter=list(TxidFilter("ENST00000494424")))
+lengthOf(edb, of = "tx", filter = TxIdFilter("ENST00000494424"))
 
-## average length of all protein coding genes encoded on chromosomes X
-## and Y
-mean(lengthOf(edb, of="gene",
-              filter=list(GenebiotypeFilter("protein_coding"),
-                  SeqnameFilter(c("X", "Y")))))
+## Average length of all protein coding genes encoded on chromosomes X
+mean(lengthOf(edb, of = "gene",
+              filter = ~ gene_biotype == "protein_coding" &
+                  seq_name == "X"))
 
-## average length of all snoRNAs
-mean(lengthOf(edb, of="gene",
-              filter=list(GenebiotypeFilter("snoRNA"),
-                  SeqnameFilter(c("X", "Y")))))
+## Average length of all snoRNAs
+mean(lengthOf(edb, of = "gene",
+              filter = ~ gene_biotype == "snoRNA" &
+                  seq_name == "X"))
 
 ##### transcriptLengths
 ##
 ## Calculate the length of transcripts encoded on chromosome Y, including
 ## length of the CDS, 5' and 3' UTR.
-##len <- transcriptLengths(edb, with.cds_len=TRUE, with.utr5_len=TRUE,
-##                         with.utr3_len=TRUE, filter=SeqnameFilter("Y"))
-##head(len)
+len <- transcriptLengths(edb, with.cds_len = TRUE, with.utr5_len = TRUE,
+                         with.utr3_len = TRUE, filter = SeqNameFilter("Y"))
+head(len)
 
 }
 \keyword{classes}
diff --git a/man/EnsDb-seqlevels.Rd b/man/EnsDb-seqlevels.Rd
index 5648f69..6cf4747 100644
--- a/man/EnsDb-seqlevels.Rd
+++ b/man/EnsDb-seqlevels.Rd
@@ -116,12 +116,12 @@ seqlevels(edb)
 
 ## Change the option ensembldb.seqnameNotFound to return NA in case
 ## the seqname can not be mapped form Ensembl to UCSC.
-options(ensembldb.seqnameNotFound=NA)
+options(ensembldb.seqnameNotFound = NA)
 
 seqlevels(edb)
 
 ## Restoring the original setting.
-options(ensembldb.seqnameNotFound="ORIGINAL")
+options(ensembldb.seqnameNotFound = "ORIGINAL")
 
 
 ## Integrate Ensembl based annotations with a BSgenome package that is based on
@@ -135,9 +135,10 @@ unique(genome(edb))
 ## Although differently named, both represent genome build GRCh37.
 
 ## Extract the full transcript sequences of all lincRNAs encoded on chromsome Y.
-yTxSeqs <- extractTranscriptSeqs(bsg, exonsBy(edb, "tx",
-                                              filter=list(SeqnameFilter("chrY"),
-                                                          GenebiotypeFilter("lincRNA"))))
+yTxSeqs <- extractTranscriptSeqs(bsg,
+                                 exonsBy(edb, "tx",
+                                         filter = ~ seq_name == "chrY" &
+                                             gene_biotype == "lincRNA"))
 yTxSeqs
 
 }
diff --git a/man/EnsDb-sequences.Rd b/man/EnsDb-sequences.Rd
index e2b9ce5..dbddb64 100644
--- a/man/EnsDb-sequences.Rd
+++ b/man/EnsDb-sequences.Rd
@@ -88,7 +88,6 @@
   Johannes Rainer
 }
 \seealso{
-  \code{\linkS4class{BasicFilter}}
   \code{\link{transcripts}}
   \code{\link{exonsBy}}
 }
@@ -105,7 +104,7 @@ edb <- EnsDb.Hsapiens.v75
     Dna <- getGenomeFaFile(edb)
     ## Extract the transcript sequence for all transcripts encoded on chromosome
     ## Y.
-    ##extractTranscriptSeqs(Dna, edb, filter=SeqnameFilter("Y"))
+    ##extractTranscriptSeqs(Dna, edb, filter=SeqNameFilter("Y"))
 
 }
 
diff --git a/man/EnsDb-utils.Rd b/man/EnsDb-utils.Rd
index b57d042..a1e42c2 100644
--- a/man/EnsDb-utils.Rd
+++ b/man/EnsDb-utils.Rd
@@ -10,10 +10,9 @@
 }
 \usage{
 
-\S4method{getGeneRegionTrackForGviz}{EnsDb}(x, filter=list(),
-                                            chromosome=NULL,
-                                            start=NULL, end=NULL,
-                                            featureIs="gene_biotype")
+\S4method{getGeneRegionTrackForGviz}{EnsDb}(x,
+        filter = AnnotationFilterList(), chromosome = NULL,
+        start = NULL, end = NULL, featureIs = "gene_biotype")
 }
 \arguments{
 
@@ -37,9 +36,13 @@
   }
 
   \item{filter}{
-    A filter object extending \code{\linkS4class{BasicFilter}} or a list
-    of such object(s) to select specific entries from the database (see
-    examples below).
+    A filter describing which results to retrieve from the database. Can
+    be a single object extending
+    \code{\link[AnnotationFilter]{AnnotationFilter}}, an
+    \code{\link[AnnotationFilter]{AnnotationFilterList}} object
+    combining several such objects or a \code{formula} representing a
+    filter expression (see examples below or
+    \code{\link[AnnotationFilter]{AnnotationFilter}} for more details).
   }
 
   \item{start}{
@@ -89,7 +92,6 @@
   Johannes Rainer
 }
 \seealso{
-  \code{\linkS4class{BasicFilter}}
   \code{\link{transcripts}}
 }
 \examples{
@@ -99,15 +101,15 @@ edb <- EnsDb.Hsapiens.v75
 ######   getGeneRegionTrackForGviz
 ##
 ## Get all genes encoded on chromosome Y in the specifyed region.
-AllY <- getGeneRegionTrackForGviz(edb, chromosome="Y", start=5000000,
-                                  end=7000000)
+AllY <- getGeneRegionTrackForGviz(edb, chromosome = "Y", start = 5000000,
+                                  end = 7000000)
 ## We could plot this now using plotTracks(GeneRegionTrack(AllY))
 
 ## We can also use filters to further restrict the query to e.g.
 ## all lincRNA genes encoded in that region.
-lincsY <- getGeneRegionTrackForGviz(edb, chromosome="Y", start=5000000,
-                                    end=7000000,
-                                    filter=GenebiotypeFilter("lincRNA"))
+lincsY <- getGeneRegionTrackForGviz(edb, chromosome = "Y", start = 5000000,
+                                    end = 7000000,
+                                    filter = GeneBiotypeFilter("lincRNA"))
 
 }
 \keyword{classes}
diff --git a/man/EnsDb.Rd b/man/EnsDb.Rd
index 6f777ec..3f0b303 100644
--- a/man/EnsDb.Rd
+++ b/man/EnsDb.Rd
@@ -47,4 +47,3 @@ edb <- EnsDb(dbcon)
 \author{
 Johannes Rainer
 }
-
diff --git a/man/Filter-classes.Rd b/man/Filter-classes.Rd
new file mode 100644
index 0000000..51f4a80
--- /dev/null
+++ b/man/Filter-classes.Rd
@@ -0,0 +1,350 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/Classes.R, R/Methods.R, R/Methods-Filter.R
+\docType{class}
+\name{Filter-classes}
+\alias{Filter-classes}
+\alias{OnlyCodingTxFilter-class}
+\alias{OnlyCodingTxFilter}
+\alias{ProtDomIdFilter-class}
+\alias{ProtDomIdFilter}
+\alias{UniprotDbFilter-class}
+\alias{UniprotDbFilter}
+\alias{UniprotMappingTypeFilter-class}
+\alias{UniprotMappingTypeFilter}
+\alias{supportedFilters,EnsDb-method}
+\alias{seqnames,GRangesFilter-method}
+\alias{seqlevels,GRangesFilter-method}
+\title{Filters supported by ensembldb}
+\usage{
+OnlyCodingTxFilter()
+
+ProtDomIdFilter(value, condition = "==")
+
+UniprotDbFilter(value, condition = "==")
+
+UniprotMappingTypeFilter(value, condition = "==")
+
+\S4method{supportedFilters}{EnsDb}(object, ...)
+
+\S4method{seqnames}{GRangesFilter}(x)
+
+\S4method{seqlevels}{GRangesFilter}(x)
+}
+\arguments{
+\item{value}{The value(s) for the filter. For
+\code{\link[AnnotationFilter]{GRangesFilter}} it has to be a
+\code{\link[GenomicRanges]{GRanges}} object.}
+
+\item{condition}{\code{character(1)} specifying the \emph{condition} of the
+filter. For \code{character}-based filters (such as
+\code{\link[AnnotationFilter]{GeneIdFilter}}) \code{"=="}, \code{"!="},
+\code{"startsWith"} and \code{"endsWith"} are supported. Allowed values
+for \code{integer}-based filters (such as
+\code{\link[AnnotationFilter]{GeneStartFilter}}) are \code{"=="},
+\code{"!="}, \code{"<"}. \code{"<="}, \code{">"} and \code{">="}.}
+
+\item{object}{For \code{supportedFilters}: an \code{EnsDb} object.}
+
+\item{...}{For \code{supportedFilters}: currently not used.}
+
+\item{x}{For \code{seqnames}, \code{seqlevels}: a \code{GRangesFilter} object.}
+}
+\value{
+For \code{ProtDomIdFilter}: A \code{ProtDomIdFilter} object.
+
+For \code{UniprotDbFilter}: A \code{UniprotDbFilter} object.
+
+For \code{UniprotMappingTypeFilter}: A
+\code{UniprotMappingTypeFilter} object.
+
+For \code{supportedFilters}: the names of the supported filter
+    classes.
+}
+\description{
+\code{ensembldb} supports most of the filters from the
+    \code{\link{AnnotationFilter}} package to retrieve specific content from
+    \code{\linkS4class{EnsDb}} databases.
+
+\code{supportedFilters} returns the names of all supported
+    filters for the \code{EnsDb} object.
+
+\code{seqnames}: accessor for the sequence names of the
+\code{GRanges} object within a \code{GRangesFilter}
+
+\code{seqnames}: accessor for the \code{seqlevels} of the
+\code{GRanges} object within a \code{GRangesFilter}
+}
+\details{
+\code{ensembldb} supports the following filters from the
+\code{AnnotationFilter} package:
+
+\describe{
+
+\item{GeneIdFilter}{
+    filter based on the Ensembl gene ID.
+}
+
+\item{GenenameFilter}{
+    filter based on the name of the gene as provided by Ensembl. In most cases
+    this will correspond to the official gene symbol.
+}
+
+\item{SymbolFilter}{
+    filter based on the gene names. \code{\linkS4class{EnsDb}} objects don't
+    have a dedicated \emph{symbol} column, the filtering is hence based on the
+    gene names.
+}
+
+\item{GeneBiotype}{
+    filter based on the biotype of genes (e.g. \code{"protein_coding"}).
+}
+
+\item{GeneStartFilter}{
+    filter based on the genomic start coordinate of genes.
+}
+
+\item{GeneEndFilter}{
+    filter based on the genomic end coordinate of genes.
+}
+
+\item{EntrezidFilter}{
+    filter based on the genes' NCBI Entrezgene ID.
+}
+
+\item{TxIdFilter}{
+    filter based on the Ensembld transcript ID.
+}
+
+\item{TxNameFilter}{
+    filter based on the Ensembld transcript ID; no transcript names are
+    provided in \code{\linkS4class{EnsDb}} databases.
+}
+
+\item{TxBiotypeFilter}{
+    filter based on the transcripts' biotype.
+}
+
+\item{TxStartFilter}{
+    filter based on the genomic start coordinate of the transcripts.
+}
+
+\item{TxEndFilter}{
+    filter based on the genonic end coordinates of the transcripts.
+}
+
+\item{ExonIdFilter}{
+    filter based on Ensembl exon IDs.
+}
+
+\item{ExonRankFilter}{
+    filter based on the index/rank of the exon within the transcrips.
+}
+
+\item{ExonStartFilter}{
+    filter based on the genomic start coordinates of the exons.
+}
+
+\item{ExonEndFilter}{
+    filter based on the genomic end coordinates of the exons.
+}
+
+\item{GRangesFilter}{
+    Allows to fetch features within or overlapping specified genomic region(s)/
+    range(s). This filter takes a \code{\link[GenomicRanges]{GRanges}} object
+    as input and, if \code{type = "any"} (the default) will restrict
+    results to features (genes, transcripts or exons) that are partially
+    overlapping the region. Alternatively, by specifying
+    \code{condition = "within"} it will return features located within the
+    range. In addition, the \code{\link[AnnotationFilter]{GRangesFilter}}
+    supports \code{condition = "start"}, \code{condition = "end"} and
+    \code{condition = "equal"} filtering for features with the same start or
+    end coordinate or that are equal to the \code{GRanges}.
+
+    Note that the type of feature on which the filter is applied depends on
+    the method that is called, i.e. \code{\link{genes}} will filter on the
+    genomic coordinates of genes, \code{\link{transcripts}} on those of
+    transcripts and \code{\link{exons}} on exon coordinates.
+
+    Calls to the methods \code{\link{exonsBy}}, \code{\link{cdsBy}} and
+    \code{\link{transcriptsBy}} use the start and end coordinates of the
+    feature type specified with argument \code{by} (i.e. \code{"gene"},
+    \code{"transcript"} or \code{"exon"}) for the filtering.
+
+    If the specified \code{GRanges} object defines multiple regions, all
+    features within (or overlapping) any of these regions are returned.
+
+    Chromosome names/seqnames can be provided in UCSC format (e.g.
+    \code{"chrX"}) or Ensembl format (e.g. \code{"X"}); see
+    \code{\link{seqlevelsStyle}} for more information. 
+}
+
+\item{SeqNameFilter}{
+    filter based on chromosome names.
+}
+
+\item{SeqStrandFilter}{
+    filter based on the chromosome strand. The strand can be specified with
+    \code{value = "+"}, \code{value = "-"}, \code{value = -1} or
+    \code{value = 1}.
+}
+
+\item{ProteinIdFilter}{
+    filter based on Ensembl protein IDs. This filter is only supported if the
+    \code{\linkS4class{EnsDb}} provides protein annotations; use the
+    \code{\link{hasProteinData}} method to evaluate.
+}
+
+\item{UniprotFilter}{
+    filter based on Uniprot IDs. This filter is only supported if the
+    \code{\linkS4class{EnsDb}} provides protein annotations; use the
+    \code{\link{hasProteinData}} method to evaluate.
+}
+
+}
+
+In addition, the following filters are defined by \code{ensembldb}:
+\describe{
+
+\item{UniprotDbFilter}{
+    allows to filter results based on the specified Uniprot database name(s).
+}
+
+\item{UniprotMappingTypeFilter}{
+    allows to filter results based on the mapping method/type that was used
+    to assign Uniprot IDs to Ensembl protein IDs.
+}
+
+\item{ProtDomIdFilter}{
+    allows to retrieve entries from the database matching the provided filter
+    criteria based on their protein  domain ID (\emph{protein_domain_id}).
+}
+
+\item{OnlyCodingTxFilter}{
+    allows to retrieve entries only for protein coding transcripts, i.e.
+    transcripts with a CDS. This filter does not take any input arguments.
+}
+
+}
+}
+\note{
+For users of \code{ensembldb} version < 2.0: in the
+    \code{\link[AnnotationFilter]{GRangesFilter}} from the
+    \code{AnnotationFilter} package the \code{condition} parameter was
+    renamed to \code{type} (to be consistent with the \code{IRanges} package)
+    . In addition, the \code{condition = "overlapping"} is no longer
+    recognized. To retrieve all features overlapping the range
+    \code{type = "any"} has to be used.
+
+Protein annotation based filters can only be used if the
+    \code{\linkS4class{EnsDb}} database contains protein annotations, i.e.
+    if \code{\link{hasProteinData}} is \code{TRUE}. Also, only protein coding
+    transcripts will have protein annotations available, thus, non-coding
+    transcripts/genes will not be returned by the queries using protein
+    annotation filters.
+}
+\examples{
+
+## Create a filter that could be used to retrieve all informations for
+## the respective gene.
+gif <- GeneIdFilter("ENSG00000012817")
+gif
+
+## Create a filter for a chromosomal end position of a gene
+sef <- GeneEndFilter(10000, condition = ">")
+sef
+
+## For additional examples see the help page of "genes".
+
+
+## Example for GRangesFilter:
+## retrieve all genes overlapping the specified region
+grf <- GRangesFilter(GRanges("11", ranges = IRanges(114000000, 114000050),
+                             strand = "+"), type = "any")
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+genes(edb, filter = grf)
+
+## Get also all transcripts overlapping that region.
+transcripts(edb, filter = grf)
+
+## Retrieve all transcripts for the above gene
+gn <- genes(edb, filter = grf)
+txs <- transcripts(edb, filter = GenenameFilter(gn$gene_name))
+## Next we simply plot their start and end coordinates.
+plot(3, 3, pch=NA, xlim=c(start(gn), end(gn)), ylim=c(0, length(txs)),
+yaxt="n", ylab="")
+## Highlight the GRangesFilter region
+rect(xleft=start(grf), xright=end(grf), ybottom=0, ytop=length(txs),
+col="red", border="red")
+for(i in 1:length(txs)){
+    current <- txs[i]
+    rect(xleft=start(current), xright=end(current), ybottom=i-0.975, ytop=i-0.125, border="grey")
+    text(start(current), y=i-0.5,pos=4, cex=0.75, labels=current$tx_id)
+}
+## Thus, we can see that only 4 transcripts of that gene are indeed
+## overlapping the region.
+
+
+## No exon is overlapping that region, thus we're not getting anything
+exons(edb, filter = grf)
+
+
+## Example for ExonRankFilter
+## Extract all exons 1 and (if present) 2 for all genes encoded on the
+## Y chromosome
+exons(edb, columns = c("tx_id", "exon_idx"),
+      filter=list(SeqNameFilter("Y"),
+                  ExonRankFilter(3, condition = "<")))
+
+
+## Get all transcripts for the gene SKA2
+transcripts(edb, filter = GenenameFilter("SKA2"))
+
+## Which is the same as using a SymbolFilter
+transcripts(edb, filter = SymbolFilter("SKA2"))
+
+
+## Create a ProteinIdFilter:
+pf <- ProteinIdFilter("ENSP00000362111")
+pf
+## Using this filter would retrieve all database entries that are associated
+## with a protein with the ID "ENSP00000362111"
+if (hasProteinData(edb)) {
+    res <- genes(edb, filter = pf)
+    res
+}
+
+## UniprotFilter:
+uf <- UniprotFilter("O60762")
+## Get the transcripts encoding that protein:
+if (hasProteinData(edb)) {
+    transcripts(edb, filter = uf)
+    ## The mapping Ensembl protein ID to Uniprot ID can however be 1:n:
+    transcripts(edb, filter = TxIdFilter("ENST00000371588"),
+        columns = c("protein_id", "uniprot_id"))
+}
+
+## ProtDomIdFilter:
+pdf <- ProtDomIdFilter("PF00335")
+## Also here we could get all transcripts related to that protein domain
+if (hasProteinData(edb)) {
+    transcripts(edb, filter = pdf, columns = "protein_id")
+}
+
+}
+\seealso{
+\code{\link{supportedFilters}} to list all filters supported for \code{EnsDb}
+    objects.
+    \code{\link{listUniprotDbs}} and \code{\link{listUniprotMappingTypes}} to
+    list all Uniprot database names respectively mapping method types from
+    the database.
+
+    \code{\link[AnnotationFilter]{GeneIdFilter}} for more details on the
+    filter objects.
+
+    \code{\link{genes}}, \code{\link{transcripts}}, \code{\link{exons}},
+    \code{\link{listGenebiotypes}}, \code{\link{listTxbiotypes}}.
+}
+\author{
+Johannes Rainer
+}
diff --git a/man/GeneidFilter-class.Rd b/man/GeneidFilter-class.Rd
deleted file mode 100644
index 22f8892..0000000
--- a/man/GeneidFilter-class.Rd
+++ /dev/null
@@ -1,451 +0,0 @@
-\name{GeneidFilter-class}
-\Rdversion{1.1}
-\docType{class}
-\alias{BasicFilter-class}
-\alias{EntrezidFilter-class}
-\alias{GeneidFilter-class}
-\alias{GenebiotypeFilter-class}
-\alias{GenenameFilter-class}
-\alias{TxidFilter-class}
-\alias{TxbiotypeFilter-class}
-\alias{ExonidFilter-class}
-\alias{SeqnameFilter-class}
-\alias{SeqstrandFilter-class}
-\alias{SeqstartFilter-class}
-\alias{SeqendFilter-class}
-\alias{GRangesFilter-class}
-\alias{ExonrankFilter-class}
-\alias{column,EntrezidFilter,missing,missing-method}
-\alias{column,GeneidFilter,missing,missing-method}
-\alias{column,GenenameFilter,missing,missing-method}
-\alias{column,GenebiotypeFilter,missing,missing-method}
-\alias{column,TxidFilter,missing,missing-method}
-\alias{column,TxbiotypeFilter,missing,missing-method}
-\alias{column,ExonidFilter,missing,missing-method}
-\alias{column,ExonrankFilter,missing,missing-method}
-\alias{column,SeqnameFilter,missing,missing-method}
-\alias{column,SeqstrandFilter,missing,missing-method}
-\alias{column,SeqstartFilter,missing,missing-method}
-\alias{column,SeqendFilter,missing,missing-method}
-\alias{column,GRangesFilter,missing,missing-method}
-\alias{where,EntrezidFilter,missing,missing-method}
-\alias{where,GeneidFilter,missing,missing-method}
-\alias{where,GenenameFilter,missing,missing-method}
-\alias{where,GenebiotypeFilter,missing,missing-method}
-\alias{where,TxidFilter,missing,missing-method}
-\alias{where,TxbiotypeFilter,missing,missing-method}
-\alias{where,ExonidFilter,missing,missing-method}
-\alias{where,ExonrankFilter,missing,missing-method}
-\alias{where,SeqnameFilter,missing,missing-method}
-\alias{where,SeqstrandFilter,missing,missing-method}
-\alias{where,SeqstartFilter,missing,missing-method}
-\alias{where,SeqendFilter,missing,missing-method}
-\alias{where,GRangesFilter,missing,missing-method}
-% EnsDb, missing
-\alias{column,EntrezidFilter,EnsDb,missing-method}
-\alias{column,GeneidFilter,EnsDb,missing-method}
-\alias{column,GenenameFilter,EnsDb,missing-method}
-\alias{column,GenebiotypeFilter,EnsDb,missing-method}
-\alias{column,TxidFilter,EnsDb,missing-method}
-\alias{column,TxbiotypeFilter,EnsDb,missing-method}
-\alias{column,ExonidFilter,EnsDb,missing-method}
-\alias{column,ExonrankFilter,EnsDb,missing-method}
-\alias{column,SeqnameFilter,EnsDb,missing-method}
-\alias{column,SeqstrandFilter,EnsDb,missing-method}
-\alias{column,SeqstartFilter,EnsDb,missing-method}
-\alias{column,SeqendFilter,EnsDb,missing-method}
-\alias{column,GRangesFilter,EnsDb,missing-method}
-\alias{column,OnlyCodingTx,EnsDb,missing-method}
-\alias{where,EntrezidFilter,EnsDb,missing-method}
-\alias{where,GeneidFilter,EnsDb,missing-method}
-\alias{where,GenenameFilter,EnsDb,missing-method}
-\alias{where,GenebiotypeFilter,EnsDb,missing-method}
-\alias{where,TxidFilter,EnsDb,missing-method}
-\alias{where,TxbiotypeFilter,EnsDb,missing-method}
-\alias{where,ExonidFilter,EnsDb,missing-method}
-\alias{where,ExonrankFilter,EnsDb,missing-method}
-\alias{where,SeqnameFilter,EnsDb,missing-method}
-\alias{where,SeqstrandFilter,EnsDb,missing-method}
-\alias{where,SeqstartFilter,EnsDb,missing-method}
-\alias{where,SeqendFilter,EnsDb,missing-method}
-\alias{where,GRangesFilter,EnsDb,missing-method}
-\alias{where,OnlyCodingTx,EnsDb,missing-method}
-% EnsDb, character
-\alias{column,EntrezidFilter,EnsDb,character-method}
-\alias{column,GeneidFilter,EnsDb,character-method}
-\alias{column,GenenameFilter,EnsDb,character-method}
-\alias{column,GenebiotypeFilter,EnsDb,character-method}
-\alias{column,TxidFilter,EnsDb,character-method}
-\alias{column,TxbiotypeFilter,EnsDb,character-method}
-\alias{column,ExonidFilter,EnsDb,character-method}
-\alias{column,ExonrankFilter,EnsDb,character-method}
-\alias{column,SeqnameFilter,EnsDb,character-method}
-\alias{column,SeqstrandFilter,EnsDb,character-method}
-\alias{column,SeqstartFilter,EnsDb,character-method}
-\alias{column,SeqendFilter,EnsDb,character-method}
-\alias{column,GRangesFilter,EnsDb,character-method}
-\alias{column,OnlyCodingTx,EnsDb,character-method}
-\alias{where,EntrezidFilter,EnsDb,character-method}
-\alias{where,GeneidFilter,EnsDb,character-method}
-\alias{where,GenenameFilter,EnsDb,character-method}
-\alias{where,GenebiotypeFilter,EnsDb,character-method}
-\alias{where,TxidFilter,EnsDb,character-method}
-\alias{where,TxbiotypeFilter,EnsDb,character-method}
-\alias{where,ExonidFilter,EnsDb,character-method}
-\alias{where,ExonrankFilter,EnsDb,character-method}
-\alias{where,SeqnameFilter,EnsDb,character-method}
-\alias{where,SeqstrandFilter,EnsDb,character-method}
-\alias{where,SeqstartFilter,EnsDb,character-method}
-\alias{where,SeqendFilter,EnsDb,character-method}
-\alias{where,GRangesFilter,EnsDb,character-method}
-\alias{where,OnlyCodingTx,EnsDb,character-method}
-%
-\alias{condition,BasicFilter-method}
-\alias{condition<-,BasicFilter-method}
-\alias{condition<-}
-\alias{condition,GRangesFilter-method}
-\alias{condition<-,GRangesFilter-method}
-\alias{show,BasicFilter-method}
-\alias{show,GRangesFilter-method}
-\alias{print,BasicFilter-method}
-\alias{where,BasicFilter,missing,missing-method}
-\alias{where,BasicFilter,EnsDb,missing-method}
-\alias{where,BasicFilter,EnsDb,character-method}
-\alias{where,list,EnsDb,character-method}
-\alias{where,list,EnsDb,missing-method}
-\alias{where,list,missing,missing-method}
-\alias{value,BasicFilter,missing-method}
-\alias{value<-}
-\alias{value<-,BasicFilter-method}
-\alias{value<-,ExonrankFilter-method}
-\alias{value,BasicFilter,EnsDb-method}
-\alias{value,GRangesFilter,missing-method}
-\alias{value,GRangesFilter,EnsDb-method}
-\alias{value,SeqnameFilter,EnsDb-method}
-\alias{condition}
-\alias{value}
-\alias{column}
-\alias{where}
-% Additional GRangesFilter stuff
-\alias{end,GRangesFilter-method}
-\alias{seqlevels,GRangesFilter-method}
-\alias{seqnames,GRangesFilter-method}
-\alias{start,GRangesFilter-method}
-\alias{strand,GRangesFilter-method}
-% SymbolFilter
-\alias{SymbolFilter-class}
-\alias{column,SymbolFilter,missing,missing-method}
-\alias{column,SymbolFilter,EnsDb,missing-method}
-\alias{column,SymbolFilter,EnsDb,character-method}
-\alias{where,SymbolFilter,missing,missing-method}
-\alias{where,SymbolFilter,EnsDb,missing-method}
-\alias{where,SymbolFilter,EnsDb,character-method}
-
-
-\title{Filter results fetched from the Ensembl database}
-\description{
-  These classes allow to specify which entries (i.e. genes, transcripts
-  or exons) should be retrieved from the database.
-}
-\section{Objects from the Class}{
-  While objects can be created by calls e.g. of the form
-  \code{new("GeneidFilter", ...)} users are strongly encouraged to use the
-  specific functions: \code{\link{GeneidFilter}}, \code{\link{EntrezidFilter}},
-  \code{\link{GenenameFilter}}, \code{\link{GenebiotypeFilter}},
-  \code{\link{GRangesFilter}}, \code{\link{SymbolFilter}},
-  \code{\link{TxidFilter}}, \code{\link{TxbiotypeFilter}},
-  \code{\link{ExonidFilter}}, \code{\link{ExonrankFilter}},
-  \code{\link{SeqnameFilter}}, \code{\link{SeqstrandFilter}},
-  \code{\link{SeqstartFilter}} and \code{\link{SeqendFilter}}.
-
-  See examples below for usage.
-}
-\section{Slots}{
-  \describe{
-    \item{\code{condition}:}{
-      Object of class \code{"character"}: can be
-      either \code{"="}, \code{"in"} or \code{"like"} to filter on character values
-      (e.g. gene id, gene biotype, seqname etc), or \code{"="}, \code{">"}
-      or \code{"<"} for numerical values (chromosome/seq
-      coordinates). Note that for \code{"like"} \code{value} should be a
-      SQL pattern (e.g. \code{"ENS\%"}).
-    }
-
-    \item{\code{value}:}{
-      Object of class \code{"character"}: the value
-      to be used for filtering.
-    }
-
-  }
-}
-\section{Extends}{
-  Class \code{\linkS4class{BasicFilter}}, directly.
-}
-\section{Methods for all \code{BasicFilter} objects}{
-  \describe{
-    Note: these methods are applicable to all classes extending the
-    \code{BasicFilter} class.
-
-    \item{column}{\code{signature(object = "GeneidFilter", db = "EnsDb",
-	with.tables = "character")}:
-      returns the column (attribute name) to be used for the
-      filtering. Submitting the \code{db} parameter ensures that
-      returned column is valid in the corresponding database schema. The
-      optional argument \code{with.tables} allows to specify which in
-      which database table the function should look for the
-      attribute/column name. By default the method will check all
-      database tables.
-    }
-
-    \item{column}{\code{signature(object = "GeneidFilter", db = "EnsDb",
-	with.tables = "missing")}:
-      returns the column (attribute name) to be used for the
-      filtering. Submitting the \code{db} parameter ensures that
-      returned column is valid in the corresponding database schema.
-    }
-
-    \item{column}{\code{signature(object = "GeneidFilter", db = "missing",
-	with.tables = "missing")}:
-      returns the column (table column name) to be used for the
-      filtering.
-    }
-
-    \item{condition}{\code{signature(x = "BasicFilter")}: returns
-      the value for the \code{condition} slot.
-    }
-
-    \item{condition<-}{
-      setter method for condition.
-    }
-
-    \item{value}{\code{signature(x = "BasicFilter", db = "EnsDb")}:
-      returns the value of the \code{value} slot of the filter object.
-    }
-
-    \item{value<-}{
-      setter method for value.
-    }
-
-    \item{where}{\code{signature(object = "GeneidFilter", db = "EnsDb",
-	with.tables = "character")}:
-      returns the where condition for the SQL call. Submitting also the
-      \code{db} parameter ensures that
-      the columns are valid in the corresponding database schema. The
-      optional argument \code{with.tables} allows to specify which in
-      which database table the function should look for the
-      attribute/column name. By default the method will check all
-      database tables.
-    }
-
-    \item{where}{\code{signature(object = "GeneidFilter", db = "EnsDb",
-	with.tables = "missing")}:
-      returns the
-      where condition for the SQL call. Submitting also the \code{db}
-      parameter ensures that
-      the columns are valid in the corresponding database schema.
-    }
-
-    \item{where}{\code{signature(object = "GeneidFilter", db = "missing",
-	with.tables = "missing")}:
-      returns the where condition for the SQL call.
-    }
-  }
-}
-\section{Methods for \code{GRangesFilter} objects}{
-  \describe{
-    \item{start, end, strand}{
-      Get the start and end coordinate and the strand from the
-	\code{GRanges} within the filter.
-    }
-
-    \item{seqlevels, seqnames}{
-      Get the names of the sequences from the \code{GRanges} of the filter.
-    }
-  }
-}
-\details{
-  \describe{
-    \item{\code{ExonidFilter}}{
-      Allows to filter based on the (Ensembl) exon identifier.
-    }
-
-    \item{\code{ExonrankFilter}}{
-      Allows to filter based on the rank (index) of the exon within the
-      transcript model. Exons are always numbered 5' to 3' end of the
-      transcript, thus, also on the reverse strand, the exon 1 is the
-      most 5' exon of the transcript.
-    }
-
-    \item{\code{EntrezidFilter}}{
-      Filter results based on the NCBI Entrezgene identifierts of the
-      genes. Use the \code{\link{listGenebiotypes}} method to get a
-      complete list of all available gene biotypes.
-    }
-
-    \item{\code{GenebiotypeFilter}}{
-      Filter results based on the gene biotype as defined in the Ensembl
-      database.
-    }
-
-    \item{\code{GeneidFilter}}{
-      Filter results based on the Ensembl gene identifiers.
-    }
-
-    \item{\code{GenenameFilter}}{
-      Allows to filter on the gene names (symbols) of the genes.
-    }
-
-    \item{\code{SymbolFilter}}{
-      Filter on gene symbols. Note that since no such database column is
-      available in an \code{EnsDb} database the gene names are used to
-      filter. These do however correspond all to the official gene
-      symbols.
-    }
-
-    \item{\code{GRangesFilter}}{
-      Allows to fetch features within or overlapping specified genomic
-      region(s)/range(s). This filter takes a \code{GRanges} object as input
-      and, if \code{condition="within"} (the default) will restrict
-      results to features (genes, transcripts or exons) that are
-      completely within the region. Alternatively, by specifying
-      \code{condition="overlapping"} it will return all features
-      (i.e. genes for a call to \code{\link{genes}}, transcripts for a
-      call to \code{\link{transcripts}} and exons for a call to
-      \code{\link{exons}}) that are partially overlapping with the
-      region, i.e. which start coordinate is smaller than the end
-      coordinate of the region and which end coordinate is larger than
-      the start coordinate of the region. Thus, genes and transcripts
-      that have an intron overlapping the region will also be returned.
-
-      Calls to the methods \code{\link{exonsBy}}, \code{\link{cdsBy}}
-      and \code{\link{transcriptsBy}} use the start and end coordinates of the
-      feature type specified with argument \code{by}
-      (i.e. \code{"gene"}, \code{"transcript"} or \code{"exon"}) for the
-      filtering.
-
-      Note: if the specified \code{GRanges} object defines multiple
-      region, all features within (or overlapping) any of these regions
-      are returned.
-
-      Chromosome names/seqnames can be provided in UCSC format
-      (e.g. \code{"chrX"}) or Ensembl format (e.g. \code{"X"}); see
-      \code{\link{seqlevelsStyle}} for more information.
-    }
-
-    \item{\code{SeqendFilter}}{
-      Filter based on the chromosomal end coordinate of the exons,
-      transcripts or genes.
-    }
-
-    \item{\code{SeqnameFilter}}{
-      Filter on the sequence name on which the features are encoded
-      (mostly the chromosome names). Supports UCSC chromosome names
-      (e.g. \code{"chrX"}) and Ensembl chromosome names
-      (e.g. \code{"X"}).
-    }
-
-    \item{\code{SeqstartFilter}}{
-      Filter based on the chromosomal start coordinates of the exons,
-      transcripts or genes.
-    }
-
-    \item{\code{SeqstrandFilter}}{
-      Filter based on the strand on which the features are encoded.
-    }
-
-    \item{\code{TxbiotypeFilter}}{
-      Filter on the transcript biotype defined in Ensembl. Use the
-      \code{\link{listTxbiotypes}} method to get a complete list of all
-      available transcript biotypes.
-    }
-
-    \item{\code{TxidFilter}}{
-      Filter on the Ensembl transcript identifiers.
-    }
-  }
-}
-\note{
-  The \code{column} and \code{where} methods should be always called
-  along with the \code{EnsDb} object, as this ensures that the
-  returned column names are valid for the database schema. The optional
-  argument \code{with.tables} should on the other hand only be used
-  rarely as it is more intended for internal use.
-
-  Note that the database column \code{"entrezid"} queried for
-  \code{EntrezidFilter} classes can contain multiple, \code{";"}
-  separated, Entrezgene IDs, thus, using this filter at present might
-  not return all entries from the database. Also, the database does not
-  provide a column with the official gene symbols and a
-  \code{SymbolFilter} queries the gene names instead.
-}
-\author{
-  Johannes Rainer
-}
-\seealso{
-  \code{\link{genes}}, \code{\link{transcripts}}, \code{\link{exons}},
-  \code{\link{listGenebiotypes}}, \code{\link{listTxbiotypes}}
-}
-\examples{
-
-## create a filter that could be used to retrieve all informations for
-## the respective gene.
-Gif <- GeneidFilter("ENSG00000012817")
-Gif
-## returns the where condition of the SQL querys
-where(Gif)
-
-## create a filter for a chromosomal end position of a gene
-Sef <- SeqendFilter(10000, condition=">", "gene")
-Sef
-
-## for additional examples see the help page of "genes"
-
-
-## Example for GRangesFilter:
-## retrieve all genes overlapping the specified region
-grf <- GRangesFilter(GRanges("11", ranges=IRanges(114000000, 114000050),
-                             strand="+"), condition="overlapping")
-library(EnsDb.Hsapiens.v75)
-edb <- EnsDb.Hsapiens.v75
-genes(edb, filter=grf)
-
-## Get also all transcripts overlapping that region
-transcripts(edb, filter=grf)
-
-## Retrieve all transcripts for the above gene
-gn <- genes(edb, filter=grf)
-txs <- transcripts(edb, filter=GenenameFilter(gn$gene_name))
-## Next we simply plot their start and end coordinates.
-plot(3, 3, pch=NA, xlim=c(start(gn), end(gn)), ylim=c(0, length(txs)), yaxt="n", ylab="")
-## Highlight the GRangesFilter region
-rect(xleft=start(grf), xright=end(grf), ybottom=0, ytop=length(txs), col="red", border="red")
-for(i in 1:length(txs)){
-    current <- txs[i]
-    rect(xleft=start(current), xright=end(current), ybottom=i-0.975, ytop=i-0.125, border="grey")
-    text(start(current), y=i-0.5,pos=4, cex=0.75, labels=current$tx_id)
-}
-## Thus, we can see that only 4 transcripts of that gene are indeed overlapping the region.
-
-
-## No exon is overlapping that region, thus we're not getting anything
-exons(edb, filter=grf)
-
-
-## Example for ExonrankFilter
-## Extract all exons 1 and (if present) 2 for all genes encoded on the
-## Y chromosome
-exons(edb, columns=c("tx_id", "exon_idx"),
-      filter=list(SeqnameFilter("Y"),
-                  ExonrankFilter(3, condition="<")))
-
-
-## Get all transcripts for the gene SKA2
-transcripts(edb, filter=GenenameFilter("SKA2"))
-
-## Which is the same as using a SymbolFilter
-transcripts(edb, filter=SymbolFilter("SKA2"))
-
-
-}
-\keyword{classes}
-
diff --git a/man/ProteinFunctionality.Rd b/man/ProteinFunctionality.Rd
new file mode 100644
index 0000000..03a7ed8
--- /dev/null
+++ b/man/ProteinFunctionality.Rd
@@ -0,0 +1,115 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/functions-utils.R, R/Methods.R
+\docType{methods}
+\name{listProteinColumns}
+\alias{listProteinColumns}
+\alias{proteins,EnsDb-method}
+\alias{proteins}
+\alias{listUniprotDbs,EnsDb-method}
+\alias{listUniprotDbs}
+\alias{listUniprotMappingTypes,EnsDb-method}
+\alias{listUniprotMappingTypes}
+\title{Protein related functionality}
+\usage{
+listProteinColumns(object)
+
+\S4method{proteins}{EnsDb}(object, columns = listColumns(object, "protein"),
+  filter = AnnotationFilterList(), order.by = "", order.type = "asc",
+  return.type = "DataFrame")
+
+\S4method{listUniprotDbs}{EnsDb}(object)
+
+\S4method{listUniprotMappingTypes}{EnsDb}(object)
+}
+\arguments{
+\item{object}{The \code{\linkS4class{EnsDb}} object.}
+
+\item{columns}{For \code{proteins}: character vector defining the columns to
+be extracted from the database. Can be any column(s) listed by the
+\code{\link{listColumns}} method.}
+
+\item{filter}{For \code{proteins}: A filter object extending
+\code{AnnotationFilter} or a list of such objects to select
+specific entries from the database. See \code{\link{Filter-classes}} for a
+documentation of available filters and use \code{\link{supportedFilters}} to
+get the full list of supported filters.}
+
+\item{order.by}{For \code{proteins}: a character vector specifying the
+column(s) by which the result should be ordered.}
+
+\item{order.type}{For \code{proteins}: if the results should be ordered
+ascending (\code{order.type = "asc"}) or descending
+(\code{order.type = "desc"})}
+
+\item{return.type}{For \code{proteins}: character of lenght one specifying
+the type of the returned object. Can be either \code{"DataFrame"},
+\code{"data.frame"} or \code{"AAStringSet"}.}
+}
+\value{
+The \code{listProteinColumns} function returns a character vector
+with the column names containing protein annotations or throws an error
+if no such annotations are available.
+
+The \code{proteins} method returns protein related annotations from
+an \code{\linkS4class{EnsDb}} object with its \code{return.type} argument
+allowing to define the type of the returned object. Note that if
+\code{return.type = "AAStringSet"} additional annotation columns are stored
+in a \code{DataFrame} that can be accessed with the \code{mcols} method on
+the returned object.
+}
+\description{
+The \code{listProteinColumns} function allows to conveniently
+extract all database columns containing protein annotations from
+an \code{\linkS4class{EnsDb}} database.
+
+This help page provides information about most of the
+functionality related to protein annotations in \code{ensembldb}.
+
+The \code{proteins} method retrieves protein related annotations from
+an \code{\linkS4class{EnsDb}} database.
+
+The \code{listUniprotDbs} method lists all Uniprot database
+names in the \code{EnsDb}.
+
+The \code{listUniprotMappingTypes} method lists all methods
+that were used for the mapping of Uniprot IDs to Ensembl protein IDs.
+}
+\details{
+The \code{proteins} method performs the query starting from the
+\code{protein} tables and can hence return all annotations from the database
+that are related to proteins and transcripts encoding these proteins from
+the database. Since \code{proteins} does thus only query annotations for
+protein coding transcripts, the \code{\link{genes}} or
+\code{\link{transcripts}} methods have to be used to retrieve annotations
+for non-coding transcripts.
+}
+\examples{
+
+## List all columns containing protein annotations
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+if (hasProteinData(edb))
+    listProteinColumns(edb)
+library(ensembldb)
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+## Get all proteins from tha database for the gene ZBTB16, if protein
+## annotations are available
+if (hasProteinData(edb))
+    proteins(edb, filter = GenenameFilter("ZBTB16"))
+
+## List the names of all Uniprot databases from which Uniprot IDs are
+## available in the EnsDb
+if (hasProteinData(edb))
+    listUniprotDbs(edb)
+
+
+## List the type of all methods that were used to map Uniprot IDs to Ensembl
+## protein IDs
+if (hasProteinData(edb))
+    listUniprotMappingTypes(edb)
+
+}
+\author{
+Johannes Rainer
+}
diff --git a/man/SeqendFilter.Rd b/man/SeqendFilter.Rd
deleted file mode 100644
index 3f602d2..0000000
--- a/man/SeqendFilter.Rd
+++ /dev/null
@@ -1,237 +0,0 @@
-\name{SeqendFilter}
-\alias{EntrezidFilter}
-\alias{GeneidFilter}
-\alias{GenenameFilter}
-\alias{GenebiotypeFilter}
-\alias{TxidFilter}
-\alias{TxbiotypeFilter}
-\alias{ExonidFilter}
-\alias{ExonrankFilter}
-\alias{SeqnameFilter}
-\alias{SeqstrandFilter}
-\alias{SeqstartFilter}
-\alias{SeqendFilter}
-\alias{GRangesFilter}
-\alias{SymbolFilter}
-\title{
-  Constructor functions for filter objects
-}
-\description{
-  These functions allow to create filter objects that can be used to
-  retrieve specific elements from the annotation database.
-}
-\usage{
-EntrezidFilter(value, condition = "=")
-
-GeneidFilter(value, condition = "=")
-
-GenenameFilter(value, condition = "=")
-
-GenebiotypeFilter(value, condition = "=")
-
-GRangesFilter(value, condition="within", feature="gene")
-
-TxidFilter(value, condition = "=")
-
-TxbiotypeFilter(value, condition = "=")
-
-ExonidFilter(value, condition = "=")
-
-ExonrankFilter(value, condition = "=")
-
-SeqnameFilter(value, condition = "=")
-
-SeqstrandFilter(value, condition = "=")
-
-SeqstartFilter(value, condition = "=", feature = "gene")
-
-SeqendFilter(value, condition = "=", feature = "gene")
-
-SymbolFilter(value, condition = "=")
-
-}
-%- maybe also 'usage' for other objects documented here.
-\arguments{
-  \item{value}{
-    The filter value, e.g., for \code{GeneidFilter} the id of the gene
-    for which the data should be retrieved. For character values (all
-    filters except \code{SeqstartFilter} and \code{SeqendFilter}) also a
-    character vector of values is allowed. Allowed values for
-    \code{SeqstrandFilter} are: \code{"+"}, \code{"-"}, \code{"1"} or
-    \code{"-1"}.
-
-    For \code{GRangeFilter} this has to be a \code{GRanges} object.
-  }
-  \item{condition}{
-    The condition to be used in the comparison. For character values
-    \code{"="}, \code{"in"} and \code{"like"} are allowed, for numeric values
-    (\code{SeqstartFilter} and \code{SeqendFilter}) \code{"="},
-    \code{">"}, \code{">="}, \code{"<"} and \code{"<="}. Note that for
-    \code{"like"} \code{value} should be a SQL pattern
-    (e.g. \code{"ENS\%"}).
-
-    For \code{GRangesFilter}, \code{"within"} and \code{"overlapping"}
-    are allowed. See below for details.
-  }
-  \item{feature}{
-    For \code{SeqstartFilter} and \code{SeqendFilter}: the chromosomal
-    position of which features should be used in the filter (either
-    \code{"gene"}, \code{"transcript"} or \code{"exon"}).
-
-    For \code{GRangesFilter}: the submitted value is overwritten
-    internally depending on the called method, i.e. calling \code{genes}
-    will set feature to \code{"gene"}, \code{transcripts} to \code{"tx"}
-    and \code{exons} to \code{"exon"}.
-
-  }
-}
-\details{
-  \describe{
-    \item{EntrezidFilter}{
-      Filter results based on the NCBI Entrezgene ID of the genes.
-    }
-    \item{GeneidFilter}{
-      Filter results based on Ensembl gene IDs.
-    }
-    \item{GenenameFilter}{
-      Filter results based on gene names (gene symbols).
-    }
-    \item{GenebiotypeFilter}{
-      Filter results based on the biotype of the genes. For a complete
-      list of available gene biotypes use the
-      \code{\link{listGenebiotypes}} method.
-    }
-    \item{GRangesFilter}{
-      Allows to fetch features within or overlapping the specified genomic
-      region(s)/range(s). This filter takes a \code{GRanges} object as input
-      and, if \code{condition="within"} (the default) will restrict
-      results to features (genes, transcripts or exons) that are
-      completely within the region. Alternatively, by specifying
-      \code{condition="overlapping"} it will return all features that
-      are partially overlapping with the region, i.e. which start
-      coordinate is smaller than the end coordinate of the region and
-      which end coordinate is larger than the start coordinate of the
-      region. Thus, genes and transcripts that have an intron
-      overlapping the region will also be returned.
-
-      Note: if the specified \code{GRanges} object defines multiple
-      region, all features within (or overlapping) any of these regions
-      are returned.
-
-      See \code{\linkS4class{GRangesFilter}} for more details.
-    }
-    \item{TxidFilter}{
-      Filter results based on the Ensembl transcript IDs.
-    }
-    \item{TxbiotypeFilter}{
-      Filter results based on the biotype of the transcripts. For a
-      complete list of available transcript biotypes use the
-      \code{\link{listTxbiotypes}} method.
-    }
-    \item{ExonidFilter}{
-      Filter based on the Ensembl exon ID.
-    }
-    \item{ExonrankFilter}{
-      Filter results based on exon ranks (indices) of exons within
-      transcripts.
-    }
-    \item{SeqnameFilter}{
-      Filter results based on the name of the sequence the features are
-      encoded.
-    }
-    \item{SeqstrandFilter}{
-      Filter results based on the strand on which the features are encoded.
-    }
-    \item{SeqstartFilter}{
-      Filter results based on the (chromosomal) start coordinate of the
-      features (exons, genes or transcripts).
-    }
-    \item{SeqendFilter}{
-      Filter results based on the (chromosomal) end coordinates.
-    }
-    \item{SymbolFilter}{
-      Filter results based on the gene names. The database does not
-      provide an explicit \emph{symbol} column, thus this filter uses the
-      gene name instead (which in many cases corresponds to the official
-      gene name).
-    }
-  }
-}
-\value{
-  Depending on the function called an instance of:
-  \code{\linkS4class{EntrezidFilter}},
-  \code{\linkS4class{GeneidFilter}},
-  \code{\linkS4class{GenenameFilter}},
-  \code{\linkS4class{GenebiotypeFilter}},
-  \code{\linkS4class{GRangesFilter}},
-  \code{\linkS4class{TxidFilter}},
-  \code{\linkS4class{TxbiotypeFilter}},
-  \code{\linkS4class{ExonidFilter}},
-  \code{\linkS4class{ExonrankFilter}},
-  \code{\linkS4class{SeqnameFilter}},
-  \code{\linkS4class{SeqstrandFilter}},
-  \code{\linkS4class{SeqstartFilter}},
-  \code{\linkS4class{SeqendFilter}},
-  \code{\linkS4class{SymbolFilter}}
-}
-\author{
-  Johannes Rainer
-}
-\seealso{
-  \code{\linkS4class{EntrezidFilter}},
-  \code{\linkS4class{GeneidFilter}},
-  \code{\linkS4class{GenenameFilter}},
-  \code{\linkS4class{GenebiotypeFilter}},
-  \code{\linkS4class{GRangesFilter}},
-  \code{\linkS4class{TxidFilter}},
-  \code{\linkS4class{TxbiotypeFilter}},
-  \code{\linkS4class{ExonidFilter}},
-  \code{\linkS4class{ExonrankFilter}},
-  \code{\linkS4class{SeqnameFilter}},
-  \code{\linkS4class{SeqstrandFilter}},
-  \code{\linkS4class{SeqstartFilter}},
-  \code{\linkS4class{SeqendFilter}},
-  \code{\linkS4class{SymbolFilter}}
-}
-\examples{
-
-## create a filter that could be used to retrieve all informations for
-## the respective gene.
-Gif <- GeneidFilter("ENSG00000012817")
-Gif
-## returns the where condition of the SQL querys
-where(Gif)
-
-## create a filter for a chromosomal end position of a gene
-Sef <- SeqendFilter(100000, condition="<", "gene")
-Sef
-
-## To find genes within a certain chromosomal position filters should be
-## combined:
-Ssf <- SeqstartFilter(10000, condition=">", "gene")
-Snf <- SeqnameFilter("2")
-## combine the filters
-Filter <- list(Ssf, Sef, Snf)
-
-Filter
-
-## generate the where SQL call for these filters:
-where(Filter)
-
-
-## Create a GRangesFilter
-GRangesFilter(GRanges("X", IRanges(123, 5454)))
-
-## Create a GRangesFilter with multiple ranges
-grf <- GRangesFilter(GRanges(c("X", "Y"),
-                             IRanges(start=c(123, 900),
-                                     end=c(5454, 910))))
-## Evaluate the 'where' SQL condition that would be applied.
-where(grf)
-## Change the "condition" of the filter and evaluate the
-## 'where' condition again.
-condition(grf) <- "overlapping"
-where(grf)
-
-}
-\keyword{data}
diff --git a/man/hasProteinData-EnsDb-method.Rd b/man/hasProteinData-EnsDb-method.Rd
new file mode 100644
index 0000000..b728dc7
--- /dev/null
+++ b/man/hasProteinData-EnsDb-method.Rd
@@ -0,0 +1,32 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/Methods.R
+\docType{methods}
+\name{hasProteinData,EnsDb-method}
+\alias{hasProteinData,EnsDb-method}
+\alias{hasProteinData}
+\title{Determine whether protein data is available in the database}
+\usage{
+\S4method{hasProteinData}{EnsDb}(x)
+}
+\arguments{
+\item{x}{The \code{\linkS4class{EnsDb}} object.}
+}
+\value{
+A logical of length one, \code{TRUE} if protein annotations are
+available and \code{FALSE} otherwise.
+}
+\description{
+Determines whether the \code{\linkS4class{EnsDb}}
+provides protein annotation data.
+}
+\examples{
+library(EnsDb.Hsapiens.v75)
+## Does this database/package have protein annotations?
+hasProteinData(EnsDb.Hsapiens.v75)
+}
+\seealso{
+\code{\link{listTables}}
+}
+\author{
+Johannes Rainer
+}
diff --git a/man/listEnsDbs.Rd b/man/listEnsDbs.Rd
index b0258ad..f2bfaab 100644
--- a/man/listEnsDbs.Rd
+++ b/man/listEnsDbs.Rd
@@ -44,10 +44,9 @@ dbcon <- dbConnect(MySQL(), host = "localhost", user = my_user, pass = my_pass)
 listEnsDbs(dbcon)
 }
 }
-\author{
-Johannes Rainer
-}
 \seealso{
 \code{\link{useMySQL}}
 }
-
+\author{
+Johannes Rainer
+}
diff --git a/man/makeEnsemblDbPackage.Rd b/man/makeEnsemblDbPackage.Rd
index 523ea58..ba45d37 100644
--- a/man/makeEnsemblDbPackage.Rd
+++ b/man/makeEnsemblDbPackage.Rd
@@ -29,13 +29,13 @@
 ensDbFromAH(ah, outfile, path, organism, genomeVersion, version)
 
 ensDbFromGRanges(x, outfile, path, organism, genomeVersion,
-                 version)
+                 version, ...)
 
 ensDbFromGff(gff, outfile, path, organism, genomeVersion,
-             version)
+             version, ...)
 
 ensDbFromGtf(gtf, outfile, path, organism, genomeVersion,
-             version)
+             version, ...)
 
 fetchTablesFromEnsembl(version, ensemblapi, user="anonymous",
                        host="ensembldb.ensembl.org", pass="",
@@ -153,6 +153,10 @@ makeEnsembldbPackage(ensdb, version, maintainer, author,
     For \code{ensDbFromGRanges}: the \code{GRanges} object.
   }
 
+  \item{...}{
+    Currently not used.
+  }
+
 }
 \section{Functions}{
   \describe{
diff --git a/man/useMySQL-EnsDb-method.Rd b/man/useMySQL-EnsDb-method.Rd
index 774bb1f..977dbe0 100644
--- a/man/useMySQL-EnsDb-method.Rd
+++ b/man/useMySQL-EnsDb-method.Rd
@@ -2,8 +2,8 @@
 % Please edit documentation in R/Methods.R
 \docType{methods}
 \name{useMySQL,EnsDb-method}
-\alias{useMySQL}
 \alias{useMySQL,EnsDb-method}
+\alias{useMySQL}
 \title{Use a MySQL backend}
 \usage{
 \S4method{useMySQL}{EnsDb}(x, host = "localhost", port = 3306, user, pass)
@@ -53,4 +53,3 @@ edb_mysql <- useMySQL(edb, host = "localhost", user = my_user, pass = my_pass)
 \author{
 Johannes Rainer
 }
-
diff --git a/readme.md b/readme.md
new file mode 100644
index 0000000..cf4a600
--- /dev/null
+++ b/readme.md
@@ -0,0 +1,16 @@
+
+<p align = "center"><img src="https://github.com/jotsetung/BioC-stickers/blob/master/ensembldb/ensembldb.png" height="100"></p>
+
+[![Years in Bioconductor](http://www.bioconductor.org/shields/years-in-bioc/ensembldb.svg)](http://www.bioconductor.org/packages/release/bioc/html/ensembldb.html)
+[![Bioconductor release build status](http://www.bioconductor.org/shields/build/release/bioc/ensembldb.svg)](http://www.bioconductor.org/packages/release/bioc/html/ensembldb.html)
+[![Bioconductor devel build status](http://www.bioconductor.org/shields/build/devel/bioc/ensembldb.svg)](http://www.bioconductor.org/checkResults/deve/bioc-LATEST/ensembldb)
+[![Travis build result](https://travis-ci.org/jotsetung/ensembldb.svg?branch=master)](https://travis-ci.org/jotsetung/ensembldb)
+[![codecov result](https://codecov.io/github/jotsetung/ensembldb/coverage.svg?branch=master)](https://codecov.io/github/jotsetung/ensembldb?branch=master)
+
+
+# `ensembldb`: build and use Ensembl-based annotation packages
+
+For more information please refer to
+the [vignettes/ensembldb.org](vignettes/ensembldb.org) file.
+
+
diff --git a/tests/runTests.R b/tests/runTests.R
deleted file mode 100644
index 785dbbe..0000000
--- a/tests/runTests.R
+++ /dev/null
@@ -1 +0,0 @@
-BiocGenerics:::testPackage("ensembldb")
diff --git a/tests/testthat.R b/tests/testthat.R
new file mode 100644
index 0000000..9c07619
--- /dev/null
+++ b/tests/testthat.R
@@ -0,0 +1,6 @@
+library(testthat)
+library(ensembldb)
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+test_check("ensembldb")
diff --git a/tests/testthat/test_Classes.R b/tests/testthat/test_Classes.R
new file mode 100644
index 0000000..9c76303
--- /dev/null
+++ b/tests/testthat/test_Classes.R
@@ -0,0 +1,85 @@
+test_that("OnlyCodingTxFilter constructor works", {
+    fl <- OnlyCodingTxFilter()
+    expect_true(is(fl, "OnlyCodingTxFilter"))
+})
+
+test_that("ProtDomIdFilter constructor works", {
+    fl <- ProtDomIdFilter("a")
+    expect_true(is(fl, "ProtDomIdFilter"))
+    fl <- AnnotationFilter(~ prot_dom_id %in% 1:4)
+    expect_true(is(fl, "ProtDomIdFilter"))
+    expect_equal(value(fl), c("1", "2", "3", "4"))
+    expect_equal(condition(fl), "==")
+})
+
+test_that("UniprotDbFilter constructor works", {
+    fl <- UniprotDbFilter("a")
+    expect_true(is(fl, "UniprotDbFilter"))
+    fl <- AnnotationFilter(~ uniprot_db != "4")
+    expect_true(is(fl, "UniprotDbFilter"))
+    expect_equal(value(fl), "4")
+    expect_equal(condition(fl), "!=")
+})
+
+test_that("UniprotMappingTypeFilter constructor works", {
+    fl <- UniprotMappingTypeFilter("a")
+    expect_true(is(fl, "UniprotMappingTypeFilter"))
+    fl <- AnnotationFilter(~ uniprot_mapping_type != "4")
+    expect_true(is(fl, "UniprotMappingTypeFilter"))
+    expect_equal(value(fl), "4")
+    expect_equal(condition(fl), "!=")
+})
+
+test_that("GRangesFilter works for EnsDb", {
+    ## Testing slots
+    gr <- GRanges("X", ranges = IRanges(123, 234), strand = "-")
+    grf <- GRangesFilter(gr, type = "within")
+    ## Now check some stuff
+    expect_equal(start(grf), start(gr))
+    expect_equal(end(grf), end(gr))
+    expect_equal(as.character(strand(gr)), strand(grf))
+    expect_equal(as.character(seqnames(gr)), seqnames(grf))
+
+    ## Test column:
+    ## filter alone.
+    exp <- c(start = "gene_seq_start", end = "gene_seq_end",
+             seqname = "seq_name", strand = "seq_strand")
+    expect_equal(ensembldb:::ensDbColumn(grf), exp)
+    grf at feature <- "tx"
+    exp <- c(start = "tx_seq_start", end = "tx_seq_end",
+             seqname = "seq_name", strand = "seq_strand")
+    expect_equal(ensembldb:::ensDbColumn(grf), exp)
+    grf at feature <- "exon"
+    exp <- c(start = "exon_seq_start", end = "exon_seq_end",
+             seqname = "seq_name", strand = "seq_strand")
+    expect_equal(ensembldb:::ensDbColumn(grf), exp)
+    ## filter and ensdb.
+    exp <- c(start = "exon.exon_seq_start", end = "exon.exon_seq_end",
+             seqname = "gene.seq_name", strand = "gene.seq_strand")
+    expect_equal(ensembldb:::ensDbColumn(grf, edb), exp)
+    grf at feature <- "tx"
+    exp <- c(start = "tx.tx_seq_start", end = "tx.tx_seq_end",
+             seqname = "gene.seq_name", strand = "gene.seq_strand")
+    expect_equal(ensembldb:::ensDbColumn(grf, edb), exp)
+    grf at feature <- "gene"
+    exp <- c(start = "gene.gene_seq_start", end = "gene.gene_seq_end",
+             seqname = "gene.seq_name", strand = "gene.seq_strand")
+    expect_equal(ensembldb:::ensDbColumn(grf, edb), exp)
+
+    exp <- paste0("(gene_seq_start>=123 and gene_seq_end<=234 and",
+                  " seq_name='X' and seq_strand = -1)")
+    expect_equal(ensembldb:::ensDbQuery(grf), exp)
+    ## what if we set strand to *
+    grf2 <- GRangesFilter(GRanges("1", IRanges(123, 234)), type = "within")
+    exp <- paste0("(gene.gene_seq_start>=123 and gene.gene_seq_end<=234",
+                  " and gene.seq_name='1')")
+    expect_equal(ensembldb:::ensDbQuery(grf2, edb), exp)
+
+    ## Now, using overlapping.
+    grf2 <- GRangesFilter(GRanges("X", IRanges(123, 234), strand = "-"),
+                          type = "any", feature = "transcript")
+    exp <- paste0("(tx.tx_seq_start<=234 and tx.tx_seq_end>=123 and",
+                  " gene.seq_name='X' and gene.seq_strand = -1)")
+    expect_equal(ensembldb:::ensDbQuery(grf2, edb), exp)
+})
+
diff --git a/tests/testthat/test_Methods-Filter.R b/tests/testthat/test_Methods-Filter.R
new file mode 100644
index 0000000..5ffb12a
--- /dev/null
+++ b/tests/testthat/test_Methods-Filter.R
@@ -0,0 +1,515 @@
+
+test_that("ensDbColumn works", {
+    smb <- SymbolFilter("a")
+    expect_equal(unname(ensembldb:::ensDbColumn(smb)), "gene_name")
+    expect_equal(unname(ensembldb:::ensDbColumn(smb, edb)), "gene.gene_name")
+    expect_error(unname(ensembldb:::ensDbColumn(smb, edb, "tx")))
+    expect_equal(unname(ensembldb:::ensDbColumn(smb, edb, "gene")),
+                 "gene.gene_name")
+    ##
+    fl <- OnlyCodingTxFilter()
+    expect_equal(ensembldb:::ensDbColumn(fl), "tx.tx_cds_seq_start")
+    ## gene filters:
+    fl <- GeneIdFilter("a")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl)), "gene_id")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb)), "gene.gene_id")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb, "tx")), "tx.gene_id")
+    fl <- GenenameFilter("a")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl)), "gene_name")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb)), "gene.gene_name")
+    expect_error(unname(ensembldb:::ensDbColumn(fl, edb, "tx")))
+    fl <- GeneStartFilter(123)
+    expect_equal(unname(ensembldb:::ensDbColumn(fl)), "gene_seq_start")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb)), "gene.gene_seq_start")
+    expect_error(unname(ensembldb:::ensDbColumn(fl, edb, "tx")))
+    fl <- GeneEndFilter(123)
+    expect_equal(unname(ensembldb:::ensDbColumn(fl)), "gene_seq_end")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb)), "gene.gene_seq_end")
+    expect_error(unname(ensembldb:::ensDbColumn(fl, edb, "tx")))
+    fl <- EntrezFilter(123)
+    expect_equal(unname(ensembldb:::ensDbColumn(fl)), "entrezid")
+    if (as.numeric(ensembldb:::dbSchemaVersion(edb)) > 1) {
+        expect_equal(unname(ensembldb:::ensDbColumn(fl, edb)),
+                     "entrezgene.entrezid")
+    } else {
+        expect_equal(unname(ensembldb:::ensDbColumn(fl, edb)), "gene.entrezid")
+    }
+    expect_error(unname(ensembldb:::ensDbColumn(fl, edb, "tx")))
+    fl <- SeqNameFilter("a")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl)), "seq_name")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb)), "gene.seq_name")
+    expect_error(unname(ensembldb:::ensDbColumn(fl, edb, "tx")))
+    fl <- SeqStrandFilter("a")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl)), "seq_strand")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb)), "gene.seq_strand")
+    expect_error(unname(ensembldb:::ensDbColumn(fl, edb, "tx")))
+    ## tx filters:
+    fl <- TxIdFilter("a")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl)), "tx_id")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb)), "tx.tx_id")
+    expect_error(unname(ensembldb:::ensDbColumn(fl, edb, "gene")))
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb, "protein")),
+                 "protein.tx_id")
+    fl <- TxNameFilter("a")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl)), "tx_id")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb)), "tx.tx_id")
+    expect_error(unname(ensembldb:::ensDbColumn(fl, edb, "gene")))
+    fl <- TxBiotypeFilter("a")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl)), "tx_biotype")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb)), "tx.tx_biotype")
+    expect_error(unname(ensembldb:::ensDbColumn(fl, edb, "gene")))
+    fl <- TxStartFilter(123)
+    expect_equal(unname(ensembldb:::ensDbColumn(fl)), "tx_seq_start")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb)), "tx.tx_seq_start")
+    expect_error(unname(ensembldb:::ensDbColumn(fl, edb, "gene")))
+    fl <- TxEndFilter(123)
+    expect_equal(unname(ensembldb:::ensDbColumn(fl)), "tx_seq_end")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb)), "tx.tx_seq_end")
+    expect_error(unname(ensembldb:::ensDbColumn(fl, edb, "gene")))
+    ## exon filters:
+    fl <- ExonIdFilter(123)
+    expect_equal(unname(ensembldb:::ensDbColumn(fl)), "exon_id")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb)), "tx2exon.exon_id")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb, "tx2exon")),
+                 "tx2exon.exon_id")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb, "exon")),
+                 "exon.exon_id")
+    expect_error(unname(ensembldb:::ensDbColumn(fl, edb, "gene")))
+    fl <- ExonEndFilter(123)
+    expect_equal(unname(ensembldb:::ensDbColumn(fl)), "exon_seq_end")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb)), "exon.exon_seq_end")
+    expect_error(unname(ensembldb:::ensDbColumn(fl, edb, "gene")))
+    fl <- ExonStartFilter(123)
+    expect_equal(unname(ensembldb:::ensDbColumn(fl)), "exon_seq_start")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb)), "exon.exon_seq_start")
+    expect_error(unname(ensembldb:::ensDbColumn(fl, edb, "gene")))
+    fl <- ExonRankFilter(123)
+    expect_equal(unname(ensembldb:::ensDbColumn(fl)), "exon_idx")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb)), "tx2exon.exon_idx")
+    expect_error(unname(ensembldb:::ensDbColumn(fl, edb, "gene")))
+    ## protein filters:
+    fl <- ProteinIdFilter("a")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl)), "protein_id")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb)), "protein.protein_id")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb, "uniprot")),
+                 "uniprot.protein_id")
+    expect_error(unname(ensembldb:::ensDbColumn(fl, edb, "gene")))
+    fl <- UniprotFilter("a")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl)), "uniprot_id")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb)), "uniprot.uniprot_id")
+    expect_error(unname(ensembldb:::ensDbColumn(fl, edb, "gene")))
+    fl <- ProtDomIdFilter("a")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl)), "protein_domain_id")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb)),
+                 "protein_domain.protein_domain_id")
+    expect_error(unname(ensembldb:::ensDbColumn(fl, edb, "gene")))
+    fl <- UniprotDbFilter("a")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl)), "uniprot_db")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb)),
+                 "uniprot.uniprot_db")
+    expect_error(unname(ensembldb:::ensDbColumn(fl, edb, "gene")))
+    fl <- UniprotMappingTypeFilter("a")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl)), "uniprot_mapping_type")
+    expect_equal(unname(ensembldb:::ensDbColumn(fl, edb)),
+                 "uniprot.uniprot_mapping_type")
+    expect_error(unname(ensembldb:::ensDbColumn(fl, edb, "gene")))    
+})
+
+test_that("ensDbQuery works", {
+    smb <- SymbolFilter("I'm a gene")
+    expect_equal(ensembldb:::ensDbQuery(smb), "gene_name = 'I''m a gene'")
+    expect_equal(ensembldb:::ensDbQuery(smb, edb), "gene.gene_name = 'I''m a gene'")
+    expect_equal(ensembldb:::ensDbQuery(smb, edb, c("gene", "tx")),
+                 "gene.gene_name = 'I''m a gene'")
+    expect_error(ensembldb:::ensDbQuery(smb, edb, "tx"))
+    smb <- SymbolFilter(c("a", "x"), condition = "!=")
+    expect_equal(ensembldb:::ensDbQuery(smb), "gene_name not in ('a','x')")
+    smb <- SymbolFilter(c("a", "x"), condition = "!=")
+    expect_equal(ensembldb:::ensDbQuery(smb, edb),
+                 "gene.gene_name not in ('a','x')")
+    ## gene_id with tx table.
+    fl <- GeneIdFilter("a")
+    expect_equal(ensembldb:::ensDbQuery(fl), "gene_id = 'a'")
+    expect_equal(ensembldb:::ensDbQuery(fl, edb), "gene.gene_id = 'a'")
+    expect_equal(ensembldb:::ensDbQuery(fl, edb, "tx"), "tx.gene_id = 'a'")
+    ## numeric filter(s)
+    fl <- ExonRankFilter(21)
+    expect_equal(ensembldb:::ensDbQuery(fl), "exon_idx = 21")
+    expect_equal(ensembldb:::ensDbQuery(fl, edb), "tx2exon.exon_idx = 21")
+    ##
+    fl <- OnlyCodingTxFilter()
+    expect_equal(ensembldb:::ensDbQuery(fl), "tx.tx_cds_seq_start is not null")
+    ## 
+    fL <- AnnotationFilterList(GeneIdFilter("a"),
+                               TxBiotypeFilter("coding", condition = "!="),
+                               GeneStartFilter(123, condition = "<"))
+    res <- ensembldb:::ensDbQuery(fL)
+    expect_equal(res, paste0("(gene_id = 'a' and tx_biotype != 'coding' ",
+                             "and gene_seq_start < 123)"))
+    res <- ensembldb:::ensDbQuery(fL, edb)
+    expect_equal(res, paste0("(gene.gene_id = 'a' and tx.tx_biotype != ",
+                             "'coding' and gene.gene_seq_start < 123)"))
+    fL <- AnnotationFilterList(GeneIdFilter("a"),
+                               TxBiotypeFilter("coding", condition = "!="),
+                               TxStartFilter(123, condition = "<"))
+    res <- ensembldb:::ensDbQuery(fL, edb)
+    expect_equal(res, paste0("(gene.gene_id = 'a' and tx.tx_biotype != ",
+                             "'coding' and tx.tx_seq_start < 123)"))
+    res <- ensembldb:::ensDbQuery(fL, edb, c("tx", "gene", "exon"))
+    expect_equal(res, paste0("(tx.gene_id = 'a' and tx.tx_biotype != ",
+                             "'coding' and tx.tx_seq_start < 123)"))
+    fl <- ProteinIdFilter("a")
+    expect_equal(unname(ensembldb:::ensDbQuery(fl)), "protein_id = 'a'")
+    expect_equal(unname(ensembldb:::ensDbQuery(fl, edb)),
+                 "protein.protein_id = 'a'")
+    expect_equal(unname(ensembldb:::ensDbQuery(fl, edb, "uniprot")),
+                 "uniprot.protein_id = 'a'")    
+    fl <- ProtDomIdFilter("a")
+    expect_equal(ensembldb:::ensDbQuery(fl), "protein_domain_id = 'a'")
+    expect_equal(ensembldb:::ensDbQuery(fl, edb),
+                 "protein_domain.protein_domain_id = 'a'")
+    fl <- UniprotDbFilter("a")
+    expect_equal(ensembldb:::ensDbQuery(fl), "uniprot_db = 'a'")
+    expect_equal(ensembldb:::ensDbQuery(fl, edb),
+                 "uniprot.uniprot_db = 'a'")
+    fl <- UniprotMappingTypeFilter("a")
+    expect_equal(ensembldb:::ensDbQuery(fl), "uniprot_mapping_type = 'a'")
+    expect_equal(ensembldb:::ensDbQuery(fl, edb),
+                 "uniprot.uniprot_mapping_type = 'a'")
+    ## Seq name and seq strand.
+    fl <- SeqNameFilter("3")
+    expect_equal(ensembldb:::ensDbQuery(fl), "seq_name = '3'")
+    expect_equal(ensembldb:::ensDbQuery(fl, edb), "gene.seq_name = '3'")
+    fl <- SeqNameFilter("chr3")
+    expect_equal(ensembldb:::ensDbQuery(fl), "seq_name = 'chr3'")
+    expect_equal(ensembldb:::ensDbQuery(fl, edb), "gene.seq_name = 'chr3'")
+    seqlevelsStyle(edb) <- "UCSC"
+    expect_equal(ensembldb:::ensDbQuery(fl), "seq_name = 'chr3'")
+    expect_equal(ensembldb:::ensDbQuery(fl, edb), "gene.seq_name = '3'")
+    seqlevelsStyle(edb) <- "Ensembl"
+    fl <- SeqStrandFilter("+")
+    expect_equal(ensembldb:::ensDbQuery(fl), "seq_strand = 1")
+    expect_equal(ensembldb:::ensDbQuery(fl, edb), "gene.seq_strand = 1")
+    fl <- SeqStrandFilter("+1")
+    expect_equal(ensembldb:::ensDbQuery(fl), "seq_strand = 1")
+    expect_equal(ensembldb:::ensDbQuery(fl, edb), "gene.seq_strand = 1")
+    fl <- SeqStrandFilter("-")
+    expect_equal(ensembldb:::ensDbQuery(fl), "seq_strand = -1")
+    expect_equal(ensembldb:::ensDbQuery(fl, edb), "gene.seq_strand = -1")
+    fl <- SeqStrandFilter("-1")
+    expect_equal(ensembldb:::ensDbQuery(fl), "seq_strand = -1")
+    expect_equal(ensembldb:::ensDbQuery(fl, edb), "gene.seq_strand = -1")
+    ## GRangesFilter: see test_ensDb_for_GRangesFilter
+})
+
+test_that("ensDbQuery works for AnnotationFilterList", {
+    gnf <- GenenameFilter("BCL2", condition = "!=")
+    snf <- SeqNameFilter(4)
+    ssf <- SeqStrandFilter("+")
+    afl <- AnnotationFilterList(gnf, snf, ssf, logOp = c("|", "&"))
+    Q <- ensembldb:::ensDbQuery(afl)
+    expect_equal(Q, "(gene_name != 'BCL2' or seq_name = '4' and seq_strand = 1)")
+    
+    ## Nested AnnotationFilterLists.
+    afl1 <- AnnotationFilterList(GenenameFilter("BCL2"),
+                                 GenenameFilter("BCL2L11"), logOp = "|")
+    afl2 <- AnnotationFilterList(afl1, SeqNameFilter(18))
+    Q <- ensembldb:::ensDbQuery(afl2, db = edb)
+    expect_equal(Q, paste0("((gene.gene_name = 'BCL2' or gene.gene_name = ",
+                           "'BCL2L11') and gene.seq_name = '18')"))
+    library(RSQLite)
+    res <- dbGetQuery(dbconn(edb), paste0("select distinct gene_name from gene",
+                                          " where ", Q))
+    expect_equal(res$gene_name, "BCL2")
+    res2 <- genes(edb,
+                  filter = AnnotationFilterList(GenenameFilter(c("BCL2L11",
+                                                                 "BCL2")),
+                                                SeqNameFilter(18)))
+    expect_equal(res$gene_name, res2$gene_name)
+    ## Same with a GRangesFilter.
+    grf <- GRangesFilter(GRanges(18, IRanges(60790600, 60790700)))
+    afl2 <- AnnotationFilterList(afl1, grf)
+    Q <- ensembldb:::ensDbQuery(afl2, db = edb)
+    expect_equal(Q, paste0("((gene.gene_name = 'BCL2' or gene.gene_name = ",
+                           "'BCL2L11') and (gene.gene_seq_start<=60790700",
+                           " and gene.gene_seq_end>=60790600 and gene.seq_name",
+                           "='18'))"))
+    res <- dbGetQuery(dbconn(edb), paste0("select distinct gene_name from gene",
+                                          " where ", Q))
+    expect_equal(res$gene_name, "BCL2")    
+})
+
+test_that("ensDbQuery works for SeqNameFilter", {
+    fl <- SeqNameFilter("3")
+    res <- ensembldb:::ensDbQuery(fl)
+    expect_equal(res, "seq_name = '3'")
+    res <- ensembldb:::ensDbQuery(fl, edb)
+    expect_equal(res, "gene.seq_name = '3'")
+    fl <- SeqNameFilter("chr3")
+    res <- ensembldb:::ensDbQuery(fl)
+    expect_equal(res, "seq_name = 'chr3'")
+    res <- ensembldb:::ensDbQuery(fl, edb)
+    expect_equal(res, "gene.seq_name = 'chr3'")
+    seqlevelsStyle(edb) <- "UCSC"
+    res <- ensembldb:::ensDbQuery(fl)
+    expect_equal(res, "seq_name = 'chr3'")
+    res <- ensembldb:::ensDbQuery(fl, edb)
+    expect_equal(res, "gene.seq_name = '3'")
+    seqlevelsStyle(edb) <- "Ensembl"
+})
+
+test_that("ensDbQuery works for GRangesFilter", {
+    gr <- GRanges(seqnames = "a",
+                  ranges = IRanges(start = 1, end = 5))
+    F <- GRangesFilter(value = gr, type = "within")
+    expect_equal(unname(ensembldb:::ensDbColumn(F)), c("gene_seq_start",
+                                                       "gene_seq_end",
+                                                       "seq_name",
+                                                       "seq_strand"))
+    expect_equal(unname(ensembldb:::ensDbColumn(F, edb)),
+                 c("gene.gene_seq_start", "gene.gene_seq_end",
+                   "gene.seq_name", "gene.seq_strand"))
+
+    expect_equal(ensembldb:::ensDbQuery(F),
+                 paste0("(gene_seq_start>=1 and gene_seq_end",
+                        "<=5 and seq_name='a')"))
+    expect_equal(ensembldb:::ensDbQuery(F, edb),
+                 paste0("(gene.gene_seq_start>=1 and gene.gene_seq_end",
+                        "<=5 and gene.seq_name='a')"))
+    F <- GRangesFilter(value = gr, type = "any")
+    expect_equal(ensembldb:::ensDbQuery(F),
+                 paste0("(gene_seq_start<=5 and gene_seq_end",
+                        ">=1 and seq_name='a')"))
+    expect_equal(ensembldb:::ensDbQuery(F, edb),
+                 paste0("(gene.gene_seq_start<=5 and gene.gene_seq_end",
+                        ">=1 and gene.seq_name='a')"))
+    
+    ## tx
+    F <- GRangesFilter(value = gr, feature = "tx", type = "within")
+    expect_equal(unname(ensembldb:::ensDbColumn(F)), c("tx_seq_start",
+                                                       "tx_seq_end",
+                                                       "seq_name",
+                                                       "seq_strand"))
+    expect_equal(unname(ensembldb:::ensDbColumn(F, edb)), c("tx.tx_seq_start",
+                                                            "tx.tx_seq_end",
+                                                            "gene.seq_name",
+                                                            "gene.seq_strand"))
+    ## exon
+    F <- GRangesFilter(value = gr, feature = "exon", type = "within")
+    expect_equal(unname(ensembldb:::ensDbColumn(F)), c("exon_seq_start",
+                                                       "exon_seq_end",
+                                                       "seq_name",
+                                                       "seq_strand"))
+    expect_equal(unname(ensembldb:::ensDbColumn(F, edb)), c("exon.exon_seq_start",
+                                                            "exon.exon_seq_end",
+                                                            "gene.seq_name",
+                                                            "gene.seq_strand"))
+    ## Check the buildWhere
+    res <- ensembldb:::buildWhereForGRanges(F, columns = ensembldb:::ensDbColumn(F))
+    expect_equal(res,"(exon_seq_start>=1 and exon_seq_end<=5 and seq_name='a')")
+    res <- ensembldb:::ensDbQuery(F)
+    expect_equal(res,"(exon_seq_start>=1 and exon_seq_end<=5 and seq_name='a')")
+    res <- ensembldb:::ensDbQuery(F, edb)
+    expect_equal(res, paste0("(exon.exon_seq_start>=1 and exon.exon_seq_end",
+                             "<=5 and gene.seq_name='a')"))
+})
+
+test_that("buildWhereForGRanges works", {
+    grng <- GRanges(seqname = "X", IRanges(start = 10, end = 100))
+    ## start
+    flt <- GRangesFilter(grng, type = "start")
+    res <- ensembldb:::buildWhereForGRanges(
+                           flt, columns = ensembldb:::ensDbColumn(flt))
+    expect_equal(res, "(gene_seq_start=10 and seq_name='X')")
+    res <- ensembldb:::buildWhereForGRanges(
+                           flt, columns = ensembldb:::ensDbColumn(flt, db = edb))
+    expect_equal(res, "(gene.gene_seq_start=10 and gene.seq_name='X')")
+    ## end
+    flt <- GRangesFilter(grng, type = "end")
+    res <- ensembldb:::buildWhereForGRanges(
+                           flt, columns = ensembldb:::ensDbColumn(flt))
+    expect_equal(res, "(gene_seq_end=100 and seq_name='X')")
+    ## equal
+    flt <- GRangesFilter(grng, type = "equal")
+    res <- ensembldb:::buildWhereForGRanges(
+                           flt, columns = ensembldb:::ensDbColumn(flt))
+    expect_equal(res, "(gene_seq_start=10 and gene_seq_end=100 and seq_name='X')")
+    ## within
+    flt <- GRangesFilter(grng, type = "within")
+    res <- ensembldb:::buildWhereForGRanges(
+                           flt, columns = ensembldb:::ensDbColumn(flt))
+    expect_equal(res, "(gene_seq_start>=10 and gene_seq_end<=100 and seq_name='X')")
+    ## any
+    flt <- GRangesFilter(grng, type = "any")
+    res <- ensembldb:::buildWhereForGRanges(
+                           flt, columns = ensembldb:::ensDbColumn(flt))
+    expect_equal(res, "(gene_seq_start<=100 and gene_seq_end>=10 and seq_name='X')")
+
+    ## Same with a strand specified.
+    grng <- GRanges(seqname = "X", IRanges(start = 10, end = 100), strand = "-")
+    ## start
+    flt <- GRangesFilter(grng, type = "start")
+    res <- ensembldb:::buildWhereForGRanges(
+                           flt, columns = ensembldb:::ensDbColumn(flt))
+    expect_equal(res, "(gene_seq_start=10 and seq_name='X' and seq_strand = -1)")
+    ## end
+    flt <- GRangesFilter(grng, type = "end")
+    res <- ensembldb:::buildWhereForGRanges(
+                           flt, columns = ensembldb:::ensDbColumn(flt))
+    expect_equal(res, "(gene_seq_end=100 and seq_name='X' and seq_strand = -1)")
+    ## equal
+    flt <- GRangesFilter(grng, type = "equal")
+    res <- ensembldb:::buildWhereForGRanges(
+                           flt, columns = ensembldb:::ensDbColumn(flt))
+    expect_equal(res, paste0("(gene_seq_start=10 and gene_seq_end=100 and ",
+                            "seq_name='X' and seq_strand = -1)"))
+    ## within
+    flt <- GRangesFilter(grng, type = "within")
+    res <- ensembldb:::buildWhereForGRanges(
+                           flt, columns = ensembldb:::ensDbColumn(flt))
+    expect_equal(res, paste0("(gene_seq_start>=10 and gene_seq_end<=100 and ",
+                            "seq_name='X' and seq_strand = -1)"))
+    ## any
+    flt <- GRangesFilter(grng, type = "any")
+    res <- ensembldb:::buildWhereForGRanges(
+                           flt, columns = ensembldb:::ensDbColumn(flt))
+    expect_equal(res, paste0("(gene_seq_start<=100 and gene_seq_end>=10 and ",
+                            "seq_name='X' and seq_strand = -1)"))
+})
+
+test_that("ensDbColumn works with AnnotationFilterList", {
+    afl <- AnnotationFilterList(GeneIdFilter(123), SeqNameFilter(3))
+    afl2 <- AnnotationFilterList(afl, SeqNameFilter(5))
+    res <- ensembldb:::ensDbColumn(afl2)
+    expect_equal(res, c("gene_id", "seq_name"))
+})
+
+############################################################
+## Using protein data based filters.
+test_that("ProteinIdFilter works", {
+    pf <- ProteinIdFilter("ABC")
+    expect_equal(value(pf), "ABC")
+    expect_equal(field(pf), "protein_id")
+    expect_equal(ensembldb:::ensDbQuery(pf), "protein_id = 'ABC'")
+    if (hasProteinData(edb)) {
+        expect_equal(ensembldb:::ensDbColumn(pf, edb), "protein.protein_id")
+        expect_equal(ensembldb:::ensDbColumn(pf, edb,
+                                             with.tables = "protein_domain"),
+                    "protein_domain.protein_id")
+        expect_equal(ensembldb:::ensDbColumn(pf, edb, with.tables = "uniprot"),
+                     "uniprot.protein_id")
+        expect_equal(ensembldb:::ensDbQuery(pf, edb), "protein.protein_id = 'ABC'")
+        expect_equal(ensembldb:::ensDbQuery(pf, edb, with.tables = "uniprot"),
+                    "uniprot.protein_id = 'ABC'")
+    } else {
+        expect_error(ensembldb:::ensDbColumn(pf, edb))
+        expect_error(ensembldb:::ensDbQuery(pf, edb))
+        expect_error(ensembldb:::ensDbColumn(pf, edb, with.tables = "uniprot"))
+        expect_error(ensembldb:::ensDbQuery(pf, edb, with.tables = "uniprot"))
+    }
+    pf <- ProteinIdFilter(c("A", "B"))
+    expect_equal(ensembldb:::ensDbQuery(pf), "protein_id in ('A','B')")
+    expect_error(ProteinIdFilter("B", condition = ">"))
+})
+
+test_that("UniprotFilter works", {
+    pf <- UniprotFilter("ABC")
+    expect_equal(value(pf), "ABC")
+    expect_equal(field(pf), "uniprot")
+    expect_equal(unname(ensembldb:::ensDbColumn(pf)), "uniprot_id")
+    expect_equal(ensembldb:::ensDbQuery(pf), "uniprot_id = 'ABC'")
+    if (hasProteinData(edb)) {
+        expect_equal(ensembldb:::ensDbColumn(pf, edb), "uniprot.uniprot_id")
+        expect_equal(ensembldb:::ensDbColumn(pf, edb, with.tables = "uniprot"),
+                    "uniprot.uniprot_id")
+        expect_equal(ensembldb:::ensDbQuery(pf, edb), "uniprot.uniprot_id = 'ABC'")
+        expect_equal(ensembldb:::ensDbQuery(pf, edb, with.tables = "uniprot"),
+                    "uniprot.uniprot_id = 'ABC'")
+    } else {
+        expect_error(ensembldb:::ensDbColumn(pf, edb))
+        expect_error(ensembldb:::ensDbQuery(pf, edb))
+        expect_error(ensembldb:::ensDbColumn(pf, edb, with.tables = "uniprot"))
+        expect_error(ensembldb:::ensDbQuery(pf, edb, with.tables = "uniprot"))
+    }
+    pf <- UniprotFilter(c("A", "B"))
+    expect_equal(ensembldb:::ensDbQuery(pf), "uniprot_id in ('A','B')")
+    expect_error(UniprotFilter("B", condition = ">"))
+})
+
+test_that("ProtDomIdFilter works", {
+    pf <- ProtDomIdFilter("ABC")
+    expect_equal(value(pf), "ABC")
+    expect_equal(field(pf), "prot_dom_id")
+    expect_equal(unname(ensembldb:::ensDbColumn(pf)), "protein_domain_id")
+    expect_equal(ensembldb:::ensDbQuery(pf), "protein_domain_id = 'ABC'")
+    if (hasProteinData(edb)) {
+        expect_equal(ensembldb:::ensDbColumn(pf, edb),
+                     "protein_domain.protein_domain_id")
+        expect_equal(ensembldb:::ensDbColumn(pf, edb,
+                                             with.tables = "protein_domain"),
+                    "protein_domain.protein_domain_id")
+        expect_equal(ensembldb:::ensDbQuery(pf, edb),
+                     "protein_domain.protein_domain_id = 'ABC'")
+        expect_equal(ensembldb:::ensDbQuery(pf, edb,
+                                            with.tables = "protein_domain"),
+                    "protein_domain.protein_domain_id = 'ABC'")
+    } else {
+        expect_error(ensembldb:::ensDbColumn(pf, edb))
+        expect_error(ensembldb:::ensDbQuery(pf, edb))
+        expect_error(ensembldb:::ensDbColumn(pf, edb,
+                                             with.tables = "protein_domain"))
+        expect_error(ensembldb:::ensDbQuery(pf, edb,
+                                            with.tables = "protein_domain"))
+    }
+    pf <- ProtDomIdFilter(c("A", "B"))
+    expect_equal(ensembldb:::ensDbQuery(pf), "protein_domain_id in ('A','B')")
+    expect_error(ProtDomIdFilter("B", condition = ">"))
+})
+
+test_that("UniprotDbFilter works", {
+    pf <- UniprotDbFilter("ABC")
+    expect_equal(value(pf), "ABC")
+    expect_equal(field(pf), "uniprot_db")
+    expect_equal(ensembldb:::ensDbQuery(pf), "uniprot_db = 'ABC'")
+    if (hasProteinData(edb)) {
+        expect_equal(ensembldb:::ensDbColumn(pf, edb), "uniprot.uniprot_db")
+        expect_equal(ensembldb:::ensDbColumn(pf, edb, with.tables = "uniprot"),
+                    "uniprot.uniprot_db")
+        expect_equal(ensembldb:::ensDbQuery(pf, edb), "uniprot.uniprot_db = 'ABC'")
+        expect_equal(ensembldb:::ensDbQuery(pf, edb, with.tables = "uniprot"),
+                    "uniprot.uniprot_db = 'ABC'")
+    } else {
+        expect_error(ensembldb:::ensDbColumn(pf, edb))
+        expect_error(ensembldb:::ensDbQuery(pf, edb))
+        expect_error(ensembldb:::ensDbColumn(pf, edb, with.tables = "uniprot"))
+        expect_error(ensembldb:::ensDbQuery(pf, edb, with.tables = "uniprot"))
+    }
+    pf <- UniprotDbFilter(c("A", "B"))
+    expect_equal(ensembldb:::ensDbQuery(pf), "uniprot_db in ('A','B')")
+    expect_error(UniprotDbFilter("B", condition = ">"))
+})
+
+test_that("UniprotMappingTypeFilter works", {
+    pf <- UniprotMappingTypeFilter("ABC")
+    expect_equal(value(pf), "ABC")
+    expect_equal(field(pf), "uniprot_mapping_type")
+    expect_equal(ensembldb:::ensDbQuery(pf), "uniprot_mapping_type = 'ABC'")
+    if (hasProteinData(edb)) {
+        expect_equal(ensembldb:::ensDbColumn(pf, edb),
+                     "uniprot.uniprot_mapping_type")
+        expect_equal(ensembldb:::ensDbColumn(pf, edb, with.tables = "uniprot"),
+                    "uniprot.uniprot_mapping_type")
+        expect_equal(ensembldb:::ensDbQuery(pf, edb),
+                     "uniprot.uniprot_mapping_type = 'ABC'")
+        expect_equal(ensembldb:::ensDbQuery(pf, edb, with.tables = "uniprot"),
+                    "uniprot.uniprot_mapping_type = 'ABC'")
+    } else {
+        expect_error(ensembldb:::ensDbColumn(pf, edb))
+        expect_error(ensembldb:::ensDbQuery(pf, edb))
+        expect_error(ensembldb:::ensDbColumn(pf, edb, with.tables = "uniprot"))
+        expect_error(ensembldb:::ensDbQuery(pf, edb, with.tables = "uniprot"))
+    }
+    pf <- UniprotMappingTypeFilter(c("A", "B"))
+    expect_equal(ensembldb:::ensDbQuery(pf), "uniprot_mapping_type in ('A','B')")
+    expect_error(UniprotMappingTypeFilter("B", condition = ">"))
+})
diff --git a/inst/unitTests/test_returnCols.R b/tests/testthat/test_Methods-with-returnFilterColumns.R
similarity index 56%
rename from inst/unitTests/test_returnCols.R
rename to tests/testthat/test_Methods-with-returnFilterColumns.R
index 6b829f5..c7eadb2 100644
--- a/inst/unitTests/test_returnCols.R
+++ b/tests/testthat/test_Methods-with-returnFilterColumns.R
@@ -1,319 +1,271 @@
-############################################################
-## Here we're checking the returnFilterColumns setting, i.e.
-## whether also filter columns should be returned or not.
-library(EnsDb.Hsapiens.v75)
-edb <- EnsDb.Hsapiens.v75
-
-## Testing the internal function.
-test_set_returnFilterColumns <- function(x) {
+test_that("set returnFilterColumns works", {
     orig <- returnFilterColumns(edb)
     returnFilterColumns(edb) <- TRUE
-    checkEquals(TRUE, returnFilterColumns(edb))
+    expect_equal(TRUE, returnFilterColumns(edb))
     returnFilterColumns(edb) <- FALSE
-    checkEquals(FALSE, returnFilterColumns(edb))
-    checkException(returnFilterColumns(edb) <- "d")
-    checkException(returnFilterColumns(edb) <- c(TRUE, FALSE))
+    expect_equal(FALSE, returnFilterColumns(edb))
+    expect_error(returnFilterColumns(edb) <- "d")
+    expect_error(returnFilterColumns(edb) <- c(TRUE, FALSE))
     ## Restore the "original" setting
     returnFilterColumns(edb) <- orig
-}
+})
 
-test_with_genes <- function(x) {
+test_that("returnFilterColumns works with_genes", {
     orig <- returnFilterColumns(edb)
 
     returnFilterColumns(edb) <- FALSE
     ## What happens if we use a GRangesFilter with return filter cols FALSE?
-    grf <- GRangesFilter(GRanges(17, IRanges(57180000, 57233000)))
+    grf <- GRangesFilter(GRanges(17, IRanges(57180000, 57233000)),
+                         type = "within")
     res <- genes(edb, filter = grf)
-    checkEquals(res$gene_id, c("ENSG00000224738", "ENSG00000182628", "ENSG00000252212",
-                               "ENSG00000211514", "ENSG00000207996"))
+    expect_equal(res$gene_id,
+                c("ENSG00000224738", "ENSG00000182628", "ENSG00000252212",
+                  "ENSG00000211514", "ENSG00000207996"))
     cols <- c("gene_id", "gene_name")
     res <- genes(edb, filter = grf, return.type = "data.frame",
                  columns = cols)
     ## Expect only the columns
-    checkEquals(colnames(res), cols)
+    expect_equal(colnames(res), cols)
     returnFilterColumns(edb) <- TRUE
     res <- genes(edb, filter = grf, return.type = "data.frame",
                  columns = cols)
     ## Now I expect also the gene coords.
-    checkEquals(colnames(res), c(cols, "gene_seq_start", "gene_seq_end", "seq_name",
-                                 "seq_strand"))
+    expect_equal(colnames(res), c(cols, "gene_seq_start", "gene_seq_end",
+                                 "seq_name", "seq_strand"))
 
     ## Use a gene biotype filter
-    gbt <- GenebiotypeFilter("protein_coding")
+    gbt <- GeneBiotypeFilter("protein_coding")
 
     returnFilterColumns(edb) <- TRUE
     res <- genes(edb, filter = list(gbt, grf), return.type = "data.frame",
                  columns = cols)
-    checkEquals(res$gene_name, "SKA2")
-    checkEquals(colnames(res), c(cols, "gene_biotype", "gene_seq_start", "gene_seq_end",
-                                 "seq_name", "seq_strand"))
+    expect_equal(res$gene_name, "SKA2")
+    expect_equal(colnames(res), c(cols, "gene_biotype", "gene_seq_start",
+                                 "gene_seq_end", "seq_name", "seq_strand"))
     returnFilterColumns(edb) <- FALSE
     res <- genes(edb, filter = list(gbt, grf), return.type = "data.frame",
                  columns = cols)
-    checkEquals(colnames(res), cols)
-
+    expect_equal(colnames(res), cols)
     returnFilterColumns(edb) <- orig
-}
-
+})
 
-test_with_tx <- function(x) {
+test_that("returnFilterColumns works with_tx", {
     orig <- returnFilterColumns(edb)
-
     returnFilterColumns(edb) <- FALSE
     ## What happens if we use a GRangesFilter with return filter cols FALSE?
-    grf <- GRangesFilter(GRanges(17, IRanges(57180000, 57233000)))
+    grf <- GRangesFilter(GRanges(17, IRanges(57180000, 57233000)),
+                         type = "within")
     res <- transcripts(edb, filter = grf)
     cols <- c("tx_id", "gene_name")
     res <- transcripts(edb, filter = grf, return.type = "data.frame",
                        columns = cols)
     ## Expect only the columns
-    checkEquals(colnames(res), cols)
+    expect_equal(colnames(res), cols)
     returnFilterColumns(edb) <- TRUE
     res <- transcripts(edb, filter = grf, return.type = "data.frame",
                        columns = cols)
     ## Now I expect also the gene coords.
-    checkEquals(colnames(res), c(cols, "tx_seq_start", "tx_seq_end", "seq_name",
+    expect_equal(colnames(res), c(cols, "tx_seq_start", "tx_seq_end", "seq_name",
                                  "seq_strand"))
-
     ## Use a gene biotype filter
-    gbt <- GenebiotypeFilter("protein_coding")
+    gbt <- GeneBiotypeFilter("protein_coding")
 
     returnFilterColumns(edb) <- TRUE
     res <- transcripts(edb, filter = list(gbt, grf), return.type = "data.frame",
                        columns = cols)
-    checkEquals(unique(res$gene_name), "SKA2")
-    checkEquals(colnames(res), c(cols, "gene_biotype", "tx_seq_start", "tx_seq_end",
-                                 "seq_name", "seq_strand"))
+    expect_equal(unique(res$gene_name), "SKA2")
+    expect_equal(colnames(res), c(cols, "gene_biotype", "tx_seq_start",
+                                 "tx_seq_end", "seq_name", "seq_strand"))
     returnFilterColumns(edb) <- FALSE
     res <- transcripts(edb, filter = list(gbt, grf), return.type = "data.frame",
                        columns = cols)
-    checkEquals(colnames(res), cols)
-
+    expect_equal(colnames(res), cols)
     returnFilterColumns(edb) <- orig
-}
+})
 
-
-test_with_exons <- function(x) {
+test_that("returnFilterColumns works with exons", {
     orig <- returnFilterColumns(edb)
-
     returnFilterColumns(edb) <- FALSE
     ## What happens if we use a GRangesFilter with return filter cols FALSE?
-    grf <- GRangesFilter(GRanges(17, IRanges(57180000, 57233000)))
+    grf <- GRangesFilter(GRanges(17, IRanges(57180000, 57233000)),
+                         type = "within")
     res <- exons(edb, filter = grf)
     cols <- c("exon_id", "gene_name")
     res <- exons(edb, filter = grf, return.type = "data.frame",
                  columns = cols)
     ## Expect only the columns
-    checkEquals(colnames(res), cols)
+    expect_equal(colnames(res), cols)
     returnFilterColumns(edb) <- TRUE
     res <- exons(edb, filter = grf, return.type = "data.frame",
                  columns = cols)
     ## Now I expect also the gene coords.
-    checkEquals(colnames(res), c(cols, "exon_seq_start", "exon_seq_end", "seq_name",
-                                 "seq_strand"))
-
+    expect_equal(colnames(res), c(cols, "exon_seq_start", "exon_seq_end",
+                                 "seq_name", "seq_strand"))
     ## Use a gene biotype filter
-    gbt <- GenebiotypeFilter("protein_coding")
-
+    gbt <- GeneBiotypeFilter("protein_coding")
     returnFilterColumns(edb) <- TRUE
     res <- exons(edb, filter = list(gbt, grf), return.type = "data.frame",
                  columns = cols)
-    checkEquals(unique(res$gene_name), c("TRIM37", "SKA2"))
-    checkEquals(colnames(res), c(cols, "gene_biotype", "exon_seq_start", "exon_seq_end",
-                                 "seq_name", "seq_strand"))
+    expect_equal(unique(res$gene_name), c("TRIM37", "SKA2"))
+    expect_equal(colnames(res), c(cols, "gene_biotype", "exon_seq_start",
+                                 "exon_seq_end", "seq_name", "seq_strand"))
     returnFilterColumns(edb) <- FALSE
     res <- exons(edb, filter = list(gbt, grf), return.type = "data.frame",
                  columns = cols)
-    checkEquals(colnames(res), cols)
-
+    expect_equal(colnames(res), cols)
     returnFilterColumns(edb) <- orig
-}
+})
 
-test_with_exonsBy <- function(x) {
+test_that("returnFilterColumns works with exonsBy", {
     orig <- returnFilterColumns(edb)
-
     returnFilterColumns(edb) <- FALSE
     ## What happens if we use a GRangesFilter with return filter cols FALSE?
-    grf <- GRangesFilter(GRanges(17, IRanges(57180000, 57233000)))
+    grf <- GRangesFilter(GRanges(17, IRanges(57180000, 57233000)),
+                         type = "within")
     ## By genes
     cols <- c("exon_id", "gene_name")
     res <- exonsBy(edb, by = "gene", filter = grf, columns = cols)
     res <- unlist(res)
     ## Expect only the columns
-    checkEquals(colnames(mcols(res)), cols)
-
+    expect_equal(colnames(mcols(res)), cols)
     returnFilterColumns(edb) <- TRUE
     res <- exonsBy(edb, by = "gene", filter = grf, columns = cols)
     res <- unlist(res)
-    ## Now I expect also the gene coords, but not the seq_name and seq_strand, as these
-    ## are redundant with data which is in the GRanges!
-    checkEquals(colnames(mcols(res)), c(cols, "gene_seq_start", "gene_seq_end"))
-
+    ## Now I expect also the gene coords, but not the seq_name and seq_strand,
+    ## as these are redundant with data which is in the GRanges!
+    expect_equal(colnames(mcols(res)), c(cols, "gene_seq_start", "gene_seq_end"))
     ## Use a gene biotype filter
-    gbt <- GenebiotypeFilter("protein_coding")
-
+    gbt <- GeneBiotypeFilter("protein_coding")
     returnFilterColumns(edb) <- TRUE
     res <- unlist(exonsBy(edb, by = "gene", filter = list(gbt, grf), columns = cols))
-    checkEquals(unique(res$gene_name), c("SKA2"))
-    checkEquals(colnames(mcols(res)), c(cols, "gene_biotype", "gene_seq_start", "gene_seq_end"))
+    expect_equal(unique(res$gene_name), c("SKA2"))
+    expect_equal(colnames(mcols(res)), c(cols, "gene_biotype", "gene_seq_start", "gene_seq_end"))
     returnFilterColumns(edb) <- FALSE
     res <- unlist(exonsBy(edb, by = "gene", filter = list(gbt, grf), columns = cols))
-    checkEquals(colnames(mcols(res)), cols)
-
+    expect_equal(colnames(mcols(res)), cols)
     ## By tx
     returnFilterColumns(edb) <- FALSE
     cols <- c("exon_id", "gene_name")
     res <- exonsBy(edb, by = "tx", filter = grf, columns = cols)
     res <- unlist(res)
     ## Expect only the columns
-    checkEquals(colnames(mcols(res)), c(cols, "exon_rank"))
-
+    expect_equal(colnames(mcols(res)), c(cols, "exon_rank"))
     returnFilterColumns(edb) <- TRUE
     res <- exonsBy(edb, by = "tx", filter = grf, columns = cols)
     res <- unlist(res)
     ## Now I expect also the gene coords.
-    checkEquals(colnames(mcols(res)), c(cols, "tx_seq_start", "tx_seq_end",
+    expect_equal(colnames(mcols(res)), c(cols, "tx_seq_start", "tx_seq_end",
                                         "exon_rank"))
-
     ## Use a gene biotype filter
-    gbt <- GenebiotypeFilter("protein_coding")
-
+    gbt <- GeneBiotypeFilter("protein_coding")
     returnFilterColumns(edb) <- TRUE
-    res <- unlist(exonsBy(edb, by = "tx", filter = list(gbt, grf), columns = cols))
-    checkEquals(unique(res$gene_name), c("SKA2"))
-    checkEquals(colnames(mcols(res)), c(cols, "gene_biotype", "tx_seq_start", "tx_seq_end",
-                                        "exon_rank"))
+    res <- unlist(exonsBy(edb, by = "tx", filter = list(gbt, grf),
+                          columns = cols))
+    expect_equal(unique(res$gene_name), c("SKA2"))
+    expect_equal(colnames(mcols(res)), c(cols, "gene_biotype", "tx_seq_start",
+                                        "tx_seq_end", "exon_rank"))
     returnFilterColumns(edb) <- FALSE
-    res <- unlist(exonsBy(edb, by = "tx", filter = list(gbt, grf), columns = cols))
-    checkEquals(colnames(mcols(res)), c(cols, "exon_rank"))
-
+    res <- unlist(exonsBy(edb, by = "tx", filter = list(gbt, grf),
+                          columns = cols))
+    expect_equal(colnames(mcols(res)), c(cols, "exon_rank"))
     returnFilterColumns(edb) <- orig
-}
-
+})
 
-test_with_transcriptsBy <- function(x) {
+test_that("returnFilterColumns works with transcriptsBy", {
     orig <- returnFilterColumns(edb)
-
     returnFilterColumns(edb) <- FALSE
     ## What happens if we use a GRangesFilter with return filter cols FALSE?
-    grf <- GRangesFilter(GRanges(17, IRanges(57180000, 57233000)))
+    grf <- GRangesFilter(GRanges(17, IRanges(57180000, 57233000)),
+                         type = "within")
     ## By genes
     cols <- c("tx_id", "gene_name")
     res <- transcriptsBy(edb, by = "gene", filter = grf, columns = cols)
     res <- unlist(res)
     ## Expect only the columns
-    checkEquals(colnames(mcols(res)), cols)
-
+    expect_equal(colnames(mcols(res)), cols)
     returnFilterColumns(edb) <- TRUE
     res <- transcriptsBy(edb, by = "gene", filter = grf, columns = cols)
     res <- unlist(res)
     ## Now I expect also the gene coords.
-    checkEquals(colnames(mcols(res)), c(cols, "gene_seq_start", "gene_seq_end"))
-
+    expect_equal(colnames(mcols(res)), c(cols, "gene_seq_start", "gene_seq_end"))
     ## Use a gene biotype filter
-    gbt <- GenebiotypeFilter("protein_coding")
-
+    gbt <- GeneBiotypeFilter("protein_coding")
     returnFilterColumns(edb) <- TRUE
-    res <- unlist(transcriptsBy(edb, by = "gene", filter = list(gbt, grf), columns = cols))
-    checkEquals(unique(res$gene_name), c("SKA2"))
-    checkEquals(colnames(mcols(res)), c(cols, "gene_biotype", "gene_seq_start", "gene_seq_end"))
+    res <- unlist(transcriptsBy(edb, by = "gene", filter = list(gbt, grf),
+                                columns = cols))
+    expect_equal(unique(res$gene_name), c("SKA2"))
+    expect_equal(colnames(mcols(res)),
+                c(cols, "gene_biotype", "gene_seq_start", "gene_seq_end"))
     returnFilterColumns(edb) <- FALSE
-    res <- unlist(transcriptsBy(edb, by = "gene", filter = list(gbt, grf), columns = cols))
-    checkEquals(colnames(mcols(res)), cols)
-
-    ## ## By exon
-    ## returnFilterColumns(edb) <- FALSE
-    ## cols <- c("tx_id", "gene_name")
-    ## res <- transcriptsBy(edb, by = "exon", filter = grf, columns = cols)
-    ## res <- unlist(res)
-    ## ## Expect only the columns
-    ## checkEquals(colnames(mcols(res)), c(cols))
-
-    ## returnFilterColumns(edb) <- TRUE
-    ## res <- transcriptsBy(edb, by = "exon", filter = grf, columns = cols)
-    ## res <- unlist(res)
-    ## ## Now I expect also the gene coords.
-    ## checkEquals(colnames(mcols(res)), c(cols, "exon_seq_start", "exon_seq_end"))
-
-    ## ## Use a gene biotype filter
-    ## gbt <- GenebiotypeFilter("protein_coding")
-
-    ## returnFilterColumns(edb) <- TRUE
-    ## res <- unlist(transcriptsBy(edb, by = "exon", filter = list(gbt, grf), columns = cols))
-    ## checkEquals(unique(res$gene_name), c("SKA2", "TRIM37"))
-    ## checkEquals(colnames(mcols(res)), c(cols, "gene_biotype", "exon_seq_start", "exon_seq_end"))
-    ## returnFilterColumns(edb) <- FALSE
-    ## res <- unlist(transcriptsBy(edb, by = "exon", filter = list(gbt, grf), columns = cols))
-    ## checkEquals(colnames(mcols(res)), c(cols))
-
+    res <- unlist(transcriptsBy(edb, by = "gene", filter = list(gbt, grf),
+                                columns = cols))
+    expect_equal(colnames(mcols(res)), cols)
     returnFilterColumns(edb) <- orig
-}
+})
 
-test_with_cdsBy <- function(x) {
+test_that("returnFilterColumns works with_cdsBy", {
     orig <- returnFilterColumns(edb)
-    grf <- GRangesFilter(GRanges(17, IRanges(57180000, 57233000)))
-
+    grf <- GRangesFilter(GRanges(17, IRanges(57180000, 57233000)),
+                         type = "within")
     ## By tx
     returnFilterColumns(edb) <- FALSE
     cols <- c("gene_id", "gene_name")
     res <- cdsBy(edb, by = "tx", filter = grf, columns = cols)
     res <- unlist(res)
     ## Expect only the columns
-    checkEquals(colnames(mcols(res)), c(cols, "exon_id", "exon_rank"))
-
+    expect_equal(colnames(mcols(res)), c(cols, "exon_id", "exon_rank"))
     returnFilterColumns(edb) <- TRUE
     res <- cdsBy(edb, by = "tx", filter = grf, columns = cols)
     res <- unlist(res)
     ## Now I expect also the gene coords.
-    checkEquals(colnames(mcols(res)), c(cols, "tx_seq_start", "tx_seq_end",
-                                        "seq_name", "seq_strand", "exon_id", "exon_rank"))
-
+    expect_equal(colnames(mcols(res)), c(cols, "tx_seq_start", "tx_seq_end",
+                                        "seq_name", "seq_strand", "exon_id",
+                                        "exon_rank"))
     ## Use a gene biotype filter
-    gbt <- GenebiotypeFilter("protein_coding")
-
+    gbt <- GeneBiotypeFilter("protein_coding")
     returnFilterColumns(edb) <- TRUE
     res <- unlist(cdsBy(edb, by = "tx", filter = list(gbt, grf), columns = cols))
-    checkEquals(unique(res$gene_name), c("SKA2"))
-    checkEquals(colnames(mcols(res)), c(cols, "gene_biotype", "tx_seq_start", "tx_seq_end",
-                                        "seq_name", "seq_strand", "exon_id", "exon_rank"))
+    expect_equal(unique(res$gene_name), c("SKA2"))
+    expect_equal(colnames(mcols(res)), c(cols, "gene_biotype", "tx_seq_start",
+                                        "tx_seq_end", "seq_name", "seq_strand",
+                                        "exon_id", "exon_rank"))
     returnFilterColumns(edb) <- FALSE
     res <- unlist(cdsBy(edb, by = "tx", filter = list(gbt, grf), columns = cols))
-    checkEquals(colnames(mcols(res)), c(cols, "exon_id", "exon_rank"))
-
+    expect_equal(colnames(mcols(res)), c(cols, "exon_id", "exon_rank"))
     returnFilterColumns(edb) <- orig
-}
+})
 
-test_with_threeUTRsByTranscript <- function(x) {
+test_that("returnFilterColumns works with threeUTRsByTranscript", {
     orig <- returnFilterColumns(edb)
-    grf <- GRangesFilter(GRanges(17, IRanges(57180000, 57233000)))
-
+    grf <- GRangesFilter(GRanges(17, IRanges(57180000, 57233000)),
+                         type = "within")
     ## By tx
     returnFilterColumns(edb) <- FALSE
     cols <- c("gene_id", "gene_name")
     res <- threeUTRsByTranscript(edb, filter = grf, columns = cols)
     res <- unlist(res)
     ## Expect only the columns
-    checkEquals(colnames(mcols(res)), c(cols, "exon_id", "exon_rank"))
-
+    expect_equal(colnames(mcols(res)), c(cols, "exon_id", "exon_rank"))
     returnFilterColumns(edb) <- TRUE
     res <- threeUTRsByTranscript(edb, filter = grf, columns = cols)
     res <- unlist(res)
     ## Now I expect also the gene coords.
-    checkEquals(colnames(mcols(res)), c(cols, "tx_seq_start", "tx_seq_end",
-                                        "seq_name", "seq_strand", "exon_id", "exon_rank"))
-
+    expect_equal(colnames(mcols(res)), c(cols, "tx_seq_start", "tx_seq_end",
+                                        "seq_name", "seq_strand", "exon_id",
+                                        "exon_rank"))
     ## Use a gene biotype filter
-    gbt <- GenebiotypeFilter("protein_coding")
-
+    gbt <- GeneBiotypeFilter("protein_coding")
     returnFilterColumns(edb) <- TRUE
-    res <- unlist(threeUTRsByTranscript(edb, filter = list(gbt, grf), columns = cols))
-    checkEquals(unique(res$gene_name), c("SKA2"))
-    checkEquals(colnames(mcols(res)), c(cols, "gene_biotype", "tx_seq_start", "tx_seq_end",
-                                        "seq_name", "seq_strand", "exon_id", "exon_rank"))
+    res <- unlist(threeUTRsByTranscript(edb, filter = list(gbt, grf),
+                                        columns = cols))
+    expect_equal(unique(res$gene_name), c("SKA2"))
+    expect_equal(colnames(mcols(res)), c(cols, "gene_biotype", "tx_seq_start",
+                                        "tx_seq_end", "seq_name", "seq_strand",
+                                        "exon_id", "exon_rank"))
     returnFilterColumns(edb) <- FALSE
-    res <- unlist(threeUTRsByTranscript(edb, filter = list(gbt, grf), columns = cols))
-    checkEquals(colnames(mcols(res)), c(cols, "exon_id", "exon_rank"))
-
+    res <- unlist(threeUTRsByTranscript(edb, filter = list(gbt, grf),
+                                        columns = cols))
+    expect_equal(colnames(mcols(res)), c(cols, "exon_id", "exon_rank"))
     returnFilterColumns(edb) <- orig
-}
+})
 
diff --git a/tests/testthat/test_Methods.R b/tests/testthat/test_Methods.R
new file mode 100644
index 0000000..d87f50f
--- /dev/null
+++ b/tests/testthat/test_Methods.R
@@ -0,0 +1,893 @@
+
+## testing genes method.
+test_that("genes method works", {
+    Gns <- genes(edb, filter = ~ genename == "BCL2")
+    expect_identical(Gns$gene_name, "BCL2")
+    Gns <- genes(edb, filter = SeqNameFilter("Y"), return.type = "DataFrame")
+    expect_identical(sort(colnames(Gns)),
+                     sort(unique(c(listColumns(edb, "gene"), "entrezid"))))
+    Gns <- genes(edb, filter = ~ seq_name == "Y" & gene_id == "ENSG00000012817",
+                 return.type = "DataFrame",
+                 columns = c("gene_id", "tx_name"))
+    expect_identical(colnames(Gns), c("gene_id", "tx_name", "seq_name"))
+    expect_true(all(Gns$seq_name == "Y"))
+    expect_true(all(Gns$gene_id == "ENSG00000012817"))
+    Gns <- genes(edb,
+                 filter = AnnotationFilterList(SeqNameFilter("Y"),
+                                               GeneIdFilter("ENSG00000012817")),
+                 columns = c("gene_id", "gene_name"))
+    ## Here we don't need the seqnames in mcols!
+    expect_identical(colnames(mcols(Gns)), c("gene_id", "gene_name"))
+    expect_true(all(Gns$seq_name == "Y"))
+    expect_true(all(Gns$gene_id == "ENSG00000012817"))
+
+    Gns <- genes(edb, filter = ~ seq_name == "Y" | genename == "BCL2",
+                 return.type = "DataFrame")
+    expect_true(all(Gns$seq_name %in% c("18", "Y")))
+    Gns <- genes(edb,
+                 filter = AnnotationFilterList(SeqNameFilter("Y"),
+                                               GenenameFilter("BCL2")),
+                 return.type = "DataFrame")
+    expect_true(nrow(Gns) == 0)
+    Gns <- genes(edb,
+                 filter = AnnotationFilterList(SeqNameFilter("Y"),
+                                               GenenameFilter("BCL2"),
+                                               logOp = "|"),
+                 return.type = "DataFrame")
+    expect_true(all(Gns$seq_name %in% c("18", "Y")))
+
+    afl <- AnnotationFilterList(GenenameFilter(c("BCL2", "BCL2L11")),
+                                SeqNameFilter(18), logOp = "&")
+    afl2 <- AnnotationFilterList(SeqNameFilter("Y"), afl, logOp = "|")
+    Gns <- genes(edb, filter = afl2, columns = "gene_name",
+                 return.type = "DataFrame")
+    expect_identical(colnames(Gns), c("gene_name", "gene_id", "seq_name"))
+    expect_true(!any(Gns$gene_name == "BCL2L11"))
+    expect_true(any(Gns$gene_name == "BCL2"))
+    expect_true(all(Gns$seq_name %in% c("Y", "18")))
+})
+
+test_that("transcripts method works", {
+    Tns <- transcripts(edb, filter = SeqNameFilter("Y"),
+                       return.type = "DataFrame")
+    expect_identical(sort(colnames(Tns)), sort(c(listColumns(edb, "tx"),
+                                                 "seq_name")))
+    Tns <- transcripts(edb, columns = c("tx_id", "tx_name"),
+                       filter = list(SeqNameFilter("Y"),
+                                     TxIdFilter("ENST00000435741")))
+    expect_identical(sort(colnames(mcols(Tns))), sort(c("tx_id", "tx_name")))
+    expect_true(all(Tns$seq_name == "Y"))
+    expect_true(all(Tns$tx_id == "ENST00000435741"))
+    ## Check the default ordering.
+    Tns <- transcripts(edb, filter = list(TxBiotypeFilter("protein_coding"),
+                                          SeqNameFilter("X")),
+                       return.type = "data.frame",
+                       columns = c("seq_name", listColumns(edb, "tx")))
+    expect_identical(order(Tns$seq_name, method = "radix"), 1:nrow(Tns))
+})
+
+test_that("promoters works", {
+    res <- promoters(edb, filter = ~ genename == "ZBTB16")
+    res_2 <- transcripts(edb, filter = GenenameFilter("ZBTB16"))
+    expect_identical(length(res), length(res_2))
+    expect_true(all(width(res) == 2200))
+})
+
+test_that("transcriptsBy works", {
+    ## Expect results on the forward strand to be ordered by tx_seq_start
+    res <- transcriptsBy(edb, filter = ~ seq_name == "Y" & seq_strand == "-",
+                         by = "gene")
+    fw <- res[[3]]
+    expect_identical(order(start(fw)), 1:length(fw))
+    ## Expect results on the reverse strand to be ordered by -tx_seq_end
+    res <- transcriptsBy(edb, filter = list(SeqNameFilter("Y"),
+                                            SeqStrandFilter("-")), by = "gene")
+    rv <- res[[3]]
+    expect_identical(order(start(rv), decreasing = TRUE), 1:length(rv))
+})
+
+test_that("exons works", {
+    Exns <- exons(edb, filter = SeqNameFilter("Y"), return.type = "DataFrame")
+    expect_identical(sort(colnames(Exns)),
+                     sort(c(listColumns(edb, "exon"), "seq_name")))
+    ## Check correct ordering.
+    Exns <- exons(edb, return.type = "data.frame", filter = SeqNameFilter(20:22))
+    expect_identical(order(Exns$seq_name, method = "radix"), 1:nrow(Exns))
+})
+
+test_that("exonsBy works", {
+    ##ExnsBy <- exonsBy(edb, filter=list(SeqNameFilter("X")), by="tx")
+    ExnsBy <- exonsBy(edb, filter = list(SeqNameFilter("Y")), by = "tx",
+                      columns = c("tx_name"))
+    expect_identical(sort(colnames(mcols(ExnsBy[[1]]))),
+                     sort(c("exon_id", "exon_rank", "tx_name")))
+    suppressWarnings(
+        ExnsBy <- exonsBy(edb, filter = list(SeqNameFilter("Y")), by = "tx",
+                          columns = c("tx_name"), use.names = TRUE)
+    )
+    expect_identical(sort(colnames(mcols(ExnsBy[[1]]))),
+                     sort(c("exon_id", "exon_rank", "tx_name")))
+    
+    ## Check what happens if we specify tx_id.
+    ExnsBy <- exonsBy(edb, filter=list(SeqNameFilter("Y")), by="tx",
+                      columns=c("tx_id"))
+    expect_identical(sort(colnames(mcols(ExnsBy[[1]]))),
+                     sort(c("exon_id", "exon_rank", "tx_id")))
+    ExnsBy <- exonsBy(edb, filter=list(SeqNameFilter("Y"), SeqStrandFilter("+")),
+                      by="gene")
+    ## Check that ordering is on start on the forward strand.
+    fw <- ExnsBy[[3]]
+    expect_identical(order(start(fw)), 1:length(fw))
+    ##
+    ExnsBy <- exonsBy(edb, filter=list(SeqNameFilter("Y"), SeqStrandFilter("-")),
+                      by="gene")
+    ## Check that ordering is on start on the forward strand.
+    rv <- ExnsBy[[3]]
+    expect_identical(order(end(rv), decreasing = TRUE), 1:length(rv))
+})
+
+test_that("listGenebiotypes works", {
+    GBT <- listGenebiotypes(edb)
+    TBT <- listTxbiotypes(edb)
+})
+
+## test if we get the expected exceptions if we're not submitting
+## correct filter objects
+test_that("Filter errors work in methods", {
+    expect_error(genes(edb, filter="d"))
+    expect_error(genes(edb, filter=list(SeqNameFilter("X"), "z")))
+    expect_error(transcripts(edb, filter="d"))
+    expect_error(transcripts(edb, filter=list(SeqNameFilter("X"), "z")))
+    expect_error(exons(edb, filter="d"))
+    expect_error(exons(edb, filter=list(SeqNameFilter("X"), "z")))
+    expect_error(exonsBy(edb, filter="d"))
+    expect_error(exonsBy(edb, filter=list(SeqNameFilter("X"), "z")))
+    expect_error(transcriptsBy(edb, filter="d"))
+    expect_error(transcriptsBy(edb, filter=list(SeqNameFilter("X"), "z")))
+    expect_error(transcripts(edb, filter = ~ other_filter == "b"))
+})
+
+test_that("genes returns correct columns", {
+    cols <- c("gene_name", "tx_id")
+    Resu <- genes(edb, filter=SeqNameFilter("Y"), columns=cols,
+                  return.type = "data.frame")
+    expect_identical(sort(c(cols, "seq_name", "gene_id")), sort(colnames(Resu)))
+
+    Resu <- genes(edb, filter=SeqNameFilter("Y"), columns=cols,
+                  return.type = "DataFrame")
+    expect_identical(sort(c(cols, "seq_name", "gene_id")), sort(colnames(Resu)))
+
+    Resu <- genes(edb, filter=SeqNameFilter("Y"), columns=cols)
+    expect_identical(sort(c(cols, "gene_id")), sort(colnames(mcols(Resu))))
+})
+
+test_that("transcripts return correct columns", {
+    cols <- c("tx_id", "exon_id", "tx_biotype")
+    Resu <- transcripts(edb, filter=SeqNameFilter("Y"), columns=cols,
+                        return.type = "data.frame")
+    expect_identical(sort(c(cols, "seq_name")), sort(colnames(Resu)))
+    Resu <- transcripts(edb, filter=SeqNameFilter("Y"), columns=cols,
+                        return.type = "DataFrame")
+    expect_identical(sort(c(cols, "seq_name")), sort(colnames(Resu)))
+    Resu <- transcripts(edb, filter=SeqNameFilter("Y"), columns=cols)
+    expect_identical(sort(cols), sort(colnames(mcols(Resu))))
+})
+
+test_that("exons returns correct columns", {
+    cols <- c("tx_id", "exon_id", "tx_biotype")
+    Resu <- exons(edb, filter=SeqNameFilter("Y"), columns=cols,
+                  return.type = "data.frame")
+    expect_identical(sort(c(cols, "seq_name")), sort(colnames(Resu)))
+    Resu <- exons(edb, filter=SeqNameFilter("Y"), columns=cols,
+                  return.type = "DataFrame")
+    expect_identical(sort(c(cols, "seq_name")), sort(colnames(Resu)))
+    Resu <- exons(edb, filter=SeqNameFilter("Y"), columns=cols)
+    expect_identical(sort(cols), sort(colnames(mcols(Resu))))
+})
+
+test_that("cdsBy works", {
+    checkSingleTx <- function(tx, cds, do.plot=FALSE){
+        rownames(tx) <- tx$exon_id
+        tx <- tx[cds$exon_id, ]
+        ## cds start and end have to be within the correct range.
+        expect_true(all(start(cds) >= min(tx$tx_cds_seq_start)))
+        expect_true(all(end(cds) <= max(tx$tx_cds_seq_end)))
+        ## For all except the first and the last we have to assume that
+        ## exon_seq_start
+        ## is equal to start of cds.
+        expect_true(all(start(cds)[-1] == tx$exon_seq_start[-1]))
+        expect_true(all(end(cds)[-nrow(tx)] == tx$exon_seq_end[-nrow(tx)]))
+        ## just plotting the stuff...
+        if(do.plot){
+            XL <- range(tx[, c("exon_seq_start", "exon_seq_end")])
+            YL <- c(0, 4)
+            plot(3, 3, pch=NA, xlim=XL, ylim=YL, xlab="", yaxt="n", ylab="")
+            ## plotting the "real" exons:
+            rect(xleft=tx$exon_seq_start, xright=tx$exon_seq_end,
+                 ybottom=rep(0, nrow(tx)),
+                 ytop=rep(1, nrow(tx)))
+            ## plotting the cds:
+            rect(xleft=start(cds), xright=end(cds), ybottom=rep(1.2, nrow(tx)),
+                 ytop=rep(2.2, nrow(tx)), col="blue")
+        }
+    }
+
+    ## Just checking if we get also tx_name
+    cs <- cdsBy(edb, filter = SeqNameFilter("Y"), column="tx_name")
+    expect_true(any(colnames(mcols(cs[[1]])) == "tx_name"))
+
+    do.plot <- FALSE
+    ## By tx
+    cs <- cdsBy(edb, filter=list(SeqNameFilter("Y"), SeqStrandFilter("+")))
+    tx <- exonsBy(edb, filter=list(SeqNameFilter("Y"), SeqStrandFilter("+")))
+    ## Check for the first if it makes sense:
+    whichTx <- names(cs)[1]
+    whichCs <- cs[[1]]
+    tx <- transcripts(edb, filter=TxIdFilter(whichTx),
+                      columns=c("tx_seq_start", "tx_seq_end",
+                                "tx_cds_seq_start", "tx_cds_seq_end",
+                                "exon_seq_start", "exon_seq_end",
+                                "exon_idx", "exon_id", "seq_strand"),
+                      return.type="data.frame")
+    checkSingleTx(tx=tx, cds=whichCs, do.plot=do.plot)
+    ## Next one:
+    whichTx <- names(cs)[2]
+    tx <- transcripts(edb, filter=TxIdFilter(whichTx),
+                      columns=c("tx_seq_start", "tx_seq_end",
+                                "tx_cds_seq_start", "tx_cds_seq_end",
+                                "exon_seq_start", "exon_seq_end",
+                                "exon_idx", "exon_id"),
+                      return.type="data.frame")
+    checkSingleTx(tx=tx, cds=cs[[2]], do.plot=do.plot)
+
+    ## Now for reverse strand:
+    cs <- cdsBy(edb, filter=list(SeqNameFilter("Y"), SeqStrandFilter("-")))
+    whichTx <- names(cs)[1]
+    whichCs <- cs[[1]]
+    tx <- transcripts(edb, filter=TxIdFilter(whichTx),
+                      columns=c("tx_seq_start", "tx_seq_end",
+                                "tx_cds_seq_start", "tx_cds_seq_end",
+                                "exon_seq_start", "exon_seq_end",
+                                "exon_idx", "exon_id"),
+                      return.type="data.frame")
+    ## order the guys by seq_start
+    whichCs <- whichCs[order(start(whichCs))]
+    checkSingleTx(tx=tx, cds=whichCs, do.plot=do.plot)
+    ## Next one:
+    whichTx <- names(cs)[2]
+    whichCs <- cs[[2]]
+    tx <- transcripts(edb, filter=TxIdFilter(whichTx),
+                      columns=c("tx_seq_start", "tx_seq_end",
+                                "tx_cds_seq_start", "tx_cds_seq_end",
+                                "exon_seq_start", "exon_seq_end",
+                                "exon_idx", "exon_id"),
+                      return.type="data.frame")
+    ## order the guys by seq_start
+    whichCs <- whichCs[order(start(whichCs))]
+    checkSingleTx(tx=tx, cds=whichCs, do.plot=do.plot)
+    ## Check adding columns
+    Test <- cdsBy(edb, filter=list(SeqNameFilter("Y")),
+                  columns=c("gene_biotype", "gene_name"))
+})
+
+test_that("cdsBy with gene works", {
+    checkSingleGene <- function(whichCs, gene, do.plot=FALSE){
+        tx <- transcripts(edb, filter=GeneIdFilter(gene),
+                          columns=c("tx_seq_start", "tx_seq_end",
+                                    "tx_cds_seq_start", "tx_cds_seq_end",
+                                    "tx_id", "exon_id", "exon_seq_start",
+                                    "exon_seq_end"),
+                          return.type="data.frame")
+        XL <- range(tx[, c("tx_seq_start", "tx_seq_end")])
+        tx <- split(tx, f=tx$tx_id)
+        if(do.plot){
+            ##XL <- range(c(start(whichCs), end(whichCs)))
+            YL <- c(0, length(tx) + 1)
+            plot(4, 4, pch=NA, xlim=XL, ylim=YL, yaxt="n", ylab="", xlab="")
+            ## plot the txses
+            for(i in 1:length(tx)){
+                current <- tx[[i]]
+                rect(xleft=current$exon_seq_start, xright=current$exon_seq_end,
+                     ybottom=rep((i-1+0.1), nrow(current)),
+                     ytop=rep((i-0.1), nrow(current)))
+                ## coding:
+                rect(xleft = current$tx_cds_seq_start,
+                     xright = current$tx_cds_seq_end,
+                     ybottom = rep((i-1+0.1), nrow(current)),
+                     ytop = rep((i-0.1), nrow(current)),
+                     border = "blue")
+            }
+            rect(xleft=start(whichCs), xright=end(whichCs),
+                 ybottom=rep(length(tx)+0.1, length(whichCs)),
+                 ytop=rep(length(tx)+0.9, length(whichCs)), border="red")
+        }
+    }
+    do.plot <- FALSE
+    ## By gene.
+    cs <- cdsBy(edb, filter=list(SeqNameFilter("Y"), SeqStrandFilter("+")),
+                by="gene", columns=NULL)
+    checkSingleGene(cs[[1]], gene=names(cs)[[1]], do.plot=do.plot)
+    checkSingleGene(cs[[2]], gene=names(cs)[[2]], do.plot=do.plot)
+    ## - strand
+    cs <- cdsBy(edb, filter=list(SeqNameFilter("Y"), SeqStrandFilter("-")),
+                by="gene", columns=NULL)
+    checkSingleGene(cs[[1]], gene=names(cs)[[1]], do.plot=do.plot)
+    checkSingleGene(cs[[2]], gene=names(cs)[[2]], do.plot=do.plot)
+    ## looks good!
+    cs2 <- cdsBy(edb, filter=list(SeqNameFilter("Y"), SeqStrandFilter("+")),
+                 by="gene", use.names=TRUE)
+})
+
+test_that("UTRs work", {
+    checkGeneUTRs <- function(f, t, c, tx, do.plot=FALSE){
+        if(any(strand(c) == "+")){
+            ## End of five UTR has to be smaller than any start of cds
+            expect_true(max(end(f)) < min(start(c)))
+            ## 3'
+            expect_true(min(start(t)) > max(end(c)))
+        }else{
+            ## 5'
+            expect_true(min(start(f)) > max(end(c)))
+            ## 3'
+            expect_true(max(end(t)) < min(start(c)))
+        }
+        ## just plot...
+        if(do.plot){
+            tx <- transcripts(edb, filter=TxIdFilter(tx),
+                              columns=c("exon_seq_start", "exon_seq_end"),
+                              return.type="data.frame")
+            XL <- range(c(start(f), start(c), start(t), end(f), end(c), end(t)))
+            YL <- c(0, 4)
+            plot(4, 4, pch=NA, xlim=XL, ylim=YL, yaxt="n", ylab="", xlab="")
+            ## five UTR
+            rect(xleft=start(f), xright=end(f), ybottom=0.1, ytop=0.9, col="blue")
+            ## cds
+            rect(xleft=start(c), xright=end(c), ybottom=1.1, ytop=1.9)
+            ## three UTR
+            rect(xleft=start(t), xright=end(t), ybottom=2.1, ytop=2.9, col="red")
+            ## all exons
+            rect(xleft=tx$exon_seq_start, xright=tx$exon_seq_end,
+                 ybottom=3.1, ytop=3.9)
+        }
+    }
+    ## check presence of tx_name
+    fUTRs <- fiveUTRsByTranscript(edb,
+                                  filter = TxIdFilter("ENST00000155093"),
+                                  column = "tx_name")
+    expect_true(any(colnames(mcols(fUTRs[[1]])) == "tx_name"))
+
+    do.plot <- FALSE
+    fUTRs <- fiveUTRsByTranscript(edb, filter = list(SeqNameFilter("Y"),
+                                                     SeqStrandFilter("+")))
+    tUTRs <- threeUTRsByTranscript(edb, filter = list(SeqNameFilter("Y"),
+                                                      SeqStrandFilter("+")))
+    cds <- cdsBy(edb, "tx", filter = list(SeqNameFilter("Y"),
+                                          SeqStrandFilter("+")))
+    ## Check a TX:
+    tx <- names(fUTRs)[1]
+    checkGeneUTRs(fUTRs[[tx]], tUTRs[[tx]], cds[[tx]], tx = tx,
+                  do.plot = do.plot)
+    tx <- names(fUTRs)[2]
+    checkGeneUTRs(fUTRs[[tx]], tUTRs[[tx]], cds[[tx]], tx = tx,
+                  do.plot = do.plot)
+    tx <- names(fUTRs)[3]
+    checkGeneUTRs(fUTRs[[tx]], tUTRs[[tx]], cds[[tx]], tx = tx,
+                  do.plot = do.plot)
+
+    ## Reverse strand
+    fUTRs <- fiveUTRsByTranscript(edb, filter = list(SeqNameFilter("Y"),
+                                                     SeqStrandFilter("-")))
+    tUTRs <- threeUTRsByTranscript(edb, filter = list(SeqNameFilter("Y"),
+                                                      SeqStrandFilter("-")))
+    cds <- cdsBy(edb, "tx", filter = list(SeqNameFilter("Y"),
+                                          SeqStrandFilter("-")))
+    ## Check a TX:
+    tx <- names(fUTRs)[1]
+    checkGeneUTRs(fUTRs[[tx]], tUTRs[[tx]], cds[[tx]], tx = tx,
+                  do.plot = do.plot)
+    tx <- names(fUTRs)[2]
+    checkGeneUTRs(fUTRs[[tx]], tUTRs[[tx]], cds[[tx]], tx = tx,
+                  do.plot = do.plot)
+    tx <- names(fUTRs)[3]
+    checkGeneUTRs(fUTRs[[tx]], tUTRs[[tx]], cds[[tx]], tx = tx,
+                  do.plot = do.plot)
+
+    res_1 <- ensembldb:::getUTRsByTranscript(edb, what = "five",
+                                             filter = TxIdFilter("ENST00000335953"))
+    res_2 <- fiveUTRsByTranscript(edb, filter = TxIdFilter("ENST00000335953"))
+    expect_identical(res_1, res_2)
+    res_1 <- ensembldb:::getUTRsByTranscript(edb, what = "three",
+                                             filter = TxIdFilter("ENST00000335953"))
+    res_2 <- threeUTRsByTranscript(edb, filter = TxIdFilter("ENST00000335953"))
+    expect_identical(res_1, res_2)
+})
+
+test_that("lengthOf works", {
+    system.time(
+        lenY <- lengthOf(edb, "tx", filter=SeqNameFilter("Y"))
+    )
+    ## Check what would happen if we do it ourselfs...
+    system.time(
+        lenY2 <- sum(width(reduce(exonsBy(edb, "tx",
+                                          filter=SeqNameFilter("Y")))))
+    )
+    expect_identical(lenY, lenY2)
+    ## Same for genes.
+    system.time(
+        lenY <- lengthOf(edb, "gene", filter= ~ seq_name == "Y")
+    )
+    ## Check what would happen if we do it ourselfs...
+    system.time(
+        lenY2 <- sum(width(reduce(exonsBy(edb, "gene",
+                                          filter=SeqNameFilter("Y")))))
+    )
+    expect_identical(lenY, lenY2)
+    ## Just using the transcriptLengths
+
+    res <- ensembldb:::.transcriptLengths(edb, filter = GenenameFilter("ZBTB16"))
+    res_2 <- lengthOf(edb, "tx", filter = GenenameFilter("ZBTB16"))
+    expect_identical(sort(res$tx_len), unname(sort(res_2)))
+    ## also cds lengths etc.
+    res <- ensembldb:::.transcriptLengths(edb, filter = GenenameFilter("ZBTB16"),
+                                          with.cds_len = TRUE,
+                                          with.utr5_len = TRUE,
+                                          with.utr3_len = TRUE)
+    expect_identical(colnames(res), c("tx_id", "gene_id", "nexon", "tx_len",
+                                      "cds_len", "utr5_len", "utr3_len"))
+    tx <- transcripts(edb, filter = list(GenenameFilter("ZBTB16"),
+                                         TxBiotypeFilter("protein_coding")))
+    expect_true(all(!is.na(res[names(tx), "cds_len"])))
+    expect_equal(unname(res[names(tx), "tx_len"]),
+                 unname(rowSums(res[names(tx),
+                                    c("utr5_len", "cds_len", "utr3_len")])))
+})
+
+
+####============================================================
+##  ExonRankFilter
+##
+####------------------------------------------------------------
+test_that("ExonRankFilter works with methods", {
+    txs <- transcripts(edb, columns=c("exon_id", "exon_idx"),
+                       filter=SeqNameFilter(c("Y")))
+    txs <- txs[order(names(txs))]
+
+    txs2 <- transcripts(edb, columns=c("exon_id"),
+                        filter=list(SeqNameFilter(c("Y")),
+                                    ExonRankFilter(3)))
+    txs2 <- txs[order(names(txs2))]
+    ## hm, that's weird somehow.
+    exns <- exons(edb, columns=c("tx_id", "exon_idx"),
+                  filter=list(SeqNameFilter("Y"),
+                              ExonRankFilter(3)))
+    expect_true(all(exns$exon_idx == 3))
+    exns <- exons(edb, columns=c("tx_id", "exon_idx"),
+                  filter=list(SeqNameFilter("Y"),
+                              ExonRankFilter(3, condition="<")))
+    expect_true(all(exns$exon_idx < 3))
+})
+
+test_that("buildQuery and getWhat works", {
+    library(RSQLite)
+    Q <- buildQuery(edb, columns = c("gene_name", "gene_id"))
+    expect_identical(Q, "select distinct gene.gene_name,gene.gene_id from gene")
+
+    gf <- GeneIdFilter("ENSG00000000005")
+    Q <- buildQuery(edb, columns = c("gene_name", "exon_idx"),
+                    filter = AnnotationFilterList(gf))
+    res <- dbGetQuery(dbconn(edb), Q)
+    Q_2 <- paste0("select * from gene join tx on (gene.gene_id=tx.gene_id)",
+                  " join tx2exon on (tx.tx_id=tx2exon.tx_id) where",
+                  " gene.gene_id = 'ENSG00000000005'")
+    res_2 <- dbGetQuery(dbconn(edb), Q_2)
+    expect_identical(res, unique(res_2[, colnames(res)]))
+    res_3 <- ensembldb:::getWhat(edb, columns = c("gene_name", "exon_idx"),
+                                 filter = AnnotationFilterList(gf))
+    expect_identical(res_3, unique(res_2[, colnames(res_3)]))
+})
+
+test_that("toSaf works", {
+    txs <- transcriptsBy(edb, filter = GenenameFilter("ZBTB16"))
+    saf <- ensembldb:::.toSaf(txs)
+    expect_identical(nrow(saf), sum(lengths(txs)))
+    saf2 <- toSAF(txs)
+    expect_identical(saf2, saf)
+})
+
+test_that("disjointExons works", {
+    dje <- disjointExons(edb, filter = GenenameFilter("ZBTB16"))
+    exns <- exons(edb, filter = GenenameFilter("ZBTB16"))
+    ## Expect that dje is shorter than exns, since overlapping exon parts have
+    ## been fused.
+    expect_true(length(dje) < length(exns))
+    dje <- disjointExons(edb, filter = GenenameFilter("ZBTB16"),
+                         aggregateGenes = TRUE)
+    expect_true(length(dje) < length(exns))
+})
+
+test_that("getGeneRegionTrackForGviz works", {
+    res <- getGeneRegionTrackForGviz(edb, filter = GenenameFilter("ZBTB16"))
+    expect_true(all(res$feature %in% c("protein_coding", "utr5", "utr3")))
+    ## Do the same without a filter:
+    ## LLLLL
+    res2 <- getGeneRegionTrackForGviz(edb, chromosome = "11", start = 113930000,
+                                      end = 113935000)
+    expect_true(all(res2$symbol == "ZBTB16"))
+})
+
+test_that("filter columns are correctly added in methods", {
+    filtList <- AnnotationFilterList(GenenameFilter("a"),
+                                     ExonStartFilter(123),
+                                     SymbolFilter("b"), TxIdFilter("c"))
+    res <- ensembldb:::addFilterColumns(cols = c("a"), filter = filtList,
+                                        edb = edb)
+    expect_identical(res, c("a", "gene_name", "exon_seq_start", "symbol", "tx_id"))
+    res <- ensembldb:::addFilterColumns(filter = filtList,
+                                        edb = edb)
+    expect_identical(res, c("gene_name", "exon_seq_start", "symbol", "tx_id"))
+    ## New filts
+    filtList <- AnnotationFilterList(GenenameFilter("a"), ExonStartFilter(123),
+                                     SymbolFilter("b"), TxIdFilter("c"))
+    res <- ensembldb:::addFilterColumns(cols = c("a"), filter = filtList,
+                                        edb = edb)
+    expect_identical(res, c("a", "gene_name", "exon_seq_start", "symbol", "tx_id"))
+    res <- ensembldb:::addFilterColumns(filter = filtList,
+                                        edb = edb)
+    expect_identical(res, c("gene_name", "exon_seq_start", "symbol", "tx_id"))
+})
+
+test_that("supportedFilters works", {
+    res <- ensembldb:::.supportedFilters(edb)
+    if (!hasProteinData(edb))
+        expect_equal(length(res), 19)
+    else 
+        expect_equal(length(res), 24)
+    res <- supportedFilters(edb)
+    if (!hasProteinData(edb))
+        expect_equal(length(res), 19)
+    else 
+        expect_equal(length(res), 24)
+})
+
+## Here we check if we fetch what we expect from the database.
+test_that("GRangesFilter works in queries", {
+    do.plot <- FALSE
+    zbtb <- genes(edb, filter = GenenameFilter("ZBTB16"))
+    txs <- transcripts(edb, filter = GenenameFilter("ZBTB16"))
+    ## Now use the GRangesFilter to fetch all tx
+    txs2 <- transcripts(edb, filter = GRangesFilter(zbtb))
+    expect_equal(txs$tx_id, txs2$tx_id)
+    ## Exons:
+    exs <- exons(edb, filter = GenenameFilter("ZBTB16"))
+    exs2 <- exons(edb, filter = GRangesFilter(zbtb))
+    expect_equal(exs$exon_id, exs2$exon_id)
+    ## Now check the filter with "overlapping".
+    intr <- GRanges("11", ranges = IRanges(114000000, 114000050), strand = "+")
+    gns <- genes(edb, filter = GRangesFilter(intr, type = "any"))
+    expect_equal(gns$gene_name, "ZBTB16")
+    ##
+    txs <- transcripts(edb, filter = GRangesFilter(intr, type = "any"))
+    expect_equal(sort(txs$tx_id), sort(c("ENST00000335953", "ENST00000541602",
+                                         "ENST00000392996", "ENST00000539918")))
+    if(do.plot){
+        plot(3, 3, pch=NA, xlim=c(start(zbtb), end(zbtb)),
+             ylim=c(0, length(txs2)))
+        rect(xleft=start(intr), xright=end(intr), ybottom=0, ytop=length(txs2),
+             col="red", border="red")
+        for(i in 1:length(txs2)){
+            current <- txs2[i]
+            rect(xleft=start(current), xright=end(current), ybottom=i-0.975,
+                 ytop=i-0.125, border="grey")
+            text(start(current), y=i-0.5,pos=4, cex=0.75, labels=current$tx_id)
+        }
+        ## OK, that' OK.
+    }
+
+    ## OK, now for a GRangesFilter with more than one GRanges.
+    ir2 <- IRanges(start=c(2654890, 2709520, 28111770),
+                   end=c(2654900, 2709550, 28111790))
+    grf2 <- GRangesFilter(GRanges(rep("Y", length(ir2)), ir2),
+                          type = "any")
+    Test <- transcripts(edb, filter = grf2)
+    expect_equal(names(Test), c("ENST00000383070", "ENST00000250784",
+                                "ENST00000598545"))
+})
+
+test_that("show works", {
+    res <- capture.output(show(edb))
+    expect_equal(res[1], "EnsDb for Ensembl:")
+    expect_equal(res[9], "|ensembl_version: 75")
+})
+
+test_that("organism method works", {
+    res <- organism(edb)
+    expect_equal(res, "Homo sapiens")
+})
+
+test_that("metadata method works", {
+    res <- metadata(edb)
+    expect_equal(res, dbGetQuery(dbconn(edb), "select * from metadata"))
+})
+
+test_that("ensemblVersion works", {
+    expect_equal(ensemblVersion(edb), "75")
+})
+
+test_that("getMetadataValue works", {
+    expect_error(ensembldb:::getMetadataValue(edb))
+})
+
+test_that("seqinfo and seqlevels work", {
+    si <- seqinfo(edb)
+    expect_true(is(si, "Seqinfo"))
+    sl <- seqlevels(edb)
+    library(RSQLite)
+    chrs <- dbGetQuery(dbconn(edb), "select seq_name from chromosome")[, 1]
+    expect_true(all(sl %in% chrs))
+    expect_true(all(seqlevels(si) %in% chrs))
+})
+
+test_that("ensVersionFromSourceUrl works", {
+    res <- ensembldb:::.ensVersionFromSourceUrl(
+                           "ftp://ftp.ensembl.org/release-85/gtf")
+    expect_equal(res, 85)
+})
+
+test_that("listBiotypes works", {
+    res <- listTxbiotypes(edb)
+    library(RSQLite)
+    res_2 <- dbGetQuery(dbconn(edb), "select distinct tx_biotype from tx")[, 1]
+    expect_true(all(res %in% res_2))
+    res <- listGenebiotypes(edb)
+    res_2 <- dbGetQuery(dbconn(edb), "select distinct gene_biotype from gene")[, 1]
+    expect_true(all(res %in% res_2))
+})
+
+test_that("listTables works", {
+    res <- listTables(edb)
+    schema_version <- ensembldb:::dbSchemaVersion(edb)
+    if (!hasProteinData(edb)) {
+        expect_equal(names(res),
+                     names(ensembldb:::.ensdb_tables(schema_version)))
+    } else {
+        expect_equal(
+            sort(names(res)),
+            sort(unique(c(names(ensembldb:::.ensdb_tables(schema_version)),
+                          names(ensembldb:::.ensdb_protein_tables(
+                                                schema_version))))))
+    }
+    ## Repeat with deleting the cached tables
+    edb at tables <- list()
+    res <- listTables(edb)
+    if (!hasProteinData(edb)) {
+        expect_equal(names(res),
+                     names(ensembldb:::.ensdb_tables(schema_version)))
+    } else {
+        expect_equal(
+            sort(names(res)),
+            sort(unique(c(names(ensembldb:::.ensdb_tables(schema_version)),
+                          names(ensembldb:::.ensdb_protein_tables(
+                                                schema_version))))))
+    }
+})
+
+test_that("listColumns works", {
+    res <- listColumns(edb, table = "gene")
+    expect_equal(res, c(ensembldb:::.ensdb_tables(ensembldb:::dbSchemaVersion(dbconn(edb)))$gene, "symbol"))
+    res <- listColumns(edb, table = "tx")
+    expect_equal(res, c(ensembldb:::.ensdb_tables(ensembldb:::dbSchemaVersion(dbconn(edb)))$tx, "tx_name"))
+    res <- listColumns(edb, table = "exon")
+    expect_equal(res, c(ensembldb:::.ensdb_tables(ensembldb:::dbSchemaVersion(dbconn(edb)))$exon))
+    res <- listColumns(edb, table = "chromosome")
+    expect_equal(res, c(ensembldb:::.ensdb_tables(ensembldb:::dbSchemaVersion(dbconn(edb)))$chromosome))
+    res <- listColumns(edb, table = "tx2exon")
+    expect_equal(res, c(ensembldb:::.ensdb_tables(ensembldb:::dbSchemaVersion(dbconn(edb)))$tx2exon))
+    if (hasProteinData(edb)) {
+        res <- listColumns(edb, table = "protein")
+        expect_equal(res, ensembldb:::.ensdb_protein_tables(ensembldb:::dbSchemaVersion(dbconn(edb)))$protein)
+        res <- listColumns(edb, table = "uniprot")
+        expect_equal(res, ensembldb:::.ensdb_protein_tables(ensembldb:::dbSchemaVersion(dbconn(edb)))$uniprot)
+        res <- listColumns(edb, table = "protein_domain")
+        expect_equal(res, ensembldb:::.ensdb_protein_tables(ensembldb:::dbSchemaVersion(dbconn(edb)))$protein_domain)
+    }
+    ## Repeat with deleting the cached tables
+    edb at tables <- list()
+    res <- listColumns(edb, table = "gene")
+    expect_equal(res, c(ensembldb:::.ensdb_tables(ensembldb:::dbSchemaVersion(dbconn(edb)))$gene, "symbol"))
+    res <- listColumns(edb, table = "tx")
+    expect_equal(res, c(ensembldb:::.ensdb_tables(ensembldb:::dbSchemaVersion(dbconn(edb)))$tx, "tx_name"))
+    res <- listColumns(edb, table = "exon")
+    expect_equal(res, c(ensembldb:::.ensdb_tables(ensembldb:::dbSchemaVersion(dbconn(edb)))$exon))
+    res <- listColumns(edb, table = "chromosome")
+    expect_equal(res, c(ensembldb:::.ensdb_tables(ensembldb:::dbSchemaVersion(dbconn(edb)))$chromosome))
+    res <- listColumns(edb, table = "tx2exon")
+    expect_equal(res, c(ensembldb:::.ensdb_tables(ensembldb:::dbSchemaVersion(dbconn(edb)))$tx2exon))
+    if (hasProteinData(edb)) {
+        res <- listColumns(edb, table = "protein")
+        expect_equal(res, ensembldb:::.ensdb_protein_tables(ensembldb:::dbSchemaVersion(dbconn(edb)))$protein)
+        res <- listColumns(edb, table = "uniprot")
+        expect_equal(res, ensembldb:::.ensdb_protein_tables(ensembldb:::dbSchemaVersion(dbconn(edb)))$uniprot)
+        res <- listColumns(edb, table = "protein_domain")
+        expect_equal(res, ensembldb:::.ensdb_protein_tables(ensembldb:::dbSchemaVersion(dbconn(edb)))$protein_domain)
+    }
+})
+
+test_that("cleanColumns works", {
+    cols <- c("gene_id", "tx_id", "tx_name")
+    res <- ensembldb:::cleanColumns(edb, cols)
+    expect_equal(cols, res)
+    cols <- c(cols, "not there")
+    suppressWarnings(
+        res <- ensembldb:::cleanColumns(edb, cols)
+    )
+    expect_equal(cols[1:3], res)
+    cols <- c("gene_id", "protein_id", "tx_id", "protein_sequence")
+    suppressWarnings(
+        res <- ensembldb:::cleanColumns(edb, cols)
+    )
+    if (hasProteinData(edb)) {
+        expect_equal(res, cols)
+    } else {
+        expect_equal(res, cols[c(1, 3)])
+    }
+    ## with full names:
+    cols <- c("gene.gene_id", "protein.protein_id", "tx.tx_id",
+              "protein.protein_sequence")
+    suppressWarnings(
+        res <- ensembldb:::cleanColumns(edb, cols)
+    )
+    if (hasProteinData(edb)) {
+        expect_equal(res, cols)
+    } else {
+        expect_equal(res, cols[c(1, 3)])
+    }
+})
+
+test_that("tablesForColumns works", {
+    expect_error(ensembldb:::tablesForColumns(edb))
+    res <- ensembldb:::tablesForColumns(edb, columns = "tx_id")
+    if (hasProteinData(edb))
+        expect_equal(res, c("tx", "tx2exon", "protein"))
+    else
+        expect_equal(res, c("tx", "tx2exon"))
+    res <- ensembldb:::tablesForColumns(edb, columns = "seq_name")
+    expect_equal(res, c("gene", "chromosome"))
+    if (hasProteinData(edb)) {
+        res <- ensembldb:::tablesForColumns(edb, columns = "protein_id")
+        expect_equal(res, c("protein", "uniprot", "protein_domain"))
+    }
+})
+
+test_that("tablesByDegree works", {
+    res <- ensembldb:::tablesByDegree(edb,
+                                      tab = c("chromosome", "gene", "tx"))
+    expect_equal(res, c("gene", "tx", "chromosome"))
+})
+
+test_that("updateEnsDb works", {
+    edb2 <- updateEnsDb(edb)
+    expect_equal(edb2 at tables, edb at tables)
+    expect_true(.hasSlot(edb2, ".properties"))
+})
+
+test_that("properties work", {
+    origProps <- ensembldb:::properties(edb)
+    expect_equal(ensembldb:::getProperty(edb, "foo"), NA)
+    expect_error(ensembldb:::setProperty(edb, "foo"))
+    edb <- ensembldb:::setProperty(edb, foo="bar")
+    expect_equal(ensembldb:::getProperty(edb, "foo"), "bar")
+    expect_equal(length(ensembldb:::properties(edb)),
+                 length(origProps) + 1)
+    expect_true(any(names(ensembldb:::properties(edb)) == "foo"))
+    edb <- ensembldb:::dropProperty(edb, "foo")
+    expect_true(all(names(ensembldb:::properties(edb)) != "foo"))
+})
+
+## Compare the results for genes call with and without ordering in R
+test_that("ordering works in genes calls", {
+    orig <- ensembldb:::orderResultsInR(edb)
+    ensembldb:::orderResultsInR(edb) <- FALSE
+    res_sql <- genes(edb, return.type = "data.frame")
+    ensembldb:::orderResultsInR(edb) <- TRUE
+    res_r <- genes(edb, return.type = "data.frame")
+    rownames(res_sql) <- NULL
+    rownames(res_r) <- NULL
+    expect_equal(res_sql, res_r)
+    ## Join tx table
+    ensembldb:::orderResultsInR(edb) <- FALSE
+    res_sql <- genes(edb, columns = c("gene_id", "tx_id"),
+                     return.type = "data.frame")
+    ensembldb:::orderResultsInR(edb) <- TRUE
+    res_r <- genes(edb, columns = c("gene_id", "tx_id"),
+                   return.type = "data.frame")
+    rownames(res_sql) <- NULL
+    rownames(res_r) <- NULL
+    expect_equal(res_sql, res_r)
+    ## Join tx table and use an SeqNameFilter
+    ensembldb:::orderResultsInR(edb) <- FALSE
+    res_sql <- genes(edb, columns = c("gene_id", "tx_id"),
+                     filter = SeqNameFilter("Y"))
+    ensembldb:::orderResultsInR(edb) <- TRUE
+    res_r <- genes(edb, columns = c("gene_id", "tx_id"),
+                   filter = SeqNameFilter("Y"))
+    expect_equal(res_sql, res_r)
+
+    ensembldb:::orderResultsInR(edb) <- orig
+})
+
+test_that("transcriptLengths works",{
+    ## With filter.
+    daFilt <- SeqNameFilter("Y")
+    allTxY <- transcripts(edb, filter = daFilt)
+    txLenY <- transcriptLengths(edb, filter = daFilt)
+    expect_equal(names(allTxY), txLenY$tx_id)
+    rownames(txLenY) <- txLenY$tx_id
+
+    ## Check if lengths are OK:
+    txLenY2 <- lengthOf(edb, "tx", filter = daFilt)
+    expect_equal(unname(txLenY2[txLenY$tx_id]), txLenY$tx_len)
+
+    ## Include the cds, 3' and 5' UTR
+    txLenY <- transcriptLengths(edb, with.cds_len = TRUE, with.utr5_len = TRUE,
+                                with.utr3_len = TRUE,
+                                filter=daFilt)
+    ## sum of 5' CDS and 3' has to match tx_len:
+    txLen <- rowSums(txLenY[, c("cds_len", "utr5_len", "utr3_len")])
+    expect_equal(txLenY[txLenY$cds_len > 0, "tx_len"],
+                 unname(txLen[txLenY$cds_len > 0]))
+    ## just to be sure...
+    expect_equal(txLenY[txLenY$utr3_len > 0, "tx_len"],
+                unname(txLen[txLenY$utr3_len > 0]))
+    ## Seems to be OK.
+
+    ## Next check the 5' UTR lengths: that also verifies the fiveUTR call.
+    futr <- fiveUTRsByTranscript(edb, filter = daFilt)
+    futrLen <- sum(width(futr))
+    rownames(txLenY) <- txLenY$tx_id
+    expect_equal(unname(futrLen), txLenY[names(futrLen), "utr5_len"])
+    ## 3'
+    tutr <- threeUTRsByTranscript(edb, filter=daFilt)
+    tutrLen <- sum(width(tutr))
+    expect_equal(unname(tutrLen), txLenY[names(tutrLen), "utr3_len"])
+})
+
+test_that("transcriptsByOverlaps works", {
+    ir2 <- IRanges(start = c(2654890, 2709520, 28111770),
+                   end = c(2654900, 2709550, 28111790))
+    gr2 <- GRanges(rep("Y", length(ir2)), ir2)
+    grf2 <- GRangesFilter(gr2, type = "any")
+    Test <- transcripts(edb, filter = grf2)
+    Test2 <- transcriptsByOverlaps(edb, gr2)
+    expect_equal(names(Test), names(Test2))
+    ## on one strand.
+    gr2 <- GRanges(rep("Y", length(ir2)), ir2, strand = rep("-", length(ir2)))
+    grf2 <- GRangesFilter(gr2, type = "any")
+    Test <- transcripts(edb, filter = grf2)
+    Test2 <- transcriptsByOverlaps(edb, gr2)
+    expect_equal(names(Test), names(Test2))
+
+    ## Combine with filter...
+    gr2 <- GRanges(rep("Y", length(ir2)), ir2)
+    Test3 <- transcriptsByOverlaps(edb, gr2, filter = SeqStrandFilter("-"))
+    expect_equal(names(Test), names(Test3))
+})
+
+test_that("exonsByOverlaps works", {
+    ir2 <- IRanges(start=c(2654890, 2709520, 28111770),
+                   end=c(2654900, 2709550, 28111790))
+    gr2 <- GRanges(rep("Y", length(ir2)), ir2)
+    grf2 <- GRangesFilter(gr2, type = "any")
+    Test <- exons(edb, filter = grf2)
+    Test2 <- exonsByOverlaps(edb, gr2)
+    expect_equal(names(Test), names(Test2))
+    ## on one strand.
+    gr2 <- GRanges(rep("Y", length(ir2)), ir2, strand=rep("-", length(ir2)))
+    grf2 <- GRangesFilter(gr2, type = "any")
+    Test <- exons(edb, filter = grf2)
+    Test2 <- exonsByOverlaps(edb, gr2)
+    expect_equal(names(Test), names(Test2))
+    ## Combine with filter...
+    gr2 <- GRanges(rep("Y", length(ir2)), ir2)
+    Test3 <- exonsByOverlaps(edb, gr2, filter=SeqStrandFilter("-"))
+    expect_equal(names(Test), names(Test3))
+})
diff --git a/tests/testthat/test_Protein-related-tests.R b/tests/testthat/test_Protein-related-tests.R
new file mode 100644
index 0000000..0c79567
--- /dev/null
+++ b/tests/testthat/test_Protein-related-tests.R
@@ -0,0 +1,253 @@
+
+############################################################
+## Getting protein data in other methods.
+test_that("genes works with proteins", {
+    if (hasProteinData(edb)) {
+        res <- genes(edb, columns = c("gene_name", "gene_id", "protein_id",
+                                      "uniprot_id", "tx_id", "tx_biotype"),
+                     filter = GenenameFilter("ZBTB16"),
+                     return.type = "data.frame")
+        ## have a 1:n mapping of protein_id to uniprot id:
+        expect_true(length(unique(res$protein_id)) <
+                  nrow(res))
+        expect_equal(colnames(res), c("gene_name", "gene_id", "protein_id",
+                                     "uniprot_id", "tx_id", "tx_biotype"))
+        ## All protein_coding have an uniprot_id
+        expect_true(all(!is.na(res[res$tx_biotype == "protein_coding",
+                                 "uniprot_id"])))
+        ## combine with cdsBy:
+        cds <- cdsBy(edb, columns = c("tx_biotype", "protein_id"),
+                     filter = GenenameFilter("ZBTB16"))
+        codingTx <- unique(res[!is.na(res$protein_id), "tx_id"])
+        expect_equal(sort(names(cds)), sort(codingTx))
+        ## Next one fetching also protein domain data.
+        res <- genes(edb, columns = c("gene_name", "tx_id", "protein_id",
+                                      "protein_domain_id"),
+                     filter = GenenameFilter("ZBTB16"),
+                     return.type = "data.frame")
+        expect_equal(colnames(res), c("gene_name", "tx_id", "protein_id",
+                                     "protein_domain_id", "gene_id"))
+        expect_true(nrow(res) > length(unique(res$protein_id)))
+        expect_true(nrow(res) > length(unique(res$tx_id)))
+    }
+})
+
+test_that("transcripts works with proteins", {
+    if (hasProteinData(edb)) {
+        res <- transcripts(edb, columns = c("tx_biotype", "protein_id",
+                                            "uniprot_id"),
+                           filter = TxIdFilter("ENST00000335953"),
+                           return.type = "data.frame")
+        ## 1:1 mapping for tx_id <-> protein_id
+        expect_true(nrow(unique(res[, c("tx_id", "protein_id")])) == 1)
+        ## Mapping tx_id -> uniprot_id is (0,1):n
+        expect_true(nrow(res) > length(unique(res$tx_id)))
+        ## Add protein domains.
+        res <- transcripts(edb, columns = c("tx_biotype", "protein_id",
+                                            "uniprot_id",
+                                            "protein_domain_id"),
+                           filter = TxIdFilter("ENST00000335953"),
+                           return.type = "data.frame")
+        resL <- split(res, f = res$uniprot_id)
+        ## All have the same protein domains:
+        resM <- do.call(rbind, lapply(resL, function(z) z$protein_domain_id))
+        expect_equal(nrow(unique(resM)), 1)
+    }
+})
+
+## exons
+test_that("exons works with proteins",  {
+    if (hasProteinData(edb)) {
+        ## Check if a call that includes a protein_id returns same data than one
+        ## without.
+        exns <- exons(edb, filter = GenenameFilter("BCL2L11"),
+                      return.type = "data.frame")
+        exns_2 <- exons(edb, filter = GenenameFilter("BCL2L11"),
+                        return.type = "data.frame",
+                        columns = c("exon_id", "protein_id"))
+        expect_equal(sort(unique(exns$exon_id)),
+                    sort(unique(exns_2$exon_id)))
+        ## ZBTB16
+        exns <- exons(edb, filter = GenenameFilter("ZBTB16"),
+                      return.type = "data.frame")
+        exns_2 <- exons(edb, filter = GenenameFilter("ZBTB16"),
+                        return.type = "data.frame",
+                        columns = c("exon_id", "protein_id"))
+        expect_equal(sort(unique(exns$exon_id)),
+                    sort(unique(exns_2$exon_id)))
+    }
+})
+
+## exonsBy
+test_that("exonsBy works with proteins", {
+    if (hasProteinData(edb)) {
+        exns <- exonsBy(edb, filter = GenenameFilter("ZBTB16"),
+                        columns = "tx_biotype")
+        exns_2 <- exonsBy(edb, filter = GenenameFilter("ZBTB16"),
+                          columns = c("protein_id", "tx_biotype"))
+        expect_equal(names(exns), names(exns_2))
+        exns <- unlist(exns)
+        exns_2 <- unlist(exns_2)
+        expect_true(any(is.na(exns_2$protein_id)))
+        expect_equal(exns$exon_id, exns_2$exon_id)
+    }
+})
+
+## transcriptsBy
+test_that("transcriptsBy works with proteins", {
+    if (hasProteinData(edb)) {
+        txs <- transcriptsBy(edb, filter = GenenameFilter("ZBTB16"),
+                             columns = "gene_biotype")
+        txs_2 <- transcriptsBy(edb, filter = GenenameFilter("ZBTB16"),
+                               columns = c("protein_id", "gene_biotype"))
+        expect_equal(names(txs), names(txs_2))
+        txs <- unlist(txs)
+        txs_2 <- unlist(txs_2)
+        expect_true(any(is.na(txs_2$protein_id)))
+        expect_equal(start(txs), start(txs_2))
+    }
+})
+
+## cdsBy
+test_that("cdsBy works with proteins", {
+    if (hasProteinData(edb)) {
+        cds <- cdsBy(edb, filter = GenenameFilter("ZBTB16"),
+                     columns = "gene_biotype")
+        cds_2 <- cdsBy(edb, filter = GenenameFilter("ZBTB16"),
+                       columns = c("protein_id", "gene_biotype"))
+        expect_equal(names(cds), names(cds_2))
+        cds <- unlist(cds)
+        cds_2 <- unlist(cds_2)
+        expect_true(all(!is.na(cds_2$protein_id)))
+        expect_equal(start(cds), start(cds_2))
+    }
+})
+
+## fiveUTRsByTranscript
+test_that("fiveUTRsByTranscript works with proteins", {
+    if (hasProteinData(edb)) {
+        utrs <- fiveUTRsByTranscript(edb, filter = GenenameFilter("ZBTB16"),
+                                     columns = "tx_biotype")
+        utrs_2 <- fiveUTRsByTranscript(edb, filter = GenenameFilter("ZBTB16"),
+                                       columns = c("protein_id", "gene_biotype"))
+        expect_equal(names(utrs), names(utrs_2))
+        utrs <- unlist(utrs)
+        utrs_2 <- unlist(utrs_2)
+        expect_true(all(!is.na(utrs_2$protein_id)))
+        expect_equal(start(utrs), start(utrs_2))
+    }
+})
+
+test_that("genes works with protein filters", {
+    ## o ProteinIdFilter
+    pif <- ProteinIdFilter("ENSP00000376721")
+    if (hasProteinData(edb)) {
+        gns <- genes(edb, filter = pif, return.type = "data.frame")
+        expect_equal(gns$gene_name, "ZBTB16")
+    }
+    ## o UniprotFilter
+    uif <- UniprotFilter("Q71UL7_HUMAN")
+    if (hasProteinData(edb)) {
+        gns <- genes(edb, filter = uif, return.type = "data.frame",
+                     columns = c("protein_id", "gene_name", "tx_id"))
+        expect_true("ENSP00000376721" %in% gns$protein_id)
+        expect_true(nrow(gns) == 2)
+    }
+    ## o ProtDomIdFilter
+    pdif <- ProtDomIdFilter("PF00096")
+    if (hasProteinData(edb)) {
+        gns <- genes(edb, filter = list(pdif,
+                                        GenenameFilter("ZBTB%", "startsWith")),
+                     return.type = "data.frame",
+                     column = c("gene_name", "gene_biotype"))
+        expect_true(all(gns$gene_biotype == "protein_coding"))
+    }
+})
+
+test_that("proteins works", {
+    if (hasProteinData(edb)) {
+        ## Check return type.
+        prts_DF <- proteins(edb, filter = GenenameFilter("ZBTB16"))
+        expect_true(is(prts_DF, "DataFrame"))
+        prts_df <- proteins(edb, filter = GenenameFilter("ZBTB16"),
+                            return.type = "data.frame")
+        expect_true(is(prts_df, "data.frame"))
+        prts_aa <- proteins(edb, filter = GenenameFilter("ZBTB16"),
+                            return.type = "AAStringSet")
+        expect_true(is(prts_aa, "AAStringSet"))
+        ## Check content.
+        library(RSQLite)
+        res_q <- dbGetQuery(
+            dbconn(edb),
+            paste0("select tx.tx_id, protein_id, gene_name from ",
+                   "protein left outer join tx on (protein.tx_id=",
+                   "tx.tx_id) join gene on (gene.gene_id=",
+                   "tx.gene_id) where gene_name = 'ZBTB16'"))
+        expect_equal(res_q$tx_id, prts_df$tx_id)
+        expect_equal(res_q$protein_id, prts_df$protein_id)
+        expect_equal(prts_df$protein_id, names(prts_aa))
+        ## Add protein domain information to the proteins.
+        prts_df <- proteins(edb, filter = ProteinIdFilter(c("ENSP00000338157",
+                                                            "ENSP00000443013")),
+                            columns = c("protein_id", "protein_domain_id",
+                                        "uniprot_id"),
+                            return.type = "data.frame")
+        ## Check if we have all data that we expect:
+        uniprots <- dbGetQuery(dbconn(edb),
+                               paste0("select uniprot_id from uniprot where",
+                                      " protein_id in ('ENSP00000338157',",
+                                      "'ENSP00000443013')"))$uniprot_id
+        expect_true(all(uniprots %in% prts_df$uniprot_id))
+        protdoms <- dbGetQuery(dbconn(edb),
+                               paste0("select protein_domain_id from",
+                                      " protein_domain where protein_id",
+                                      " in ('ENSP00000338157',",
+                                      "'ENSP00000443013')"))$protein_domain_id
+        expect_true(all(protdoms %in% prts_df$protein_domain_id))
+    }
+})
+
+test_that("proteins works with uniprot mapping", {
+    ## ZBTB16 and the mapping of 1 protein to two Uniprot IDs, one with DIRECT
+    ## mapping type.
+    if (hasProteinData(edb)) {
+        prts <- proteins(edb, filter = GenenameFilter("ZBTB16"),
+                         columns = c("uniprot_id", "uniprot_db",
+                                     "uniprot_mapping_type"),
+                         return.type = "DataFrame")
+        ## NOTE: this is true for Ensembl 86, but might not be the case for 75!
+        ## Here we have the n:m mapping:
+        ## Q05516 is assigned to ENSP00000338157 and ENSP00000376721,
+        ## Each of the two proteins is however also annotated to a second
+        ## Uniprot ID: A0A024R3C6
+        ## If we use the UniprotMappingTypeFilter with only DIRECT mapping
+        ## we expect to reduce it to the 1:n mapping between Uniprot and Ensembl
+        prts <- proteins(edb, filter = list(GenenameFilter("ZBTB16"),
+                                            UniprotMappingTypeFilter("DIRECT")),
+                         columns = c("uniprot_id", "uniprot_db",
+                                     "uniprot_mapping_type"),
+                         return.type = "DataFrame")
+        expect_true(all(prts$uniprot_mapping_type == "DIRECT"))
+        ## Check the UniprotDbFilter
+        prts <- proteins(edb, filter = list(GenenameFilter("ZBTB16"),
+                                            UniprotDbFilter("SPTREMBL")),
+                         columns = c("uniprot_id", "uniprot_db", "protein_id"),
+                         return.type = "DataFrame")
+        expect_true(all(prts$uniprot_db == "SPTREMBL"))
+    }
+})
+
+test_that("isProteinFilter works", {
+    ## TRUE
+    expect_true(ensembldb:::isProteinFilter(ProteinIdFilter("a")))
+    expect_true(ensembldb:::isProteinFilter(UniprotFilter("a")))
+    expect_true(ensembldb:::isProteinFilter(ProtDomIdFilter("a")))
+    expect_true(ensembldb:::isProteinFilter(UniprotDbFilter("a")))
+    expect_true(ensembldb:::isProteinFilter(UniprotMappingTypeFilter("a")))
+    ## FALSE
+    expect_true(!ensembldb:::isProteinFilter(GeneIdFilter("a")))
+    expect_true(!ensembldb:::isProteinFilter(SymbolFilter("a")))
+    expect_true(!ensembldb:::isProteinFilter(3))
+    expect_true(!ensembldb:::isProteinFilter("dfdf"))
+})
+
diff --git a/tests/testthat/test_SymbolFilter.R b/tests/testthat/test_SymbolFilter.R
new file mode 100644
index 0000000..b83a797
--- /dev/null
+++ b/tests/testthat/test_SymbolFilter.R
@@ -0,0 +1,99 @@
+
+test_that("SymbolFilter works for gene", {
+    sf <- SymbolFilter("SKA2")
+    gnf <- GenenameFilter("SKA2")
+    returnFilterColumns(edb) <- FALSE
+    gns_sf <- genes(edb, filter = sf)
+    gns_gnf <- genes(edb, filter = gnf)
+    expect_equal(gns_sf, gns_gnf)
+    returnFilterColumns(edb) <- TRUE
+    gns_sf <- genes(edb, filter=sf)
+    expect_equal(gns_sf$gene_name, gns_sf$symbol)
+    ## Hm, what happens if we use both?
+    gns <- genes(edb, filter=list(sf, gnf))
+    ## All fine.
+})
+
+test_that("SymbolFilter works for tx", {
+    sf <- SymbolFilter("SKA2")
+    gnf <- GenenameFilter("SKA2")
+    returnFilterColumns(edb) <- FALSE
+    tx_sf <- transcripts(edb, filter=sf)
+    tx_gnf <- transcripts(edb, filter=gnf)
+    expect_equal(tx_sf, tx_gnf)
+    returnFilterColumns(edb) <- TRUE
+    tx_sf <- transcripts(edb, filter=sf, columns=c("gene_name"))
+    expect_equal(tx_sf$gene_name, tx_sf$symbol)
+})
+
+test_that("SymbolFilter works for exons", {
+    sf <- SymbolFilter("SKA2")
+    gnf <- GenenameFilter("SKA2")
+    returnFilterColumns(edb) <- FALSE
+    ex_sf <- exons(edb, filter=sf)
+    ex_gnf <- exons(edb, filter=gnf)
+    expect_equal(ex_sf, ex_gnf)
+    returnFilterColumns(edb) <- TRUE
+    ex_sf <- exons(edb, filter=sf, columns=c("gene_name"))
+    expect_equal(ex_sf$gene_name, ex_sf$symbol)
+})
+
+test_that("SymbolFilter works", {
+    sf <- SymbolFilter("SKA2")
+    res <- genes(edb, filter = sf, return.type = "data.frame")
+    expect_equal(res$gene_id, "ENSG00000182628")
+    ## We need now also a column "symbol"!
+    expect_equal(res$symbol, res$gene_name)
+    ## Asking explicitely for symbol
+    res <- genes(edb, filter = sf, return.type = "data.frame",
+                 columns = c("symbol", "gene_id"))
+    expect_equal(colnames(res), c("symbol", "gene_id"))
+    ## Some more stuff, also shuffling the order.
+    res <- genes(edb, filter = sf, return.type = "data.frame",
+                 columns = c("gene_name", "symbol", "gene_id"))
+    expect_equal(colnames(res), c("gene_name", "symbol", "gene_id"))
+    res <- genes(edb, filter = sf, return.type = "data.frame",
+                 columns = c("gene_id", "gene_name", "symbol"))
+    expect_equal(colnames(res), c("gene_id", "gene_name", "symbol"))
+    ## And with GRanges as return type.
+    res <- genes(edb, filter = sf, return.type = "GRanges",
+                 columns = c("gene_id", "gene_name", "symbol"))
+    expect_equal(colnames(mcols(res)), c("gene_id", "gene_name", "symbol"))
+
+    ## Combine tx_name and symbol
+    res <- genes(edb, filter = sf, columns = c("tx_name", "symbol"),
+                 return.type = "data.frame")
+    expect_equal(colnames(res), c("tx_name", "symbol", "gene_id"))
+    expect_true(all(res$symbol == "SKA2"))
+
+    ## Test for transcripts
+    res <- transcripts(edb, filter=sf, return.type="data.frame")
+    expect_true(all(res$symbol == "SKA2"))
+    res <- transcripts(edb, filter = sf, return.type = "data.frame",
+                       columns = c("symbol", "tx_id", "gene_name"))
+    expect_true(all(res$symbol == "SKA2"))
+    expect_equal(res$symbol, res$gene_name)
+    expect_equal(colnames(res), c("symbol", "tx_id", "gene_name"))
+
+    ## Test for exons
+    res <- exons(edb, filter=sf, return.type="data.frame")
+    expect_true(all(res$symbol == "SKA2"))
+    res <- exons(edb, filter = c(sf, TxBiotypeFilter("nonsense_mediated_decay")),
+                 return.type = "data.frame",
+                 columns = c("symbol", "tx_id", "gene_name"))
+    expect_true(all(res$symbol == "SKA2"))
+    expect_equal(res$symbol, res$gene_name)
+    expect_equal(colnames(res), c("symbol", "tx_id", "gene_name", "exon_id",
+                                  "tx_biotype"))
+
+    ## Test for exonsBy
+    res <- exonsBy(edb, filter=sf)
+    expect_true(all(unlist(res)$symbol == "SKA2"))
+    res <- exonsBy(edb, filter = c(sf, TxBiotypeFilter("nonsense_mediated_decay")),
+                   columns = c("symbol", "tx_id", "gene_name"))
+    expect_true(all(unlist(res)$symbol == "SKA2"))
+
+    expect_equal(unlist(res)$symbol, unlist(res)$gene_name)
+})
+
+
diff --git a/tests/testthat/test_dbhelpers.R b/tests/testthat/test_dbhelpers.R
new file mode 100644
index 0000000..fc60731
--- /dev/null
+++ b/tests/testthat/test_dbhelpers.R
@@ -0,0 +1,405 @@
+
+test_that("prefixColumns works", {
+    res <- ensembldb:::prefixColumns(edb, columns = "a")
+    expect_true(is.null(res))
+    expect_error(ensembldb:::prefixColumns(edb, columns = "a", clean = FALSE))
+    res <- ensembldb:::prefixColumns(edb, columns = c("gene_id", "a"),
+                                     clean = FALSE)
+    expect_equal(names(res), "gene")
+    expect_equal(res$gene, "gene.gene_id")
+    ## The "new" prefixColumns function ALWAYS returns the first table in which
+    ## a column was found; tables are ordered as in listTables
+    res <- ensembldb:::prefixColumns(edb, columns = c("tx_id", "gene_id",
+                                                      "tx_biotype"))
+    want <- list(gene = "gene.gene_id",
+                 tx = c("tx.tx_id", "tx.tx_biotype"))
+    expect_equal(res, want)
+    ##
+    res <- ensembldb:::prefixColumns(edb, columns = c("exon_idx", "seq_name",
+                                                      "gene_id"))
+    want <- list(gene = c("gene.gene_id", "gene.seq_name"),
+                 tx2exon = "tx2exon.exon_idx")
+    expect_equal(res, want)
+    ##
+    res <- ensembldb:::prefixColumns(edb, columns = c("exon_idx", "seq_name",
+                                                      "gene_id", "exon_id"))
+    want <- list(gene = c("gene.gene_id", "gene.seq_name"),
+                 tx2exon = c("tx2exon.exon_id", "tx2exon.exon_idx"))
+    expect_equal(res, want)
+
+    if (hasProteinData(edb)) {
+        res <- ensembldb:::prefixColumns(edb,
+                                         columns = c("tx_id", "protein_id"))
+        want <- list(tx = "tx.tx_id", protein = "protein.protein_id")
+        expect_equal(res, want)
+        ##
+        res <- ensembldb:::prefixColumns(edb,
+                                         columns = c("uniprot_id",
+                                                     "protein_domain_id"))
+        want <- list(uniprot = "uniprot.uniprot_id",
+                     protein_domain = "protein_domain.protein_domain_id")
+        expect_equal(res, want)
+        ##
+        res <- ensembldb:::prefixColumns(edb,
+                                         columns = c("uniprot_id",
+                                                     "protein_domain_id",
+                                                     "protein_id", "tx_id"))
+        want = list(tx = "tx.tx_id", protein = "protein.protein_id",
+                    uniprot = "uniprot.uniprot_id",
+                    protein_domain = "protein_domain.protein_domain_id")
+        expect_equal(res, want)
+    }
+})
+
+############################################################
+## Test the new join engine.
+## o use the startWith argument.
+## o change the join argument.
+test_that("joinTwoTables works", {
+    ## Check errors:
+    expect_error(ensembldb:::joinTwoTables(a = "gene", b = "dont exist"))
+    expect_error(ensembldb:::joinTwoTables(a = c("a", "b"), b = "gene"))
+    ## Working example:
+    res <- ensembldb:::joinTwoTables(a = c("a", "gene"), b = "tx")
+    expect_equal(sort(res[1:2]), c("gene", "tx"))
+    expect_equal(res[3], "on (gene.gene_id=tx.gene_id)")
+    ## Error
+    expect_error(ensembldb:::joinTwoTables(a = "tx", b = "exon"))
+    ## Working example:
+    res <- ensembldb:::joinTwoTables(a = c("tx"), b = c("exon", "tx2exon"))
+    expect_equal(sort(res[1:2]), c("tx", "tx2exon"))
+    expect_equal(res[3], "on (tx.tx_id=tx2exon.tx_id)")
+    res <- ensembldb:::joinTwoTables(a = c("chromosome", "gene", "tx"),
+                                     b = c("exon", "protein", "tx2exon"))
+    expect_equal(sort(res[1:2]), c("tx", "tx2exon"))
+    expect_equal(res[3], "on (tx.tx_id=tx2exon.tx_id)")
+})
+
+test_that("joinQueryOnTables2 and joinQueryOnColumns2 work", {
+    ## exceptions
+    expect_error(ensembldb:::joinQueryOnTables2(edb, tab = c("a", "exon")))
+    res <- ensembldb:::joinQueryOnTables2(edb, tab = c("gene", "exon"))
+    want <- paste0("gene join tx on (gene.gene_id=tx.gene_id) join",
+                   " tx2exon on (tx.tx_id=tx2exon.tx_id) join",
+                   " exon on (tx2exon.exon_id=exon.exon_id)")
+    ## The "default" order is gene->tx->tx2exon->exon
+    expect_equal(res, want)
+    res <- ensembldb:::joinQueryOnColumns2(edb, columns = c("exon_seq_start",
+                                                            "gene_name"))
+    expect_equal(res, want)
+    ## Same but in the order: exon->tx2exon->tx->gene
+    res <- ensembldb:::joinQueryOnTables2(edb, tab = c("gene", "exon"),
+                                          startWith = "exon")
+    want <- paste0("exon join tx2exon on (tx2exon.exon_id=exon.exon_id)",
+                   " join tx on (tx.tx_id=tx2exon.tx_id) join",
+                   " gene on (gene.gene_id=tx.gene_id)")
+    expect_equal(res, want)
+    res <- ensembldb:::joinQueryOnColumns2(edb, columns = c("exon_seq_start",
+                                                            "gene_name"),
+                                           startWith = "exon")
+    expect_equal(res, want)
+    ## That would be less expensive, but with "startWith" we force it to start
+    ## from table exon, instead of just using tx2exon and tx.
+    res <- ensembldb:::joinQueryOnColumns2(edb, columns = c("exon_id",
+                                                            "gene_id"),
+                                           startWith = "exon")
+    expect_equal(res, want)
+    ## Check proteins too.
+    if (hasProteinData(edb)) {
+        res <- ensembldb:::joinQueryOnTables2(edb, tab = c("protein", "gene",
+                                                           "exon"))
+        ## That should be: gene->tx->tx2exon->exon->protein
+        want <- paste0("gene join tx on (gene.gene_id=tx.gene_id) join",
+                       " tx2exon on (tx.tx_id=tx2exon.tx_id) join",
+                       " exon on (tx2exon.exon_id=exon.exon_id) left outer join",
+                       " protein on (tx.tx_id=protein.tx_id)")
+        expect_equal(res, want)
+        res <- ensembldb:::joinQueryOnColumns2(edb,
+                                               columns = c("protein_id",
+                                                           "gene_name",
+                                                           "exon_seq_start"))
+        expect_equal(res, want)
+        res <- ensembldb:::joinQueryOnTables2(edb, tab = c("protein", "gene"),
+                                              startWith = "protein")
+        want <- paste0("protein left outer join tx on (tx.tx_id=protein.tx_id)",
+                       " join gene on (gene.gene_id=tx.gene_id)")
+        expect_equal(res, want)
+        res <- ensembldb:::joinQueryOnColumns2(edb, columns = c("protein_id",
+                                                                "gene_name"),
+                                               startWith = "protein")
+        expect_equal(res, want)
+    }
+})
+
+test_that("addRequiredTables works", {
+    have <- c("exon", "gene")
+    need <- c("exon", "gene", "tx2exon", "tx")
+    expect_equal(sort(need), sort(ensembldb:::addRequiredTables(edb, have)))
+
+    have <- c("exon", "chromosome")
+    need <- c("exon", "tx2exon", "tx", "gene", "chromosome")
+    expect_equal(sort(need), sort(ensembldb:::addRequiredTables(edb, have)))
+
+    have <- c("chromosome", "tx")
+    need <- c("chromosome", "tx", "gene")
+    expect_equal(sort(need), sort(ensembldb:::addRequiredTables(edb, have)))
+
+    if (hasProteinData(edb)) {
+        have <- c("uniprot", "exon")
+        need <- c("uniprot", "exon", "protein", "tx", "tx2exon")
+        expect_equal(sort(need),
+                     sort(ensembldb:::addRequiredTables(edb, have)))
+
+        have <- c("uniprot", "chromosome")
+        need <- c("uniprot", "chromosome", "protein", "tx", "gene")
+        expect_equal(sort(need),
+                     sort(ensembldb:::addRequiredTables(edb, have)))
+
+        have <- c("protein_domain", "gene")
+        need <- c("protein_domain", "gene", "protein", "tx")
+        expect_equal(sort(need),
+                     sort(ensembldb:::addRequiredTables(edb, have)))
+
+        have <- c("protein", "exon")
+        need <- c("protein", "exon", "tx", "tx2exon")
+        expect_equal(sort(need),
+                     sort(ensembldb:::addRequiredTables(edb, have)))
+    }
+})
+
+test_that(".buildQuery with filter works", {
+    columns <- c("gene_id", "gene_name", "exon_id")
+    gnf <- GenenameFilter("BCL2")
+    Q <- ensembldb:::.buildQuery(edb, columns = columns,
+                                 filter = AnnotationFilterList(gnf))
+    want <- paste0("select distinct gene.gene_id,gene.gene_name,",
+                   "tx2exon.exon_id from gene join tx on (gene.gene_id",
+                   "=tx.gene_id) join tx2exon on (tx.tx_id=tx2exon.tx_id)",
+                   " where (gene.gene_name = 'BCL2')")
+    expect_equal(Q, want)
+    library(RSQLite)
+    res <- dbGetQuery(dbconn(edb), Q)
+    expect_equal(unique(res$gene_name), "BCL2")
+    ## Two GeneNameFilters combined with or
+    gnf2 <- GenenameFilter("BCL2L11")
+    columns <- c("gene_id", "gene_name", "exon_id")
+    Q <- ensembldb:::.buildQuery(edb, columns = columns,
+                                 filter = AnnotationFilterList(gnf, gnf2,
+                                                               logOp = "|"))
+    want <- paste0("select distinct gene.gene_id,gene.gene_name,",
+                   "tx2exon.exon_id from gene join tx on (gene.gene_id",
+                   "=tx.gene_id) join tx2exon on (tx.tx_id=tx2exon.tx_id)",
+                   " where (gene.gene_name = 'BCL2' or gene.gene_name = ",
+                   "'BCL2L11')")
+    expect_equal(Q, want)
+    res <- dbGetQuery(dbconn(edb), Q)
+    expect_true(all(res$gene_name %in% c("BCL2", "BCL2L11")))
+    ## Combine with a SeqnameFilter.
+    snf <- SeqNameFilter(2)
+    flt <- AnnotationFilterList(gnf, gnf2, snf, logOp = c("|", "&"))
+    Q <- ensembldb:::.buildQuery(edb, columns = columns, filter = flt)
+    want <- paste0("select distinct gene.gene_id,gene.gene_name,",
+                   "tx2exon.exon_id,gene.seq_name from gene join tx on (",
+                   "gene.gene_id=tx.gene_id) join tx2exon on (tx.tx_id=",
+                   "tx2exon.tx_id) where (gene.gene_name = 'BCL2' or ",
+                   "gene.gene_name = 'BCL2L11' and gene.seq_name = '2')")
+    expect_equal(Q, want)
+    res <- dbGetQuery(dbconn(edb), Q)
+    expect_true(all(res$gene_name %in% c("BCL2", "BCL2L11")))
+    ## now with a nested AnnotationFilterList:
+    flt <- AnnotationFilterList(AnnotationFilterList(gnf, gnf2, logOp = "|"),
+                                snf, logOp = "&")
+    Q <- ensembldb:::.buildQuery(edb, columns = columns, filter = flt)
+    want <- paste0("select distinct gene.gene_id,gene.gene_name,",
+                   "tx2exon.exon_id,gene.seq_name from gene join tx on (",
+                   "gene.gene_id=tx.gene_id) join tx2exon on (tx.tx_id=",
+                   "tx2exon.tx_id) where ((gene.gene_name = 'BCL2' or ",
+                   "gene.gene_name = 'BCL2L11') and gene.seq_name = '2')")
+    expect_equal(Q, want)
+    res <- dbGetQuery(dbconn(edb), Q)
+    expect_true(all(res$gene_name %in% c("BCL2L11")))
+    ## If we only want to get BCL2L11 back:
+    flt <- AnnotationFilterList(GenenameFilter(c("BCL2", "BCL2L11")), snf,
+                                logOp = "&")
+    Q <- ensembldb:::.buildQuery(edb, columns = columns, filter = flt)
+    want <- paste0("select distinct gene.gene_id,gene.gene_name,",
+                   "tx2exon.exon_id,gene.seq_name from gene join tx on (",
+                   "gene.gene_id=tx.gene_id) join tx2exon on (tx.tx_id=",
+                   "tx2exon.tx_id) where (gene.gene_name in ('BCL2','BCL2L11'",
+                   ") and gene.seq_name = '2')")
+    expect_equal(Q, want)
+    res <- dbGetQuery(dbconn(edb), Q)
+    expect_true(all(res$gene_name == "BCL2L11"))
+    
+    ## Check with a GRangesFilter.
+    grf <- GRangesFilter(GRanges(seqnames = 18, IRanges(60790600, 60790700)))
+    flt <- AnnotationFilterList(grf)
+    Q <- ensembldb:::.buildQuery(edb, columns = columns, filter = flt)
+    want <- paste0("select distinct gene.gene_id,gene.gene_name,tx2exon.",
+                   "exon_id,gene.gene_seq_start,gene.gene_seq_end,gene.seq_name",
+                   ",gene.seq_strand from gene join tx on (gene.gene_id",
+                   "=tx.gene_id) join tx2exon on (tx.tx_id=tx2exon.tx_id) ",
+                   "where ((gene.gene_seq_start<=60790700 and gene.gene_seq",
+                   "_end>=60790600 and gene.seq_name='18'))")
+    expect_equal(Q, want)
+    res <- dbGetQuery(dbconn(edb), Q)
+    expect_true(all(res$gene_name == "BCL2"))
+})
+
+test_that("buildQuery with startWith works", {
+    columns <- c("gene_id", "gene_name", "exon_id")
+    Q <- ensembldb:::.buildQuery(edb, columns = columns)
+    want <- paste0("select distinct gene.gene_id,gene.gene_name,",
+                   "tx2exon.exon_id from gene join tx on (gene.gene_id",
+                   "=tx.gene_id) join tx2exon on (tx.tx_id=tx2exon.tx_id)")
+    expect_equal(Q, want)
+    ## Different if we use startWith = exon
+    Q <- ensembldb:::.buildQuery(edb, columns = columns, startWith = "exon")
+    want <- paste0("select distinct gene.gene_id,gene.gene_name,",
+                   "tx2exon.exon_id from exon join tx2exon on (tx2exon.exon_id",
+                   "=exon.exon_id) join tx on (tx.tx_id=tx2exon.tx_id)",
+                   " join gene on (gene.gene_id=tx.gene_id)")
+    expect_equal(Q, want)
+    Q <- ensembldb:::.buildQuery(edb, columns = c("gene_id", "tx_biotype"))
+    want <- paste0("select distinct gene.gene_id,tx.tx_biotype from gene ",
+                   "join tx on (gene.gene_id=tx.gene_id)")
+    expect_equal(Q, want)
+    Q <- ensembldb:::.buildQuery(edb, columns = c("gene_id", "tx_biotype"),
+                                 startWith = "exon")
+    want <- paste0("select distinct gene.gene_id,tx.tx_biotype from exon ",
+                   "join tx2exon on (tx2exon.exon_id=exon.exon_id) join ",
+                   "tx on (tx.tx_id=tx2exon.tx_id) join ",
+                   "gene on (gene.gene_id=tx.gene_id)")
+    expect_equal(Q, want)
+    if (hasProteinData(edb)) {
+        ## Protein columns.
+        Q <- ensembldb:::.buildQuery(edb,
+                                     columns = c("protein_id", "uniprot_id",
+                                                 "protein_domain_id"))
+        want <- paste0("select distinct protein.protein_id,uniprot.uniprot_id,",
+                       "protein_domain.protein_domain_id from protein left ",
+                       "outer join protein_domain on (protein.protein_id=",
+                       "protein_domain.protein_id) left outer join ",
+                       "uniprot on (protein.protein_id=uniprot.protein_id)")
+        expect_equal(Q, want)
+        ## start at protein
+        Q <- ensembldb:::.buildQuery(edb,
+                                     columns = c("protein_id", "uniprot_id",
+                                                 "protein_domain_id"),
+                                     startWith = "protein")
+        want <- paste0("select distinct protein.protein_id,uniprot.uniprot_id,",
+                       "protein_domain.protein_domain_id from protein left ",
+                       "outer join protein_domain on (protein.protein_id=",
+                       "protein_domain.protein_id) left outer join ",
+                       "uniprot on (protein.protein_id=uniprot.protein_id)")
+        expect_equal(Q, want)
+        ## start at uniprot.
+        Q <- ensembldb:::.buildQuery(edb,
+                                     columns = c("protein_id", "uniprot_id",
+                                                 "protein_domain_id"),
+                                     startWith = "uniprot")
+        want <- paste0("select distinct protein.protein_id,uniprot.uniprot_id,",
+                       "protein_domain.protein_domain_id from uniprot left ",
+                       "outer join protein on (protein.protein_id=",
+                       "uniprot.protein_id) left outer join",
+                       " protein_domain on (protein.protein_id=",
+                       "protein_domain.protein_id)")
+        expect_equal(Q, want)
+        ## join with tx.
+        Q <- ensembldb:::.buildQuery(edb, columns = c("tx_id", "protein_id",
+                                                      "uniprot_id", "gene_id"))
+        want <- paste0("select distinct tx.tx_id,protein.protein_id,",
+                       "uniprot.uniprot_id,gene.gene_id from gene join ",
+                       "tx on (gene.gene_id=tx.gene_id) left outer join protein",
+                       " on (tx.tx_id=protein.tx_id) left outer join uniprot on",
+                       " (protein.protein_id=uniprot.protein_id)")
+        expect_equal(Q, want)
+        ## if we started from protein:
+        Q <- ensembldb:::.buildQuery(edb, columns = c("tx_id", "protein_id",
+                                                      "uniprot_id", "gene_id"),
+                                     startWith = "protein")
+        want <- paste0("select distinct tx.tx_id,protein.protein_id,",
+                       "uniprot.uniprot_id,gene.gene_id from protein left outer",
+                       " join tx on (tx.tx_id=protein.tx_id) join gene on",
+                       " (gene.gene_id=tx.gene_id) left outer join uniprot on",
+                       " (protein.protein_id=uniprot.protein_id)")
+        expect_equal(Q, want)
+    }
+})
+
+
+
+## This test is an important one as it checks that we don't miss any entries
+## from the database, e.g. if we query gene and join with protein that we don't
+## miss any non-coding transcripts, or if we join protein with uniprot or
+## protein_domain that we don't miss any values.
+test_that("query is valid", {
+    ## Check RNA/DNA tables; shouldn't be a problem there, though.
+    Ygns <- genes(edb, filter = SeqNameFilter("Y"), return.type = "data.frame")
+    Ytxs <- transcripts(edb, filter = SeqNameFilter("Y"),
+                        return.type = "data.frame",
+                        columns = c("gene_id", "tx_id", "tx_biotype"))
+    Yexns <- exons(edb, filter = SeqNameFilter("Y"), return.type = "data.frame",
+                   columns = c("exon_id", "gene_id"))
+    expect_true(all(unique(Ygns$gene_id) %in% unique(Yexns$gene_id)))
+    expect_true(all(unique(Ygns$gene_id) %in% unique(Ytxs$gene_id)))
+    ## Check gene with protein
+    if (hasProteinData(edb)) {
+        library(RSQLite)
+        ## Simulate what a simple join would do:
+        gns_f <- dbGetQuery(dbconn(edb),
+                            paste0("select gene.gene_id, tx.tx_id, tx_biotype, ",
+                                   "protein_id from gene join tx on ",
+                                   "(gene.gene_id=tx.gene_id) join protein on ",
+                                   "(tx.tx_id=protein.tx_id) ",
+                                   "where seq_name = 'Y'"))
+        ## We expect that gns_f is smaller, but that all protein_coding tx are
+        ## there.
+        expect_true(length(unique(gns_f$gene_id)) < length(unique(Ygns$gene_id)))
+        expect_true(all(unique(Ytxs[Ytxs$tx_biotype == "protein_coding", "tx_id"])
+                        %in% unique(gns_f$tx_id)))
+        ## Now test the "real" query:
+        Ygns_2 <- genes(edb, filter = SeqNameFilter("Y"),
+                        return.type = "data.frame",
+                        columns = c("gene_id", "tx_id", "tx_biotype",
+                                    "protein_id"))
+        ## We expect that ALL genes are present and ALL tx:
+        expect_true(all(unique(Ygns$gene_id) %in% unique(Ygns_2$gene_id)))
+        expect_true(all(unique(Ygns$tx_id) %in% unique(Ygns_2$tx_id)))
+
+        ## Get all the tx with protein_id
+        txs <- transcripts(edb, columns = c("tx_id", "protein_id"),
+                           return.type = "data.frame")
+        txids <- dbGetQuery(dbconn(edb), "select tx_id from tx;")[, "tx_id"]
+        protids <- dbGetQuery(dbconn(edb),
+                              "select protein_id from protein;")[, "protein_id"]
+        expect_true(all(txids %in% txs$tx_id))
+        expect_true(all(protids %in% txs$protein_id))
+
+        ## Check protein with uniprot
+        uniprotids <- dbGetQuery(dbconn(edb),
+                                 "select uniprot_id from uniprot")$uniprot_id
+
+        ## Check protein with protein domain
+        ## Check protein_domain with uniprot
+    }
+})
+
+test_that(".getWhat works", {
+    library(RSQLite)
+    Q_2 <- paste0("select * from gene join tx on (gene.gene_id=tx.gene_id)",
+                  " join tx2exon on (tx.tx_id=tx2exon.tx_id) where",
+                  " gene.gene_id = 'ENSG00000000005'")
+    res_2 <- dbGetQuery(dbconn(edb), Q_2)
+    gf <- GeneIdFilter("ENSG00000000005")
+    res_3 <- ensembldb:::.getWhat(edb, columns = c("gene_name", "exon_idx"),
+                                 filter = AnnotationFilterList(gf))
+    expect_identical(res_3, unique(res_2[, colnames(res_3)]))
+})
+
+test_that(".logOp2SQL works", {
+    expect_equal(ensembldb:::.logOp2SQL("|"), "or")
+    expect_equal(ensembldb:::.logOp2SQL("&"), "and")
+    expect_equal(ensembldb:::.logOp2SQL("dfdf"), NULL)
+})
+
diff --git a/tests/testthat/test_extractTranscriptSeqs.R b/tests/testthat/test_extractTranscriptSeqs.R
new file mode 100644
index 0000000..0b1d0d4
--- /dev/null
+++ b/tests/testthat/test_extractTranscriptSeqs.R
@@ -0,0 +1,65 @@
+
+test_that("extractTranscriptSeqs works with BSGenome", {
+    library(BSgenome.Hsapiens.UCSC.hg19)
+    bsg <- BSgenome.Hsapiens.UCSC.hg19
+
+    ## Changing the seqlevels tyle to UCSC
+    seqlevelsStyle(edb) <- "UCSC"
+    ZBTB <- extractTranscriptSeqs(bsg, edb, filter=GenenameFilter("ZBTB16"))
+    ## Load the sequences for one ZBTB16 transcript from FA.
+    faf <- system.file("txt/ENST00000335953.fa.gz", package="ensembldb")
+    Seqs <- readDNAStringSet(faf)
+    tx <- "ENST00000335953"
+    ## cDNA
+    expect_equal(unname(as.character(ZBTB[tx])),
+                 unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
+    ## CDS
+    cBy <- cdsBy(edb, "tx", filter=TxIdFilter(tx))
+    CDS <- extractTranscriptSeqs(bsg, cBy)
+    expect_equal(unname(as.character(CDS)),
+                 unname(as.character(Seqs[grep(names(Seqs), pattern="cds")])))
+    ## 5' UTR
+    fBy <- fiveUTRsByTranscript(edb, filter=TxIdFilter(tx))
+    UTR <- extractTranscriptSeqs(bsg, fBy)
+    expect_equal(unname(as.character(UTR)),
+                 unname(as.character(Seqs[grep(names(Seqs), pattern="utr5")])))
+    ## 3' UTR
+    tBy <- threeUTRsByTranscript(edb, filter=TxIdFilter(tx))
+    UTR <- extractTranscriptSeqs(bsg, tBy)
+    expect_equal(unname(as.character(UTR)),
+                 unname(as.character(Seqs[grep(names(Seqs), pattern="utr3")])))
+
+    ## Another gene on the reverse strand:
+    faf <- system.file("txt/ENST00000200135.fa.gz", package="ensembldb")
+    Seqs <- readDNAStringSet(faf)
+    tx <- "ENST00000200135"
+    ## cDNA
+    cDNA <- extractTranscriptSeqs(bsg, edb, filter=TxIdFilter(tx))
+    expect_equal(unname(as.character(cDNA)),
+                 unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
+    ## do the same, but from other strand
+    exns <- exonsBy(edb, "tx", filter=TxIdFilter(tx))
+    cDNA <- extractTranscriptSeqs(bsg, exns)
+    expect_equal(unname(as.character(cDNA)),
+                 unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
+    strand(exns) <- "+"
+    cDNA <- extractTranscriptSeqs(bsg, exns)
+    expect_true(unname(as.character(cDNA)) !=
+                unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
+    ## CDS
+    cBy <- cdsBy(edb, "tx", filter=TxIdFilter(tx))
+    CDS <- extractTranscriptSeqs(bsg, cBy)
+    expect_equal(unname(as.character(CDS)),
+                 unname(as.character(Seqs[grep(names(Seqs), pattern="cds")])))
+    ## 5' UTR
+    fBy <- fiveUTRsByTranscript(edb, filter=TxIdFilter(tx))
+    UTR <- extractTranscriptSeqs(bsg, fBy)
+    expect_equal(unname(as.character(UTR)),
+                 unname(as.character(Seqs[grep(names(Seqs), pattern="utr5")])))
+    ## 3' UTR
+    tBy <- threeUTRsByTranscript(edb, filter=TxIdFilter(tx))
+    UTR <- extractTranscriptSeqs(bsg, tBy)
+    expect_equal(unname(as.character(UTR)),
+                 unname(as.character(Seqs[grep(names(Seqs), pattern="utr3")])))
+})
+
diff --git a/tests/testthat/test_functions-Filter.R b/tests/testthat/test_functions-Filter.R
new file mode 100644
index 0000000..0e446f3
--- /dev/null
+++ b/tests/testthat/test_functions-Filter.R
@@ -0,0 +1,226 @@
+
+test_that(".fieldInEnsDb works", {
+    expect_equal(unname(ensembldb:::.fieldInEnsDb("symbol")), "gene_name")
+    expect_equal(unname(ensembldb:::.fieldInEnsDb("gene_biotype")), "gene_biotype")
+    expect_equal(unname(ensembldb:::.fieldInEnsDb("entrez")), "entrezid")
+    expect_equal(unname(ensembldb:::.fieldInEnsDb("gene_id")), "gene_id")
+    expect_equal(unname(ensembldb:::.fieldInEnsDb("genename")), "gene_name")
+    expect_equal(unname(ensembldb:::.fieldInEnsDb("seq_name")), "seq_name")
+    expect_equal(unname(ensembldb:::.fieldInEnsDb("tx_id")), "tx_id")
+    expect_equal(unname(ensembldb:::.fieldInEnsDb("tx_biotype")), "tx_biotype")
+    expect_equal(unname(ensembldb:::.fieldInEnsDb("tx_name")), "tx_id")
+    expect_equal(unname(ensembldb:::.fieldInEnsDb("exon_id")), "exon_id")
+    expect_equal(unname(ensembldb:::.fieldInEnsDb("exon_rank")), "exon_idx")
+    expect_equal(unname(ensembldb:::.fieldInEnsDb("protein_id")), "protein_id")
+    expect_equal(unname(ensembldb:::.fieldInEnsDb("uniprot")), "uniprot_id")
+    expect_equal(unname(ensembldb:::.fieldInEnsDb("uniprot_db")), "uniprot_db")
+    expect_equal(unname(ensembldb:::.fieldInEnsDb("uniprot_mapping_type")),
+                 "uniprot_mapping_type")
+    expect_equal(unname(ensembldb:::.fieldInEnsDb("prot_dom_id")),
+                 "protein_domain_id")
+    expect_error(ensembldb:::.fieldInEnsDb("aaa"))
+})
+
+test_that(".conditionForEnsDb works", {
+    smb <- SymbolFilter("a")
+    expect_equal(condition(smb), "==")
+    expect_equal(ensembldb:::.conditionForEnsDb(smb), "=")
+    smb <- SymbolFilter(c("a", "b", "c"))
+    expect_equal(ensembldb:::.conditionForEnsDb(smb), "in")
+    smb <- SymbolFilter(c("a", "b", "c"), condition = "!=")
+    expect_equal(ensembldb:::.conditionForEnsDb(smb), "not in")
+    smb <- SymbolFilter(c("a"), condition = "!=")
+    expect_equal(ensembldb:::.conditionForEnsDb(smb), "!=")
+    smb <- SymbolFilter(c("a"), condition = "startsWith")
+    expect_equal(ensembldb:::.conditionForEnsDb(smb), "like")
+    smb <- SymbolFilter(c("a"), condition = "endsWith")
+    expect_equal(ensembldb:::.conditionForEnsDb(smb), "like")
+    ## Tests for numeric filters
+    fl <- GeneStartFilter(4)
+    expect_equal(ensembldb:::.conditionForEnsDb(fl), "=")
+    fl <- GeneStartFilter(4, condition = ">")
+    expect_equal(ensembldb:::.conditionForEnsDb(fl), ">")
+    fl <- GeneStartFilter(4, condition = ">=")
+    expect_equal(ensembldb:::.conditionForEnsDb(fl), ">=")
+    fl <- GeneStartFilter(4, condition = "<")
+    expect_equal(ensembldb:::.conditionForEnsDb(fl), "<")
+    fl <- GeneStartFilter(4, condition = "<=")
+    expect_equal(ensembldb:::.conditionForEnsDb(fl), "<=")
+})
+
+test_that(".valueForEnsDb works", {
+    smb <- SymbolFilter("a")
+    expect_equal(ensembldb:::.valueForEnsDb(smb), "'a'")
+    smb <- SymbolFilter(c("a", "b", "b", "c"))
+    expect_equal(ensembldb:::.valueForEnsDb(smb), "('a','b','c')")
+    smb <- SymbolFilter("a", condition = "startsWith")
+    expect_equal(ensembldb:::.valueForEnsDb(smb), "'a%'")
+    smb <- SymbolFilter("a", condition = "endsWith")
+    expect_equal(ensembldb:::.valueForEnsDb(smb), "'%a'")
+    ## Tests for numeric filters
+    fl <- GeneStartFilter(4)
+    expect_equal(ensembldb:::.valueForEnsDb(fl), 4)
+})
+
+test_that(".queryForEnsDb works", {
+    smb <- SymbolFilter("a")
+    expect_equal(ensembldb:::.queryForEnsDb(smb), "gene_name = 'a'")
+    smb <- SymbolFilter(c("a", "x"), condition = "!=")
+    expect_equal(ensembldb:::.queryForEnsDb(smb), "gene_name not in ('a','x')")
+    ## Tests for numeric filters
+    fl <- GeneStartFilter(5, condition = "<=")
+    expect_equal(ensembldb:::.queryForEnsDb(fl), "gene_seq_start <= 5")
+})
+
+test_that(".queryForEnsDbWithTables works", {
+    smb <- SymbolFilter("a")
+    expect_equal(ensembldb:::.queryForEnsDbWithTables(smb), "gene_name = 'a'")
+    smb <- SymbolFilter(c("a", "x"), condition = "!=")
+    expect_equal(ensembldb:::.queryForEnsDbWithTables(smb),
+                 "gene_name not in ('a','x')")
+    ## With edb
+    smb <- SymbolFilter("a")
+    expect_equal(ensembldb:::.queryForEnsDbWithTables(smb, edb),
+                 "gene.gene_name = 'a'")
+    smb <- SymbolFilter(c("a", "x"), condition = "!=")
+    expect_equal(ensembldb:::.queryForEnsDbWithTables(smb, edb),
+                 "gene.gene_name not in ('a','x')")
+    ## With edb, tables
+    smb <- SymbolFilter("a")
+    expect_equal(ensembldb:::.queryForEnsDbWithTables(smb, edb, c("gene", "tx")),
+                 "gene.gene_name = 'a'")
+    fl <- GeneIdFilter("b")
+    expect_equal(ensembldb:::.queryForEnsDbWithTables(fl, edb),
+                 "gene.gene_id = 'b'")
+    expect_equal(ensembldb:::.queryForEnsDbWithTables(fl, edb, c("tx", "gene")),
+                 "tx.gene_id = 'b'")
+    ## Entrez
+    if (as.numeric(ensembldb:::dbSchemaVersion(edb)) > 1) {
+        fl <- EntrezFilter("g")
+        expect_equal(ensembldb:::.queryForEnsDbWithTables(fl, edb),
+                     "entrezgene.entrezid = 'g'")
+        fl <- EntrezFilter("g", condition = "endsWith")
+        expect_equal(ensembldb:::.queryForEnsDbWithTables(fl, edb),
+                     "entrezgene.entrezid like '%g'")
+    } else {
+        fl <- EntrezFilter("g")
+        expect_equal(ensembldb:::.queryForEnsDbWithTables(fl, edb),
+                     "gene.entrezid = 'g'")
+        fl <- EntrezFilter("g", condition = "endsWith")
+        expect_equal(ensembldb:::.queryForEnsDbWithTables(fl, edb),
+                     "gene.entrezid like '%g'")
+    }
+    ## Numeric filters
+    fl <- TxStartFilter(123)
+    expect_equal(ensembldb:::.queryForEnsDbWithTables(fl, edb),
+                 "tx.tx_seq_start = 123")
+    expect_error(ensembldb:::.queryForEnsDbWithTables(fl, edb, "gene"))
+})
+
+test_that(".processFilterParam works", {
+    ## Check that the processFilterParam does what we expect. Check input and
+    ## return ALWAYS an AnnotationFilterList object.
+    snf <- SeqNameFilter(c("Y", 9))
+    res <- ensembldb:::.processFilterParam(snf, db = edb)
+    expect_true(is(res, "AnnotationFilterList"))
+    expect_equal(res[[1]], snf)
+
+    ## - single filter
+    gif <- GeneIdFilter("BCL2", condition = "!=")
+    res <- ensembldb:::.processFilterParam(gif, db = edb)
+    expect_true(is(res, "AnnotationFilterList"))
+    expect_equal(res[[1]], gif)
+    
+    ## - list of filters
+    snf <- SeqNameFilter("X")
+    res <- ensembldb:::.processFilterParam(list(gif, snf), edb)
+    expect_true(is(res, "AnnotationFilterList"))
+    expect_true(length(res) == 2)
+    expect_equal(res[[1]], gif)
+    expect_equal(res[[2]], snf)
+    expect_equal(res at logOp, "&")
+    
+    ## - AnnotationFilterList
+    afl <- AnnotationFilterList(gif, snf, logOp = "|")
+    res <- ensembldb:::.processFilterParam(afl, edb)
+    expect_true(is(res, "AnnotationFilterList"))
+    expect_equal(afl, res)
+    afl <- AnnotationFilterList(gif, snf, logOp = "&")
+    res <- ensembldb:::.processFilterParam(afl, edb)
+    expect_true(is(res, "AnnotationFilterList"))
+    expect_equal(afl, res)
+    
+    ## - filter expression
+    res <- ensembldb:::.processFilterParam(~ gene_id != "BCL2" |
+                                               seq_name == "X", edb)
+    expect_true(is(res, "AnnotationFilterList"))
+    expect_equal(res, AnnotationFilterList(gif, snf, logOp = "|"))
+    flt <- ~ gene_id != "BCL2" | seq_name == "X"
+    res <- ensembldb:::.processFilterParam(flt, edb)
+    expect_true(is(res, "AnnotationFilterList"))
+    expect_equal(res, AnnotationFilterList(gif, snf, logOp = "|"))
+    res <- ensembldb:::.processFilterParam(~ gene_id != "BCL2", edb)
+    expect_true(is(res, "AnnotationFilterList"))
+    expect_equal(res, AnnotationFilterList(gif))
+
+    ## - Errors
+    expect_error(ensembldb:::.processFilterParam(db = edb))
+    expect_error(ensembldb:::.processFilterParam(4, edb))
+    expect_error(ensembldb:::.processFilterParam(list(afl, "a"), edb))
+    expect_error(ensembldb:::.processFilterParam("a", edb))
+    expect_error(ensembldb:::.processFilterParam(~ gene_bla == "14", edb))
+    ## Errors for filters that are not supported.
+    expect_error(ensembldb:::.processFilterParam(CdsEndFilter(123), edb))
+    
+    ## Same with calls from within a function.
+    testFun <- function(filter = AnnotationFilterList()) {
+        ensembldb:::.processFilterParam(filter, db = edb)
+    }
+
+    res <- testFun()
+    expect_true(is(res, "AnnotationFilterList"))
+    expect_true(length(res) == 0)
+    res <- testFun(filter = ~ gene_id == 4)
+    expect_true(is(res, "AnnotationFilterList"))
+    expect_equal(res[[1]], GeneIdFilter(4))
+    res <- testFun(filter = GenenameFilter("BCL2"))
+    expect_true(is(res, "AnnotationFilterList"))
+    expect_equal(res[[1]], GenenameFilter("BCL2"))
+    res <- testFun(filter = AnnotationFilterList(GenenameFilter("BCL2")))
+    expect_true(is(res, "AnnotationFilterList"))
+    expect_equal(res[[1]], GenenameFilter("BCL2"))
+    
+    gene <- "ZBTB16"
+    otherFun <- function(gn) {
+        testFun(filter = GenenameFilter(gn))
+    }
+    res <- otherFun(gene)
+})
+
+test_that("setFeatureInGRangesFilter works", {
+    afl <- AnnotationFilterList(GeneIdFilter(123), SeqNameFilter(3),
+                                GRangesFilter(GRanges()))
+    afl2 <- AnnotationFilterList(afl, GRangesFilter(GRanges()))
+
+    res <- ensembldb:::setFeatureInGRangesFilter(afl, feature = "tx")
+    expect_equal(res[[3]]@feature, "tx")
+    res <- ensembldb:::setFeatureInGRangesFilter(afl2, feature = "tx")
+    expect_equal(res[[2]]@feature, "tx")
+    expect_equal(res[[1]][[3]]@feature, "tx")    
+})
+
+test_that(".AnnottionFilterClassNames works", {
+    afl1 <- AnnotationFilter(~ genename == 3 & seq_name != 5)
+    expect_equal(.AnnotationFilterClassNames(afl1),
+                 c("GenenameFilter", "SeqNameFilter"))
+    afl2 <- AnnotationFilter(~ gene_start > 13 | seq_strand == "+")
+    expect_equal(.AnnotationFilterClassNames(afl2),
+                 c("GeneStartFilter", "SeqStrandFilter"))
+    afl3 <- AnnotationFilterList(afl1, SymbolFilter(4))
+    expect_equal(.AnnotationFilterClassNames(afl3),
+                 c("GenenameFilter", "SeqNameFilter", "SymbolFilter"))
+    afl4 <- AnnotationFilterList(afl2, afl3)
+    expect_equal(.AnnotationFilterClassNames(afl4),
+                 c("GeneStartFilter", "SeqStrandFilter", "GenenameFilter",
+                   "SeqNameFilter", "SymbolFilter"))
+})
diff --git a/tests/testthat/test_functions-create-EnsDb.R b/tests/testthat/test_functions-create-EnsDb.R
new file mode 100644
index 0000000..ebbc096
--- /dev/null
+++ b/tests/testthat/test_functions-create-EnsDb.R
@@ -0,0 +1,234 @@
+
+test_that(".organismName, .abbrevOrganismName and .makePackageName works", {
+    res <- ensembldb:::.organismName("homo_sapiens")
+    expect_equal(res, "Homo_sapiens")
+    res <- ensembldb:::.abbrevOrganismName("homo_sapiens")
+    expect_equal(res, "hsapiens")
+    res <- ensembldb:::.makePackageName(dbconn(edb))
+    expect_equal(res, "EnsDb.Hsapiens.v75")
+})
+
+test_that("ensDbFromGRanges works", {
+    load(system.file("YGRanges.RData", package="ensembldb"))
+    suppressWarnings(
+        DB <- ensDbFromGRanges(Y, path=tempdir(), version=75,
+                               organism="Homo_sapiens", skip = TRUE)
+    )
+    db <- EnsDb(DB)
+    expect_equal(unname(genome(db)), "GRCh37")
+
+    Test <- makeEnsembldbPackage(DB, destDir = tempdir(),
+                                 version = "0.0.1", author = "J Rainer",
+                                 maintainer = "")
+    expect_true(ensembldb:::checkValidEnsDb(db))
+})
+
+test_that("ensDbFromGtf and Gff works", {
+    gff <- system.file("gff/Devosia_geojensis.ASM96941v1.32.gff3.gz",
+                       package="ensembldb")
+    gtf <- system.file("gtf/Devosia_geojensis.ASM96941v1.32.gtf.gz",
+                       package="ensembldb")
+    suppressWarnings(
+        db_gff <- EnsDb(ensDbFromGff(gff, outfile = tempfile(), skip = TRUE))
+    )
+    suppressWarnings(
+        db_gtf <- EnsDb(ensDbFromGtf(gtf, outfile = tempfile(), skip = TRUE))
+    )
+    expect_equal(ensemblVersion(db_gtf), "32")
+    expect_equal(ensemblVersion(db_gff), "32")
+
+    res <- ensembldb:::compareChromosomes(db_gtf, db_gff)
+    expect_equal(res, "OK")
+    res <- ensembldb:::compareGenes(db_gtf, db_gff)
+    expect_equal(res, "WARN")  ## differences in gene names and Entrezid.
+    res <- ensembldb:::compareTx(db_gtf, db_gff)
+    expect_equal(res, "OK")
+    res <- ensembldb:::compareExons(db_gtf, db_gff)
+    expect_equal(res, "OK")
+    ## Compare them all in one call
+    res <- ensembldb:::compareEnsDbs(db_gtf, db_gff)
+    expect_equal(unname(res["metadata"]), "NOTE")
+    expect_equal(unname(res["chromosome"]), "OK")
+    expect_equal(unname(res["transcript"]), "OK")
+    expect_equal(unname(res["exon"]), "OK")
+})
+
+test_that("isEnsemblFileName", {
+    res <- ensembldb:::isEnsemblFileName("Caenorhabditis_elegans.WS210.60.gtf.gz")
+    expect_true(res)
+    res <- ensembldb:::isEnsemblFileName("Caenorhabditis_elegans_fdf.60.dfd.gtf.gz")
+    expect_true(!res)
+
+    fn <- "Caenorhabditis_elegans.WS210.60.gtf.gz"
+    res <- ensembldb:::ensemblVersionFromGtfFileName(fn)
+    expect_equal(res, "60")
+    res <- ensembldb:::organismFromGtfFileName(fn)
+    expect_equal(res, "Caenorhabditis_elegans")
+    res <- ensembldb:::genomeVersionFromGtfFileName(fn)
+    expect_equal(res, "WS210")
+
+    res <- ensembldb:::elementFromEnsemblFilename(fn, which = 1)
+    expect_equal(res, "Caenorhabditis_elegans")
+    res <- ensembldb:::elementFromEnsemblFilename(fn, which = 2)
+    expect_equal(res, "WS210")
+    res <- ensembldb:::elementFromEnsemblFilename(fn, which = 3)
+    expect_equal(res, "60")
+    res <- ensembldb:::elementFromEnsemblFilename(fn, which = 4)
+    expect_equal(res, "gtf")
+})
+
+test_that("processEnsemblFileNames works", {
+    Test <- "Homo_sapiens.GRCh38.83.gtf.gz"
+    expect_true(ensembldb:::isEnsemblFileName(Test))
+    expect_equal(ensembldb:::organismFromGtfFileName(Test), "Homo_sapiens")
+    expect_equal(ensembldb:::genomeVersionFromGtfFileName(Test), "GRCh38")
+    expect_equal(ensembldb:::ensemblVersionFromGtfFileName(Test), "83")
+
+    Test <- "Homo_sapiens.GRCh38.83.chr.gff3.gz"
+    expect_true(ensembldb:::isEnsemblFileName(Test))
+    expect_equal(ensembldb:::organismFromGtfFileName(Test), "Homo_sapiens")
+    expect_equal(ensembldb:::genomeVersionFromGtfFileName(Test), "GRCh38")
+    expect_equal(ensembldb:::ensemblVersionFromGtfFileName(Test), "83")
+
+    Test <- "Gadus_morhua.gadMor1.83.gff3.gz"
+    expect_true(ensembldb:::isEnsemblFileName(Test))
+    expect_equal(ensembldb:::organismFromGtfFileName(Test), "Gadus_morhua")
+    expect_equal(ensembldb:::genomeVersionFromGtfFileName(Test), "gadMor1")
+    expect_equal(ensembldb:::ensemblVersionFromGtfFileName(Test), "83")
+
+    Test <- "Solanum_lycopersicum.GCA_000188115.2.30.chr.gtf.gz"
+    expect_true(ensembldb:::isEnsemblFileName(Test))
+    expect_equal(ensembldb:::organismFromGtfFileName(Test), "Solanum_lycopersicum")
+    expect_equal(ensembldb:::genomeVersionFromGtfFileName(Test), "GCA_000188115.2")
+    expect_equal(ensembldb:::ensemblVersionFromGtfFileName(Test), "30")
+
+    Test <- "ref_GRCh38.p2_top_level.gff3.gz"
+    expect_equal(ensembldb:::isEnsemblFileName(Test), FALSE)
+    ensembldb:::organismFromGtfFileName(Test)
+    expect_error(ensembldb:::genomeVersionFromGtfFileName(Test))
+    ##checkException(ensembldb:::ensemblVersionFromGtfFileName(Test))
+})
+
+test_that("checkExtractVersions works", {
+    fn <- "Devosia_geojensis.ASM96941v1.32.gff3.gz"
+    res <- ensembldb:::.checkExtractVersions(fn)
+    expect_equal(unname(res["organism"]), "Devosia_geojensis")
+    expect_equal(unname(res["genomeVersion"]), "ASM96941v1")
+    expect_equal(unname(res["version"]), "32")
+    suppressWarnings(
+        res <- ensembldb:::.checkExtractVersions(fn, organism = "Homo_sapiens")
+        )
+    expect_equal(unname(res["organism"]), "Homo_sapiens")
+    expect_error(ensembldb:::.checkExtractVersions("afdfhjd"))
+})
+
+test_that("buildMetadata works", {
+    res <- ensembldb:::buildMetadata(organism = "Mus_musculus",
+                                     ensemblVersion = "88",
+                                     genomeVersion = "38")
+    expect_equal(colnames(res), c("name", "value"))
+    expect_equal(res[res$name == "Organism", "value"], "Mus_musculus")
+})
+
+test_that("guessDatabaseName works", {
+    ## Testing real case examples.
+    genome <- "Rnor_5.0"
+    organism <- "Rattus_norvegicus"
+    ensembl <- "75"
+    res <- ensembldb:::.guessDatabaseName(organism, ensembl)
+    expect <- "rattus_norvegicus_core_75"
+    expect_equal(res, expect)
+
+    genome <- "GRCm38"
+    organism <- "Mus_musculus"
+    expect <- "mus_musculus_core_75_38"
+    res <- ensembldb:::.guessDatabaseName(organism, ensembl,
+                                          genome = genome)
+    expect_equal(expect, res)
+})
+
+test_that("getEnsemblMysqlUrl works", {
+    check_getReadMysqlTable <- function(url) {
+        res <- ensembldb:::.getReadMysqlTable(url, "coord_system.txt.gz",
+                                              colnames = c("coord_system_id",
+                                                           "species_id",
+                                                           "name", "version",
+                                                           "rank", "attrib"))
+        expect_true(nrow(res) > 0)
+    }
+
+    ## Only run this if we have access to Ensembl.
+    tmp <- try(
+        RCurl::getURL(ensembldb:::.ENSEMBL_URL, dirlistonly = TRUE,
+                      .opts = list(timeout = 5, maxredirs = 2))
+    )
+    if (!is(tmp, "try-error")) {
+        res <- ensembldb:::.getEnsemblMysqlUrl(type = "ensembl",
+                                               organism = "macaca mulatta",
+                                               ensembl = 85)
+        expect_equal(res, paste0(ensembldb:::.ENSEMBL_URL, "release-85/",
+                                "mysql/macaca_mulatta_core_85_10"))
+        check_getReadMysqlTable(res)
+        ## Next.
+        res <- ensembldb:::.getEnsemblMysqlUrl(type = "ensembl",
+                                               organism = "Bos taurus",
+                                               ensembl = 61)
+        expect_equal(res, paste0(ensembldb:::.ENSEMBL_URL, "release-61/",
+                                "mysql/bos_taurus_core_61_4j"))
+        check_getReadMysqlTable(res)
+        ## Next
+        res <- ensembldb:::.getEnsemblMysqlUrl(type = "ensembl",
+                                               organism = "Ficedula albicollis",
+                                               ensembl = 77)
+        expect_equal(res, paste0(ensembldb:::.ENSEMBL_URL, "release-77/",
+                                "mysql/ficedula_albicollis_core_77_1"))
+    }
+    ## ensemblgenomes
+    tmp <- try(
+        RCurl::getURL(ensembldb:::.ENSEMBLGENOMES_URL, dirlistonly = TRUE,
+                      .opts = list(timeout = 5, maxredirs = 2))
+    )
+    if (!is(tmp, "try-error")) {
+        ## check fungi
+        res <- ensembldb:::.getEnsemblMysqlUrl(type = "ensemblgenomes",
+                                               organism = "fusarium_oxysporum",
+                                               ensembl = 21)
+        db_name <- "fusarium_oxysporum_core_21_74_2"
+        expect_equal(res, paste0(ensembldb:::.ENSEMBLGENOMES_URL, "release-21/",
+                                "fungi/mysql/", db_name))
+        check_getReadMysqlTable(res)
+        ## Next one
+        db_name <- "solanum_lycopersicum_core_28_81_250"
+        res <- ensembldb:::.getEnsemblMysqlUrl(type = "ensemblgenomes",
+                                               organism = "solanum_lycopersicum",
+                                               ensembl = 28)
+        expect_equal(res, paste0(ensembldb:::.ENSEMBLGENOMES_URL, "release-28/",
+                                "plants/mysql/", db_name))
+        check_getReadMysqlTable(res)
+    }
+})
+
+test_that("getSeqlengthsFromMysqlFolder works", {
+    library(curl)
+    ch <- new_handle(timeout = 5)
+    handle_setopt(ch, timeout = 5)
+    tmp <- try(
+        ## RCurl::getURL(ensembldb:::.ENSEMBL_URL, dirlistonly = TRUE,
+        ##               .opts = list(timeout = 5, maxredirs = 2))
+        readLines(curl(ensembldb:::.ENSEMBL_URL, handle = ch))
+    )
+    if (!is(tmp, "try-error")) {
+        ## Compare seqlengths we've in EnsDb.Hsapiens.v75 with the expected
+        ## ones.
+        seq_info <- seqinfo(edb)
+        seq_lengths <- ensembldb:::.getSeqlengthsFromMysqlFolder(
+            organism = "Homo sapiens", ensembl = 75,
+            seqnames = seqlevels(seq_info))
+        sl <- seqlengths(seq_info)
+        sl_2 <- seq_lengths$length
+        names(sl_2) <- rownames(seq_lengths)
+        expect_true(all(names(sl) %in% names(sl_2)))
+        expect_equal(sl, sl_2[names(sl)])
+    }
+})
+
diff --git a/tests/testthat/test_functions-utils.R b/tests/testthat/test_functions-utils.R
new file mode 100644
index 0000000..848c805
--- /dev/null
+++ b/tests/testthat/test_functions-utils.R
@@ -0,0 +1,106 @@
+
+test_that("orderDataFrameBy works", {
+    res <- exons(edb, filter = GenenameFilter("ZBTB16"),
+                 return.type = "DataFrame")
+    ## Order by end
+    res_2 <- ensembldb:::orderDataFrameBy(res, by = "exon_seq_end")
+    idx <- order(res_2$exon_seq_end)
+    expect_equal(idx, 1:nrow(res_2))
+})
+
+test_that("addFilterColumns works for AnnotationFilterList", {
+    afl <- AnnotationFilterList(GenenameFilter(2), SymbolFilter(23))
+    afl2 <- AnnotationFilterList(SeqNameFilter(4), afl)
+    res <- ensembldb:::addFilterColumns(cols = "gene_biotype", filter = afl, edb)
+    expect_equal(res, c("gene_biotype", "gene_name", "symbol"))
+    res <- ensembldb:::addFilterColumns(cols = "gene_biotype", filter = afl2,
+                                        edb)
+    expect_equal(res, c("gene_biotype", "seq_name", "gene_name", "symbol"))
+})
+
+## Here we want to test if we get always also the filter columns back.
+test_that("multiFilterReturnCols works also with symbolic filters", {
+    cols <- ensembldb:::addFilterColumns(edb, cols = c("exon_id"),
+                                         filter = SymbolFilter("SKA2"))
+    expect_equal(cols, c("exon_id", "symbol"))
+    ## Two filter
+    cols <- ensembldb:::addFilterColumns(edb, cols = c("exon_id"),
+                                         filter = list(SymbolFilter("SKA2"),
+                                                       GenenameFilter("SKA2")))
+    expect_equal(cols, c("exon_id", "symbol", "gene_name"))
+    cols <- ensembldb:::addFilterColumns(edb, cols = c("exon_id"),
+                                         filter = list(SymbolFilter("SKA2"),
+                                                       GenenameFilter("SKA2"),
+                                                       GRangesFilter(
+                                                           GRanges("3",
+                                                                   IRanges(3, 5)
+                                                                   ))))
+    expect_equal(cols, c("exon_id", "symbol", "gene_name", "gene_seq_start",
+                         "gene_seq_end", "seq_name", "seq_strand"))
+    cols <- ensembldb:::addFilterColumns(edb, cols = c("exon_id"),
+                                         filter = list(SymbolFilter("SKA2"),
+                                                       GenenameFilter("SKA2"),
+                                                       GRangesFilter(
+                                                           GRanges("3",
+                                                                   IRanges(3, 5)
+                                                                   ),
+                                                           feature = "exon")))
+    expect_equal(cols, c("exon_id", "symbol", "gene_name", "exon_seq_start",
+                         "exon_seq_end", "seq_name", "seq_strand"))
+    ## SeqStartFilter and GRangesFilter
+    ssf <- TxStartFilter(123)
+    cols <- ensembldb:::addFilterColumns(edb, cols = c("exon_id"),
+                                         filter = list(SymbolFilter("SKA2"),
+                                                       GenenameFilter("SKA2"),
+                                                       GRangesFilter(
+                                                           GRanges("3",
+                                                                   IRanges(3, 5)
+                                                                   ),
+                                                           feature = "exon"),
+                                                       ssf))
+    expect_equal(cols, c("exon_id", "symbol", "gene_name", "exon_seq_start",
+                         "exon_seq_end", "seq_name", "seq_strand",
+                         "tx_seq_start"))
+})
+
+test_that("SQLiteName2MySQL works", {
+    have <- "EnsDb.Hsapiens.v75"
+    want <- "ensdb_hsapiens_v75"
+    expect_equal(ensembldb:::SQLiteName2MySQL(have), want)
+})
+
+test_that("anyProteinColumns works", {
+    expect_true(ensembldb:::anyProteinColumns(c("gene_id", "protein_id")))
+    expect_true(!ensembldb:::anyProteinColumns(c("gene_id", "exon_id")))
+})
+
+test_that("listProteinColumns works", {
+    if (hasProteinData(edb)) {
+        res <- listProteinColumns(edb)
+        expect_true(any(res == "protein_id"))
+        expect_true(any(res == "uniprot_id"))
+        expect_true(any(res == "protein_domain_id"))
+        ## That's new columns fetched for Uniprot:
+        expect_true(any(res == "uniprot_db"))
+        expect_true(any(res == "uniprot_mapping_type"))
+    } else {
+        expect_error(listProteinColumns(edb))
+    }
+})
+
+test_that("strand2num works", {
+    expect_equal(ensembldb:::strand2num("+"), 1)
+    expect_equal(ensembldb:::strand2num("+1"), 1)
+    expect_equal(ensembldb:::strand2num("-"), -1)
+    expect_equal(ensembldb:::strand2num("-1"), -1)
+    expect_equal(ensembldb:::strand2num(1), 1)
+    expect_equal(ensembldb:::strand2num(5), 1)
+    expect_equal(ensembldb:::strand2num(-1), -1)
+    expect_equal(ensembldb:::strand2num(-5), -1)
+    expect_error(ensembldb:::strand2num("a"))
+})
+
+test_that("num2strand works", {
+    expect_equal(ensembldb:::num2strand(1), "+")
+    expect_equal(ensembldb:::num2strand(-1), "-")
+})
diff --git a/tests/testthat/test_select-methods.R b/tests/testthat/test_select-methods.R
new file mode 100644
index 0000000..b49622e
--- /dev/null
+++ b/tests/testthat/test_select-methods.R
@@ -0,0 +1,414 @@
+
+test_that("columns works", {
+    cols <- columns(edb)
+    ## Don't expect to see any _ there...
+    expect_equal(length(grep(cols, pattern="_")), 0)
+})
+
+test_that("keytypes works", {
+    keyt <- keytypes(edb)
+    expect_equal(all(c("GENEID", "EXONID", "TXID") %in% keyt), TRUE)
+})
+
+test_that("ensDbColumnForColumn works", {
+    Test <- ensembldb:::ensDbColumnForColumn(edb, "GENEID")
+    expect_equal(unname(Test), "gene_id")
+    Test <- ensembldb:::ensDbColumnForColumn(edb, c("GENEID", "TXID"))
+    expect_equal(unname(Test), c("gene_id", "tx_id"))
+    suppressWarnings(
+        Test <- ensembldb:::ensDbColumnForColumn(edb, c("GENEID", "TXID", "bla"))
+    )
+    expect_equal(unname(Test), c("gene_id", "tx_id"))
+})
+
+test_that("keys works", {
+    ## get all gene ids
+    ids <- keys(edb, "GENEID")
+    expect_true(length(ids) > 0)
+    expect_equal(length(ids), length(unique(ids)))
+    ## get all tx ids
+    ids <- keys(edb, "TXID")
+    expect_true(length(ids) > 0)
+    ## Get the TXNAME...
+    nms <- keys(edb, "TXNAME")
+    expect_equal(nms, ids)
+    expect_equal(length(ids), length(unique(ids)))
+    ## get all gene names
+    ids <- keys(edb, "GENENAME")
+    expect_true(length(ids) > 0)
+    expect_equal(length(ids), length(unique(ids)))
+    ## get all seq names
+    ids <- keys(edb, "SEQNAME")
+    expect_true(length(ids) > 0)
+    expect_equal(length(ids), length(unique(ids)))
+    ## get all seq strands
+    ids <- keys(edb, "SEQSTRAND")
+    expect_true(length(ids) > 0)
+    expect_equal(length(ids), length(unique(ids)))
+    ## get all gene biotypes
+    ids <- keys(edb, "GENEBIOTYPE")
+    expect_true(length(ids) > 0)
+    expect_equal(ids, listGenebiotypes(edb))
+    ## Now with protein data.
+    if (hasProteinData(edb)) {
+        library(RSQLite)
+        ls <- keys(edb, "PROTEINID")
+        ls_2 <- dbGetQuery(dbconn(edb),
+                           "select distinct protein_id from protein")$protein_id
+        expect_equal(sort(ls), sort(ls_2))
+        ##
+        ks <- keys(edb, "UNIPROTID")
+        ks_2 <- dbGetQuery(dbconn(edb),
+                           "select distinct uniprot_id from uniprot")$uniprot_id
+        expect_equal(sort(ks), sort(ks_2))
+        ##
+        ks <- keys(edb, "PROTEINDOMAINID")
+        ks_2 <- dbGetQuery(dbconn(edb),
+                           paste0("select distinct protein_domain_id from",
+                                  " protein_domain"))$protein_domain_id
+        expect_equal(sort(ks), sort(ks_2))
+    }
+    ## keys with filter:
+    res <- keys(edb, "GENENAME", filter = ~ genename == "BCL2")
+    expect_equal(res, "BCL2")
+})
+
+test_that("select method works", {
+    .comprehensiveCheckForGene <- function(x) {
+        ##   Check if we've got all of the transcripts.
+        txs <- dbGetQuery(
+            dbconn(edb),
+            paste0("select tx_id from tx where gene_id = '",
+                   x$GENEID[1], "';"))
+        expect_equal(sort(txs$tx_id), sort(unique(x$TXID)))
+        ##   Check if we've got all exons.
+        exs <- dbGetQuery(
+            dbconn(edb),
+            paste0("select exon_id from tx2exon where tx_id in (",
+                   paste0("'", txs$tx_id, "'", collapse = ", "),")"))
+        a <- sort(unique(exs$exon_id))
+        b <- sort(unique(x$EXONID))
+        a <- a[!is.na(a)]
+        b <- b[!is.na(b)]
+        expect_equal(a, b)
+        if (hasProteinData(edb)) {
+            ##  Check if we've got all proteins
+            prt <- dbGetQuery(
+                dbconn(edb),
+                paste0("select protein_id from protein where tx_id in (",
+                       paste0("'", txs$tx_id, "'", collapse = ", "), ")"))
+            a <- sort(prt$protein_id)
+            b <- sort(unique(x$PROTEINID))
+            a <- a[!is.na(a)]
+            b <- b[!is.na(b)]
+            expect_equal(a, b)
+            ##  Check if we've got all uniprots.
+            res <- dbGetQuery(
+                dbconn(edb),
+                paste0("select uniprot_id from uniprot where ",
+                       "protein_id in (", paste0("'", prt$protein_id,
+                                                 "'", collapse = ", ") ,")"))
+            a <- sort(unique(res$uniprot_id))
+            b <- sort(unique(x$UNIPROTID))
+            a <- a[!is.na(a)]
+            b <- b[!is.na(b)]
+            expect_equal(a, b)
+            ##  Check if we've got all protein_domains.
+            res <- dbGetQuery(
+                dbconn(edb),
+                paste0("select protein_domain_id from protein_domain ",
+                       "where protein_id in (",
+                       paste0("'", prt$protein_id,
+                              "'", collapse = ", "), ")"))
+            a <- sort(unique(res$protein_domain_id))
+            b <- sort(unique(x$PROTEINDOMAINID))
+            a <- a[!is.na(a)]
+            b <- b[!is.na(b)]
+            expect_equal(a, b)
+        }
+    }
+
+    library(RSQLite)
+    ## 1) Test:
+    ##   Provide GenenameFilter.
+    gf <- GenenameFilter("BCL2")
+    Test <- select(edb, keys = gf)
+    expect_true(all(Test$GENENAME == "BCL2"))
+    .comprehensiveCheckForGene(Test)
+    Test2 <- select(edb, keys = ~ symbol == "BCL2")
+    expect_equal(Test, Test2)
+    ## ZBTB16
+    tmp <- select(edb, keys = GenenameFilter("ZBTB16"))
+    .comprehensiveCheckForGene(tmp)
+    ## BCL2L11
+    tmp <- select(edb, keys = GenenameFilter("BCL2L11"))
+    .comprehensiveCheckForGene(tmp)
+    ## NR3C1
+    tmp <- select(edb, keys = GenenameFilter("NR3C1"))
+    .comprehensiveCheckForGene(tmp)
+    ## Combine GenenameFilter and TxBiotypeFilter.
+    Test2 <- select(edb, keys = ~ symbol == "BCL2" &
+                             tx_biotype == "protein_coding")
+    expect_equal(Test$EXONID[Test$TXBIOTYPE == "protein_coding"], Test2$EXONID)
+    ## Choose selected columns.
+    Test3 <- select(edb, keys = gf, columns = c("GENEID", "GENENAME", "SEQNAME"))
+    expect_equal(unique(Test[, c("GENEID", "GENENAME", "SEQNAME")]), Test3)
+    ## Provide keys.
+    Test4 <- select(edb, keys = "BCL2", keytype = "GENENAME")
+    expect_equal(Test[, colnames(Test4)], Test4)
+    gns <- keys(edb, "GENEID")
+    ## Just get stuff from the tx table; should be faster.
+    Test <- select(edb, keys = gns, columns = c("GENEID", "SEQNAME"),
+                   keytype = "GENEID")
+    expect_equal(all(Test$GENEID == gns), TRUE)
+    ## Get all lincRNA genes
+    Test <- select(edb, keys = "lincRNA", columns = c("GENEID", "GENEBIOTYPE",
+                                                      "GENENAME"),
+                   keytype = "GENEBIOTYPE")
+    Test2 <- select(edb, keys = GeneBiotypeFilter("lincRNA"),
+                    columns = c("GENEID", "GENEBIOTYPE", "GENENAME"))
+    expect_equal(Test[, colnames(Test2)], Test2)
+    ## All on chromosome 21
+    Test <- select(edb, keys = "21", columns = c("GENEID", "GENEBIOTYPE",
+                                                 "GENENAME"),
+                   keytype = "SEQNAME")
+    Test2 <- select(edb, keys = ~ seq_name == "21",
+                    columns = c("GENEID", "GENEBIOTYPE", "GENENAME"))
+    expect_equal(Test[, colnames(Test2)], Test2)
+    ## What if we can't find it?
+    Test <- select(edb, keys = "bla", columns = c("GENEID", "GENENAME"),
+                   keytype = "GENENAME")
+    expect_equal(colnames(Test), c("GENEID", "GENENAME"))
+    expect_true(nrow(Test) == 0)
+    ## TXNAME
+    Test <- select(edb, keys = "ENST00000000233",
+                   columns = c("GENEID", "GENENAME"), keytype = "TXNAME")
+    expect_equal(Test$TXNAME, "ENST00000000233")
+    ## Check what happens if we just add TXNAME and also TXID.
+    Test2 <- select(edb, keys = list(gf, TxBiotypeFilter("protein_coding")),
+                    columns = c("TXID", "TXNAME", "GENENAME", "GENEID"))
+    expect_equal(colnames(Test2), c("TXID", "TXNAME", "GENENAME", "GENEID",
+                                    "TXBIOTYPE"))
+    ## Protein stuff.
+    if (hasProteinData(edb)) {
+        ## Test:
+        ## o if we're fetching with PROTEINID keys we're just getting protein
+        ##   coding tx, i.e. those with a tx_cds_seq_start not NULL AND we get
+        ##   also those with a uniprot ID null.
+        pids <- c("ENSP00000338157", "ENSP00000437716", "ENSP00000443013",
+                  "ENSP00000376721", "ENSP00000445047")
+        res <- select(edb, keys = pids, keytype = "PROTEINID",
+                      columns = c("TXID", "TXCDSSEQSTART", "TXBIOTYPE",
+                                  "PROTEINID", "UNIPROTID", "PROTEINDOMAINID"))
+        expect_equal(sort(pids), sort(unique(res$PROTEINID)))
+        res_2 <- select(edb, keys = ProteinIdFilter(pids),
+                        columns = c("TXID", "TXCDSSEQSTART", "TXBIOTYPE",
+                                    "PROTEINID", "UNIPROTID", "PROTEINDOMAINID"))
+        expect_equal(sort(pids), sort(unique(res_2$PROTEINID)))
+        expect_equal(res, res_2)
+        expect_true(all(!is.na(res$TXCDSSEQSTART)))
+        ## Do we have all of the uniprot ids?
+        tmp <- dbGetQuery(dbconn(edb),
+                          paste0("select uniprot_id from uniprot where ",
+                                 "protein_id in (",
+                                 paste0("'", pids,"'", collapse = ", "),")"))
+        a <- sort(unique(res$UNIPROTID))
+        b <- sort(unique(tmp$uniprot_id))
+        a <- a[!is.na(a)]
+        b <- b[!is.na(b)]
+        expect_equal(a, b)
+        ## Do we have all protein domain ids?
+        tmp <- dbGetQuery(dbconn(edb),
+                          paste0("select protein_domain_id from protein_domain ",
+                                 "where protein_id in (",
+                                 paste0("'", pids,"'", collapse = ", "),")"))
+        a <- sort(unique(res$PROTEINDOMAINID))
+        b <- sort(unique(tmp$protein_domain_id))
+        a <- a[!is.na(a)]
+        b <- b[!is.na(b)]
+        expect_equal(a, b)
+
+        ## o if we're fetching with uniprot and protein id filter we get all
+        ##   even if they don't have a protein domain.
+        upids <- c("ZBT16_HUMAN", "Q71UL7_HUMAN", "Q71UL6_HUMAN", "Q71UL5_HUMAN")
+        res <- select(edb, keys = upids, keytype = "UNIPROTID",
+                      columns = c("PROTEINID", "UNIPROTID", "PROTEINDOMAINID"))
+    }
+})
+
+test_that("mapIds works", {
+    ## Simple... map gene ids to gene names
+    allgenes <- keys(edb, keytype = "GENEID")
+    randordergenes <- allgenes[sample(1:length(allgenes), 100)]
+    mi <- mapIds(edb, keys = allgenes, keytype = "GENEID", column = "GENENAME")
+    expect_equal(allgenes, names(mi))
+    ## Ordering should always match the ordering of the input:
+    mi <- mapIds(edb, keys = randordergenes, keytype = "GENEID",
+                 column = "GENENAME")
+    expect_equal(randordergenes, names(mi))
+    ## Handle multi mappings.
+    ## o first
+    first <- mapIds(edb, keys = randordergenes, keytype = "GENEID",
+                    column = "TXID")
+    expect_equal(names(first), randordergenes)
+    ## o list
+    lis <- mapIds(edb, keys = randordergenes, keytype = "GENEID",
+                  column = "TXID", multiVals = "list")
+    expect_equal(names(lis), randordergenes)
+    Test <- lapply(lis, function(z){return(z[1])})
+    expect_equal(first, unlist(Test))
+    ## o filter
+    filt <- mapIds(edb, keys = randordergenes, keytype = "GENEID",
+                   column = "TXID", multiVals = "filter")
+    expect_equal(filt, unlist(lis[unlist(lapply(lis, length)) == 1]))
+    ## o asNA
+    asNA <- mapIds(edb, keys = randordergenes, keytype = "GENEID",
+                   column = "TXID", multiVals = "asNA")
+    ## Check what happens if we provide 2 identical keys.
+    Test <- mapIds(edb, keys = c("BCL2", "BCL2L11", "BCL2"),
+                   keytype = "GENENAME", column = "TXID")
+    expect_equal(names(Test), c("BCL2", "BCL2L11", "BCL2"))
+    expect_true(length(unique(Test)) == 2)
+    ## Submit Filter:
+    Test <- mapIds(edb, keys = SeqNameFilter("Y"), column = "GENEID",
+                   multiVals = "list")
+    TestS <- select(edb, keys = Test[[1]], columns = "SEQNAME",
+                    keytype = "GENEID")
+    expect_equal(unique(TestS$SEQNAME), "Y")
+    ## Submit 2 filter.LLLLL
+    Test <- mapIds(edb, keys = ~ seq_name == "Y" & seq_strand == "-",
+                   multiVals = "list", column = "GENEID")
+    TestS <- select(edb, keys = Test[[1]], keytype = "GENEID",
+                    columns = c("SEQNAME", "SEQSTRAND"))
+    expect_true(all(TestS$SEQNAME == "Y"))
+    expect_true(all(TestS$SEQSTRAND == -1))
+
+    ## Now using protein annotations:
+    if (hasProteinData(edb)) {
+        library(RSQLite)
+        txids <- keys(edb, keytype = "TXID", filter = GenenameFilter("ZBTB16"))
+        mapd <- mapIds(edb, keys = txids, keytype = "TXID", column = "GENENAME")
+        expect_equal(names(mapd), txids)
+        expect_true(all(mapd == "ZBTB16"))
+        ## Map to protein ids.
+        mapd <- mapIds(edb, keys = txids, keytype = "TXID", column = "PROTEINID")
+        res <- dbGetQuery(dbconn(edb),
+                          paste0("select protein_id from protein where tx_id in",
+                                 " (", paste0("'", txids,"'", collapse = ", "),
+                                 ")"))
+        pids <- mapd[!is.na(mapd)]
+        expect_true(all(pids %in% res$protein_id))
+        ## multi-mapping:
+        ## proteins and uniprot.
+        mapd <- mapIds(edb, keys = pids, keytype = "PROTEINID",
+                       column = "UNIPROTID", multiVals = "list")
+        mapd <- mapd[!is.na(mapd)]
+        res <- dbGetQuery(dbconn(edb),
+                          paste0("select protein_id, uniprot_id from uniprot ",
+                                 "where protein_id in (",
+                                 paste0("'", pids, "'", collapse = ", "), ")"))
+        res <- split(res$uniprot_id, res$protein_id)
+        expect_equal(mapd, res[names(mapd)])
+        ## Just to ensure:
+        tmp <- proteins(edb, filter = ProteinIdFilter(pids),
+                        columns = c("uniprot_id", "protein_id"))
+        upids <- tmp$uniprot_id[!is.na(tmp$uniprot_id)]
+        expect_true(all(res$uniprot_id %in% upids))
+        ## map protein ids to gene name
+        mapd <- mapIds(edb, keys = pids, keytype = "PROTEINID",
+                       column = "GENENAME")
+        expect_true(all(mapd == "ZBTB16"))
+    }
+})
+
+## Test if the results are properly sorted if we submit a single filter or just keys.
+test_that("select results are properly sorted", {
+    ks <- c("ZBTB16", "BCL2", "SKA2", "BCL2L11")
+    ## gene_id
+    res <- select(edb, keys = ks, keytype = "GENENAME")
+    expect_equal(unique(res$GENENAME), ks)
+    res <- select(edb, keys = GenenameFilter(ks))
+    expect_equal(unique(res$GENENAME), ks)
+    ## Using two filters;
+    res <- select(edb, keys = list(GenenameFilter(ks),
+                                   TxBiotypeFilter("nonsense_mediated_decay")))
+    ## We don't expect same sorting here!
+    expect_true(!all(unique(res$GENENAME) == ks[ks %in% unique(res$GENENAME)]))
+    res2 <- select(edb, keys = ~ genename == ks &
+                            tx_biotype == "nonsense_mediated_decay")
+    expect_equal(res, res2)
+    ## symbol
+    res <- select(edb, keys = ks, keytype = "SYMBOL",
+                  columns = c("GENENAME", "SYMBOL", "SEQNAME"))
+    expect_equal(res$SYMBOL, ks)
+    expect_equal(res$GENENAME, ks)
+    ## tx_biotype
+    ks <- c("retained_intron", "nonsense_mediated_decay")
+    res <- select(edb, keys = ks, keytype = "TXBIOTYPE",
+                  columns = c("GENENAME", "TXBIOTYPE"))
+    expect_equal(unique(res$TXBIOTYPE), ks)
+    res <- select(edb, keys = TxBiotypeFilter(ks),
+                  keytype = "TXBIOTYPE", columns = c("GENENAME", "TXBIOTYPE"))
+    expect_equal(unique(res$TXBIOTYPE), ks)
+})
+
+test_that("select works with symbol as keytype", {
+    ks <- c("ZBTB16", "BCL2", "SKA2", "BCL2L11")
+    res <- select(edb, keys = ks, keytype = "GENENAME")
+    res2 <- select(edb, keys = ks, keytype = "SYMBOL")
+    expect_equal(res, res2)
+    res <- select(edb, keys = GenenameFilter(ks),
+                  columns = c("TXNAME", "SYMBOL", "GENEID"))
+    expect_equal(colnames(res), c("TXNAME", "SYMBOL", "GENEID", "GENENAME"))
+    res <- select(edb, keys = ~ symbol == ks, columns=c("GENEID"))
+    expect_equal(colnames(res), c("GENEID", "SYMBOL"))
+    expect_equal(res$SYMBOL, ks)
+    res <- select(edb, keys = list(SeqNameFilter("Y"),
+                                   GeneBiotypeFilter("lincRNA")),
+                  columns = c("GENEID", "SYMBOL"))
+    expect_equal(colnames(res), c("GENEID", "SYMBOL", "SEQNAME", "GENEBIOTYPE"))
+    expect_true(all(res$SEQNAME == "Y"))
+})
+
+test_that("select works with txname", {
+    ## TXNAME as a column
+    ks <- c("ZBTB16", "BCL2", "SKA2")
+    res <- select(edb, keys = ks, keytype = "GENENAME", columns = c("TXNAME"))
+    expect_equal(colnames(res), c("GENENAME", "TXNAME"))
+})
+
+test_that(".keytype2FilterMapping works", {
+    ## Check whether or not we're getting protein columns.
+    res <- ensembldb:::.keytype2FilterMapping()
+    expect_equal(names(res),
+                 c("ENTREZID", "GENEID", "GENEBIOTYPE", "GENENAME", "TXID",
+                   "TXBIOTYPE", "EXONID", "SEQNAME", "SEQSTRAND", "TXNAME",
+                   "SYMBOL"))
+    res <- ensembldb:::.keytype2FilterMapping(TRUE)
+    expect_true(all(c("PROTEINID", "UNIPROTID", "PROTEINDOMAINID") %in%
+                    names(res)))
+})
+
+test_that("keytypes works", {
+    keyt <- c("ENTREZID", "GENEID", "GENEBIOTYPE", "GENENAME", "TXID",
+              "TXBIOTYPE", "EXONID", "SEQNAME", "SEQSTRAND", "TXNAME",
+              "SYMBOL")
+    res <- keytypes(edb)
+    if (hasProteinData(edb)) {
+        expect_equal(res, sort(c(keyt, "PROTEINID", "UNIPROTID",
+                                 "PROTEINDOMAINID")))
+    } else {
+        expect_equal(res, sort(keyt))
+    }
+})
+
+test_that("filterForKeytype works", {
+    res <- ensembldb:::filterForKeytype("SYMBOL")
+    expect_true(is(res, "SymbolFilter"))
+    if (hasProteinData(edb)) {
+        res <- ensembldb:::filterForKeytype("PROTEINDOMAINID", edb)
+        expect_true(is(res, "ProtDomIdFilter"))
+    }
+    res <- ensembldb:::filterForKeytype("TXID")
+    expect_true(is(res, "TxIdFilter"))
+})
diff --git a/tests/testthat/test_seqLevelStyle.R b/tests/testthat/test_seqLevelStyle.R
new file mode 100644
index 0000000..277fb8a
--- /dev/null
+++ b/tests/testthat/test_seqLevelStyle.R
@@ -0,0 +1,445 @@
+## Tests related to setting seqLevelsStyle
+
+test_that("seqlevelsStyle works", {
+    orig <- getOption("ensembldb.seqnameNotFound")
+    options(ensembldb.seqnameNotFound = NA)
+    edb <- EnsDb.Hsapiens.v75
+    SL <- seqlevels(edb)
+    ucscs <- paste0("chr", c(1:22, "X", "Y", "M"))
+    seqlevelsStyle(edb) <- "UCSC"
+    suppressWarnings(
+        SL2 <- seqlevels(edb)
+    )
+    expect_equal(sort(ucscs), sort(SL2[!is.na(SL2)]))
+    ## Check if we throw an error message
+    options(ensembldb.seqnameNotFound = "MISSING")
+    expect_error(seqlevels(edb))
+    ## Check if returning original names works.
+    options(ensembldb.seqnameNotFound = "ORIGINAL")
+    suppressWarnings(
+        SL3 <- seqlevels(edb)
+    )
+    idx <- which(SL3 %in% ucscs)
+    expect_equal(sort(SL[-idx]), sort(SL3[-idx]))
+    options(ensembldb.seqnameNotFound=orig)
+})
+
+test_that("seqinfo works with seqlevelsStyle", {
+    edb <- EnsDb.Hsapiens.v75
+    orig <- getOption("ensembldb.seqnameNotFound")
+    options(ensembldb.seqnameNotFound="MISSING")
+    seqlevelsStyle(edb) <- "UCSC"
+    expect_error(seqinfo(edb))
+    options(ensembldb.seqnameNotFound="ORIGINAL")
+    suppressWarnings(
+        si <- seqinfo(edb)
+    )
+    options(ensembldb.seqnameNotFound=orig)
+})
+
+test_that("getWhat works with seqlevelsStyle", {
+    orig <- getOption("ensembldb.seqnameNotFound")
+    edb <- EnsDb.Hsapiens.v75
+    seqlevelsStyle(edb) <- "Ensembl"
+    ensRes <- ensembldb:::getWhat(edb, columns=c("seq_name", "seq_strand"))
+    seqlevelsStyle(edb) <- "UCSC"
+    suppressWarnings(
+        ucscRes <- ensembldb:::getWhat(edb, columns=c("seq_name", "seq_strand"))
+    )
+    seqlevelsStyle(edb) <- "NCBI"
+    suppressWarnings(
+        ncbiRes <- ensembldb:::getWhat(edb, columns=c("seq_name", "seq_strand"))
+    )
+    options(ensembldb.seqnameNotFound=orig)
+})
+
+test_that("SeqNameFilter works with seqlevelsStyle", {
+    orig <- getOption("ensembldb.seqnameNotFound")
+    options(ensembldb.seqnameNotFound="MISSING")
+    edb <- EnsDb.Hsapiens.v75
+    seqlevelsStyle(edb) <- "Ensembl"
+    snf <- SeqNameFilter("chrX")
+    snfEns <- SeqNameFilter(c("X", "Y"))
+    snfNo <- SeqNameFilter(c("bla", "blu"))
+    snfSomeNo <- SeqNameFilter(c("bla", "X"))
+
+    seqlevelsStyle(edb) <- "Ensembl"
+    expect_equal(value(snf), "chrX")
+    ## That makes no sense for a query though.
+    expect_equal(value(snf), "chrX")
+    expect_equal(value(snfEns), c("X", "Y"))
+    seqlevelsStyle(edb) <- "UCSC"
+    expect_equal(ensembldb:::ensDbQuery(snf, edb), "gene.seq_name = 'X'")
+    expect_error(ensembldb:::ensDbQuery(snfEns, edb))
+    expect_error(ensembldb:::ensDbQuery(snfNo, edb))
+    expect_error(ensembldb:::ensDbQuery(snfSomeNo, edb))
+
+    ## Setting the options to "ORIGINAL"
+    options(ensembldb.seqnameNotFound="ORIGINAL")
+    expect_equal(ensembldb:::ensDbQuery(snf, edb), "gene.seq_name = 'X'")
+    suppressWarnings(
+        expect_equal(ensembldb:::ensDbQuery(snfEns, edb),
+                    "gene.seq_name in ('X','Y')")
+    )
+    suppressWarnings(
+        expect_equal(ensembldb:::ensDbQuery(snfNo, edb),
+                    "gene.seq_name in ('bla','blu')")
+    )
+    suppressWarnings(
+        expect_equal(ensembldb:::ensDbQuery(snfSomeNo, edb),
+                    "gene.seq_name in ('bla','X')")
+    )
+    ##
+    snf <- SeqNameFilter(c("chrX", "Y"))
+    suppressWarnings(
+        expect_equal(ensembldb:::ensDbQuery(snf, edb),
+                    "gene.seq_name in ('X','Y')")
+    )
+    options(ensembldb.seqnameNotFound=orig)
+})
+
+test_that("genes works with seqlevelsStyles", {
+    orig <- getOption("ensembldb.seqnameNotFound")
+    edb <- EnsDb.Hsapiens.v75
+    ## Here we want to test whether the result returned by the function does really
+    ## work when changing the seqnames.
+    seqlevelsStyle(edb) <- "Ensembl"
+    ensAll <- genes(edb)
+    ens21Y <- genes(edb, filter=SeqNameFilter(c("Y", "21")))
+    expect_equal(sort(as.character(unique(seqnames(ens21Y)))), c("21", "Y"))
+    gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
+    ensY <- genes(edb, filter=GRangesFilter(gr))
+    expect_equal(seqlevels(ensY), "Y")
+    expect_equal(unique(as.character(strand(ensY))), "+")
+
+    ## Check UCSC stuff
+    seqlevelsStyle(edb) <- "UCSC"
+    options(ensembldb.seqnameNotFound="ORIGINAL")
+    ## Just visually inspect the seqinfo and seqnames for the "all" query.
+    ucscAll <- genes(edb)
+    as.character(unique(seqnames(ucscAll)))
+    ucsc21Y <- genes(edb, filter=SeqNameFilter(c("chrY", "chr21")))
+    expect_equal(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
+    expect_equal(sort(names(ens21Y)), sort(names(ucsc21Y)))
+    ## GRangesFilter.
+    gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566), strand="+")
+    ucscY <- genes(edb, filter=GRangesFilter(gr))
+    expect_equal(seqlevels(ucscY), "chrY")
+    expect_equal(unique(as.character(strand(ucscY))), "+")
+    expect_equal(sort(names(ensY)), sort(names(ucscY)))
+    options(ensembldb.seqnameNotFound=orig)
+})
+
+test_that("transcripts works with seqlevelsStyle", {
+    orig <- getOption("ensembldb.seqnameNotFound")
+    edb <- EnsDb.Hsapiens.v75
+    seqlevelsStyle(edb) <- "Ensembl"
+    ens21Y <- transcripts(edb, filter = SeqNameFilter(c("Y", "21")))
+    expect_equal(sort(seqlevels(ens21Y)), c("21", "Y"))
+    gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
+    ensY <- transcripts(edb, filter=GRangesFilter(gr))
+    expect_equal(seqlevels(ensY), "Y")
+    expect_equal(unique(as.character(strand(ensY))), "+")
+
+    ## Check UCSC stuff
+    seqlevelsStyle(edb) <- "UCSC"
+    options(ensembldb.seqnameNotFound="ORIGINAL")
+    ucsc21Y <- transcripts(edb, filter=SeqNameFilter(c("chrY", "chr21")))
+    expect_equal(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
+    expect_equal(sort(names(ens21Y)), sort(names(ucsc21Y)))
+    ## GRangesFilter.
+    gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566),
+                  strand="+")
+    ucscY <- transcripts(edb, filter=GRangesFilter(gr))
+    expect_equal(seqlevels(ucscY), "chrY")
+    expect_equal(unique(as.character(strand(ucscY))), "+")
+    expect_equal(sort(names(ensY)), sort(names(ucscY)))
+    options(ensembldb.seqnameNotFound=orig)
+})
+
+test_that("transcriptsBy works with seqlevelsStyle", {
+    orig <- getOption("ensembldb.seqnameNotFound")
+    edb <- EnsDb.Hsapiens.v75
+    seqlevelsStyle(edb) <- "Ensembl"
+    ens21Y <- transcriptsBy(edb, filter=SeqNameFilter(c("Y", "21")))
+    expect_equal(sort(seqlevels(ens21Y)), c("21", "Y"))
+    gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
+    ensY <- transcriptsBy(edb, filter=GRangesFilter(gr))
+    expect_equal(seqlevels(ensY), "Y")
+    suppressWarnings(
+        expect_equal(unique(as.character(unlist(strand(ensY)))), "+")
+    )
+    
+    ## Check UCSC stuff
+    seqlevelsStyle(edb) <- "UCSC"
+    options(ensembldb.seqnameNotFound="ORIGINAL")
+    ucsc21Y <- transcriptsBy(edb, filter=SeqNameFilter(c("chrY", "chr21")))
+    expect_equal(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
+    expect_equal(sort(names(ens21Y)), sort(names(ucsc21Y)))
+    ## GRangesFilter.
+    gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566), strand="+")
+    ucscY <- transcriptsBy(edb, filter=GRangesFilter(gr))
+    expect_equal(seqlevels(ucscY), "chrY")
+    suppressWarnings(
+        expect_equal(unique(as.character(unlist(strand(ucscY)))), "+")
+    )
+    expect_equal(sort(names(ensY)), sort(names(ucscY)))
+    options(ensembldb.seqnameNotFound=orig)
+})
+
+test_that("exons works with seqlevelsStyle", {
+    orig <- getOption("ensembldb.seqnameNotFound")
+    edb <- EnsDb.Hsapiens.v75
+    seqlevelsStyle(edb) <- "Ensembl"
+    ens21Y <- exons(edb, filter=SeqNameFilter(c("Y", "21")))
+    expect_equal(sort(seqlevels(ens21Y)), c("21", "Y"))
+    gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
+    ensY <- exons(edb, filter=GRangesFilter(gr))
+    expect_equal(seqlevels(ensY), "Y")
+    expect_equal(unique(as.character(strand(ensY))), "+")
+
+    ## Check UCSC stuff
+    seqlevelsStyle(edb) <- "UCSC"
+    options(ensembldb.seqnameNotFound="ORIGINAL")
+    ucsc21Y <- exons(edb, filter=SeqNameFilter(c("chrY", "chr21")))
+    expect_equal(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
+    expect_equal(sort(names(ens21Y)), sort(names(ucsc21Y)))
+    ## GRangesFilter.
+    gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566), strand="+")
+    ucscY <- exons(edb, filter=GRangesFilter(gr))
+    expect_equal(seqlevels(ucscY), "chrY")
+    expect_equal(unique(as.character(strand(ucscY))), "+")
+    expect_equal(sort(names(ensY)), sort(names(ucscY)))
+    options(ensembldb.seqnameNotFound=orig)
+})
+
+test_that("exonsBy works with seqlevelsStyle", {
+    orig <- getOption("ensembldb.seqnameNotFound")
+    edb <- EnsDb.Hsapiens.v75
+    seqlevelsStyle(edb) <- "Ensembl"
+    ens21Y <- exonsBy(edb, filter=SeqNameFilter(c("Y", "21")))
+    expect_equal(sort(seqlevels(ens21Y)), c("21", "Y"))
+    gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
+    ensY <- exonsBy(edb, filter=GRangesFilter(gr))
+    expect_equal(seqlevels(ensY), "Y")
+    suppressWarnings(
+        expect_equal(unique(as.character(unlist(strand(ensY)))), "+")
+    )
+    ## Check UCSC stuff
+    seqlevelsStyle(edb) <- "UCSC"
+    options(ensembldb.seqnameNotFound="ORIGINAL")
+    ucsc21Y <- exonsBy(edb, filter=SeqNameFilter(c("chrY", "chr21")))
+    expect_equal(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
+    expect_equal(sort(names(ens21Y)), sort(names(ucsc21Y)))
+    ## GRangesFilter.
+    gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566), strand="+")
+    ucscY <- exonsBy(edb, filter=GRangesFilter(gr))
+    expect_equal(seqlevels(ucscY), "chrY")
+    suppressWarnings(
+        expect_equal(unique(as.character(unlist(strand(ucscY)))), "+")
+    )
+    expect_equal(sort(names(ensY)), sort(names(ucscY)))
+    options(ensembldb.seqnameNotFound=orig)
+})
+
+test_that("cdsBy works with seqlevelsStyle", {
+    orig <- getOption("ensembldb.seqnameNotFound")
+    edb <- EnsDb.Hsapiens.v75
+    seqlevelsStyle(edb) <- "Ensembl"
+    ens21Y <- cdsBy(edb, filter=SeqNameFilter(c("Y", "21")))
+    expect_equal(sort(seqlevels(ens21Y)), c("21", "Y"))
+    gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
+    ensY <- cdsBy(edb, filter=GRangesFilter(gr))
+    expect_equal(seqlevels(ensY), "Y")
+    suppressWarnings(
+        expect_equal(unique(as.character(unlist(strand(ensY)))), "+")
+    )
+    ## Check UCSC stuff
+    seqlevelsStyle(edb) <- "UCSC"
+    options(ensembldb.seqnameNotFound="ORIGINAL")
+    ucsc21Y <- cdsBy(edb, filter=SeqNameFilter(c("chrY", "chr21")))
+    expect_equal(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
+    expect_equal(sort(names(ens21Y)), sort(names(ucsc21Y)))
+    ## GRangesFilter.
+    gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566), strand="+")
+    ucscY <- cdsBy(edb, filter=GRangesFilter(gr))
+    expect_equal(seqlevels(ucscY), "chrY")
+    suppressWarnings(
+        expect_equal(unique(as.character(unlist(strand(ucscY)))), "+")
+    )
+    expect_equal(sort(names(ensY)), sort(names(ucscY)))
+    options(ensembldb.seqnameNotFound=orig)
+})
+
+test_that("threeUTRsByTranscript works with seqlevelsStyle", {
+    orig <- getOption("ensembldb.seqnameNotFound")
+    edb <- EnsDb.Hsapiens.v75
+    seqlevelsStyle(edb) <- "Ensembl"
+    ens21Y <- threeUTRsByTranscript(edb, filter=SeqNameFilter(c("Y", "21")))
+    expect_equal(sort(seqlevels(ens21Y)), c("21", "Y"))
+    gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
+    ensY <- threeUTRsByTranscript(edb, filter=GRangesFilter(gr))
+    expect_equal(seqlevels(ensY), "Y")
+    suppressWarnings(
+        expect_equal(unique(as.character(unlist(strand(ensY)))), "+")
+    )
+    ## Check UCSC stuff
+    seqlevelsStyle(edb) <- "UCSC"
+    options(ensembldb.seqnameNotFound="ORIGINAL")
+    ucsc21Y <- threeUTRsByTranscript(edb, filter=SeqNameFilter(c("chrY", "chr21")))
+    expect_equal(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
+    expect_equal(sort(names(ens21Y)), sort(names(ucsc21Y)))
+    ## GRangesFilter.
+    gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566), strand="+")
+    ucscY <- threeUTRsByTranscript(edb, filter=GRangesFilter(gr))
+    expect_equal(seqlevels(ucscY), "chrY")
+    suppressWarnings(
+        expect_equal(unique(as.character(unlist(strand(ucscY)))), "+")
+    )
+    expect_equal(sort(names(ensY)), sort(names(ucscY)))
+    options(ensembldb.seqnameNotFound=orig)
+})
+
+test_that("fiveUTRsByTranscript works with seqlevelsStyle", {
+    orig <- getOption("ensembldb.seqnameNotFound")
+    edb <- EnsDb.Hsapiens.v75
+    seqlevelsStyle(edb) <- "Ensembl"
+    ens21Y <- fiveUTRsByTranscript(edb, filter=SeqNameFilter(c("Y", "21")))
+    expect_equal(sort(seqlevels(ens21Y)), c("21", "Y"))
+    gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
+    ensY <- fiveUTRsByTranscript(edb, filter=GRangesFilter(gr))
+    expect_equal(seqlevels(ensY), "Y")
+    suppressWarnings(
+        expect_equal(unique(as.character(unlist(strand(ensY)))), "+")
+    )
+    ## Check UCSC stuff
+    seqlevelsStyle(edb) <- "UCSC"
+    options(ensembldb.seqnameNotFound="ORIGINAL")
+    ucsc21Y <- fiveUTRsByTranscript(edb, filter=SeqNameFilter(c("chrY", "chr21")))
+    expect_equal(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
+    expect_equal(sort(names(ens21Y)), sort(names(ucsc21Y)))
+    ## GRangesFilter.
+    gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566), strand="+")
+    ucscY <- fiveUTRsByTranscript(edb, filter=GRangesFilter(gr))
+    expect_equal(seqlevels(ucscY), "chrY")
+    suppressWarnings(
+        expect_equal(unique(as.character(unlist(strand(ucscY)))), "+")
+    )
+    expect_equal(sort(names(ensY)), sort(names(ucscY)))
+    options(ensembldb.seqnameNotFound=orig)
+})
+
+test_that("seting and getting seqlevelsStyle works", {
+    edb <- EnsDb.Hsapiens.v75
+    ## Testing the getter/setter for the seqlevelsStyle.
+    expect_equal(seqlevelsStyle(edb), "Ensembl")
+    expect_equal(NA, ensembldb:::getProperty(edb, "seqlevelsStyle"))
+
+    seqlevelsStyle(edb) <- "Ensembl"
+    expect_equal(seqlevelsStyle(edb), "Ensembl")
+    expect_equal("Ensembl", ensembldb:::getProperty(edb, "seqlevelsStyle"))
+
+    ## Try NCBI.
+    seqlevelsStyle(edb) <- "NCBI"
+    expect_equal(seqlevelsStyle(edb), "NCBI")
+
+    ## Try UCSC.
+    seqlevelsStyle(edb) <- "UCSC"
+    expect_equal(seqlevelsStyle(edb), "UCSC")
+
+    ## Error checking:
+    expect_error(seqlevelsStyle(edb) <- "bla")
+})
+
+test_that("formatting seqnames for query works with seqlevelsStyle", {
+    ## Testing if the formating/mapping between seqnames works as expected
+    ## We want to map anything TO Ensembl.
+    ## Check also the warning messages!
+    ucscs <- c("chr1", "chr3", "chr1", "chr9", "chrM", "chr1", "chrX")
+    enses <- c("1", "3", "1", "9", "MT", "1", "X")
+    ## reset
+    edb <- EnsDb.Hsapiens.v75
+    ## Shouldn't do anything here.
+    seqlevelsStyle(edb)
+    ensembldb:::dbSeqlevelsStyle(edb)
+    got <- ensembldb:::formatSeqnamesForQuery(edb, enses)
+    expect_equal(got, enses)
+    ## Change the seqlevels to UCSC
+    seqlevelsStyle(edb) <- "UCSC"
+    ## If ifNotFound is not specified we suppose to get an error.
+    options(ensembldb.seqnameNotFound="MISSING")
+    expect_error(ensembldb:::formatSeqnamesForQuery(edb, enses))
+    ## With specifying ifNotFound
+    suppressWarnings(
+        got <- ensembldb:::formatSeqnamesForQuery(edb, enses, ifNotFound=NA)
+    )
+    expect_equal(all(is.na(got)), TRUE)
+    ## Same by setting the option
+    options(ensembldb.seqnameNotFound=NA)
+    suppressWarnings(
+        got <- ensembldb:::formatSeqnamesForQuery(edb, enses)
+    )
+    expect_equal(all(is.na(got)), TRUE)
+
+    ## Now the working example:
+    got <- ensembldb:::formatSeqnamesForQuery(edb, ucscs)
+    expect_equal(got, enses)
+    ## What if one is not mappable:
+    suppressWarnings(
+        got <- ensembldb:::formatSeqnamesForQuery(edb, c(ucscs, "asdfd"),
+                                                  ifNotFound=NA)
+    )
+    expect_equal(got, c(enses, NA))
+})
+
+test_that("formating seqnames from query works with seqlevelsStyle", {
+    ucscs <- c("chr1", "chr3", "chr1", "chr9", "chrM", "chr1", "chrX")
+    enses <- c("1", "3", "1", "9", "MT", "1", "X")
+    edb <- EnsDb.Hsapiens.v75
+    ## Shouldn't do anything here.
+    seqlevelsStyle(edb)
+    ensembldb:::dbSeqlevelsStyle(edb)
+    got <- ensembldb:::formatSeqnamesFromQuery(edb, enses)
+    expect_equal(got, enses)
+    ## Change the seqlevels to UCSC
+    seqlevelsStyle(edb) <- "UCSC"
+    ## If ifNotFound is not specified we suppose to get an error.
+    options(ensembldb.seqnameNotFound="MISSING")
+    expect_error(ensembldb:::formatSeqnamesFromQuery(edb, ucsc))
+    ## With specifying ifNotFound
+    suppressWarnings(
+        got <- ensembldb:::formatSeqnamesFromQuery(edb, ucscs, ifNotFound=NA)
+    )
+    expect_equal(all(is.na(got)), TRUE)
+    ## Same using options
+    options(ensembldb.seqnameNotFound=NA)
+    suppressWarnings(
+        got <- ensembldb:::formatSeqnamesFromQuery(edb, ucscs, ifNotFound=NA)
+    )
+    expect_equal(all(is.na(got)), TRUE)
+    ## Now the working example:
+    got <- ensembldb:::formatSeqnamesFromQuery(edb, enses)
+    expect_equal(got, ucscs)
+    ## What if one is not mappable:
+    suppressWarnings(
+        got <- ensembldb:::formatSeqnamesFromQuery(edb, c(enses, "asdfd"),
+                                                   ifNotFound=NA)
+    )
+    expect_equal(got, c(ucscs, NA))
+    suppressWarnings(
+        got <- ensembldb:::formatSeqnamesFromQuery(edb, c(enses, "asdfd"))
+    )
+    expect_equal(got, c(ucscs, NA))
+})
+
+test_that("prefixChromName works", {
+    res <- ensembldb:::ucscToEns("chrY")
+    expect_equal(res, "Y")
+    res <- ensembldb:::prefixChromName("Y")
+    expect_equal(res, "Y")
+    useU <- getOption("ucscChromosomeNames", default = FALSE)
+    options(ucscChromosomeNames = TRUE)
+    res <- ensembldb:::prefixChromName("Y")
+    expect_equal(res, "chrY")
+    options(ucscChromosomeNames = useU)
+})
diff --git a/tests/testthat/test_validity.R b/tests/testthat/test_validity.R
new file mode 100644
index 0000000..b55a7ce
--- /dev/null
+++ b/tests/testthat/test_validity.R
@@ -0,0 +1,20 @@
+
+test_that("validity functions work", {
+    OK <- ensembldb:::dbHasRequiredTables(dbconn(edb))
+    expect_true(OK)
+    ## Check the tables
+    OK <- ensembldb:::dbHasValidTables(dbconn(edb))
+    expect_true(OK)
+})
+
+test_that("validateEnsDb works", {
+    expect_true(ensembldb:::validateEnsDb(edb))
+})
+
+test_that("compareProteins works", {
+    if (hasProteinData(edb)) {
+        res <- ensembldb:::compareProteins(edb, edb)
+        expect_equal(res, "OK")
+    }
+})
+
diff --git a/vignettes/MySQL-backend.Rmd b/vignettes/MySQL-backend.Rmd
index 0acd514..de512cb 100644
--- a/vignettes/MySQL-backend.Rmd
+++ b/vignettes/MySQL-backend.Rmd
@@ -1,21 +1,17 @@
 ---
 title: "Using a MySQL server backend"
-graphics: yes
+author: "Johannes Rainer"
+package: ensembldb
 output:
-  BiocStyle::html_document2
+  BiocStyle::html_document2:
+    toc_float: true
 vignette: >
   %\VignetteIndexEntry{Using a MySQL server backend}
   %\VignetteEngine{knitr::rmarkdown}
   %\VignetteEncoding{UTF-8}
   %\VignetteDepends{ensembldb,EnsDb.Hsapiens.v75,BiocStyle}
-  %\VignettePackage{ensembldb}
-  %\VignetteKeywords{annotation,database}
 ---
 
-**Package**: `r BiocStyle::Biocpkg("ensembldb")`<br />
-**Authors**: `r packageDescription("ensembldb")$Author`<br />
-**Modified**: 20 September, 2016<br />
-**Compiled**: `r date()`
 
 # Introduction
 
@@ -31,12 +27,13 @@ the individual clients.
 **Note** the code in this document is not executed during vignette generation as
 this would require access to a MySQL server.
 
+
 # Using `ensembldb` with a MySQL server
 
 Installation of `EnsDb` databases in a MySQL server is straight forward - given
 that the user has write access to the server:
 
-```{r eval=FALSE}
+```{r eval = FALSE}
 library(ensembldb)
 ## Load the EnsDb package that should be installed on the MySQL server
 library(EnsDb.Hsapiens.v75)
@@ -55,7 +52,7 @@ R-package, the connection to the database can be passed to the `EnsDb` construct
 function. With the resulting `EnsDb` object annotations can be retrieved from the
 MySQL database.
 
-```{r eval=FALSE}
+```{r eval = FALSE}
 library(ensembldb)
 library(RMySQL)
 
@@ -72,3 +69,4 @@ dbcon <- dbConnect(MySQL(), host = "localhost", user = "readonly",
 edb <- EnsDb(dbcon)
 edb
 ```
+
diff --git a/vignettes/MySQL-backend.org b/vignettes/MySQL-backend.org
index 50dd77d..9b0d164 100644
--- a/vignettes/MySQL-backend.org
+++ b/vignettes/MySQL-backend.org
@@ -2,33 +2,24 @@
 #+AUTHOR:    Johannes Rainer
 #+EMAIL:     johannes.rainer at eurac.edu
 #+OPTIONS: ^:{} toc:nil
-#+PROPERTY: exports code
-#+PROPERTY: session *R*
+#+PROPERTY: header-args :exports code
+#+PROPERTY: header-args :session *R*
 
-#+BEGIN_html
+#+BEGIN_EXPORT html
 ---
 title: "Using a MySQL server backend"
-graphics: yes
+author: "Johannes Rainer"
+package: ensembldb
 output:
-  BiocStyle::html_document2
+  BiocStyle::html_document2:
+    toc_float: true
 vignette: >
   %\VignetteIndexEntry{Using a MySQL server backend}
   %\VignetteEngine{knitr::rmarkdown}
   %\VignetteEncoding{UTF-8}
   %\VignetteDepends{ensembldb,EnsDb.Hsapiens.v75,BiocStyle}
-  %\VignettePackage{ensembldb}
-  %\VignetteKeywords{annotation,database}
 ---
-#+END_html
-
-# #+BEGIN_EXPORT html
-
-#+BEGIN_html
-**Package**: `r BiocStyle::Biocpkg("ensembldb")`<br />
-**Authors**: `r packageDescription("ensembldb")$Author`<br />
-**Modified**: 20 September, 2016<br />
-**Compiled**: `r date()`
-#+END_html
+#+END_EXPORT
 
 ** Introduction
 
@@ -49,7 +40,7 @@ this would require access to a MySQL server.
 Installation of =EnsDb= databases in a MySQL server is straight forward - given
 that the user has write access to the server:
 
-#+BEGIN_SRC R :ravel eval=FALSE
+#+BEGIN_SRC R :ravel eval = FALSE
   library(ensembldb)
   ## Load the EnsDb package that should be installed on the MySQL server
   library(EnsDb.Hsapiens.v75)
@@ -68,7 +59,7 @@ R-package, the connection to the database can be passed to the =EnsDb= construct
 function. With the resulting =EnsDb= object annotations can be retrieved from the
 MySQL database.
 
-#+BEGIN_SRC R :ravel eval=FALSE
+#+BEGIN_SRC R :ravel eval = FALSE
   library(ensembldb)
   library(RMySQL)
 
diff --git a/vignettes/ensembldb.Rmd b/vignettes/ensembldb.Rmd
index 44420d6..7bbf10c 100644
--- a/vignettes/ensembldb.Rmd
+++ b/vignettes/ensembldb.Rmd
@@ -1,37 +1,33 @@
 ---
 title: "Generating an using Ensembl based annotation packages"
+author: "Johannes Rainer"
 graphics: yes
+package: ensembldb
 output:
-  BiocStyle::html_document2
+  BiocStyle::html_document2:
+    toc_float: true
 vignette: >
   %\VignetteIndexEntry{Generating an using Ensembl based annotation packages}
   %\VignetteEngine{knitr::rmarkdown}
   %\VignetteEncoding{UTF-8}
-  %\VignetteDepends{ensembldb,EnsDb.Hsapiens.v75,Gviz,BiocStyle}
-  %\VignettePackage{ensembldb}
-  %\VignetteKeywords{annotation,database}
+  %\VignetteDepends{ensembldb,EnsDb.Hsapiens.v75,BiocStyle,AnnotationHub,ggbio,Gviz}
 ---
 
-**Package**: `r BiocStyle::Biocpkg("ensembldb")`<br />
-**Authors**: `r packageDescription("ensembldb")$Author`<br />
-**Modified**: 12 September, 2016<br />
-**Compiled**: `r date()`
 
 # Introduction
 
 The `ensembldb` package provides functions to create and use transcript centric
 annotation databases/packages. The annotation for the databases are directly
-fetched from Ensembl <sup><a id="fnr.1" class="footref" href="#fn.1">1</a></sup> using their Perl API.  The functionality and data is
-similar to that of the `TxDb` packages from the `GenomicFeatures` package, but,
-in addition to retrieve all gene/transcript models and annotations from the
+fetched from Ensembl <sup><a id="fnr.1" class="footref" href="#fn.1">1</a></sup> using their Perl API. The functionality and data is
+similar to that of the `TxDb` packages from the `GenomicFeatures` package, but, in
+addition to retrieve all gene/transcript models and annotations from the
 database, the `ensembldb` package provides also a filter framework allowing to
 retrieve annotations for specific entries like genes encoded on a chromosome
-region or transcript models of lincRNA genes.  In the databases, along with the
-gene and transcript models and their chromosomal coordinates, additional
-annotations including the gene name (symbol) and NCBI Entrezgene identifiers as
-well as the gene and transcript biotypes are stored too (see Section
-[11](#orgtarget1) for the database layout and an overview of available
-attributes/columns).
+region or transcript models of lincRNA genes. From version 1.7 on, `EnsDb`
+databases created by the `ensembldb` package contain also protein annotation data
+(see Section [11](#org35014ed) for the database layout and an overview of
+available attributes/columns). For more information on the use of the protein
+annotations refer to the *proteins* vignette.
 
 Another main goal of this package is to generate *versioned* annotation
 packages, i.e. annotation packages that are build for a specific Ensembl
@@ -43,10 +39,10 @@ also allows to load multiple annotation packages at the same time in order to
 e.g. compare gene models between Ensembl releases.
 
 In the example below we load an Ensembl based annotation package for Homo
-sapiens, Ensembl version 75. The connection to the database is bound to the
-variable `EnsDb.Hsapiens.v75`.
+sapiens, Ensembl version 75. The `EnsDb` object providing access to the underlying
+SQLite database is bound to the variable name `EnsDb.Hsapiens.v75`.
 
-```{r warning=FALSE, message=FALSE}
+```{r load-libs, warning=FALSE, message=FALSE}
 library(EnsDb.Hsapiens.v75)
 
 ## Making a "short cut"
@@ -54,72 +50,107 @@ edb <- EnsDb.Hsapiens.v75
 ## print some informations for this package
 edb
 
-## for what organism was the database generated?
+## For what organism was the database generated?
 organism(edb)
 ```
 
+```{r no-network, echo = FALSE, results = "hide"}
+## Disable code chunks that require network connection - conditionally
+## disable this on Windows only. This is to avoid TIMEOUT errors on the
+## Bioconductor Windows build maching (issue #47).
+use_network <- FALSE
+```
+
+
 # Using `ensembldb` annotation packages to retrieve specific annotations
 
-The `ensembldb` package provides a set of filter objects allowing to specify
-which entries should be fetched from the database. The complete list of filters,
-which can be used individually or can be combined, is shown below (in
-alphabetical order):
+One of the strengths of the `ensembldb` package and the related `EnsDb` databases is
+its implementation of a filter framework that enables to efficiently extract
+data sub-sets from the databases. The `ensembldb` package supports most of the
+filters defined in the `AnnotationFilter` Bioconductor package and defines some
+additional filters specific to the data stored in `EnsDb` databases. The
+`supportedFilters` method can be used to get an overview over all supported filter
+classes, each of them (except the `GRangesFilter`) working on a single
+column/field in the database.
+
+```{r filters}
+supportedFilters(edb)
+```
+
+These filters can be divided into 3 main filter types:
 
--   `ExonidFilter`: allows to filter the result based on the (Ensembl) exon
-    identifiers.
--   `ExonrankFilter`: filter results on the rank (index) of an exon within the
+-   `IntegerFilter`: filter classes extending this basic object can take a single
+    numeric value as input and support the conditions `=, !`, >, <, >= and <=. All
+    filters that work on chromosomal coordinates, such as the `GeneEndFilter` extend
+    `IntegerFilter`.
+-   `CharacterFilter`: filter classes extending this object can take a single or
+    multiple character values as input and allow conditions: `=, !`, "startsWith"
+    and "endsWith". All filters working on IDs extend this class.
+-   `GRangesFilter`: takes a `GRanges` object as input and supports all conditions
+    that `findOverlaps` from the `IRanges` package supports ("any", "start", "end",
+    "within", "equal"). Note that these have to be passed using the parameter `type`
+    to the constructor function.
+
+The supported filters are:
+
+-   `EntrezFilter`: allows to filter results based on NCBI Entrezgene
+    identifiers of the genes.
+-   `ExonEndFilter`: filter using the chromosomal end coordinate of exons.
+-   `ExonIdFilter`: filter based on the (Ensembl) exon identifiers.
+-   `ExonRankFilter`: filter based on the rank (index) of an exon within the
     transcript model. Exons are always numbered from 5' to 3' end of the
     transcript, thus, also on the reverse strand, the exon 1 is the most 5' exon
     of the transcript.
--   `EntrezidFilter`: allows to filter results based on NCBI Entrezgene
-    identifiers of the genes.
--   `GenebiotypeFilter`: allows to filter for the gene biotypes defined in the
-    Ensembl database; use the `listGenebiotypes` method to list all available
-    biotypes.
--   `GeneidFilter`: allows to filter based on the Ensembl gene IDs.
--   `GenenameFilter`: allows to filter based on the names (symbols) of the genes.
--   `SymbolFilter`: allows to filter on gene symbols; note that no database columns
-    *symbol* is available in an `EnsDb` database and hence the gene name is used for
-    filtering.
+-   `ExonStartFilter`: filter using the chromosomal start coordinate of exons.
+-   `GeneBiotypeFilter`: filter using the gene biotypes defined in the Ensembl
+    database; use the `listGenebiotypes` method to list all available biotypes.
+-   `GeneEndFilter`: filter using the chromosomal end coordinate of gene.
+-   `GeneIdFilter`: filter based on the Ensembl gene IDs.
+-   `GenenameFilter`: filter based on the names (symbols) of the genes.
+-   `GeneStartFilter`: filter using the chromosomal start coordinate of gene.
 -   `GRangesFilter`: allows to retrieve all features (genes, transcripts or exons)
-    that are either within (setting `condition` to "within") or partially
-    overlapping (setting `condition` to "overlapping") the defined genomic
-    region/range. Note that, depending on the called method (`genes`, `transcripts`
-    or `exons`) the start and end coordinates of either the genes, transcripts or
-    exons are used for the filter. For methods `exonsBy`, `cdsBy` and `txBy` the
-    coordinates of `by` are used.
--   `SeqendFilter`: filter based on the chromosomal end coordinate of the exons,
-    transcripts or genes (correspondingly set =feature = "exon"=, =feature = "tx"= or
-    =feature = "gene"=).
--   `SeqnameFilter`: filter by the name of the chromosomes the genes are encoded
+    that are either within (setting parameter `type` to "within") or partially
+    overlapping (setting `type` to "any") the defined genomic region/range. Note
+    that, depending on the called method (`genes`, `transcripts` or `exons`) the start
+    and end coordinates of either the genes, transcripts or exons are used for the
+    filter. For methods `exonsBy`, `cdsBy` and `txBy` the coordinates of `by` are used.
+-   `SeqNameFilter`: filter by the name of the chromosomes the genes are encoded
     on.
--   `SeqstartFilter`: filter based on the chromosomal start coordinates of the
-    exons, transcripts or genes (correspondingly set =feature = "exon"=,
-    =feature = "tx"= or =feature = "gene"=).
--   `SeqstrandFilter`: filter for the chromosome strand on which the genes are
+-   `SeqStrandFilter`: filter for the chromosome strand on which the genes are
     encoded.
--   `TxbiotypeFilter`: filter on the transcript biotype defined in Ensembl; use
+-   `SymbolFilter`: filter on gene symbols; note that no database columns *symbol* is
+    available in an `EnsDb` database and hence the gene name is used for filtering.
+-   `TxBiotypeFilter`: filter on the transcript biotype defined in Ensembl; use
     the `listTxbiotypes` method to list all available biotypes.
--   `TxidFilter`: filter on the Ensembl transcript identifiers.
-
-Each of the filter classes can take a single value or a vector of values (with
-the exception of the `SeqendFilter` and `SeqstartFilter`) for comparison. In
-addition, it is possible to specify the *condition* for the filter,
-e.g. setting `condition` to = to retrieve all entries matching the filter value,
-to != to negate the filter or setting `condition = "like"= to allow
-partial matching. The =condition` parameter for `SeqendFilter` and
-`SeqendFilter` can take the values = , >, >=, < and <= (since these
-filters base on numeric values).
-
-A simple example would be to get all transcripts for the gene *BCL2L11*. To this
-end we specify a `GenenameFilter` with the value *BCL2L11*. As a result we get
-a `GRanges` object with `start`, `end`, `strand` and `seqname` of the `GRanges`
-object being the start coordinate, end coordinate, chromosome name and strand
-for the respective transcripts. All additional annotations are available as
-metadata columns. Alternatively, by setting `return.type` to "DataFrame", or
-"data.frame" the method would return a `DataFrame` or `data.frame` object.
-
-```{r }
+-   `TxEndFilter`: filter using the chromosomal end coordinate of transcripts.
+-   `TxIdFilter`: filter on the Ensembl transcript identifiers.
+-   `TxNameFilter`: filter on the Ensembl transcript names (currently identical to
+    the transcript IDs).
+-   `TxStartFilter`: filter using the chromosomal start coordinate of transcripts.
+
+In addition to the above listed *DNA-RNA-based* filters, *protein-specific*
+filters are also available: 
+
+-   `ProtDomIdFilter`: filter by the protein domain ID.
+-   `ProteinIdFilter`: filter by Ensembl protein ID filters.
+-   `UniprotDbFilter`: filter by the name of the Uniprot database.
+-   `UniprotFilter`: filter by the Uniprot ID.
+-   `UniprotMappingTypeFilter`: filter by the mapping type of Ensembl protein IDs to
+    Uniprot IDs.
+
+These can however only be used on `EnsDb` databases that provide protein
+annotations, i.e. for which a call to `hasProteinData` returns `TRUE`.
+
+A simple use case for the filter framework would be to get all transcripts for
+the gene *BCL2L11*. To this end we specify a `GenenameFilter` with the value
+*BCL2L11*. As a result we get a `GRanges` object with `start`, `end`, `strand` and `seqname`
+being the start coordinate, end coordinate, chromosome name and strand for the
+respective transcripts. All additional annotations are available as metadata
+columns. Alternatively, by setting `return.type` to "DataFrame", or "data.frame"
+the method would return a `DataFrame` or `data.frame` object instead of the default
+`GRanges`.
+
+```{r transcripts}
 Tx <- transcripts(edb, filter = list(GenenameFilter("BCL2L11")))
 
 Tx
@@ -131,22 +162,34 @@ head(start(Tx))
 head(Tx$tx_biotype)
 ```
 
-The parameter `columns` of the `exons`, `genes` and `transcripts` method allows
-to specify which database attributes (columns) should be retrieved. The `exons`
-method returns by default all exon-related columns, the `transcripts` all columns
-from the transcript database table and the `genes` all from the gene table. Note
-however that in the example above we got also a column `gene_name` although this
-column is not present in the transcript database table. By default the methods
-return also all columns that are used by any of the filters submitted with the
-`filter` argument (thus, because a `GenenameFilter` was used, the column `gene_name`
-is also returned). Setting `returnFilterColumns(edb) <- FALSE` disables this
-option and only the columns specified by the `columns` parameter are retrieved.
+The parameter `columns` of the extractor methods (such as `exons`, `genes` or
+`transcripts)` allows to specify which database attributes (columns) should be
+retrieved. The `exons` method returns by default all exon-related columns, the
+`transcripts` all columns from the transcript database table and the `genes` all
+from the gene table. Note however that in the example above we got also a column
+`gene_name` although this column is not present in the transcript database
+table. By default the methods return also all columns that are used by any of
+the filters submitted with the `filter` argument (thus, because a `GenenameFilter`
+was used, the column `gene_name` is also returned). Setting
+`returnFilterColumns(edb) <- FALSE` disables this option and only the columns
+specified by the `columns` parameter are retrieved.
+
+Instead of passing a filter *object* to the method it is also possible to provide
+a filter *expression* written as a `formula`.
+
+```{r transcripts-filter-expression}
+## Use a filter expression to perform the filtering.
+transcripts(edb, filter = ~ genename == "ZBTB16")
+```
+
+Filter expression have to be written as a formula (i.e. starting with a `~`) in
+the form *column name* followed by the logical condition.
 
 To get an overview of database tables and available columns the function
 `listTables` can be used. The method `listColumns` on the other hand lists columns
 for the specified database table.
 
-```{r }
+```{r list-columns}
 ## list all database tables along with their columns
 listTables(edb)
 
@@ -161,10 +204,10 @@ the name of the gene for each transcript. Note that we are changing here the
 `return.type` to `DataFrame`, so the method will return a `DataFrame` with the
 results instead of the default `GRanges`.
 
-```{r }
+```{r transcripts-example2}
 Tx <- transcripts(edb,
 		  columns = c(listColumns(edb , "tx"), "gene_name"),
-		  filter = TxbiotypeFilter("nonsense_mediated_decay"),
+		  filter = TxBiotypeFilter("nonsense_mediated_decay"),
 		  return.type = "DataFrame")
 nrow(Tx)
 Tx
@@ -174,8 +217,8 @@ For protein coding transcripts, we can also specifically extract their coding
 region. In the example below we extract the CDS for all transcripts encoded on
 chromosome Y.
 
-```{r }
-yCds <- cdsBy(edb, filter = SeqnameFilter("Y"))
+```{r cdsBy}
+yCds <- cdsBy(edb, filter = SeqNameFilter("Y"))
 yCds
 ```
 
@@ -185,10 +228,10 @@ below we query all genes that are partially overlapping with a small region on
 chromosome 11. The filter restricts to all genes for which either an exon or an
 intron is partially overlapping with the region.
 
-```{r }
+```{r genes-GRangesFilter}
 ## Define the filter
 grf <- GRangesFilter(GRanges("11", ranges = IRanges(114000000, 114000050),
-			     strand = "+"), condition = "overlapping")
+			     strand = "+"), type = "any")
 
 ## Query genes:
 gn <- genes(edb, filter = grf)
@@ -217,7 +260,7 @@ region. Below we fetch these 4 transcripts. Note, that a call to `exons` will
 not return any features from the database, as no exon is overlapping with the
 region.
 
-```{r }
+```{r transcripts-GRangesFilter}
 transcripts(edb, filter = grf)
 ```
 
@@ -229,11 +272,11 @@ overlapping genomic regions using the `exonsByOverlaps` or
 implementation of these methods for `EnsDb` objects supports also to use filters
 to further fine-tune the query.
 
-To get an overview of allowed/available gene and transcript biotype the
-functions `listGenebiotypes` and `listTxbiotypes` can be used.
+The functions `listGenebiotypes` and `listTxbiotypes` can be used to get an overview
+of allowed/available gene and transcript biotype
 
-```{r }
-## Get all gene biotypes from the database. The GenebiotypeFilter
+```{r biotypes}
+## Get all gene biotypes from the database. The GeneBiotypeFilter
 ## allows to filter on these values.
 listGenebiotypes(edb)
 
@@ -245,13 +288,13 @@ Data can be fetched in an analogous way using the `exons` and `genes`
 methods. In the example below we retrieve `gene_name`, `entrezid` and the
 `gene_biotype` of all genes in the database which names start with "BCL2".
 
-```{r }
+```{r genes-BCL2}
 ## We're going to fetch all genes which names start with BCL. To this end
 ## we define a GenenameFilter with partial matching, i.e. condition "like"
 ## and a % for any character/string.
 BCLs <- genes(edb,
 	      columns = c("gene_name", "entrezid", "gene_biotype"),
-	      filter = list(GenenameFilter("BCL%", condition = "like")),
+	      filter = GenenameFilter("BCL", condition = "startsWith"),
 	      return.type = "DataFrame")
 nrow(BCLs)
 BCLs
@@ -261,20 +304,21 @@ Sometimes it might be useful to know the length of genes or transcripts
 (i.e. the total sum of nucleotides covered by their exons). Below we calculate
 the mean length of transcripts from protein coding genes on chromosomes X and Y
 as well as the average length of snoRNA, snRNA and rRNA transcripts encoded on
-these chromosomes.
+these chromosomes. For the first query we combine two `AnnotationFilter` objects
+using an `AnnotationFilterList` object, in the second we define the query using a
+filter expression.
 
-```{r }
+```{r example-AnnotationFilterList}
 ## determine the average length of snRNA, snoRNA and rRNA genes encoded on
 ## chromosomes X and Y.
-mean(lengthOf(edb, of = "tx",
-	      filter = list(GenebiotypeFilter(c("snRNA", "snoRNA", "rRNA")),
-			    SeqnameFilter(c("X", "Y")))))
+mean(lengthOf(edb, of = "tx", filter = AnnotationFilterList(
+				  GeneBiotypeFilter(c("snRNA", "snoRNA", "rRNA")),
+				  SeqNameFilter(c("X", "Y")))))
 
 ## determine the average length of protein coding genes encoded on the same
 ## chromosomes.
-mean(lengthOf(edb, of = "tx",
-	      filter = list(GenebiotypeFilter("protein_coding"),
-			    SeqnameFilter(c("X", "Y")))))
+mean(lengthOf(edb, of = "tx", filter = ~ gene_biotype == "protein_coding" &
+				  seq_name %in% c("X", "Y")))
 ```
 
 Not unexpectedly, transcripts of protein coding genes are longer than those of
@@ -283,14 +327,15 @@ snRNA, snoRNA or rRNA genes.
 At last we extract the first two exons of each transcript model from the
 database.
 
-```{r }
+```{r example-first-two-exons}
 ## Extract all exons 1 and (if present) 2 for all genes encoded on the
 ## Y chromosome
 exons(edb, columns = c("tx_id", "exon_idx"),
-      filter = list(SeqnameFilter("Y"),
-		    ExonrankFilter(3, condition = "<")))
+      filter = list(SeqNameFilter("Y"),
+		    ExonRankFilter(3, condition = "<")))
 ```
 
+
 # Extracting gene/transcript/exon models for RNASeq feature counting
 
 For the feature counting step of an RNAseq experiment, the gene or transcript
@@ -307,10 +352,8 @@ CDS.
 A simple use case is to retrieve all genes encoded on chromosomes X and Y from
 the database.
 
-```{r }
-TxByGns <- transcriptsBy(edb, by = "gene",
-			 filter = list(SeqnameFilter(c("X", "Y")))
-			 )
+```{r transcriptsBy-X-Y}
+TxByGns <- transcriptsBy(edb, by = "gene", filter = SeqNameFilter(c("X", "Y")))
 TxByGns
 ```
 
@@ -319,17 +362,17 @@ Since Ensembl contains also definitions of genes that are on chromosome variants
 gene models should be returned.
 
 In a real use case, we might thus want to retrieve all genes encoded on the
-*standard* chromosomes. In addition it is advisable to use a `GeneidFilter` to
+*standard* chromosomes. In addition it is advisable to use a `GeneIdFilter` to
 restrict to Ensembl genes only, as also *LRG* (Locus Reference Genomic)
 genes<sup><a id="fnr.2" class="footref" href="#fn.2">2</a></sup> are defined in the database, which are partially redundant with
 Ensembl genes.
 
-```{r eval=FALSE}
+```{r exonsBy-RNAseq, message = FALSE, eval = FALSE}
 ## will just get exons for all genes on chromosomes 1 to 22, X and Y.
 ## Note: want to get rid of the "LRG" genes!!!
-EnsGenes <- exonsBy(edb, by = "gene",
-		    filter = list(SeqnameFilter(c(1:22, "X", "Y")),
-				  GeneidFilter("ENSG%", "like")))
+EnsGenes <- exonsBy(edb, by = "gene", filter = AnnotationFilterList(
+					  SeqNameFilter(c(1:22, "X", "Y")),
+					  GeneIdFilter("ENSG", "startsWith")))
 ```
 
 The code above returns a `GRangesList` that can be used directly as an input for
@@ -339,7 +382,7 @@ Alternatively, the above `GRangesList` can be transformed to a `data.frame` in
 *SAF* format that can be used as an input to the `featureCounts` function of the
 `Rsubread` package <sup><a id="fnr.4" class="footref" href="#fn.4">4</a></sup>.
 
-```{r eval=FALSE}
+```{r toSAF-RNAseq, message = FALSE, eval=FALSE}
 ## Transforming the GRangesList into a data.frame in SAF format
 EnsGenes.SAF <- toSAF(EnsGenes)
 ```
@@ -353,13 +396,14 @@ In addition, the `disjointExons` function (similar to the one defined in
 `GenomicFeatures`) can be used to generate a `GRanges` of non-overlapping exon
 parts which can be used in the `DEXSeq` package.
 
-```{r eval=FALSE}
+```{r disjointExons, message = FALSE, eval=FALSE}
 ## Create a GRanges of non-overlapping exon parts.
-DJE <- disjointExons(edb,
-		     filter = list(SeqnameFilter(c(1:22, "X", "Y")),
-				   GeneidFilter("ENSG%", "like")))
+DJE <- disjointExons(edb, filter = AnnotationFilterList(
+			      SeqNameFilter(c(1:22, "X", "Y")),
+			      GeneIdFilter("ENSG%", "startsWith")))
 ```
 
+
 # Retrieving sequences for gene/transcript/exon models
 
 The methods to retrieve exons, transcripts and genes (i.e. `exons`, `transcripts`
@@ -381,7 +425,7 @@ the package, subset to genes encoded on sequences available in the `FaFile` and
 extract all of their sequences. Note: these sequences represent the sequence
 between the chromosomal start and end coordinates of the gene.
 
-```{r eval=FALSE}
+```{r transcript-sequence-AnnotationHub, message = FALSE, eval = FALSE}
 library(EnsDb.Hsapiens.v75)
 library(Rsamtools)
 edb <- EnsDb.Hsapiens.v75
@@ -405,9 +449,9 @@ To retrieve the (exonic) sequence of transcripts (i.e. without introns) we can
 use directly the `extractTranscriptSeqs` method defined in the `GenomicFeatures` on
 the `EnsDb` object, eventually using a filter to restrict the query.
 
-```{r eval=FALSE}
+```{r transcript-sequence-extractTranscriptSeqs, message = FALSE, eval = FALSE}
 ## get all exons of all transcripts encoded on chromosome Y
-yTx <- exonsBy(edb, filter = SeqnameFilter("Y"))
+yTx <- exonsBy(edb, filter = SeqNameFilter("Y"))
 
 ## Retrieve the sequences for these transcripts from the FaFile.
 library(GenomicFeatures)
@@ -415,17 +459,18 @@ yTxSeqs <- extractTranscriptSeqs(Dna, yTx)
 yTxSeqs
 
 ## Extract the sequences of all transcripts encoded on chromosome Y.
-yTx <- extractTranscriptSeqs(Dna, edb, filter = SeqnameFilter("Y"))
+yTx <- extractTranscriptSeqs(Dna, edb, filter = SeqNameFilter("Y"))
 
 ## Along these lines, we could use the method also to retrieve the coding sequence
 ## of all transcripts on the Y chromosome.
-cdsY <- cdsBy(edb, filter = SeqnameFilter("Y"))
+cdsY <- cdsBy(edb, filter = SeqNameFilter("Y"))
 extractTranscriptSeqs(Dna, cdsY)
 ```
 
 Note: in the next section we describe how transcript sequences can be retrieved
 from a `BSgenome` package that is based on UCSC, not Ensembl.
 
+
 # Integrating annotations from Ensembl based  `EnsDb` packages with UCSC based annotations
 
 Sometimes it might be useful to combine (Ensembl based) annotations from `EnsDb`
@@ -440,12 +485,12 @@ UCSC, NCBI and Ensembl chromosome names for the *main* chromosomes).
 
 In the example below we change the seqnames style to UCSC.
 
-```{r message=FALSE}
+```{r seqlevelsStyle, message = FALSE}
 ## Change the seqlevels style form Ensembl (default) to UCSC:
 seqlevelsStyle(edb) <- "UCSC"
 
-## Now we can use UCSC style seqnames in SeqnameFilters or GRangesFilter:
-genesY <- genes(edb, filter = SeqnameFilter("chrY"))
+## Now we can use UCSC style seqnames in SeqNameFilters or GRangesFilter:
+genesY <- genes(edb, filter = ~ seq_name == "chrY")
 ## The seqlevels of the returned GRanges are also in UCSC style
 seqlevels(genesY)
 ```
@@ -459,7 +504,7 @@ ones from Ensembl) are returned. With `ensembldb.seqnameNotFound` "MISSING" each
 time a seqname can not be found an error is thrown. For all other cases
 (e.g. `ensembldb.seqnameNotFound = NA`) the value of the option is returned.
 
-```{r }
+```{r seqlevelsStyle-2, message = FALSE}
 seqlevelsStyle(edb) <- "UCSC"
 
 ## Getting the default option:
@@ -483,7 +528,7 @@ the `BSGenome` package for the human genome from UCSC. The specified version
 while we changed the style of the seqnames to UCSC we did not change the naming
 of the genome release.
 
-```{r warning=FALSE, message=FALSE}
+```{r extractTranscriptSeqs-BSGenome, warning = FALSE, message = FALSE}
 library(BSgenome.Hsapiens.UCSC.hg19)
 bsg <- BSgenome.Hsapiens.UCSC.hg19
 
@@ -493,22 +538,25 @@ unique(genome(edb))
 ## Although differently named, both represent genome build GRCh37.
 
 ## Extract the full transcript sequences.
-yTxSeqs <- extractTranscriptSeqs(bsg, exonsBy(edb, "tx", filter = SeqnameFilter("chrY")))
+yTxSeqs <- extractTranscriptSeqs(bsg, exonsBy(edb, "tx",
+					      filter = SeqNameFilter("chrY")))
 
 yTxSeqs
 
 ## Extract just the CDS
-Test <- cdsBy(edb, "tx", filter = SeqnameFilter("chrY"))
-yTxCds <- extractTranscriptSeqs(bsg, cdsBy(edb, "tx", filter = SeqnameFilter("chrY")))
+Test <- cdsBy(edb, "tx", filter = SeqNameFilter("chrY"))
+yTxCds <- extractTranscriptSeqs(bsg, cdsBy(edb, "tx",
+					   filter = SeqNameFilter("chrY")))
 yTxCds
 ```
 
-At last changing the seqname style to the default value ="Ensembl"=.
+At last changing the seqname style to the default value `"Ensembl"`.
 
-```{r }
+```{r seqlevelsStyle-restore}
 seqlevelsStyle(edb) <- "Ensembl"
 ```
 
+
 # Interactive annotation lookup using the `shiny` web app
 
 In addition to the `genes`, `transcripts` and `exons` methods it is possibly to
@@ -517,7 +565,8 @@ search interactively for gene/transcript/exon annotations using the internal,
 `runEnsDbApp()` function. The search results from this app can also be returned
 to the R workspace either as a `data.frame` or `GRanges` object.
 
-# Plotting gene/transcript features using `ensembldb` and `Gviz`
+
+# Plotting gene/transcript features using `ensembldb` and `Gviz` and `ggbio`
 
 The `Gviz` package provides functions to plot genes and transcripts along with
 other data on a genomic scale. Gene models can be provided either as a
@@ -535,7 +584,7 @@ not necessary if we just want to retrieve gene models from an `EnsDb` object, as
 the `ensembldb` package internally checks the `ucscChromosomeNames` option and,
 depending on that, maps Ensembl chromosome names to UCSC chromosome names.
 
-```{r gviz-plot, message=FALSE, fig.align='center', fig.width=7.5, fig.height=2.25}
+```{r gviz-plot, message=FALSE, fig.align='center', fig.width=7.5, fig.height=2.3}
 ## Loading the Gviz library
 library(Gviz)
 library(EnsDb.Hsapiens.v75)
@@ -560,7 +609,7 @@ options(ucscChromosomeNames = TRUE)
 Above we had to change the option `ucscChromosomeNames` to `FALSE` in order to
 use it with non-UCSC chromosome names. Alternatively, we could however also
 change the `seqnamesStyle` of the `EnsDb` object to `UCSC`. Note that we have to
-use now also chromosome names in the *UCSC style* in the `SeqnameFilter`
+use now also chromosome names in the *UCSC style* in the `SeqNameFilter`
 (i.e. "chrY" instead of `Y`).
 
 ```{r message=FALSE}
@@ -581,10 +630,10 @@ different gene region tracks, one for protein coding genes and one for lincRNAs.
 ```{r gviz-separate-tracks, message=FALSE, warning=FALSE, fig.align='center', fig.width=7.5, fig.height=2.25}
 protCod <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
 				     start = 20400000, end = 21400000,
-				     filter = GenebiotypeFilter("protein_coding"))
+				     filter = GeneBiotypeFilter("protein_coding"))
 lincs <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
 				   start = 20400000, end = 21400000,
-				   filter = GenebiotypeFilter("lincRNA"))
+				   filter = GeneBiotypeFilter("lincRNA"))
 
 plotTracks(list(gat, GeneRegionTrack(protCod, name = "protein coding"),
 		GeneRegionTrack(lincs, name = "lincRNAs")), transcriptAnnotation = "symbol")
@@ -593,6 +642,28 @@ plotTracks(list(gat, GeneRegionTrack(protCod, name = "protein coding"),
 seqlevelsStyle <- "Ensembl"
 ```
 
+Alternatively, we can also use `ggbio` for plotting. For `ggplot` we can directly
+pass the `EnsDb` object along with optional filters (or as in the example below a
+filter expression as a `formula`).
+
+```{r pplot-plot, message=FALSE, fig.align='center', fig.width=7.5, fig.height=4}
+library(ggbio)
+
+## Create a plot for all transcripts of the gene SKA2
+autoplot(edb, ~ genename == "SKA2")
+```
+
+To plot the genomic region and plot genes from both strands we can use a
+`GRangesFilter`.
+
+```{r pplot-plot-2, message=FALSE, fig.align='center', fig.width=7.5, fig.height=4}
+## Get the chromosomal region in which the gene is encoded
+ska2 <- genes(edb, filter = ~ genename == "SKA2")
+strand(ska2) <- "*"
+autoplot(edb, GRangesFilter(ska2), names.expr = "gene_name")
+```
+
+
 # Using `EnsDb` objects in the `AnnotationDbi` framework
 
 Most of the methods defined for objects extending the basic annotation package
@@ -605,7 +676,7 @@ In the example below we first evaluate all the available columns and keytypes in
 the database and extract then the gene names for all genes encoded on chromosome
 X.
 
-```{r }
+```{r AnnotationDbi, message = FALSE}
 library(EnsDb.Hsapiens.v75)
 edb <- EnsDb.Hsapiens.v75
 
@@ -626,7 +697,7 @@ gids <- keys(edb, keytype = "GENEID")
 length(gids)
 
 ## Get all gene names for genes encoded on chromosome Y.
-gnames <- keys(edb, keytype = "GENENAME", filter = SeqnameFilter("Y"))
+gnames <- keys(edb, keytype = "GENENAME", filter = SeqNameFilter("Y"))
 head(gnames)
 ```
 
@@ -636,14 +707,14 @@ In the next example we retrieve specific information from the database using the
 we employ the filtering system to perform a more fine-grained query to fetch
 only the protein coding transcripts for these genes.
 
-```{r warning=FALSE}
+```{r select, message = FALSE, warning=FALSE}
 ## Use the /standard/ way to fetch data.
 select(edb, keys = c("BCL2", "BCL2L11"), keytype = "GENENAME",
        columns = c("GENEID", "GENENAME", "TXID", "TXBIOTYPE"))
 
 ## Use the filtering system of ensembldb
-select(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
-			TxbiotypeFilter("protein_coding")),
+select(edb, keys = ~ genename %in% c("BCL2", "BCL2L11") &
+		tx_biotype == "protein_coding",
        columns = c("GENEID", "GENENAME", "TXID", "TXBIOTYPE"))
 ```
 
@@ -651,7 +722,7 @@ Finally, we use the `mapIds` method to establish a mapping between ids and
 values. In the example below we fetch transcript ids for the two genes from the
 example above.
 
-```{r }
+```{r mapIds, message = FALSE}
 ## Use the default method, which just returns the first value for multi mappings.
 mapIds(edb, keys = c("BCL2", "BCL2L11"), column = "TXID", keytype = "GENENAME")
 
@@ -661,13 +732,14 @@ mapIds(edb, keys = c("BCL2", "BCL2L11"), column = "TXID", keytype = "GENENAME",
 
 ## And, just like before, we can use filters to map only to protein coding transcripts.
 mapIds(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
-			TxbiotypeFilter("protein_coding")), column = "TXID",
+			TxBiotypeFilter("protein_coding")), column = "TXID",
        multiVals = "list")
 ```
 
 Note that, if the filters are used, the ordering of the result does no longer
 match the ordering of the genes.
 
+
 # Important notes
 
 These notes might explain eventually unexpected results (and, more importantly,
@@ -691,38 +763,79 @@ help avoiding them):
 -   At present, `EnsDb` support only genes/transcripts for which all of their
     exons are encoded on the same chromosome and the same strand.
 
-# Building an transcript-centric database package based on Ensembl annotation
+-   Since a single Ensembl gene ID might be mapped to multiple NCBI Entrezgene IDs
+    methods such as `genes`, `transcripts` etc return a `list` in the `"entrezid"` column
+    of the resulting result object.
 
-The code in this section is not supposed to be automatically executed when the
-vignette is built, as this would require a working installation of the Ensembl
-Perl API, which is not expected to be available on each system. Also, building
-`EnsDb` from alternative sources, like GFF or GTF files takes some time and
-thus also these examples are not directly executed when the vignette is build.
 
-## Requirements
+# Getting or building `EnsDb` databases/packages
 
-The `fetchTablesFromEnsembl` function of the package uses the Ensembl Perl API
-to retrieve the required annotations from an Ensembl database (e.g. from the
-main site *ensembldb.ensembl.org*). Thus, to use the functionality to built
-databases, the Ensembl Perl API needs to be installed (see <sup><a id="fnr.5" class="footref" href="#fn.5">5</a></sup> for details).
+Some of the code in this section is not supposed to be automatically executed
+when the vignette is built, as this would require a working installation of the
+Ensembl Perl API, which is not expected to be available on each system. Also,
+building `EnsDb` from alternative sources, like GFF or GTF files takes some time
+and thus also these examples are not directly executed when the vignette is
+build.
+
+
+## Getting `EnsDb` databases
+
+Some `EnsDb` databases are available as `R` packages from Bioconductor and can be
+simply installed with the `biocLite` function from the `BiocInstaller` package. The
+name of such annotation packages starts with *EnsDb* followed by the abbreviation
+of the organism and the Ensembl version on which the annotation
+bases. `EnsDb.Hsapiens.v86` provides thus an `EnsDb` database for homo sapiens with
+annotations from Ensembl version 86.
+
+Since Bioconductor version 3.5 `EnsDb` databases can also be retrieved directly
+from `AnnotationHub`.
+
+```{r AnnotationHub-query, message = FALSE, eval = use_network}
+library(AnnotationHub)
+## Load the annotation resource.
+ah <- AnnotationHub()
+
+## Query for all available EnsDb databases
+query(ah, "EnsDb")
+```
+
+We can simply fetch one of the databases.
+
+```{r AnnotationHub-query-2, message = FALSE, eval = use_network}
+ahDb <- query(ah, pattern = c("Xiphophorus Maculatus", "EnsDb", 87))
+## What have we got
+ahDb
+```
+
+Fetch the `EnsDb` and use it.
+
+```{r AnnotationHub-fetch, message = FALSE, eval = FALSE}
+ahEdb <- ahDb[[1]]
+
+## retriebe all genes
+gns <- genes(ahEdb)
+```
+
+We could even make an annotation package from this `EnsDb` object using the
+`makeEnsembldbPackage` and passing `dbfile(dbconn(ahEdb))` as `ensdb` argument.
 
-Alternatively, the `ensDbFromAH`, `ensDbFromGff`, `ensDbFromGRanges` and `ensDbFromGtf`
-functions allow to build EnsDb SQLite files from a `GRanges` object or GFF/GTF
-files from Ensembl (either provided as files or *via* `AnnotationHub`). These
-functions do not depend on the Ensembl Perl API, but require a working internet
-connection to fetch the chromosome lengths from Ensembl as these are not
-provided within GTF or GFF files.
 
 ## Building annotation packages
 
-The functions below use the Ensembl Perl API to fetch the required data directly
-from the Ensembl core databases. Thus, the path to the Perl API specific for the
-desired Ensembl version needs to be added to the `PERL5LIB` environment variable.
 
-An annotation package containing all human genes for Ensembl version 75 can be
-created using the code in the block below.
+### Directly from Ensembl databases
+
+The `fetchTablesFromEnsembl` function uses the Ensembl Perl API
+to retrieve the required annotations from an Ensembl database (e.g. from the
+main site *ensembldb.ensembl.org*). Thus, to use this functionality to build
+databases, the Ensembl Perl API needs to be installed (see <sup><a id="fnr.5" class="footref" href="#fn.5">5</a></sup> for details).
+
+Below we create an `EnsDb` database by fetching the required data directly from
+the Ensembl core databases. The `makeEnsembldbPackage` function is then used to
+create an annotation package from this `EnsDb` containing all human genes for
+Ensembl version 75.
 
-```{r eval=FALSE}
+```{r edb-from-ensembl, message = FALSE, eval = FALSE}
 library(ensembldb)
 
 ## get all human gene/transcript/exon annotations from Ensembl (75)
@@ -751,6 +864,20 @@ thaliana), the *Ensembl genomes* should be specified as a host, i.e. setting
 `host` to "mysql-eg-publicsql.ebi.ac.uk", `port` to `4157` and `species` to
 e.g. "arabidopsis thaliana".
 
+
+### From a GTF or GFF file
+
+Alternatively, the `ensDbFromAH`, `ensDbFromGff`, `ensDbFromGRanges` and `ensDbFromGtf`
+functions allow to build EnsDb SQLite files from a `GRanges` object or GFF/GTF
+files from Ensembl (either provided as files or *via* `AnnotationHub`). These
+functions do not depend on the Ensembl Perl API, but require a working internet
+connection to fetch the chromosome lengths from Ensembl as these are not
+provided within GTF or GFF files. Also note that protein annotations are usually
+not available in GTF or GFF files, thus, such annotations will not be included
+in the generated `EnsDb` database - protein annotations are only available in
+`EnsDb` databases created with the Ensembl Perl API (such as the ones provided
+through `AnnotationHub` or as Bioconductor packages).
+
 In the next example we create an `EnsDb` database using the `AnnotationHub`
 package and load also the corresponding genomic DNA sequence matching the
 Ensembl version. We thus first query the `AnnotationHub` package for all
@@ -760,7 +887,7 @@ then use the `getGenomeFaFile` method on the `EnsDb` to directly look up and
 retrieve the correct or best matching `FaFile` with the genomic DNA sequence. At
 last we retrieve the sequences of all exons using the `getSeq` method.
 
-```{r eval=FALSE}
+```{r gtf-gff-edb, message = FALSE, eval = FALSE}
 ## Load the AnnotationHub data.
 library(AnnotationHub)
 ah <- AnnotationHub()
@@ -782,7 +909,7 @@ edb <- EnsDb(DbFile)
 Dna <- getGenomeFaFile(edb)
 library(Rsamtools)
 ## We next retrieve the sequence of all exons on chromosome Y.
-exons <- exons(edb, filter = SeqnameFilter("Y"))
+exons <- exons(edb, filter = SeqNameFilter("Y"))
 exonSeq <- getSeq(Dna, exons)
 
 ## Alternatively, look up and retrieve the toplevel DNA sequence manually.
@@ -790,37 +917,37 @@ Dna <- ah[["AH22042"]]
 ```
 
 In the example below we load a `GRanges` containing gene definitions for genes
-encoded on chromosome Y and generate a EnsDb SQLite database from that
+encoded on chromosome Y and generate a `EnsDb` SQLite database from that
 information.
 
-```{r message=FALSE}
+```{r EnsDb-from-Y-GRanges, message = FALSE, eval = use_network}
 ## Generate a sqlite database from a GRanges object specifying
 ## genes encoded on chromosome Y
 load(system.file("YGRanges.RData", package = "ensembldb"))
 Y
 
+## Create the EnsDb database file
 DB <- ensDbFromGRanges(Y, path = tempdir(), version = 75,
 		       organism = "Homo_sapiens")
 
+## Load the database
 edb <- EnsDb(DB)
 edb
-
-## As shown in the example below, we could make an EnsDb package on
-## this DB object using the makeEnsembldbPackage function.
 ```
 
 Alternatively we can build the annotation database using the `ensDbFromGtf`
-`ensDbFromGff` functions, that extracts most of the required data from a GTF
-respectively GFF (version 3) file which can be downloaded from Ensembl (e.g. from
-<ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens> for human gene definitions
-from Ensembl version 75; for plant genomes etc files can be retrieved from
-<ftp://ftp.ensemblgenomes.org>). All information except the chromosome lengths and
-the NCBI Entrezgene IDs can be extracted from these GTF files. The function also
-tries to retrieve chromosome length information automatically from Ensembl.
+`ensDbFromGff` functions, that extract most of the required data from a GTF
+respectively GFF (version 3) file which can be downloaded from Ensembl
+(e.g. from <ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens> for human gene
+definitions from Ensembl version 75; for plant genomes etc, files can be
+retrieved from <ftp://ftp.ensemblgenomes.org>). All information except the
+chromosome lengths, the NCBI Entrezgene IDs and protein annotations can be
+extracted from these GTF files. The function also tries to retrieve chromosome
+length information automatically from Ensembl.
 
 Below we create the annotation from a gtf file that we fetch directly from Ensembl.
 
-```{r eval=FALSE}
+```{r EnsDb-from-GTF, message = FALSE, eval = FALSE}
 library(ensembldb)
 
 ## the GTF file can be downloaded from
@@ -839,17 +966,23 @@ makeEnsembldbPackage(ensdb = DB, version = "0.99.12",
 		     author = "J Rainer")
 ```
 
-# Database layout<a id="orgtarget1"></a>
+
+# Database layout<a id="org35014ed"></a>
 
 The database consists of the following tables and attributes (the layout is also
-shown in Figure [115](#orgparagraph1)):
+shown in Figure [159](#org6a42233)). Note that the protein-specific annotations
+might not be available in all `EnsDB` databases (e.g. such ones created with
+`ensembldb` version < 1.7 or created from GTF or GFF files).
 
 -   **gene**: all gene specific annotations.
     -   `gene_id`: the Ensembl ID of the gene.
     -   `gene_name`: the name (symbol) of the gene.
+<<<<<<< variant A
     -   `entrezid`: the NCBI Entrezgene ID(s) of the gene. Note that this can be a
         `;` separated list of IDs for genes that are mapped to more than one
         Entrezgene.
+>>>>>>> variant B
+======= end
     -   `gene_biotype`: the biotype of the gene.
     -   `gene_seq_start`: the start coordinate of the gene on the sequence (usually
         a chromosome).
@@ -858,6 +991,11 @@ shown in Figure [115](#orgparagraph1)):
     -   `seq_strand`: the strand on which the gene is encoded.
     -   `seq_coord_system`: the coordinate system of the sequence.
 
+-   **entrezgene**: mapping of Ensembl genes to NCBI Entrezgene identifiers. Note that
+    this mapping can be a one-to-many mapping.
+    -   `gene_id`: the Ensembl gene ID.
+    -   `entrezid`: the NCBI Entrezgene ID.
+
 -   **tx**: all transcript related annotations. Note that while no `tx_name` column
     is available in this database column, all methods to retrieve data from the
     database support also this column. The returned values are however the ID of
@@ -887,9 +1025,36 @@ shown in Figure [115](#orgparagraph1)):
     -   `seq_length`: the length of the sequence.
     -   `is_circular`: whether the sequence in circular.
 
--   **information**: some additional, internal, informations (Genome build, Ensembl
+-   **protein**: provides protein annotation for a (coding) transcript.
+    -   `protein_id`: the Ensembl protein ID.
+    -   `tx_id`: the transcript ID which CDS encodes the protein.
+    -   `protein_sequence`: the peptide sequence of the protein (translated from the
+        transcript's coding sequence after applying eventual RNA editing).
+
+-   **uniprot**: provides the mapping from Ensembl protein ID(s) to Uniprot ID(s). Not
+    all Ensembl proteins are annotated to Uniprot IDs, also, each Ensembl protein
+    might be mapped to multiple Uniprot IDs.
+    -   `protein_id`: the Ensembl protein ID.
+    -   `uniprot_id`: the Uniprot ID.
+    -   `uniprot_db`: the Uniprot database in which the ID is defined.
+    -   `uniprot_mapping_type`: the type of the mapping method that was used to assign
+        the Uniprot ID to an Ensembl protein ID.
+
+-   **protein\_domain**: provides protein domain annotations and mapping to proteins.
+    -   `protein_id`: the Ensembl protein ID on which the protein domain is present.
+    -   `protein_domain_id`: the ID of the protein domain (from the protein domain
+        source).
+    -   `protein_domain_source`: the source/analysis method in/by which the protein
+        domain was defined (such as pfam etc).
+    -   `interpro_accession`: the Interpro accession ID of the protein domain.
+    -   `prot_dom_start`: the start position of the protein domain within the
+        protein's sequence.
+    -   `prot_dom_end`: the end position of the protein domain within the protein's
+        sequence.
+
+-   **metadata**: some additional, internal, informations (Genome build, Ensembl
     version etc).
-    -   `key`
+    -   `name`
     -   `value`
 
 -   *virtual* columns:
@@ -897,24 +1062,22 @@ shown in Figure [115](#orgparagraph1)):
         possible to use it in the `columns` parameter. This column is *symlinked* to the
         `gene_name` column.
     -   `tx_name`: similar to the `symbol` column, this column is *symlinked* to the `tx_id`
-            column.
+        column.
 
-![img](images/dblayout.png "Database layout.")
+The database layout: as already described above, protein related annotations
+(green) might not be available in each `EnsDb` database.
 
-<div id="footnotes">
-<h2 class="footnotes">Footnotes: </h2>
-<div id="text-footnotes">
+![img](images/dblayout.png "Database layout.")
 
-<div class="footdef"><sup><a id="fn.1" class="footnum" href="#fnr.1">1</a></sup> <div class="footpara"><http://www.ensembl.org></div></div>
 
-<div class="footdef"><sup><a id="fn.2" class="footnum" href="#fnr.2">2</a></sup> <div class="footpara"><http://www.lrg-sequence.org></div></div>
+# Footnotes
 
-<div class="footdef"><sup><a id="fn.3" class="footnum" href="#fnr.3">3</a></sup> <div class="footpara"><http://www.ncbi.nlm.nih.gov/pubmed/23950696></div></div>
+<sup><a id="fn.1" href="#fnr.1">1</a></sup> <http://www.ensembl.org>
 
-<div class="footdef"><sup><a id="fn.4" class="footnum" href="#fnr.4">4</a></sup> <div class="footpara"><http://www.ncbi.nlm.nih.gov/pubmed/24227677></div></div>
+<sup><a id="fn.2" href="#fnr.2">2</a></sup> <http://www.lrg-sequence.org>
 
-<div class="footdef"><sup><a id="fn.5" class="footnum" href="#fnr.5">5</a></sup> <div class="footpara"><http://www.ensembl.org/info/docs/api/api_installation.html></div></div>
+<sup><a id="fn.3" href="#fnr.3">3</a></sup> <http://www.ncbi.nlm.nih.gov/pubmed/23950696>
 
+<sup><a id="fn.4" href="#fnr.4">4</a></sup> <http://www.ncbi.nlm.nih.gov/pubmed/24227677>
 
-</div>
-</div>
+<sup><a id="fn.5" href="#fnr.5">5</a></sup> <http://www.ensembl.org/info/docs/api/api_installation.html>
diff --git a/vignettes/ensembldb.org b/vignettes/ensembldb.org
index 554983e..b44a95e 100644
--- a/vignettes/ensembldb.org
+++ b/vignettes/ensembldb.org
@@ -1,48 +1,32 @@
-#+TITLE: Generating and using Ensembl based annotation packages
+#+TITLE: Generating and using Ensembl-based annotation packages
 #+AUTHOR:    Johannes Rainer
 #+EMAIL:     johannes.rainer at eurac.edu
 #+DESCRIPTION:
 #+KEYWORDS:
 #+LANGUAGE:  en
 #+OPTIONS: ^:{} toc:nil
-#+PROPERTY: exports code
-#+PROPERTY: session *R*
+#+PROPERTY: header-args :exports code
+#+PROPERTY: header-args:R :session *R*
 
 #+EXPORT_SELECT_TAGS: export
 #+EXPORT_EXCLUDE_TAGS: noexport
 
-#+latex: %\VignetteIndexEntry{Generating an using Ensembl based annotation packages}
-#+latex: %\VignetteKeywords{annotation, database}
-#+latex: %\VignetteDepends{ensembldb,EnsDb.Hsapiens.v75,BSgenome.Hsapiens.UCSC.hg19}
-#+latex: %\VignettePackage{ensembldb}
-#+latex: %\VignetteEngine{knitr::rmarkdown}
-
-
-#+BEGIN_html
+#+BEGIN_EXPORT html
 ---
 title: "Generating an using Ensembl based annotation packages"
+author: "Johannes Rainer"
 graphics: yes
+package: ensembldb
 output:
-  BiocStyle::html_document2
+  BiocStyle::html_document2:
+    toc_float: true
 vignette: >
   %\VignetteIndexEntry{Generating an using Ensembl based annotation packages}
   %\VignetteEngine{knitr::rmarkdown}
   %\VignetteEncoding{UTF-8}
-  %\VignetteDepends{ensembldb,EnsDb.Hsapiens.v75,Gviz,BiocStyle}
-  %\VignettePackage{ensembldb}
-  %\VignetteKeywords{annotation,database}
+  %\VignetteDepends{ensembldb,EnsDb.Hsapiens.v75,BiocStyle,AnnotationHub,ggbio,Gviz}
 ---
-#+END_html
-
-# #+BEGIN_EXPORT html
-
-#+BEGIN_html
-**Package**: `r BiocStyle::Biocpkg("ensembldb")`<br />
-**Authors**: `r packageDescription("ensembldb")$Author`<br />
-**Modified**: 12 September, 2016<br />
-**Compiled**: `r date()`
-#+END_html
-
+#+END_EXPORT
 
 
 * How to export this to a =Rnw= vignette			   :noexport:
@@ -70,17 +54,16 @@ r=). That way we don't need to edit the resulting =Rmd= file.
 
 The =ensembldb= package provides functions to create and use transcript centric
 annotation databases/packages. The annotation for the databases are directly
-fetched from Ensembl [fn:1] using their Perl API.  The functionality and data is
-similar to that of the =TxDb= packages from the =GenomicFeatures= package, but,
-in addition to retrieve all gene/transcript models and annotations from the
+fetched from Ensembl [fn:1] using their Perl API. The functionality and data is
+similar to that of the =TxDb= packages from the =GenomicFeatures= package, but, in
+addition to retrieve all gene/transcript models and annotations from the
 database, the =ensembldb= package provides also a filter framework allowing to
 retrieve annotations for specific entries like genes encoded on a chromosome
-region or transcript models of lincRNA genes.  In the databases, along with the
-gene and transcript models and their chromosomal coordinates, additional
-annotations including the gene name (symbol) and NCBI Entrezgene identifiers as
-well as the gene and transcript biotypes are stored too (see Section
-[[section.database.layout]] for the database layout and an overview of available
-attributes/columns).
+region or transcript models of lincRNA genes. From version 1.7 on, =EnsDb=
+databases created by the =ensembldb= package contain also protein annotation data
+(see Section [[section.database.layout]] for the database layout and an overview of
+available attributes/columns). For more information on the use of the protein
+annotations refer to the /proteins/ vignette.
 
 Another main goal of this package is to generate /versioned/ annotation
 packages, i.e. annotation packages that are build for a specific Ensembl
@@ -92,9 +75,10 @@ also allows to load multiple annotation packages at the same time in order to
 e.g. compare gene models between Ensembl releases.
 
 In the example below we load an Ensembl based annotation package for Homo
-sapiens, Ensembl version 75. The connection to the database is bound to the
-variable =EnsDb.Hsapiens.v75=.
+sapiens, Ensembl version 75. The =EnsDb= object providing access to the underlying
+SQLite database is bound to the variable name =EnsDb.Hsapiens.v75=.
 
+#+NAME: load-libs
 #+BEGIN_SRC R :ravel warning=FALSE, message=FALSE
   library(EnsDb.Hsapiens.v75)
 
@@ -103,80 +87,111 @@ variable =EnsDb.Hsapiens.v75=.
   ## print some informations for this package
   edb
 
-  ## for what organism was the database generated?
+  ## For what organism was the database generated?
   organism(edb)
+
 #+END_SRC
 
 
+#+NAME: no-network
+#+BEGIN_SRC R :results silent :ravel echo = FALSE, results = "hide"
+  ## Disable code chunks that require network connection - conditionally
+  ## disable this on Windows only. This is to avoid TIMEOUT errors on the
+  ## Bioconductor Windows build maching (issue #47).
+  use_network <- FALSE
+
+#+END_SRC
+
 * Using =ensembldb= annotation packages to retrieve specific annotations
 
-The =ensembldb= package provides a set of filter objects allowing to specify
-which entries should be fetched from the database. The complete list of filters,
-which can be used individually or can be combined, is shown below (in
-alphabetical order):
+One of the strengths of the =ensembldb= package and the related =EnsDb= databases is
+its implementation of a filter framework that enables to efficiently extract
+data sub-sets from the databases. The =ensembldb= package supports most of the
+filters defined in the =AnnotationFilter= Bioconductor package and defines some
+additional filters specific to the data stored in =EnsDb= databases. The
+=supportedFilters= method can be used to get an overview over all supported filter
+classes, each of them (except the =GRangesFilter=) working on a single
+column/field in the database.
+
+#+NAME: filters
+#+BEGIN_SRC R 
+  supportedFilters(edb)
 
-+ =ExonidFilter=: allows to filter the result based on the (Ensembl) exon
-  identifiers.
-+ =ExonrankFilter=: filter results on the rank (index) of an exon within the
+#+END_SRC
+
+These filters can be divided into 3 main filter types:
++ =IntegerFilter=: filter classes extending this basic object can take a single
+  numeric value as input and support the conditions ==, !=, >, <, >= and <=. All
+  filters that work on chromosomal coordinates, such as the =GeneEndFilter= extend
+  =IntegerFilter=.
++ =CharacterFilter=: filter classes extending this object can take a single or
+  multiple character values as input and allow conditions: ==, !=, "startsWith"
+  and "endsWith". All filters working on IDs extend this class.
++ =GRangesFilter=: takes a =GRanges= object as input and supports all conditions
+  that =findOverlaps= from the =IRanges= package supports ("any", "start", "end",
+  "within", "equal"). Note that these have to be passed using the parameter =type=
+  to the constructor function.
+
+
+The supported filters are:
++ =EntrezFilter=: allows to filter results based on NCBI Entrezgene
+  identifiers of the genes.
++ =ExonEndFilter=: filter using the chromosomal end coordinate of exons.
++ =ExonIdFilter=: filter based on the (Ensembl) exon identifiers.
++ =ExonRankFilter=: filter based on the rank (index) of an exon within the
   transcript model. Exons are always numbered from 5' to 3' end of the
   transcript, thus, also on the reverse strand, the exon 1 is the most 5' exon
   of the transcript.
-+ =EntrezidFilter=: allows to filter results based on NCBI Entrezgene
-  identifiers of the genes.
-+ =GenebiotypeFilter=: allows to filter for the gene biotypes defined in the
-  Ensembl database; use the =listGenebiotypes= method to list all available
-  biotypes.
-+ =GeneidFilter=: allows to filter based on the Ensembl gene IDs.
-+ =GenenameFilter=: allows to filter based on the names (symbols) of the genes.
-+ =SymbolFilter=: allows to filter on gene symbols; note that no database columns
-  /symbol/ is available in an =EnsDb= database and hence the gene name is used for
-  filtering.
++ =ExonStartFilter=: filter using the chromosomal start coordinate of exons.
++ =GeneBiotypeFilter=: filter using the gene biotypes defined in the Ensembl
+  database; use the =listGenebiotypes= method to list all available biotypes.
++ =GeneEndFilter=: filter using the chromosomal end coordinate of gene.
++ =GeneIdFilter=: filter based on the Ensembl gene IDs.
++ =GenenameFilter=: filter based on the names (symbols) of the genes.
++ =GeneStartFilter=: filter using the chromosomal start coordinate of gene.
 + =GRangesFilter=: allows to retrieve all features (genes, transcripts or exons)
-  that are either within (setting =condition= to "within") or partially
-  overlapping (setting =condition= to "overlapping") the defined genomic
-  region/range. Note that, depending on the called method (=genes=, =transcripts=
-  or =exons=) the start and end coordinates of either the genes, transcripts or
-  exons are used for the filter. For methods =exonsBy=, =cdsBy= and =txBy= the
-  coordinates of =by= are used.
-+ =SeqendFilter=: filter based on the chromosomal end coordinate of the exons,
-  transcripts or genes (correspondingly set =feature = "exon"=, =feature = "tx"= or
-  =feature = "gene"=).
-+ =SeqnameFilter=: filter by the name of the chromosomes the genes are encoded
+  that are either within (setting parameter =type= to "within") or partially
+  overlapping (setting =type= to "any") the defined genomic region/range. Note
+  that, depending on the called method (=genes=, =transcripts= or =exons=) the start
+  and end coordinates of either the genes, transcripts or exons are used for the
+  filter. For methods =exonsBy=, =cdsBy= and =txBy= the coordinates of =by= are used.
++ =SeqNameFilter=: filter by the name of the chromosomes the genes are encoded
   on.
-+ =SeqstartFilter=: filter based on the chromosomal start coordinates of the
-  exons, transcripts or genes (correspondingly set =feature = "exon"=,
-  =feature = "tx"= or =feature = "gene"=).
-+ =SeqstrandFilter=: filter for the chromosome strand on which the genes are
++ =SeqStrandFilter=: filter for the chromosome strand on which the genes are
   encoded.
-+ =TxbiotypeFilter=: filter on the transcript biotype defined in Ensembl; use
++ =SymbolFilter=: filter on gene symbols; note that no database columns /symbol/ is
+  available in an =EnsDb= database and hence the gene name is used for filtering.
++ =TxBiotypeFilter=: filter on the transcript biotype defined in Ensembl; use
   the =listTxbiotypes= method to list all available biotypes.
-+ =TxidFilter=: filter on the Ensembl transcript identifiers.
-
-Each of the filter classes can take a single value or a vector of values (with
-the exception of the =SeqendFilter= and =SeqstartFilter=) for comparison. In
-addition, it is possible to specify the /condition/ for the filter,
-e.g. setting =condition= to = to retrieve all entries matching the filter value,
-to != to negate the filter or setting =condition = "like"= to allow
-partial matching. The =condition= parameter for =SeqendFilter= and
-=SeqendFilter= can take the values = , >, >=, < and <= (since these
-filters base on numeric values).
-
-# The =SeqnameFilter= and =GRangesFilter= support both UCSC and Ensembl chromosome
-# names (e.g. ="chrX"= for UCSC and ="X"= for Ensembl), internally, UCSC
-# chromosome names are mapped to Ensembl names. By default, all functions to
-# retrieve data from the database return Ensembl chromosome names, but by setting
-# the global option =ucscChromosomeNames= to =TRUE=
-# (i.e. =options(ucscChromosomeNames = TRUE)=) chromosome/seqnames are returned in
-# UCSC format.
-
-A simple example would be to get all transcripts for the gene /BCL2L11/. To this
-end we specify a =GenenameFilter= with the value /BCL2L11/. As a result we get
-a =GRanges= object with =start=, =end=, =strand= and =seqname= of the =GRanges=
-object being the start coordinate, end coordinate, chromosome name and strand
-for the respective transcripts. All additional annotations are available as
-metadata columns. Alternatively, by setting =return.type= to "DataFrame", or
-"data.frame" the method would return a =DataFrame= or =data.frame= object.
-
++ =TxEndFilter=: filter using the chromosomal end coordinate of transcripts.
++ =TxIdFilter=: filter on the Ensembl transcript identifiers.
++ =TxNameFilter=: filter on the Ensembl transcript names (currently identical to
+  the transcript IDs).
++ =TxStartFilter=: filter using the chromosomal start coordinate of transcripts.
+
+In addition to the above listed /DNA-RNA-based/ filters, /protein-specific/
+filters are also available: 
+
++ =ProtDomIdFilter=: filter by the protein domain ID.
++ =ProteinIdFilter=: filter by Ensembl protein ID filters.
++ =UniprotDbFilter=: filter by the name of the Uniprot database.
++ =UniprotFilter=: filter by the Uniprot ID.
++ =UniprotMappingTypeFilter=: filter by the mapping type of Ensembl protein IDs to
+  Uniprot IDs.
+
+These can however only be used on =EnsDb= databases that provide protein
+annotations, i.e. for which a call to =hasProteinData= returns =TRUE=.
+
+A simple use case for the filter framework would be to get all transcripts for
+the gene /BCL2L11/. To this end we specify a =GenenameFilter= with the value
+/BCL2L11/. As a result we get a =GRanges= object with =start=, =end=, =strand= and =seqname=
+being the start coordinate, end coordinate, chromosome name and strand for the
+respective transcripts. All additional annotations are available as metadata
+columns. Alternatively, by setting =return.type= to "DataFrame", or "data.frame"
+the method would return a =DataFrame= or =data.frame= object instead of the default
+=GRanges=.
+
+#+NAME: transcripts
 #+BEGIN_SRC R
   Tx <- transcripts(edb, filter = list(GenenameFilter("BCL2L11")))
 
@@ -187,29 +202,46 @@ metadata columns. Alternatively, by setting =return.type= to "DataFrame", or
 
   ## or extract the biotype with
   head(Tx$tx_biotype)
+
+#+END_SRC
+
+The parameter =columns= of the extractor methods (such as =exons=, =genes= or
+=transcripts)= allows to specify which database attributes (columns) should be
+retrieved. The =exons= method returns by default all exon-related columns, the
+=transcripts= all columns from the transcript database table and the =genes= all
+from the gene table. Note however that in the example above we got also a column
+=gene_name= although this column is not present in the transcript database
+table. By default the methods return also all columns that are used by any of
+the filters submitted with the =filter= argument (thus, because a =GenenameFilter=
+was used, the column =gene_name= is also returned). Setting
+=returnFilterColumns(edb) <- FALSE= disables this option and only the columns
+specified by the =columns= parameter are retrieved.
+
+Instead of passing a filter /object/ to the method it is also possible to provide
+a filter /expression/ written as a =formula=.
+
+#+NAME: transcripts-filter-expression
+#+BEGIN_SRC R
+  ## Use a filter expression to perform the filtering.
+  transcripts(edb, filter = ~ genename == "ZBTB16")
+
 #+END_SRC
 
-The parameter =columns= of the =exons=, =genes= and =transcripts= method allows
-to specify which database attributes (columns) should be retrieved. The =exons=
-method returns by default all exon-related columns, the =transcripts= all columns
-from the transcript database table and the =genes= all from the gene table. Note
-however that in the example above we got also a column =gene_name= although this
-column is not present in the transcript database table. By default the methods
-return also all columns that are used by any of the filters submitted with the
-=filter= argument (thus, because a =GenenameFilter= was used, the column =gene_name=
-is also returned). Setting =returnFilterColumns(edb) <- FALSE= disables this
-option and only the columns specified by the =columns= parameter are retrieved.
+Filter expression have to be written as a formula (i.e. starting with a =~=) in
+the form /column name/ followed by the logical condition.
 
 To get an overview of database tables and available columns the function
 =listTables= can be used. The method =listColumns= on the other hand lists columns
 for the specified database table.
 
+#+NAME: list-columns
 #+BEGIN_SRC R
   ## list all database tables along with their columns
   listTables(edb)
 
   ## list columns from a specific table
   listColumns(edb, "tx")
+
 #+END_SRC
 
 Thus, we could retrieve all transcripts of the biotype /nonsense_mediated_decay/
@@ -219,22 +251,26 @@ the name of the gene for each transcript. Note that we are changing here the
 =return.type= to =DataFrame=, so the method will return a =DataFrame= with the
 results instead of the default =GRanges=.
 
+#+NAME: transcripts-example2
 #+BEGIN_SRC R
   Tx <- transcripts(edb,
                     columns = c(listColumns(edb , "tx"), "gene_name"),
-                    filter = TxbiotypeFilter("nonsense_mediated_decay"),
+                    filter = TxBiotypeFilter("nonsense_mediated_decay"),
                     return.type = "DataFrame")
   nrow(Tx)
   Tx
+
 #+END_SRC
 
 For protein coding transcripts, we can also specifically extract their coding
 region. In the example below we extract the CDS for all transcripts encoded on
 chromosome Y.
 
+#+NAME: cdsBy
 #+BEGIN_SRC R
-  yCds <- cdsBy(edb, filter = SeqnameFilter("Y"))
+  yCds <- cdsBy(edb, filter = SeqNameFilter("Y"))
   yCds
+
 #+END_SRC
 
 Using a =GRangesFilter= we can retrieve all features from the database that are
@@ -243,10 +279,11 @@ below we query all genes that are partially overlapping with a small region on
 chromosome 11. The filter restricts to all genes for which either an exon or an
 intron is partially overlapping with the region.
 
+#+NAME: genes-GRangesFilter
 #+BEGIN_SRC R
   ## Define the filter
   grf <- GRangesFilter(GRanges("11", ranges = IRanges(114000000, 114000050),
-                               strand = "+"), condition = "overlapping")
+                               strand = "+"), type = "any")
 
   ## Query genes:
   gn <- genes(edb, filter = grf)
@@ -254,6 +291,7 @@ intron is partially overlapping with the region.
 
   ## Next we retrieve all transcripts for that gene so that we can plot them.
   txs <- transcripts(edb, filter = GenenameFilter(gn$gene_name))
+
 #+END_SRC
 
 #+BEGIN_SRC R :ravel tx-for-zbtb16, message=FALSE, fig.align='center', fig.width=7.5, fig.height=5
@@ -276,8 +314,10 @@ region. Below we fetch these 4 transcripts. Note, that a call to =exons= will
 not return any features from the database, as no exon is overlapping with the
 region.
 
+#+NAME: transcripts-GRangesFilter
 #+BEGIN_SRC R
   transcripts(edb, filter = grf)
+
 #+END_SRC
 
 The =GRangesFilter= supports also =GRanges= defining multiple regions and a
@@ -288,52 +328,59 @@ overlapping genomic regions using the =exonsByOverlaps= or
 implementation of these methods for =EnsDb= objects supports also to use filters
 to further fine-tune the query.
 
-To get an overview of allowed/available gene and transcript biotype the
-functions =listGenebiotypes= and =listTxbiotypes= can be used.
+The functions =listGenebiotypes= and =listTxbiotypes= can be used to get an overview
+of allowed/available gene and transcript biotype
 
+#+NAME: biotypes
 #+BEGIN_SRC R
-  ## Get all gene biotypes from the database. The GenebiotypeFilter
+  ## Get all gene biotypes from the database. The GeneBiotypeFilter
   ## allows to filter on these values.
   listGenebiotypes(edb)
 
   ## Get all transcript biotypes from the database.
   listTxbiotypes(edb)
+
 #+END_SRC
 
 Data can be fetched in an analogous way using the =exons= and =genes=
 methods. In the example below we retrieve =gene_name=, =entrezid= and the
 =gene_biotype= of all genes in the database which names start with "BCL2".
 
+#+NAME: genes-BCL2
 #+BEGIN_SRC R
   ## We're going to fetch all genes which names start with BCL. To this end
   ## we define a GenenameFilter with partial matching, i.e. condition "like"
   ## and a % for any character/string.
   BCLs <- genes(edb,
-                columns = c("gene_name", "entrezid", "gene_biotype"),
-                filter = list(GenenameFilter("BCL%", condition = "like")),
-                return.type = "DataFrame")
+		columns = c("gene_name", "entrezid", "gene_biotype"),
+		filter = GenenameFilter("BCL", condition = "startsWith"),
+		return.type = "DataFrame")
   nrow(BCLs)
   BCLs
+
 #+END_SRC
 
 Sometimes it might be useful to know the length of genes or transcripts
 (i.e. the total sum of nucleotides covered by their exons). Below we calculate
 the mean length of transcripts from protein coding genes on chromosomes X and Y
 as well as the average length of snoRNA, snRNA and rRNA transcripts encoded on
-these chromosomes.
+these chromosomes. For the first query we combine two =AnnotationFilter= objects
+using an =AnnotationFilterList= object, in the second we define the query using a
+filter expression.
 
+#+NAME: example-AnnotationFilterList
 #+BEGIN_SRC R
   ## determine the average length of snRNA, snoRNA and rRNA genes encoded on
   ## chromosomes X and Y.
-  mean(lengthOf(edb, of = "tx",
-                filter = list(GenebiotypeFilter(c("snRNA", "snoRNA", "rRNA")),
-                              SeqnameFilter(c("X", "Y")))))
+  mean(lengthOf(edb, of = "tx", filter = AnnotationFilterList(
+                                    GeneBiotypeFilter(c("snRNA", "snoRNA", "rRNA")),
+                                    SeqNameFilter(c("X", "Y")))))
 
   ## determine the average length of protein coding genes encoded on the same
   ## chromosomes.
-  mean(lengthOf(edb, of = "tx",
-                filter = list(GenebiotypeFilter("protein_coding"),
-                              SeqnameFilter(c("X", "Y")))))
+  mean(lengthOf(edb, of = "tx", filter = ~ gene_biotype == "protein_coding" &
+                                    seq_name %in% c("X", "Y")))
+
 #+END_SRC
 
 Not unexpectedly, transcripts of protein coding genes are longer than those of
@@ -342,14 +389,17 @@ snRNA, snoRNA or rRNA genes.
 At last we extract the first two exons of each transcript model from the
 database.
 
+#+NAME: example-first-two-exons
 #+BEGIN_SRC R
   ## Extract all exons 1 and (if present) 2 for all genes encoded on the
   ## Y chromosome
   exons(edb, columns = c("tx_id", "exon_idx"),
-        filter = list(SeqnameFilter("Y"),
-                      ExonrankFilter(3, condition = "<")))
+	filter = list(SeqNameFilter("Y"),
+                      ExonRankFilter(3, condition = "<")))
+
 #+END_SRC
 
+
 * Extracting gene/transcript/exon models for RNASeq feature counting
 
 For the feature counting step of an RNAseq experiment, the gene or transcript
@@ -366,11 +416,11 @@ CDS.
 A simple use case is to retrieve all genes encoded on chromosomes X and Y from
 the database.
 
+#+NAME: transcriptsBy-X-Y
 #+BEGIN_SRC R
-  TxByGns <- transcriptsBy(edb, by = "gene",
-                           filter = list(SeqnameFilter(c("X", "Y")))
-                           )
+  TxByGns <- transcriptsBy(edb, by = "gene", filter = SeqNameFilter(c("X", "Y")))
   TxByGns
+
 #+END_SRC
 
 Since Ensembl contains also definitions of genes that are on chromosome variants
@@ -378,17 +428,19 @@ Since Ensembl contains also definitions of genes that are on chromosome variants
 gene models should be returned.
 
 In a real use case, we might thus want to retrieve all genes encoded on the
-/standard/ chromosomes. In addition it is advisable to use a =GeneidFilter= to
+/standard/ chromosomes. In addition it is advisable to use a =GeneIdFilter= to
 restrict to Ensembl genes only, as also /LRG/ (Locus Reference Genomic)
 genes[fn:3] are defined in the database, which are partially redundant with
 Ensembl genes.
 
-#+BEGIN_SRC R :ravel eval=FALSE
+#+NAME: exonsBy-RNAseq
+#+BEGIN_SRC R :ravel message = FALSE, eval = FALSE
   ## will just get exons for all genes on chromosomes 1 to 22, X and Y.
   ## Note: want to get rid of the "LRG" genes!!!
-  EnsGenes <- exonsBy(edb, by = "gene",
-                      filter = list(SeqnameFilter(c(1:22, "X", "Y")),
-                                    GeneidFilter("ENSG%", "like")))
+  EnsGenes <- exonsBy(edb, by = "gene", filter = AnnotationFilterList(
+                                            SeqNameFilter(c(1:22, "X", "Y")),
+                                            GeneIdFilter("ENSG", "startsWith")))
+
 #+END_SRC
 
 The code above returns a =GRangesList= that can be used directly as an input for
@@ -398,7 +450,8 @@ Alternatively, the above =GRangesList= can be transformed to a =data.frame= in
 /SAF/ format that can be used as an input to the =featureCounts= function of the
 =Rsubread= package [fn:5].
 
-#+BEGIN_SRC R :ravel eval=FALSE
+#+NAME: toSAF-RNAseq
+#+BEGIN_SRC R :ravel message = FALSE, eval=FALSE
   ## Transforming the GRangesList into a data.frame in SAF format
   EnsGenes.SAF <- toSAF(EnsGenes)
 
@@ -413,16 +466,16 @@ In addition, the =disjointExons= function (similar to the one defined in
 =GenomicFeatures=) can be used to generate a =GRanges= of non-overlapping exon
 parts which can be used in the =DEXSeq= package.
 
-#+BEGIN_SRC R :ravel eval=FALSE
+#+NAME: disjointExons
+#+BEGIN_SRC R :ravel message = FALSE, eval=FALSE
   ## Create a GRanges of non-overlapping exon parts.
-  DJE <- disjointExons(edb,
-                       filter = list(SeqnameFilter(c(1:22, "X", "Y")),
-                                     GeneidFilter("ENSG%", "like")))
+  DJE <- disjointExons(edb, filter = AnnotationFilterList(
+				SeqNameFilter(c(1:22, "X", "Y")),
+				GeneIdFilter("ENSG%", "startsWith")))
 
 #+END_SRC
 
 
-
 * Retrieving sequences for gene/transcript/exon models
 
 The methods to retrieve exons, transcripts and genes (i.e. =exons=, =transcripts=
@@ -444,7 +497,8 @@ the package, subset to genes encoded on sequences available in the =FaFile= and
 extract all of their sequences. Note: these sequences represent the sequence
 between the chromosomal start and end coordinates of the gene.
 
-#+BEGIN_SRC R :ravel eval=FALSE
+#+NAME: transcript-sequence-AnnotationHub
+#+BEGIN_SRC R :ravel message = FALSE, eval = FALSE
   library(EnsDb.Hsapiens.v75)
   library(Rsamtools)
   edb <- EnsDb.Hsapiens.v75
@@ -463,16 +517,16 @@ between the chromosomal start and end coordinates of the gene.
   ## all of the gene's exons and introns.
   geneSeqs <- getSeq(Dna, genes)
 
-
 #+END_SRC
 
 To retrieve the (exonic) sequence of transcripts (i.e. without introns) we can
 use directly the =extractTranscriptSeqs= method defined in the =GenomicFeatures= on
 the =EnsDb= object, eventually using a filter to restrict the query.
 
-#+BEGIN_SRC R :ravel eval=FALSE
+#+NAME: transcript-sequence-extractTranscriptSeqs
+#+BEGIN_SRC R :ravel message = FALSE, eval = FALSE
   ## get all exons of all transcripts encoded on chromosome Y
-  yTx <- exonsBy(edb, filter = SeqnameFilter("Y"))
+  yTx <- exonsBy(edb, filter = SeqNameFilter("Y"))
 
   ## Retrieve the sequences for these transcripts from the FaFile.
   library(GenomicFeatures)
@@ -480,11 +534,11 @@ the =EnsDb= object, eventually using a filter to restrict the query.
   yTxSeqs
 
   ## Extract the sequences of all transcripts encoded on chromosome Y.
-  yTx <- extractTranscriptSeqs(Dna, edb, filter = SeqnameFilter("Y"))
+  yTx <- extractTranscriptSeqs(Dna, edb, filter = SeqNameFilter("Y"))
 
   ## Along these lines, we could use the method also to retrieve the coding sequence
   ## of all transcripts on the Y chromosome.
-  cdsY <- cdsBy(edb, filter = SeqnameFilter("Y"))
+  cdsY <- cdsBy(edb, filter = SeqNameFilter("Y"))
   extractTranscriptSeqs(Dna, cdsY)
 
 #+END_SRC
@@ -492,6 +546,7 @@ the =EnsDb= object, eventually using a filter to restrict the query.
 Note: in the next section we describe how transcript sequences can be retrieved
 from a =BSgenome= package that is based on UCSC, not Ensembl.
 
+
 * Integrating annotations from Ensembl based  =EnsDb= packages with UCSC based annotations
 
 Sometimes it might be useful to combine (Ensembl based) annotations from =EnsDb=
@@ -506,14 +561,16 @@ UCSC, NCBI and Ensembl chromosome names for the /main/ chromosomes).
 
 In the example below we change the seqnames style to UCSC.
 
-#+BEGIN_SRC R :ravel message=FALSE
+#+NAME: seqlevelsStyle
+#+BEGIN_SRC R :ravel message = FALSE
   ## Change the seqlevels style form Ensembl (default) to UCSC:
   seqlevelsStyle(edb) <- "UCSC"
 
-  ## Now we can use UCSC style seqnames in SeqnameFilters or GRangesFilter:
-  genesY <- genes(edb, filter = SeqnameFilter("chrY"))
+  ## Now we can use UCSC style seqnames in SeqNameFilters or GRangesFilter:
+  genesY <- genes(edb, filter = ~ seq_name == "chrY")
   ## The seqlevels of the returned GRanges are also in UCSC style
   seqlevels(genesY)
+
 #+END_SRC
 
 Note that in most instances no mapping is available for sequences not
@@ -525,7 +582,8 @@ ones from Ensembl) are returned. With =ensembldb.seqnameNotFound= "MISSING" each
 time a seqname can not be found an error is thrown. For all other cases
 (e.g. =ensembldb.seqnameNotFound = NA=) the value of the option is returned.
 
-#+BEGIN_SRC R
+#+NAME: seqlevelsStyle-2
+#+BEGIN_SRC R :ravel message = FALSE
   seqlevelsStyle(edb) <- "UCSC"
 
   ## Getting the default option:
@@ -550,7 +608,8 @@ the =BSGenome= package for the human genome from UCSC. The specified version
 while we changed the style of the seqnames to UCSC we did not change the naming
 of the genome release.
 
-#+BEGIN_SRC R :ravel warning=FALSE, message=FALSE
+#+NAME: extractTranscriptSeqs-BSGenome
+#+BEGIN_SRC R :ravel warning = FALSE, message = FALSE
   library(BSgenome.Hsapiens.UCSC.hg19)
   bsg <- BSgenome.Hsapiens.UCSC.hg19
 
@@ -560,23 +619,28 @@ of the genome release.
   ## Although differently named, both represent genome build GRCh37.
 
   ## Extract the full transcript sequences.
-  yTxSeqs <- extractTranscriptSeqs(bsg, exonsBy(edb, "tx", filter = SeqnameFilter("chrY")))
+  yTxSeqs <- extractTranscriptSeqs(bsg, exonsBy(edb, "tx",
+						filter = SeqNameFilter("chrY")))
 
   yTxSeqs
 
   ## Extract just the CDS
-  Test <- cdsBy(edb, "tx", filter = SeqnameFilter("chrY"))
-  yTxCds <- extractTranscriptSeqs(bsg, cdsBy(edb, "tx", filter = SeqnameFilter("chrY")))
+  Test <- cdsBy(edb, "tx", filter = SeqNameFilter("chrY"))
+  yTxCds <- extractTranscriptSeqs(bsg, cdsBy(edb, "tx",
+                                             filter = SeqNameFilter("chrY")))
   yTxCds
 
 #+END_SRC
 
 At last changing the seqname style to the default value ="Ensembl"=.
 
+#+NAME: seqlevelsStyle-restore
 #+BEGIN_SRC R
   seqlevelsStyle(edb) <- "Ensembl"
+
 #+END_SRC
 
+
 * Interactive annotation lookup using the =shiny= web app
 
 In addition to the =genes=, =transcripts= and =exons= methods it is possibly to
@@ -586,7 +650,7 @@ search interactively for gene/transcript/exon annotations using the internal,
 to the R workspace either as a =data.frame= or =GRanges= object.
 
 
-* Plotting gene/transcript features using =ensembldb= and =Gviz=
+* Plotting gene/transcript features using =ensembldb= and =Gviz= and =ggbio=
 
 The =Gviz= package provides functions to plot genes and transcripts along with
 other data on a genomic scale. Gene models can be provided either as a
@@ -604,7 +668,7 @@ not necessary if we just want to retrieve gene models from an =EnsDb= object, as
 the =ensembldb= package internally checks the =ucscChromosomeNames= option and,
 depending on that, maps Ensembl chromosome names to UCSC chromosome names.
 
-#+BEGIN_SRC R :ravel gviz-plot, message=FALSE, fig.align='center', fig.width=7.5, fig.height=2.25
+#+BEGIN_SRC R :ravel gviz-plot, message=FALSE, fig.align='center', fig.width=7.5, fig.height=2.3
   ## Loading the Gviz library
   library(Gviz)
   library(EnsDb.Hsapiens.v75)
@@ -630,7 +694,7 @@ depending on that, maps Ensembl chromosome names to UCSC chromosome names.
 Above we had to change the option =ucscChromosomeNames= to =FALSE= in order to
 use it with non-UCSC chromosome names. Alternatively, we could however also
 change the =seqnamesStyle= of the =EnsDb= object to =UCSC=. Note that we have to
-use now also chromosome names in the /UCSC style/ in the =SeqnameFilter=
+use now also chromosome names in the /UCSC style/ in the =SeqNameFilter=
 (i.e. "chrY" instead of =Y=).
 
 #+BEGIN_SRC R :ravel message=FALSE
@@ -652,10 +716,10 @@ different gene region tracks, one for protein coding genes and one for lincRNAs.
 #+BEGIN_SRC R :ravel gviz-separate-tracks, message=FALSE, warning=FALSE, fig.align='center', fig.width=7.5, fig.height=2.25
   protCod <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
                                        start = 20400000, end = 21400000,
-                                       filter = GenebiotypeFilter("protein_coding"))
+                                       filter = GeneBiotypeFilter("protein_coding"))
   lincs <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
                                      start = 20400000, end = 21400000,
-                                     filter = GenebiotypeFilter("lincRNA"))
+                                     filter = GeneBiotypeFilter("lincRNA"))
 
   plotTracks(list(gat, GeneRegionTrack(protCod, name = "protein coding"),
                   GeneRegionTrack(lincs, name = "lincRNAs")), transcriptAnnotation = "symbol")
@@ -665,6 +729,30 @@ different gene region tracks, one for protein coding genes and one for lincRNAs.
 
 #+END_SRC
 
+Alternatively, we can also use =ggbio= for plotting. For =ggplot= we can directly
+pass the =EnsDb= object along with optional filters (or as in the example below a
+filter expression as a =formula=).
+
+#+BEGIN_SRC R :ravel pplot-plot, message=FALSE, fig.align='center', fig.width=7.5, fig.height=4
+  library(ggbio)
+
+  ## Create a plot for all transcripts of the gene SKA2
+  autoplot(edb, ~ genename == "SKA2")
+
+#+END_SRC
+
+To plot the genomic region and plot genes from both strands we can use a
+=GRangesFilter=.
+
+#+BEGIN_SRC R :ravel pplot-plot-2, message=FALSE, fig.align='center', fig.width=7.5, fig.height=4
+  ## Get the chromosomal region in which the gene is encoded
+  ska2 <- genes(edb, filter = ~ genename == "SKA2")
+  strand(ska2) <- "*"
+  autoplot(edb, GRangesFilter(ska2), names.expr = "gene_name")
+
+#+END_SRC
+
+
 
 * Using =EnsDb= objects in the =AnnotationDbi= framework
 
@@ -678,7 +766,8 @@ In the example below we first evaluate all the available columns and keytypes in
 the database and extract then the gene names for all genes encoded on chromosome
 X.
 
-#+BEGIN_SRC R
+#+NAME: AnnotationDbi
+#+BEGIN_SRC R :ravel message = FALSE
   library(EnsDb.Hsapiens.v75)
   edb <- EnsDb.Hsapiens.v75
 
@@ -699,8 +788,9 @@ X.
   length(gids)
 
   ## Get all gene names for genes encoded on chromosome Y.
-  gnames <- keys(edb, keytype = "GENENAME", filter = SeqnameFilter("Y"))
+  gnames <- keys(edb, keytype = "GENENAME", filter = SeqNameFilter("Y"))
   head(gnames)
+
 #+END_SRC
 
 In the next example we retrieve specific information from the database using the
@@ -709,33 +799,37 @@ In the next example we retrieve specific information from the database using the
 we employ the filtering system to perform a more fine-grained query to fetch
 only the protein coding transcripts for these genes.
 
-#+BEGIN_SRC R :ravel warning=FALSE
+#+NAME: select
+#+BEGIN_SRC R :ravel message = FALSE, warning=FALSE
   ## Use the /standard/ way to fetch data.
   select(edb, keys = c("BCL2", "BCL2L11"), keytype = "GENENAME",
-         columns = c("GENEID", "GENENAME", "TXID", "TXBIOTYPE"))
+	 columns = c("GENEID", "GENENAME", "TXID", "TXBIOTYPE"))
 
   ## Use the filtering system of ensembldb
-  select(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
-                          TxbiotypeFilter("protein_coding")),
-         columns = c("GENEID", "GENENAME", "TXID", "TXBIOTYPE"))
+  select(edb, keys = ~ genename %in% c("BCL2", "BCL2L11") &
+                  tx_biotype == "protein_coding",
+	 columns = c("GENEID", "GENENAME", "TXID", "TXBIOTYPE"))
+
 #+END_SRC
 
 Finally, we use the =mapIds= method to establish a mapping between ids and
 values. In the example below we fetch transcript ids for the two genes from the
 example above.
 
-#+BEGIN_SRC R
+#+NAME: mapIds
+#+BEGIN_SRC R :ravel message = FALSE
   ## Use the default method, which just returns the first value for multi mappings.
   mapIds(edb, keys = c("BCL2", "BCL2L11"), column = "TXID", keytype = "GENENAME")
 
   ## Alternatively, specify multiVals="list" to return all mappings.
   mapIds(edb, keys = c("BCL2", "BCL2L11"), column = "TXID", keytype = "GENENAME",
-         multiVals = "list")
+	 multiVals = "list")
 
   ## And, just like before, we can use filters to map only to protein coding transcripts.
   mapIds(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
-                          TxbiotypeFilter("protein_coding")), column = "TXID",
-         multiVals = "list")
+                          TxBiotypeFilter("protein_coding")), column = "TXID",
+	 multiVals = "list")
+
 #+END_SRC
 
 Note that, if the filters are used, the ordering of the result does no longer
@@ -764,41 +858,84 @@ help avoiding them):
 + At present, =EnsDb= support only genes/transcripts for which all of their
   exons are encoded on the same chromosome and the same strand.
 
++ Since a single Ensembl gene ID might be mapped to multiple NCBI Entrezgene IDs
+  methods such as =genes=, =transcripts= etc return a =list= in the ="entrezid"= column
+  of the resulting result object.
 
 
-* Building an transcript-centric database package based on Ensembl annotation
+* Getting or building =EnsDb= databases/packages
 
-The code in this section is not supposed to be automatically executed when the
-vignette is built, as this would require a working installation of the Ensembl
-Perl API, which is not expected to be available on each system. Also, building
-=EnsDb= from alternative sources, like GFF or GTF files takes some time and
-thus also these examples are not directly executed when the vignette is build.
+Some of the code in this section is not supposed to be automatically executed
+when the vignette is built, as this would require a working installation of the
+Ensembl Perl API, which is not expected to be available on each system. Also,
+building =EnsDb= from alternative sources, like GFF or GTF files takes some time
+and thus also these examples are not directly executed when the vignette is
+build.
 
-** Requirements
+** Getting =EnsDb= databases
 
-The =fetchTablesFromEnsembl= function of the package uses the Ensembl Perl API
-to retrieve the required annotations from an Ensembl database (e.g. from the
-main site /ensembldb.ensembl.org/). Thus, to use the functionality to built
-databases, the Ensembl Perl API needs to be installed (see [fn:2] for details).
+Some =EnsDb= databases are available as =R= packages from Bioconductor and can be
+simply installed with the =biocLite= function from the =BiocInstaller= package. The
+name of such annotation packages starts with /EnsDb/ followed by the abbreviation
+of the organism and the Ensembl version on which the annotation
+bases. =EnsDb.Hsapiens.v86= provides thus an =EnsDb= database for homo sapiens with
+annotations from Ensembl version 86.
 
-Alternatively, the =ensDbFromAH=, =ensDbFromGff=, =ensDbFromGRanges= and =ensDbFromGtf=
-functions allow to build EnsDb SQLite files from a =GRanges= object or GFF/GTF
-files from Ensembl (either provided as files or /via/ =AnnotationHub=). These
-functions do not depend on the Ensembl Perl API, but require a working internet
-connection to fetch the chromosome lengths from Ensembl as these are not
-provided within GTF or GFF files.
+Since Bioconductor version 3.5 =EnsDb= databases can also be retrieved directly
+from =AnnotationHub=.
+
+#+NAME: AnnotationHub-query
+#+BEGIN_SRC R :ravel message = FALSE, eval = use_network
+  library(AnnotationHub)
+  ## Load the annotation resource.
+  ah <- AnnotationHub()
+
+  ## Query for all available EnsDb databases
+  query(ah, "EnsDb")
+
+#+END_SRC
+
+We can simply fetch one of the databases.
+
+#+NAME: AnnotationHub-query-2
+#+BEGIN_SRC R :ravel message = FALSE, eval = use_network
+  ahDb <- query(ah, pattern = c("Xiphophorus Maculatus", "EnsDb", 87))
+  ## What have we got
+  ahDb
+
+#+END_SRC
+
+Fetch the =EnsDb= and use it.
+
+#+NAME: AnnotationHub-fetch
+#+BEGIN_SRC R :ravel message = FALSE, eval = FALSE
+  ahEdb <- ahDb[[1]]
+
+  ## retriebe all genes
+  gns <- genes(ahEdb)
+
+#+END_SRC
+
+We could even make an annotation package from this =EnsDb= object using the
+=makeEnsembldbPackage= and passing =dbfile(dbconn(ahEdb))= as =ensdb= argument.
 
 
 ** Building annotation packages
 
-The functions below use the Ensembl Perl API to fetch the required data directly
-from the Ensembl core databases. Thus, the path to the Perl API specific for the
-desired Ensembl version needs to be added to the =PERL5LIB= environment variable.
+*** Directly from Ensembl databases
 
-An annotation package containing all human genes for Ensembl version 75 can be
-created using the code in the block below.
+The =fetchTablesFromEnsembl= function uses the Ensembl Perl API
+to retrieve the required annotations from an Ensembl database (e.g. from the
+main site /ensembldb.ensembl.org/). Thus, to use this functionality to build
+databases, the Ensembl Perl API needs to be installed (see [fn:2] for details).
+
+Below we create an =EnsDb= database by fetching the required data directly from
+the Ensembl core databases. The =makeEnsembldbPackage= function is then used to
+create an annotation package from this =EnsDb= containing all human genes for
+Ensembl version 75.
 
-#+BEGIN_SRC R :ravel eval=FALSE
+#+NAME: edb-from-ensembl
+#+BEGIN_SRC R :ravel message = FALSE, eval = FALSE
   library(ensembldb)
 
   ## get all human gene/transcript/exon annotations from Ensembl (75)
@@ -828,6 +965,20 @@ thaliana), the /Ensembl genomes/ should be specified as a host, i.e. setting
 =host= to "mysql-eg-publicsql.ebi.ac.uk", =port= to =4157= and =species= to
 e.g. "arabidopsis thaliana".
 
+
+*** From a GTF or GFF file
+
+Alternatively, the =ensDbFromAH=, =ensDbFromGff=, =ensDbFromGRanges= and =ensDbFromGtf=
+functions allow to build EnsDb SQLite files from a =GRanges= object or GFF/GTF
+files from Ensembl (either provided as files or /via/ =AnnotationHub=). These
+functions do not depend on the Ensembl Perl API, but require a working internet
+connection to fetch the chromosome lengths from Ensembl as these are not
+provided within GTF or GFF files. Also note that protein annotations are usually
+not available in GTF or GFF files, thus, such annotations will not be included
+in the generated =EnsDb= database - protein annotations are only available in
+=EnsDb= databases created with the Ensembl Perl API (such as the ones provided
+through =AnnotationHub= or as Bioconductor packages).
+
 In the next example we create an =EnsDb= database using the =AnnotationHub=
 package and load also the corresponding genomic DNA sequence matching the
 Ensembl version. We thus first query the =AnnotationHub= package for all
@@ -837,8 +988,8 @@ then use the =getGenomeFaFile= method on the =EnsDb= to directly look up and
 retrieve the correct or best matching =FaFile= with the genomic DNA sequence. At
 last we retrieve the sequences of all exons using the =getSeq= method.
 
-
-#+BEGIN_SRC R :ravel eval=FALSE
+#+NAME: gtf-gff-edb
+#+BEGIN_SRC R :ravel message = FALSE, eval = FALSE
   ## Load the AnnotationHub data.
   library(AnnotationHub)
   ah <- AnnotationHub()
@@ -860,7 +1011,7 @@ last we retrieve the sequences of all exons using the =getSeq= method.
   Dna <- getGenomeFaFile(edb)
   library(Rsamtools)
   ## We next retrieve the sequence of all exons on chromosome Y.
-  exons <- exons(edb, filter = SeqnameFilter("Y"))
+  exons <- exons(edb, filter = SeqNameFilter("Y"))
   exonSeq <- getSeq(Dna, exons)
 
   ## Alternatively, look up and retrieve the toplevel DNA sequence manually.
@@ -869,39 +1020,41 @@ last we retrieve the sequences of all exons using the =getSeq= method.
 #+END_SRC
 
 In the example below we load a =GRanges= containing gene definitions for genes
-encoded on chromosome Y and generate a EnsDb SQLite database from that
+encoded on chromosome Y and generate a =EnsDb= SQLite database from that
 information.
 
-#+BEGIN_SRC R :ravel message=FALSE
+#+NAME: EnsDb-from-Y-GRanges
+#+BEGIN_SRC R :ravel message = FALSE, eval = use_network
   ## Generate a sqlite database from a GRanges object specifying
   ## genes encoded on chromosome Y
   load(system.file("YGRanges.RData", package = "ensembldb"))
   Y
 
+  ## Create the EnsDb database file
   DB <- ensDbFromGRanges(Y, path = tempdir(), version = 75,
 			 organism = "Homo_sapiens")
 
+  ## Load the database
   edb <- EnsDb(DB)
   edb
 
-  ## As shown in the example below, we could make an EnsDb package on
-  ## this DB object using the makeEnsembldbPackage function.
-
 #+END_SRC
 
 
 Alternatively we can build the annotation database using the =ensDbFromGtf=
-=ensDbFromGff= functions, that extracts most of the required data from a GTF
-respectively GFF (version 3) file which can be downloaded from Ensembl (e.g. from
-ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens for human gene definitions
-from Ensembl version 75; for plant genomes etc files can be retrieved from
-ftp://ftp.ensemblgenomes.org). All information except the chromosome lengths and
-the NCBI Entrezgene IDs can be extracted from these GTF files. The function also
-tries to retrieve chromosome length information automatically from Ensembl.
+=ensDbFromGff= functions, that extract most of the required data from a GTF
+respectively GFF (version 3) file which can be downloaded from Ensembl
+(e.g. from ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens for human gene
+definitions from Ensembl version 75; for plant genomes etc, files can be
+retrieved from ftp://ftp.ensemblgenomes.org). All information except the
+chromosome lengths, the NCBI Entrezgene IDs and protein annotations can be
+extracted from these GTF files. The function also tries to retrieve chromosome
+length information automatically from Ensembl.
 
 Below we create the annotation from a gtf file that we fetch directly from Ensembl.
 
-#+BEGIN_SRC R :ravel eval=FALSE
+#+NAME: EnsDb-from-GTF
+#+BEGIN_SRC R :ravel message = FALSE, eval = FALSE
   library(ensembldb)
 
   ## the GTF file can be downloaded from
@@ -925,14 +1078,13 @@ Below we create the annotation from a gtf file that we fetch directly from Ensem
 * Database layout<<section.database.layout>>
 
 The database consists of the following tables and attributes (the layout is also
-shown in Figure [[fig.database.layout]]):
+shown in Figure [[fig.database.layout]]). Note that the protein-specific annotations
+might not be available in all =EnsDB= databases (e.g. such ones created with
+=ensembldb= version < 1.7 or created from GTF or GFF files).
 
 + *gene*: all gene specific annotations.
   - =gene_id=: the Ensembl ID of the gene.
   - =gene_name=: the name (symbol) of the gene.
-  - =entrezid=: the NCBI Entrezgene ID(s) of the gene. Note that this can be a
-    =;= separated list of IDs for genes that are mapped to more than one
-    Entrezgene.
   - =gene_biotype=: the biotype of the gene.
   - =gene_seq_start=: the start coordinate of the gene on the sequence (usually
     a chromosome).
@@ -941,6 +1093,11 @@ shown in Figure [[fig.database.layout]]):
   - =seq_strand=: the strand on which the gene is encoded.
   - =seq_coord_system=: the coordinate system of the sequence.
 
++ *entrezgene*: mapping of Ensembl genes to NCBI Entrezgene identifiers. Note that
+  this mapping can be a one-to-many mapping.
+  - =gene_id=: the Ensembl gene ID.
+  - =entrezid=: the NCBI Entrezgene ID.
+
 + *tx*: all transcript related annotations. Note that while no =tx_name= column
   is available in this database column, all methods to retrieve data from the
   database support also this column. The returned values are however the ID of
@@ -970,11 +1127,39 @@ shown in Figure [[fig.database.layout]]):
   - =seq_length=: the length of the sequence.
   - =is_circular=: whether the sequence in circular.
 
-+ *information*: some additional, internal, informations (Genome build, Ensembl
++ *protein*: provides protein annotation for a (coding) transcript.
+  - =protein_id=: the Ensembl protein ID.
+  - =tx_id=: the transcript ID which CDS encodes the protein.
+  - =protein_sequence=: the peptide sequence of the protein (translated from the
+    transcript's coding sequence after applying eventual RNA editing).
+
++ *uniprot*: provides the mapping from Ensembl protein ID(s) to Uniprot ID(s). Not
+  all Ensembl proteins are annotated to Uniprot IDs, also, each Ensembl protein
+  might be mapped to multiple Uniprot IDs.
+  - =protein_id=: the Ensembl protein ID.
+  - =uniprot_id=: the Uniprot ID.
+  - =uniprot_db=: the Uniprot database in which the ID is defined.
+  - =uniprot_mapping_type=: the type of the mapping method that was used to assign
+    the Uniprot ID to an Ensembl protein ID.
+
++ *protein_domain*: provides protein domain annotations and mapping to proteins.
+  - =protein_id=: the Ensembl protein ID on which the protein domain is present.
+  - =protein_domain_id=: the ID of the protein domain (from the protein domain
+    source).
+  - =protein_domain_source=: the source/analysis method in/by which the protein
+    domain was defined (such as pfam etc).
+  - =interpro_accession=: the Interpro accession ID of the protein domain.
+  - =prot_dom_start=: the start position of the protein domain within the
+    protein's sequence.
+  - =prot_dom_end=: the end position of the protein domain within the protein's
+    sequence.
+
++ *metadata*: some additional, internal, informations (Genome build, Ensembl
   version etc).
-  - =key=
+  - =name=
   - =value=
 
+
 + /virtual/ columns:
   - =symbol=: the database does not have such a database column, but it is still
     possible to use it in the =columns= parameter. This column is /symlinked/ to the
@@ -982,6 +1167,9 @@ shown in Figure [[fig.database.layout]]):
   - =tx_name=: similar to the =symbol= column, this column is /symlinked/ to the =tx_id=
     column.
 
+The database layout: as already described above, protein related annotations
+(green) might not be available in each =EnsDb= database.
+
 #+ATTR_LATEX: :center :placement [h!] :width 14cm
 #+NAME: fig.database.layout
 #+CAPTION: Database layout.
@@ -1004,7 +1192,7 @@ shown in Figure [[fig.database.layout]]):
 
 * Installing the Ensembl database locally and building new packages :noexport:
 :PROPERTIES:
-:eval: never
+:header-args: :eval never
 :END:
 
 This section covers the local installation of a new Ensembl database on my
@@ -1023,10 +1211,12 @@ Start the server using =mysql.server start=.
 
   ## Download and install the Ensembl core database
   perl installEnsembldb.pl -e 85 -d homo_sapiens_core_85_38
+
 #+END_SRC
 
 
 
+
 * TODOs								   :noexport:
 
 ** DONE Fix the =ensembldb:::EnsDb= call in /zzz.R/ of the package template!
@@ -1127,7 +1317,7 @@ to the chromosome names. That way, =EnsDb= databases could directly work with
 
 + If something is queried from the database, the ="chr"= has to be stripped
   off. Here we have to deal with the filters:
-+ [X] =SeqnameFilter=: this now always returns stripped chr names, if =EnsDb= is
++ [X] =SeqNameFilter=: this now always returns stripped chr names, if =EnsDb= is
   also submitted.
 + [X] =GRangesFilter=
   and eventually using their =value= method:
@@ -1257,7 +1447,7 @@ Specifically, use =mapSeqlevels=
   - [X] =genes= uses =getWhat= and =seqinfo= (restricting to used seqnames).
   - [X] =transcripts= uses =getWhat= and =seqinfo= (restricting to used seqnames).
   - [X] =transcriptsBy= uses =getWhat= and =seqinfo= (restricting to used seqnames).
-  - [X] =SeqnameFilter=: always calling =formatSeqnamesForQuery=, does *not*
+  - [X] =SeqNameFilter=: always calling =formatSeqnamesForQuery=, does *not*
     allow =NA= values, thus doesn't work if the seqname can not be changed to
     Ensembl style.
   - [X] =GRangesFilter=: always calls =formatSeqnamesForQuery=.
@@ -1297,7 +1487,7 @@ Support multiple regions for a =GRangesFilter=.
   ## Convert variant coordinates to genomic coordinates
   tx <- "ENST00000070846"
   ## Get the cds
-  txCds <- cdsBy(edb, by="tx", filter=TxidFilter(tx))
+  txCds <- cdsBy(edb, by="tx", filter=TxIdFilter(tx))
 
   ## ENST00000070846:c.1643delG
   varPos <- 1643
@@ -1350,8 +1540,10 @@ Support multiple regions for a =GRangesFilter=.
    - State "DONE"       from "TODO"       [2016-09-16 Fri 15:27]
 
 Done in issues #4 and #5.
-** TODO What about using pipe and /formula-like/ filters?
+** DONE What about using pipe and /formula-like/ filters?
+   CLOSED: [2017-03-27 Mon 09:35]
 
+   - State "DONE"       from "TODO"       [2017-03-27 Mon 09:35]
 ** DONE Fix the =select= method such that it always returns the values in the same order than the keys were
    CLOSED: [2016-09-16 Fri 15:26]
    - State "DONE"       from "TODO"       [2016-09-16 Fri 15:26]
@@ -1367,3 +1559,560 @@ I have to check that; eventually do that based on an user option, or even better
 on an internal property, which can be set by =returnFilterCols(edb) <- TRUE/FALSE=.
 
 Done in issue #6.
+
+** CANCELED Integration with =Organism.dplyr=
+   CLOSED: [2017-02-10 Fri 15:22]
+
+   - State "CANCELED"   from "TODO"       [2017-02-10 Fri 15:22] \\
+     No need to perform this - we have now a dedicated =AnnotationFilter= package for
+     this.
+ To integrate =ensembldb= with =Organism.dplyr= we export database tables in an
+ /un-normalized/ form so that it can be stored into a SQLite database for =dplyr=.
+** DONE Use =filters= as they are used in =Organism.dplyr=
+   CLOSED: [2017-03-22 Wed 06:58]
+
+   - State "DONE"       from "TODO"       [2017-03-22 Wed 06:58]
+i.e. dynamically create filters. Check if we could do that.
+
+#+BEGIN_SRC R
+  library(Organism.dplyr)
+  ## library(ensembldb)
+
+  Tx_idFilter(value = 3, condition = "==")
+  Tx_nameFilter(value = c("dfda", "sdfsd"))
+#+END_SRC
+
+Now, their filters are created /dynamically/, the first part of the name being the
+attribute (field) name followed by /Filter/. How could I use these? Problem comes
+since my attributes are not unique, i.e. present in one table only.
+
+** TODO Implement a different type of filtering
+
+Implement a filtering that does allow calls like
+
+#+BEGIN_EXAMPLE
+  genes(filter(edb, GeneidFilter("a")))
+#+END_EXAMPLE
+
+This should also enable
+
+#+BEGIN_EXAMPLE
+  filter(edb, GeneidFilter(""a)) %>% genes()
+#+END_EXAMPLE
+
+The idea would be to add filter(s) as =AnnotationFilterList= object(s) to the
+=EnsDb= object. Eventually by binding/adding it to the =.properties= slot. There are
+even the =properties=, =getProperty=, =dropProperty= and =setProperty= methods (check
+/Methods.R/.
+
+
+
+** DONE Interpret R logical conditions
+   CLOSED: [2017-03-22 Wed 06:58]
+
+   - State "DONE"       from "TODO"       [2017-03-22 Wed 06:58]
+That would be the coolest thing ever, if we could use filters like
+
+#+BEGIN_EXAMPLE
+  genes(edb, filter = gene_id == "BCL2")
+#+END_EXAMPLE
+
+For simple things that would work, but it would be quite tricky to use
+combinations, especially if they are enclosed in brackets!
+
+I could basically
++ split by =&= and =|=.
++ split each of the resulting elements by the supported conditions.
+
+Actually it would be better to replace first all =&= by =@&@=.
+
+#+BEGIN_SRC R
+  res <- quote(gene_id == "abc" & seq_name == "X")
+  class(res)
+
+  eval(res)
+
+  as.character(res)
+  ## Oh, interesting!
+
+  myCall <- quote((gene_id == "a" | gene_id == "b") & seq_name == "Y")
+
+  all.names(myCall)
+
+  res <- as.character(myCall)
+  res[1]
+  res[2]
+  res[3]
+  ## hm, further split the second?
+  as.character(parse(text = res[2]))  ## nope
+
+  as.character(substitute(res[2]))
+  class(substitute(gene_id == "a")) ## hm, similar to quote...
+
+  deparse(res[[2]])
+  res[2]
+  parse(text = res[2]) ## OK, have an expression now.
+
+  library(pryr)
+  as.character(ast(gene_id == "abc"))
+
+  as.symbol(res[2])
+
+  c2 <- quote(gene_id %in% c(2, 3, 5))
+
+  eval(parse(text = c2[3])) ## would have to eval c( and :
+
+  c3 <- quote(gene_id %in% c(2, 3, 5) & (bbla > 5 | g < 5) & ggg == 3)
+  res <- as.character(c3)
+
+  quote(eval(parse(text = res[2])))
+  parse(text = res[2])  ## It's an expression, need a call.
+  (parse(text = res[2]))
+
+  myE <- new.env()
+  library(AnnotationFilter)
+  myE$gene_id <- GeneIdFilter
+
+  eval(3 == 3, envir = myE)
+  myE$`==` <- function(x) {cat(x)}
+
+  ## START HERE
+  myL <- list()
+  myL$`==` <- function(x, y) cat(as.character(quote(x)), " - ", y, "\n")
+
+  myL$`&` <- function(a, b) {
+      cat("----- & ----\n")
+      cat("a: ", class(a), " ", a, "\n")
+      cat("b: ", class(b), " ", b, "\n")
+      cat("----- & DONE ----\n")
+  }
+
+  eval(quote(gene_id == 4), envir = myL)
+  eval(quote(4 & 2), envir = myL)
+
+  eval(quote(gene_id == 4 & 2), envir = myL)
+
+  eval(quote(gene_id == 4 & other_id == 3), envir = myL)
+
+  res <- quote(gene_id == "abc" & seq_name == "X")
+  eval(res, envir = myL)
+
+  secL <- list()
+  secL$`==` <- function(x, y) cat(as.character(quote(x)), "==", eval(y))
+  secL$`&` <- function(a, b) cat(a, "and", b)
+
+  eval(res, envir = secL)
+
+  thiL <- list()
+  thiL$`==` <- function(x, y) paste0(as.character(quote(x)), " == ", eval(y))
+  thiL$`==` <- function(x, y) {
+      ## xName <- substitute(x)
+      ## cat(length(xName))
+      ## cat(class(xName))
+      ## cat(xName)
+      ## if (!is.null(fun <- get0(x, inherits = FALSE)))
+      ##     cat("x", x , "found")
+      ## else
+      ##     cat("x", as.character(x), "not found")
+      ## if (exists(x))
+      do.call(x, list(y, "=="))
+      ## cond <- " == "
+      ## y <- paste0("'", eval(y), "'")
+      ## if (length(y) > 1) {
+      ##     y <- paste0("(", paste0(y, collapse = ","), ")")
+      ##     cond <- " in "
+      ## }
+      ## paste0(as.character(quote(x)), cond, y)
+  }
+  thiL$gene_id <- function(val, cond) {
+      val <- paste0("'", val, "'")
+      if (length(val) > 1) {
+          if (cond == "==")
+              cond <- "in"
+          val <- paste0("(", paste0(val, collapse = ","), ")")
+      }
+      return(paste("gene_id", cond, val))
+  }
+  thiL$seq_name <- function(val, cond) {
+      val <- paste0("'", val, "'")
+      if (length(val) > 1) {
+          if (cond == "==")
+              cond <- "in"
+          val <- paste0("(", paste0(val, collapse = ","), ")")
+      }
+      return(paste("seq_name", cond, val))
+  }
+  thiL$`&` <- function(a, b) paste0(a, " and ", b)
+  thiL$`>` <- function(a, b) {
+      ## That's the only way I can check that this exists and is valid! not that
+      ## we've got a variable defined somewhere.
+      tryCatch(
+          cat(is.function(a))
+	, error = function(e) {
+            stop("Nono, -", deparse(substitute(a)), "-", e)
+	})
+  }
+  ## Have to extract the stuff from the error string!!!
+
+  eval(quote(gene_id == "abc"), envir = thiL)
+
+  eval(quote(gene_id == "abc" & seq_name == 1:3), envir = thiL)
+
+  ## That's the point - how to catch if the key can not be found???
+  eval(quote(bla_id == "adf"), envir = thiL)
+  eval(quote(bla_id > 2), envir = thiL)
+  eval(quote(gene_id > 2), envir = thiL)
+
+  blu <- 3
+  eval(quote(blu > 2), envir = thiL)
+
+  tt <- function(a, b) {
+      cat(as.character(a))
+  }
+
+  tt(quote(gene_id), 4)
+#+END_SRC
+
+OK, it /should/ work: bind a function to e.g. =gene_id= that is supposed to return
+the result. Bind also a function to /==/, /&/ and all other possible operators, /&/
+and /|/ just concatenating the elements, but /==/ calling the function bound to the
+first passed argument. I can check for an existing column using the
+=exists("gene_id")= function.
+
+
+** DONE Ensure all depending packages work with =AnnotationFilter=
+   CLOSED: [2017-05-16 Tue 06:24]
+
+   - State "DONE"       from "TODO"       [2017-05-16 Tue 06:24]
++ [X] =biovizBase=: *has to be fixed*. Uses filter classes from =ensembldb=. Forked
+  the repo from github mirror and fixed it in version 1.23.3 (i.e. import
+  filters from =AnnotationFilter= instead.
++ [X] =Gviz=: OK if =biovizBase= is fixed.
++ [X] =ChIPpeakAnno=: OK if =biovizBase= is fixed.
++ [X] =Pbase=: *has to be fixed*.
++ [X] =TVTB=: added an issue to https://github.com/kevinrue/TVTB/issues/5. Just
+  needs to import the filters from =AnnotationFilter= instead of =ensembldb=.
++ [X] =VariantFiltering=: OK once =biovizBase= builds. Probably due to that.
++ [X] =chimeraviz=: added an issue to
+  https://github.com/stianlagstad/chimeraviz/issues/3. Just needs to import the
+  =GeneIdFilter= from =AnnotationFilter= instead.
++ [X] =ggbio=: *has to be fixed*.
+
+To fix it:
+1) Install =AnnotationFilter=.
+2) Disable the =Gviz= and =ggbio= vignette and (momentarily) remove =Gviz= suggestion
+   (from DESCRIPTION and vignette depends).
+3) Install/fix =biocvizBase=.
+4) Install/fix =ggio=.
+5) Install/fix =Pbase=.
+
+The remaining packages (=Gviz=, =alpine=, =ChIPpeakAnno=).
+
+
+Steps when =AnnotationFilter= is accepted:
++ [X] Contact Micheal Lawrence that =biovizBase= and =ggbio= should be fixed (have
+  patches).
++ [X] Push new =ensembldb= package.
++ [X] Contact developers of =chimeraviz= and =TVTB= and =wiggleplotr=.
+
+** DONE Fix/check packages failing to build for Bioc 3.5
+   CLOSED: [2017-05-16 Tue 06:24]
+
+   - State "DONE"       from "TODO"       [2017-05-16 Tue 06:24]
+A
++ [ ] affycoretools: because of ReportingTools
++ [ ] AgiMicroRna: because of affycoretools
++ [X] AllelicImbalance: because of Gviz
++ [X] ASpli: because of Gviz
+
+B
++ [ ] BgeeDB ? not related to ensembldb
++ [X] biomvRCNS: because of Gviz
++ [X] biovizBase: *depends* on ensembldb!!! Has been fixed. XXXX
++ [X] BubbleTree: because of biovizBase
+
+C
++ [X] CAFE: because of biovizBase
++ [X] ChAMP: because of DMRcate
++ [X] Chicago: because of GenomicInteractions
++ [X] chimeraviz: *depends* on ensembldb!!! XXXX
++ [X] ChIPexoQual: depends on biovizBase
++ [X] ChIPpeakAnno: *depends* on ensembldb XXXX, but BUILDS.
++ [X] CINdex: depends on biovizBase.
++ [X] CNEr: depends on Gviz.
++ [X] coMET: depends on Gviz.
++ [X] compEpiTools: depends on methylPipe.
++ [X] cummeRbund: depends on Gviz.
+
+D
++ [X] DeepBlueR: depends on Gviz.
++ [X] derfinder: depends on biovizBase.
++ [X] derfinderPlot: depends on derfinder, biovizBase
++ [X] DMRcate: depends on Gviz.
++ [X] DMRforPairs: depends on Gviz.
+
+E
++ [ ] EnrichmentBrowser: depends on GSEABase.
+
+F
++ [X] FourCSeq: depends on ggbio.
+
+G
++ [X] GeneGeneInteR: depends on GGtools.
++ [X] GenomicInteractions: depends on Gviz.
++ [X] GGBase: depends on GGtools.
++ [X] ggbio: *depends* on ensembldb!!!! XXXX
++ [X] GGtools: depends on Gviz.
++ [X] GoogleGenomics: depends on ggbio.
++ [X] gQTLBase: depends on GGtools.
++ [ ] GSEABase: depends on ReportingTools.
++ [X] Gviz: depends on biovizBase.
++ [X] gwascat: depends on Gviz, ggbio.
+
+H
+I
++ [X] InPAS: depends on Gviz.
++ [X] intansv: depends on ggbio.
+
+J
+
+K
++ [X] karyoploteR: depends on biovizBase.
+
+L
++ [X] ldblock: depends on gwascat.
+
+M
++ [X] MEAL: depends on DMRcate.
++ [X] meshr: depends on cummeRbund.
++ [X] methyAnalysis: depends on Gviz.
++ [X] methylPipe: depends on Gviz.
++ [X] motifbreakR: depends on Gviz.
+
+N
++ [X] NADfinder: depends on trackViewer.
+
+P
++ [ ] Pbase: *depends* on ensembldb!!! XXXX Fixed/not fixed.
++ [X] pepStat: depends on Pviz.
++ [X] Pi: depends on ggbio.
++ [X] PING: depends on Gviz.
++ [X] pqsfinder: depends on Gviz. -> biomaRt error.
++ [X] Pviz: depends on Gviz.
+
+Q
++ [X] qrqc: depends on biovizBase.
++ [X] QuasR: depends on Gviz.
+
+R
++ [X] R3CPET: depends on ggbio.
++ [X] RareVariantVis: depends on VariantFiltering.
++ [X] Rariant: depends on ggbio.
++ [ ] ReportingTools: depends on ggbio. PFAM.db not available.
++ [X] RiboProfiling: depends on ggbio
++ [X] Rqc: depends on biovizBase.
+
+S
++ [X] SomaticSignatures: depends on ggbio.
++ [X] spliceR: depends on cummeRbund.
++ [X] SplicingGraphs: depends on Gviz.
++ [X] SPLINTER: depends on Gviz.
++ [X] STAN: depends on Gviz.
+
+T
++ [X] trackViewer: depends on Gviz.
+
+V
++ [X] VariantFiltering: depends on Gviz.
++ [X] vtpnet: depends on gwascat.
+
+W
++ [ ] wiggleplotr: *depends* on ensembldb!!!! XXXX
+
+Y
++ [X] YAPSA: depends on SomaticSignatures.
+
+
+Base on =ensembldb=:
++ [X] =biovizBase=:
++ [X] =chimeraviz=:
++ [X] =ChIPpeakAnno=:
++ [X] =ggbio=:
++ [ ] =Pbase=:
++ [ ] =wiggleplotr=:
+** TODO entrezid in separate database table
+
++ [X] Perl script to save =entrezid= into a separate table =entrezgene=.
++ [X] Import script to create the additional table and indices (=gene_id= and
+  =entrezid=).
++ [X] Concatenate on SQL levels? =group_concat(X,Y)=. NO! Return the result as a
+  list.
++ [X] Test if queries work for genes that don't have an entry in =entrezid=,
+  otherwise save just the =gene_id= into the table without =entrezid=. Using a =left
+  outer join= seems to fix that.
++ [X] Different SQL queries depending on DBSCHEMA version: extract the
+  DBSCHEMAVERSION using the =dbSchemaVersion= function (passing the =EnsDb=). Seems
+  to work out of the box - no need to make schema dependent calls.
+
++ [X] Put =entrezid= as a =list= into =GRanges=? The point is we have to collapse the
+  entries we have to specify by what. E.g. by =gene_id= if the call is =genes=, by
+  =exon_id= if the call is =exons= or =exonsBy= etc. WORKS.
++ [X] Validity dependent on DB schema.
++ [ ] Build from GRanges: use database version 2.0 schema?
++ [X] Update documentation: mention that column entrezid is a =list=.
++ [X] Update vignette: mention that column entrezid is a =list= and update the
+  database layout.
++ [X] Fix =select=.
++ [X] Fix =mapIds=.
++ [X] Check the package on the database with DBSCHEMAVERSION 1.0.
++ [X] Check the package on the database with DBSCHEMAVERSION 2.0.
+
+
+Some test code below.
+#+BEGIN_SRC R :eval never
+  library(ensembldb)
+  library(testthat)
+
+  edb <- EnsDb("/Users/jo/tmp/ensdb_20/EnsDb.Hsapiens.v88.sqlite")
+
+  ensembldb:::dbSchemaVersion(edb)
+
+  system.time(gns1 <- genes(edb, return.type = "data.frame")) ## 0.677 sec
+  system.time(gns2 <- genes(edb, return.type = "data.frame",
+			    columns = c(listColumns(edb, "gene"), "entrezid"))) ## 1.5
+
+  all(unique(gns1$gene_id) == unique(gns2$gene_id))
+  expect_equal(gns1$gene_id, gns2$gene_id)
+
+  ## Seems to work...
+  gns2 <- genes(edb, columns = c(listColumns(edb, "gene"), "entrezid"))
+
+  ## Check for transcripts
+  ## transcripts
+  system.time(tx1 <- transcripts(edb))  ## 3.2 sec
+  system.time(tx2 <- transcripts(
+		  edb, columns = c(listColumns(edb, "tx"), "entrezid")))  ## 5.5
+  expect_equal(length(tx1), length(tx2))
+  expect_equal(mcols(tx1), mcols(tx2)[, -ncol(mcols(tx2))])
+  expect_equal(names(tx1), names(tx2))
+
+  ## transcriptsBy
+  tx1 <- transcriptsBy(edb)
+  tx2 <- transcriptsBy(edb, columns = c(listColumns(edb, "tx"), "entrezid"))
+  expect_equal(length(tx1), length(tx2))
+  expect_equal(mcols(tx1), mcols(tx2)[, -ncol(mcols(tx2))])
+  expect_equal(names(tx1), names(tx2))
+
+
+  ## Check for exons
+  ## exons
+  ex1 <- exons(edb)
+  ex2 <- exons(edb, columns = c(listColumns(edb, "exon"), "entrezid"))
+  expect_equal(length(ex1), length(ex2))
+  expect_equal(names(ex1), names(ex2))
+  ## Are all entrezids unique?
+  lens <- lengths(ex2$entrezid)
+  lens_2 <- lengths(lapply(ex2$entrezid, unique))
+  expect_equal(lens, lens_2)
+
+  ## exonsBy
+  ex1 <- exonsBy(edb)
+  ex2 <- exonsBy(edb, columns = c(listColumns(edb, "exon"), "entrezid"))
+  all.equal(names(ex1), names(ex2))
+  expect_equal(length(ex1), length(ex2))
+  expect_equal(mcols(ex1), mcols(ex2)[, -ncol(mcols(ex2))])
+
+  ## cdsBy
+  cs1 <- cdsBy(edb)
+  cs2 <- cdsBy(edb, columns = c("entrezid"))
+  all.equal(names(cs1), names(cs2))
+  expect_equal(length(cs1), length(cs2))
+  expect_equal(mcols(cs1), mcols(cs2)[, -1])
+
+  ## threeUTRsByTranscript
+  tu1 <- threeUTRsByTranscript(edb)
+  tu2 <- threeUTRsByTranscript(edb, columns = "entrezid")
+  all.equal(names(tu1), names(tu2))
+  expect_equal(length(tu1), length(tu2))
+  expect_equal(mcols(tu1), mcols(tu2)[, -1])
+  ## fiveUTRsByTranscript
+  fu1 <- fiveUTRsByTranscript(edb)
+  fu2 <- fiveUTRsByTranscript(edb, columns = "entrezid")
+  all.equal(names(fu1), names(fu2))
+  expect_equal(length(fu1), length(fu2))
+  expect_equal(mcols(fu1), mcols(fu2)[, -1])
+
+  ## proteins
+  pr1 <- proteins(edb)
+  pr2 <- proteins(edb, columns = c(listColumns(edb, "protein"), "entrezid"))
+  all.equal(pr1$protein_id, pr2$protein_id)
+  expect_equal(pr1, pr2[, -ncol(pr2)])
+
+
+  tmp <- ensembldb:::getWhat(edb, columns = c(listColumns(edb, "gene"), "entrezid"))
+
+  system.time(tmp_u <- unique(tmp[, -ncol(tmp)]))  ## 0.194
+
+  system.time(tmp_1 <- .collapseEntrezidInTable(tmp, by = "gene_id"))
+  system.time(tmp_2 <- ensembldb:::.collapseEntrezidInTable(tmp, by = "gene_id"))
+
+  expect_equal(tmp_1, tmp_2)
+
+
+  ## Check if we could do it faster...
+  system.time(ids <- apply(tmp[, -ncol(tmp)], MARGIN = 1, FUN = paste0, collapse = ""))
+
+  system.time(egs <- split(tmp$entrezid,
+			   f = factor(tmp$gene_id, levels = unique(tmp$gene_id))))  ## 0.019
+  system.time(egs <- lapply(egs, unique))  ## 0.6
+
+  system.time(eg2 <- aggregate(tmp$entrezid,
+			       by = list(factor(tmp$gene_id,
+					   levels = unique(tmp$gene_id))),
+			       FUN = unique))
+
+  system.time(tmp <- unique(gns2[, colnames(gns2) != "entrezid"]))  ## 0.201
+
+  system.time(tmp2 <- gns2[match(names(egs), gns2$gene_id), ])  ## 0.029
+
+  all.equal(tmp, tmp2[, -ncol(tmp2)])
+
+  DF <- DataFrame(tmp2)
+  DF$entrezid <- egs
+
+  system.time(Test <- .collapseEntrezidInTable(gns2))  ## 0.05
+#+END_SRC
+
+Testing select etc methods:
+#+BEGIN_SRC R
+  library(ensembldb)
+  library(testthat)
+
+  edb <- EnsDb("/Users/jo/tmp/ensdb_20/EnsDb.Hsapiens.v88.sqlite")
+
+  all <- select(edb) ## THAT SHOULD WORK!
+  all <- select(edb, keys = ~ symbol == "BCL2")
+
+  gns <- genes(edb)
+
+  ## Gene with multiple entrezgenes
+  all <- select(edb, keys = ~ symbol == "DDX11L1")
+
+  all_u <- unique(all[, -1])
+  n_entrez <- length(unique(all[, 1]))
+  ## Expect that the nrow of 'all' is:
+  expect_equal(nrow(all_u) * n_entrez, nrow(all))
+
+  ## Looks OK.
+  vals <- mapIds(edb, keys = "DDX11L1", column = "ENTREZID", keytype = "SYMBOL",
+		 multiVals = "list")
+  expect_equal(length(vals[[1]]), n_entrez)
+
+  ## Seems to work...
+  vals <- mapIds(edb, keys = ~ symbol %in% c("BCL2", "DDX11L1", "ZBTB16"),
+		 column = "ENTREZID", multiVals = "list")
+  vals
+
+#+END_SRC
+
+Seems to work out of the box...
diff --git a/vignettes/images/dblayout.png b/vignettes/images/dblayout.png
index a88d1a6..382df90 100644
Binary files a/vignettes/images/dblayout.png and b/vignettes/images/dblayout.png differ
diff --git a/vignettes/proteins.Rmd b/vignettes/proteins.Rmd
new file mode 100644
index 0000000..7bf98ab
--- /dev/null
+++ b/vignettes/proteins.Rmd
@@ -0,0 +1,273 @@
+---
+title: "Querying protein features"
+author: "Johannes Rainer"
+graphics: yes
+package: ensembldb
+output:
+  BiocStyle::html_document2:
+    toc_float: true
+vignette: >
+  %\VignetteIndexEntry{Querying protein features}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+  %\VignetteDepends{ensembldb,EnsDb.Hsapiens.v75,BiocStyle}
+---
+
+From Bioconductor release 3.5 on, `EnsDb` databases/packages created by the
+`ensembldb` package contain also, for transcripts with a coding regions, mappings
+between transcripts and proteins. Thus, in addition to the RNA/DNA-based
+features also the following protein related information is available:
+
+-   `protein_id`: the Ensembl protein ID. This is the primary ID for the proteins
+    defined in Ensembl and each (protein coding) Ensembl transcript has one
+    protein ID assigned to it.
+-   `protein_sequence`: the amino acid sequence of a protein.
+-   `uniprot_id`: the Uniprot ID for a protein. Note that not every Ensembl
+    `protein_id` has an Uniprot ID, and each `protein_id` might be mapped to several
+    `uniprot_id`. Also, the same Uniprot ID might be mapped to different `protein_id`.
+-   `uniprot_db`: the name of the Uniprot database in which the feature is
+    annotated. Can be either *SPTREMBL* or *SWISSPROT*.
+-   `uniprot_mapping_type`: the type of the mapping method that was used to assign
+    the Uniprot ID to the Ensembl protein ID.
+-   `protein_domain_id`: the ID of the protein domain according to the
+    source/analysis in/by which is was defined.
+-   `protein_domain_source`: the source of the protein domain information, one of
+    *pfscan*, *scanprosite*, *superfamily*, *pfam*, *prints*, *smart*, *pirsf* or *tigrfam*.
+-   `interpro_accession`: the Interpro accession ID of the protein domain (if
+    available).
+-   `prot_dom_start`: the start of the protein domain within the sequence of
+    the protein.
+-   `prot_dom_start`: the end position of the protein domain within the
+    sequence of the protein.
+
+Thus, for protein coding transcripts, these annotations can be fetched from the
+database too, given that protein annotations are available. Note that only `EnsDb`
+databases created through the Ensembl Perl API contain protein annotation, while
+databases created using `ensDbFromAH`, `ensDbFromGff`, `ensDbFromGRanges` and
+`ensDbFromGtf` don't.
+
+```{r doeval, echo = FALSE, results = "hide"}
+## Globally switch off execution of code chunks
+evalMe <- FALSE
+haveProt <- FALSE
+```
+
+```{r loadlib, message = FALSE, eval = evalMe}
+library(ensembldb)
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+## Evaluate whether we have protein annotation available
+hasProteinData(edb)
+```
+
+If protein annotation is available, the additional tables and columns are also
+listed by the `listTables` and `listColumns` methods.
+
+```{r listCols, message = FALSE, eval = evalMe}
+listTables(edb)
+```
+
+In the following sections we show examples how to 1) fetch protein annotations
+as additional columns to gene/transcript annotations, 2) fetch protein
+annotation data and 3) map proteins to the genome.
+
+```{r haveprot, echo = FALSE, results = "hide", eval = evalMe}
+## Use this to conditionally disable eval on following chunks
+haveProt <- hasProteinData(edb) & evalMe
+```
+
+
+# Fetch protein annotation for genes and transcripts
+
+Protein annotations for (protein coding) transcripts can be retrieved by simply
+adding the desired annotation columns to the `columns` parameter of the e.g. `genes`
+or `transcripts` methods.
+
+```{r a_transcripts, eval = haveProt}
+## Get also protein information for ZBTB16 transcripts
+txs <- transcripts(edb, filter = GenenameFilter("ZBTB16"),
+		   columns = c("protein_id", "uniprot_id", "tx_biotype"))
+txs
+```
+
+The gene ZBTB16 has protein coding and non-coding transcripts, thus, we get the
+protein ID for the coding- and `NA` for the non-coding transcripts. Note also that
+we have a transcript targeted for nonsense mediated mRNA-decay with a protein ID
+associated with it, but no Uniprot ID.
+
+```{r a_transcripts_coding_noncoding, eval = haveProt}
+## Subset to transcripts with tx_biotype other than protein_coding.
+txs[txs$tx_biotype != "protein_coding", c("uniprot_id", "tx_biotype",
+					  "protein_id")]
+```
+
+While the mapping from a protein coding transcript to a Ensembl protein ID
+(column `protein_id`) is 1:1, the mapping between `protein_id` and `uniprot_id` can be
+n:m, i.e. each Ensembl protein ID can be mapped to 1 or more Uniprot IDs and
+each Uniprot ID can be mapped to more than one `protein_id` (and hence
+`tx_id`). This should be kept in mind if querying transcripts from the database
+fetching Uniprot related additional columns or even protein ID features, as in
+such cases a redundant list of transcripts is returned.
+
+```{r a_transcripts_coding, eval = haveProt}
+## List the protein IDs and uniprot IDs for the coding transcripts
+mcols(txs[txs$tx_biotype == "protein_coding",
+	  c("tx_id", "protein_id", "uniprot_id")])
+```
+
+Some of the n:m mappings for Uniprot IDs can be resolved by restricting either
+to entries from one Uniprot database (*SPTREMBL* or *SWISSPROT*) or to mappings of a
+certain type of mapping method. The corresponding filters are the
+`UniprotDbFilter` and the `UniprotMappingTypeFilter` (using the `uniprot_db` and
+`uniprot_mapping_type` columns of the `uniprot` database table). In the example
+below we restrict the result to Uniprot IDs with the mapping type *DIRECT*.
+
+```{r a_transcripts_coding_up, eval = haveProt}
+## List all uniprot mapping types in the database.
+listUniprotMappingTypes(edb)
+
+## Get all protein_coding transcripts of ZBTB16 along with their protein_id
+## and Uniprot IDs, restricting to protein_id to uniprot_id mappings based
+## on "DIRECT" mapping methods.
+txs <- transcripts(edb, filter = list(GenenameFilter("ZBTB16"),
+				      UniprotMappingTypeFilter("DIRECT")),
+		   columns = c("protein_id", "uniprot_id", "uniprot_db"))
+mcols(txs)
+```
+
+For this example the use of the `UniprotMappingTypeFilter` resolved the multiple
+mapping of Uniprot IDs to Ensembl protein IDs, but the Uniprot ID *Q05516* is
+still assigned to the two Ensembl protein IDs *ENSP00000338157* and
+*ENSP00000376721*.
+
+All protein annotations can also be added as *metadata columns* to the
+results of the `genes`, `exons`, `exonsBy`, `transcriptsBy`, `cdsBy`, `fiveUTRsByTranscript`
+and `threeUTRsByTranscript` methods by specifying the desired column names with
+the `columns` parameter. For non coding transcripts `NA` will be reported in the
+protein annotation columns.
+
+In addition to retrieve protein annotations from the database, we can also use
+protein data to filter the results. In the example below we fetch for example
+all genes from the database that have a certain protein domain in the protein
+encoded by any of its transcripts.
+
+```{r a_genes_protdomid_filter, eval = haveProt}
+## Get all genes that encode a transcript encoding for a protein that contains
+## a certain protein domain.
+gns <- genes(edb, filter = ProtDomIdFilter("PS50097"))
+length(gns)
+
+sort(gns$gene_name)
+```
+
+So, in total we got 152 genes with that protein domain. In addition to the
+`ProtDomIdFilter`, also the `ProteinidFilter` and the `UniprotidFilter` can be used to
+query the database for entries matching conditions on their protein ID or
+Uniprot ID.
+
+
+# Use methods from the `AnnotationDbi` package to query protein annotation
+
+The `select`, `keys` and `mapIds` methods from the `AnnotationDbi` package can also be
+used to query `EnsDb` objects for protein annotations. Supported columns and
+key types are returned by the `columns` and `keytypes` methods.
+
+```{r a_2_annotationdbi, message = FALSE, eval = haveProt}
+## Show all columns that are provided by the database
+columns(edb)
+
+## Show all key types/filters that are supported
+keytypes(edb)
+```
+
+Below we fetch all Uniprot IDs annotated to the gene *ZBTB16*.
+
+```{r a_2_select, message = FALSE, eval = haveProt}
+select(edb, keys = "ZBTB16", keytype = "GENENAME",
+       columns = "UNIPROTID")
+```
+
+This returns us all Uniprot IDs of all proteins encoded by the gene's
+transcripts. One of the transcripts from ZBTB16, while having a CDS and being
+annotated to a protein, does not have an Uniprot ID assigned (thus `NA` is
+returned by the above call). As we see below, this transcript is targeted for
+non sense mediated mRNA decay.
+
+```{r a_2_select_nmd, message = FALSE, eval = haveProt}
+## Call select, this time providing a GenenameFilter.
+select(edb, keys = GenenameFilter("ZBTB16"),
+       columns = c("TXBIOTYPE", "UNIPROTID", "PROTEINID"))
+```
+
+Note also that we passed this time a `GenenameFilter` with the `keys` parameter.
+
+
+# Retrieve proteins from the database
+
+Proteins can be fetched using the dedicated `proteins` method that returns, unlike
+DNA/RNA-based methods like `genes` or `transcripts`, not a `GRanges` object by
+default, but a `DataFrame` object. Alternatively, results can be returned as a
+`data.frame` or as an `AAStringSet` object from the `Biobase` package. Note that this
+might change in future releases if a more appropriate object to represent
+protein annotations becomes available.
+
+In the code chunk below we fetch all protein annotations for the gene *ZBTB16*.
+
+```{r b_proteins, message = FALSE, eval = haveProt}
+## Get all proteins and return them as an AAStringSet
+prts <- proteins(edb, filter = GenenameFilter("ZBTB16"),
+		 return.type = "AAStringSet")
+prts
+```
+
+Besides the amino acid sequence, the `prts` contains also additional annotations
+that can be accessed with the `mcols` method (metadata columns). All additional
+columns provided with the parameter `columns` are also added to the `mcols`
+`DataFrame`.
+
+```{r b_proteins_mcols, message = FALSE, eval = haveProt}
+mcols(prts)
+```
+
+Note that the `proteins` method will retrieve only gene/transcript annotations of
+transcripts encoding a protein. Thus annotations for the non-coding transcripts
+of the gene *ZBTB16*, that were returned by calls to `genes` or `transcripts` in the
+previous section are not fetched.
+
+Querying in addition Uniprot identifiers or protein domain data will result at
+present in a redundant list of proteins as shown in the code block below.
+
+```{r b_proteins_prot_doms, message = FALSE, eval = haveProt}
+## Get also protein domain annotations in addition to the protein annotations.
+pd <- proteins(edb, filter = GenenameFilter("ZBTB16"),
+	       columns = c("tx_id", listColumns(edb, "protein_domain")),
+	       return.type = "AAStringSet")
+pd
+```
+
+The result contains one row/element for each protein domain in each of the
+proteins. The number of protein domains per protein and the `mcols` are shown
+below.
+
+```{r b_proteins_prot_doms_2, message = FALSE, eval = haveProt}
+## The number of protein domains per protein:
+table(names(pd))
+
+## The mcols
+mcols(pd)
+```
+
+As we can see each protein can have several protein domains with the start and
+end coordinates within the amino acid sequence being reported in columns
+`prot_dom_start` and `prot_dom_end`. Also, not all Ensembl protein IDs, like
+`protein_id` *ENSP00000445047* are mapped to an Uniprot ID or have protein domains.
+
+
+# Map peptide features within proteins to the genome
+
+Functionality to map peptide features (i.e. ranges within the amino acid
+sequence of the protein) to genomic coordinates are provided by the `Pbase`
+Bioconductor package. These rely in part on the protein annotations provided by
+`EnsDb` databases. See the corresponding vignette *Pbase-with-ensembldb* in that
+package.
+
diff --git a/vignettes/proteins.org b/vignettes/proteins.org
new file mode 100644
index 0000000..bfb94b2
--- /dev/null
+++ b/vignettes/proteins.org
@@ -0,0 +1,485 @@
+#+TITLE: Querying protein features
+#+AUTHOR: Johannes Rainer
+#+EMAIL:  johannes.rainer at eurac.edu
+#+OPTIONS: ^:{} toc:nil
+#+PROPERTY: header-args :exports code
+#+PROPERTY: header-args :session *R_prot*
+
+#+BEGIN_EXPORT html
+---
+title: "Querying protein features"
+author: "Johannes Rainer"
+graphics: yes
+package: ensembldb
+output:
+  BiocStyle::html_document2:
+    toc_float: true
+vignette: >
+  %\VignetteIndexEntry{Querying protein features}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+  %\VignetteDepends{ensembldb,EnsDb.Hsapiens.v75,BiocStyle}
+---
+
+#+END_EXPORT
+
+From Bioconductor release 3.5 on, =EnsDb= databases/packages created by the
+=ensembldb= package contain also, for transcripts with a coding regions, mappings
+between transcripts and proteins. Thus, in addition to the RNA/DNA-based
+features also the following protein related information is available:
++ =protein_id=: the Ensembl protein ID. This is the primary ID for the proteins
+  defined in Ensembl and each (protein coding) Ensembl transcript has one
+  protein ID assigned to it.
++ =protein_sequence=: the amino acid sequence of a protein.
++ =uniprot_id=: the Uniprot ID for a protein. Note that not every Ensembl
+  =protein_id= has an Uniprot ID, and each =protein_id= might be mapped to several
+  =uniprot_id=. Also, the same Uniprot ID might be mapped to different =protein_id=.
++ =uniprot_db=: the name of the Uniprot database in which the feature is
+  annotated. Can be either /SPTREMBL/ or /SWISSPROT/.
++ =uniprot_mapping_type=: the type of the mapping method that was used to assign
+  the Uniprot ID to the Ensembl protein ID.
++ =protein_domain_id=: the ID of the protein domain according to the
+  source/analysis in/by which is was defined.
++ =protein_domain_source=: the source of the protein domain information, one of
+  /pfscan/, /scanprosite/, /superfamily/, /pfam/, /prints/, /smart/, /pirsf/ or /tigrfam/.
++ =interpro_accession=: the Interpro accession ID of the protein domain (if
+  available).
++ =prot_dom_start=: the start of the protein domain within the sequence of
+  the protein.
++ =prot_dom_start=: the end position of the protein domain within the
+  sequence of the protein.
+
+Thus, for protein coding transcripts, these annotations can be fetched from the
+database too, given that protein annotations are available. Note that only =EnsDb=
+databases created through the Ensembl Perl API contain protein annotation, while
+databases created using =ensDbFromAH=, =ensDbFromGff=, =ensDbFromGRanges= and
+=ensDbFromGtf= don't.
+
+#+NAME: doeval
+#+BEGIN_SRC R :ravel echo = FALSE, results = "hide"
+  ## Globally switch off execution of code chunks
+  evalMe <- FALSE
+  haveProt <- FALSE
+
+#+END_SRC
+
+#+NAME: loadlib
+#+BEGIN_SRC R :ravel message = FALSE, eval = evalMe
+  library(ensembldb)
+  library(EnsDb.Hsapiens.v75)
+  edb <- EnsDb.Hsapiens.v75
+  ## Evaluate whether we have protein annotation available
+  hasProteinData(edb)
+
+#+END_SRC
+
+If protein annotation is available, the additional tables and columns are also
+listed by the =listTables= and =listColumns= methods.
+
+#+NAME: listCols
+#+BEGIN_SRC R :ravel message = FALSE, eval = evalMe
+  listTables(edb)
+
+#+END_SRC
+
+In the following sections we show examples how to 1) fetch protein annotations
+as additional columns to gene/transcript annotations, 2) fetch protein
+annotation data and 3) map proteins to the genome.
+
+#+NAME: haveprot
+#+BEGIN_SRC R :ravel echo = FALSE, results = "hide", eval = evalMe
+  ## Use this to conditionally disable eval on following chunks
+  haveProt <- hasProteinData(edb) & evalMe
+
+#+END_SRC
+
+* Fetch protein annotation for genes and transcripts
+
+Protein annotations for (protein coding) transcripts can be retrieved by simply
+adding the desired annotation columns to the =columns= parameter of the e.g. =genes=
+or =transcripts= methods.
+
+#+NAME: a_transcripts
+#+BEGIN_SRC R :ravel eval = haveProt
+  ## Get also protein information for ZBTB16 transcripts
+  txs <- transcripts(edb, filter = GenenameFilter("ZBTB16"),
+                     columns = c("protein_id", "uniprot_id", "tx_biotype"))
+  txs
+
+#+END_SRC
+
+The gene ZBTB16 has protein coding and non-coding transcripts, thus, we get the
+protein ID for the coding- and =NA= for the non-coding transcripts. Note also that
+we have a transcript targeted for nonsense mediated mRNA-decay with a protein ID
+associated with it, but no Uniprot ID.
+
+#+NAME: a_transcripts_coding_noncoding
+#+BEGIN_SRC R :ravel eval = haveProt
+  ## Subset to transcripts with tx_biotype other than protein_coding.
+  txs[txs$tx_biotype != "protein_coding", c("uniprot_id", "tx_biotype",
+                                            "protein_id")]
+
+#+END_SRC
+
+While the mapping from a protein coding transcript to a Ensembl protein ID
+(column =protein_id=) is 1:1, the mapping between =protein_id= and =uniprot_id= can be
+n:m, i.e. each Ensembl protein ID can be mapped to 1 or more Uniprot IDs and
+each Uniprot ID can be mapped to more than one =protein_id= (and hence
+=tx_id=). This should be kept in mind if querying transcripts from the database
+fetching Uniprot related additional columns or even protein ID features, as in
+such cases a redundant list of transcripts is returned.
+
+#+NAME: a_transcripts_coding
+#+BEGIN_SRC R :ravel eval = haveProt
+  ## List the protein IDs and uniprot IDs for the coding transcripts
+  mcols(txs[txs$tx_biotype == "protein_coding",
+            c("tx_id", "protein_id", "uniprot_id")])
+
+#+END_SRC
+
+Some of the n:m mappings for Uniprot IDs can be resolved by restricting either
+to entries from one Uniprot database (/SPTREMBL/ or /SWISSPROT/) or to mappings of a
+certain type of mapping method. The corresponding filters are the
+=UniprotDbFilter= and the =UniprotMappingTypeFilter= (using the =uniprot_db= and
+=uniprot_mapping_type= columns of the =uniprot= database table). In the example
+below we restrict the result to Uniprot IDs with the mapping type /DIRECT/.
+
+#+NAME: a_transcripts_coding_up
+#+BEGIN_SRC R :ravel eval = haveProt
+  ## List all uniprot mapping types in the database.
+  listUniprotMappingTypes(edb)
+
+  ## Get all protein_coding transcripts of ZBTB16 along with their protein_id
+  ## and Uniprot IDs, restricting to protein_id to uniprot_id mappings based
+  ## on "DIRECT" mapping methods.
+  txs <- transcripts(edb, filter = list(GenenameFilter("ZBTB16"),
+					UniprotMappingTypeFilter("DIRECT")),
+                     columns = c("protein_id", "uniprot_id", "uniprot_db"))
+  mcols(txs)
+
+#+END_SRC
+
+For this example the use of the =UniprotMappingTypeFilter= resolved the multiple
+mapping of Uniprot IDs to Ensembl protein IDs, but the Uniprot ID /Q05516/ is
+still assigned to the two Ensembl protein IDs /ENSP00000338157/ and
+/ENSP00000376721/.
+
+
+All protein annotations can also be added as /metadata columns/ to the
+results of the =genes=, =exons=, =exonsBy=, =transcriptsBy=, =cdsBy=, =fiveUTRsByTranscript=
+and =threeUTRsByTranscript= methods by specifying the desired column names with
+the =columns= parameter. For non coding transcripts =NA= will be reported in the
+protein annotation columns.
+
+In addition to retrieve protein annotations from the database, we can also use
+protein data to filter the results. In the example below we fetch for example
+all genes from the database that have a certain protein domain in the protein
+encoded by any of its transcripts.
+
+#+NAME: a_genes_protdomid_filter
+#+BEGIN_SRC R :ravel eval = haveProt
+  ## Get all genes that encode a transcript encoding for a protein that contains
+  ## a certain protein domain.
+  gns <- genes(edb, filter = ProtDomIdFilter("PS50097"))
+  length(gns)
+
+  sort(gns$gene_name)
+
+#+END_SRC
+
+So, in total we got 152 genes with that protein domain. In addition to the
+=ProtDomIdFilter=, also the =ProteinidFilter= and the =UniprotidFilter= can be used to
+query the database for entries matching conditions on their protein ID or
+Uniprot ID.
+
+* Use methods from the =AnnotationDbi= package to query protein annotation
+
+The =select=, =keys= and =mapIds= methods from the =AnnotationDbi= package can also be
+used to query =EnsDb= objects for protein annotations. Supported columns and
+key types are returned by the =columns= and =keytypes= methods.
+
+#+NAME: a_2_annotationdbi
+#+BEGIN_SRC R :ravel message = FALSE, eval = haveProt
+  ## Show all columns that are provided by the database
+  columns(edb)
+
+  ## Show all key types/filters that are supported
+  keytypes(edb)
+
+#+END_SRC
+
+Below we fetch all Uniprot IDs annotated to the gene /ZBTB16/.
+
+#+NAME: a_2_select
+#+BEGIN_SRC R :ravel message = FALSE, eval = haveProt
+  select(edb, keys = "ZBTB16", keytype = "GENENAME",
+         columns = "UNIPROTID")
+
+#+END_SRC
+
+This returns us all Uniprot IDs of all proteins encoded by the gene's
+transcripts. One of the transcripts from ZBTB16, while having a CDS and being
+annotated to a protein, does not have an Uniprot ID assigned (thus =NA= is
+returned by the above call). As we see below, this transcript is targeted for
+non sense mediated mRNA decay.
+
+#+NAME: a_2_select_nmd
+#+BEGIN_SRC R :ravel message = FALSE, eval = haveProt
+  ## Call select, this time providing a GenenameFilter.
+  select(edb, keys = GenenameFilter("ZBTB16"),
+         columns = c("TXBIOTYPE", "UNIPROTID", "PROTEINID"))
+
+#+END_SRC
+
+Note also that we passed this time a =GenenameFilter= with the =keys= parameter.
+
+* Retrieve proteins from the database
+
+Proteins can be fetched using the dedicated =proteins= method that returns, unlike
+DNA/RNA-based methods like =genes= or =transcripts=, not a =GRanges= object by
+default, but a =DataFrame= object. Alternatively, results can be returned as a
+=data.frame= or as an =AAStringSet= object from the =Biobase= package. Note that this
+might change in future releases if a more appropriate object to represent
+protein annotations becomes available.
+
+In the code chunk below we fetch all protein annotations for the gene /ZBTB16/.
+
+#+NAME: b_proteins
+#+BEGIN_SRC R :ravel message = FALSE, eval = haveProt
+  ## Get all proteins and return them as an AAStringSet
+  prts <- proteins(edb, filter = GenenameFilter("ZBTB16"),
+                   return.type = "AAStringSet")
+  prts
+
+#+END_SRC
+
+Besides the amino acid sequence, the =prts= contains also additional annotations
+that can be accessed with the =mcols= method (metadata columns). All additional
+columns provided with the parameter =columns= are also added to the =mcols=
+=DataFrame=.
+
+#+NAME: b_proteins_mcols
+#+BEGIN_SRC R :ravel message = FALSE, eval = haveProt
+  mcols(prts)
+
+#+END_SRC
+
+Note that the =proteins= method will retrieve only gene/transcript annotations of
+transcripts encoding a protein. Thus annotations for the non-coding transcripts
+of the gene /ZBTB16/, that were returned by calls to =genes= or =transcripts= in the
+previous section are not fetched.
+
+Querying in addition Uniprot identifiers or protein domain data will result at
+present in a redundant list of proteins as shown in the code block below.
+
+#+NAME: b_proteins_prot_doms
+#+BEGIN_SRC R :ravel message = FALSE, eval = haveProt
+  ## Get also protein domain annotations in addition to the protein annotations.
+  pd <- proteins(edb, filter = GenenameFilter("ZBTB16"),
+                 columns = c("tx_id", listColumns(edb, "protein_domain")),
+                 return.type = "AAStringSet")
+  pd
+
+#+END_SRC
+
+The result contains one row/element for each protein domain in each of the
+proteins. The number of protein domains per protein and the =mcols= are shown
+below.
+
+#+NAME: b_proteins_prot_doms_2
+#+BEGIN_SRC R :ravel message = FALSE, eval = haveProt
+  ## The number of protein domains per protein:
+  table(names(pd))
+
+  ## The mcols
+  mcols(pd)
+
+#+END_SRC
+
+As we can see each protein can have several protein domains with the start and
+end coordinates within the amino acid sequence being reported in columns
+=prot_dom_start= and =prot_dom_end=. Also, not all Ensembl protein IDs, like
+=protein_id= /ENSP00000445047/ are mapped to an Uniprot ID or have protein domains.
+
+* Map peptide features within proteins to the genome
+
+Functionality to map peptide features (i.e. ranges within the amino acid
+sequence of the protein) to genomic coordinates are provided by the =Pbase=
+Bioconductor package. These rely in part on the protein annotations provided by
+=EnsDb= databases. See the corresponding vignette /Pbase-with-ensembldb/ in that
+package.
+
+
+
+* TODOs								   :noexport:
+
+** TODO Fetch protein feature data from the database [3/4]
+
++ [X] Check: do we have a 1:1 mapping between transcript ID and protein ID? *No*:
+  ENST00000359635 for example maps to 13 different Uniprot IDs, hence we have 13
+  mappings in the database table.
+  - Multiple mappings between /protein_id/ and /uniprot_id/ exist.
+  - For some proteins there is a n:1 mapping between /tx_id/ and /protein_id/.
++ [X] Check: is the genome_start/end of a protein the same as the CDS start and
+  end?
++ [ ] Check: is the aa sequence identical to the sequence we would get if
+  we translated the CDS in R?
++ [X] Would it be better to split the protein table into a protein and
+  protein_uniprot table? Looks like it's better to split them.
+
+** TODO Implement a =proteins= method
+
+See also issue #20 https://github.com/jotsetung/ensembldb/issues/20.
+
+The question here is what =start= and =end= we put into the resulting =GRanges=
+object, /just/ the CDS coding start and end, or the individual start and end of
+all of its exons (same like for e.g. the =cdsBy= method)?
+
+A) =proteins= returns a =GRanges= with start being 1, width being the length of the
+aa and the seqname being the protein ID.
+B) A =Proteins= object?
+
+** TODO Implement a =proteinDomains= method
+
+That's tricky, the same protein domain might be on several protein sequences.
+
+** TODO How to handle the protein domain features?
+
+For these we've got just the start and end position within the protein
+sequence. We would have to calculate that back to genomic coordinates in case,
+or, just leave them as they are on per-protein coordinates.
+
+** DONE Add a =hasProtein= method for =EnsDb=
+   CLOSED: [2016-10-03 Mon 13:43]
+   - State "DONE"       from "TODO"       [2016-10-03 Mon 13:43]
+Checks if the available /protein/ table is available.
+
+** DONE Add additional filters [3/3]
+   CLOSED: [2016-10-03 Mon 13:44]
+   - State "DONE"       from "TODO"       [2016-10-03 Mon 13:44]
+These filters should check if the database has the required tables/columns
+available, i.e. should call =hasProtein= within the =column= and =where= methods
+and =stop= if no protein data available.
+
++ [X] =ProteinidFilter=
++ [X] =UniprotFilter=
++ [X] =ProtdomFilter=
+
+** DONE Add a validation method for protein data[1/2]
+   CLOSED: [2016-10-04 Tue 18:09]
+   - State "DONE"       from "TODO"       [2016-10-04 Tue 18:09]
++ [X] Check that all transcripts with a CDS have a protein.
++ [ ] Length of the protein sequence is the length of the CDS / 3.
+
+** DONE Add an argument =startWith= to the =.buildQuery= function.
+   CLOSED: [2016-10-04 Tue 15:29]
+   - State "DONE"       from "TODO"       [2016-10-04 Tue 15:29]
+** TODO Add protein data to the =select= method [3/4]
+
+Add the required functionality to allow querying protein data also with =select=
+and related methods
+
++ [X] =keys=.
++ [X] =keytypes=.
++ [X] =select=.
++ [ ] =mapIds=.
+
+** TODO Add protein data comparison to =compareEnsDb=.
+
+
+** TODO Which object best represents protein annotation (issue #20)
+
+https://github.com/jotsetung/ensembldb/issues/20
+
+
+** TODO Method to select the /best suited/ transcript for a protein
+
+Idea is to select, for proteins encoded by different transcripts, the transcript
+which CDS best represents the sequence. That way we could get rid of transcripts
+with an incomplete 5' sequence (e.g. lacking the start codon), or transcripts
+without stop codon. We could select the transcript which CDS length is equal to
+the length of the (AA + 1) * 3; + 1 because the stop codon, which is part of the
+CDS is not encoded.
+
+
+** TODO Add additional Uniprot columns [3/4]
+
++ [X] Adapt perl script.
++ [X] Add methods.
++ [ ] Add Unit tests.
++ [X] Add documentation.
+
+* Experimental perl code and docs				   :noexport:
+
+Do you know which species each of these is from. If so the easiest
+thing to do is to use Biomart for each species. (if there are only a
+few species).
+Alternatively (if this is a lot of species, but you still need to know
+what this is)
+then you can use the API
+
+So if we pretend we have a list of acc and species in a file
+
+use Bio::EnsEMBL::Registry;
+use strict;
+my $reg = "Bio::EnsEMBL::Registry";
+
+$registry->load_registry_from_db(
+                 -host => 'ensembldb.ensembl.org',
+                 -user => 'anonymous',
+                 );
+
+while(<>){
+  my ($acc, $species) = split;
+
+  my $adap = $reg->get_adaptor($species,"core","translation");
+
+  my @trans = @{$adap->fetch_all_by_external_name($acc,"uniprot%")};
+
+  foreach my $translation (@trans){
+    print $translation->stable_id."\t".$acc."\n";
+  }
+}
+
+
+Please note i have not ran this code or compiled it or checked it,
+this is just a brief outline. But it looks okay to me.
+
+-Ian.
+
+
+Translations and ProteinFeatures
+
+Translation objects and protein sequence can be extracted from a Transcript object. It is important to remember that some Ensembl transcripts are non-coding (pseudo-genes, ncRNAs, etc.) and have no translation. The primary purpose of a Translation object is to define the CDS and UTRs of its associated Transcript object. Peptide sequence is obtained directly from a Transcript object not a Translation object as might be expected. Once you have a Translation you can go back to its Transcrip [...]
+
+my $stable_id = 'ENST00000528762';
+
+my $transcript_adaptor =
+  $registry->get_adaptor( 'Human', 'Core', 'Transcript' );
+my $transcript = $transcript_adaptor->fetch_by_stable_id($stable_id);
+
+print $transcript->translation()->stable_id(), "\n";
+print $transcript->translate()->seq(),         "\n";
+
+print $transcript->translation()->transcript()->stable_id(), "\n";
+
+ProteinFeatures are features which are on an amino acid sequence rather than a nucleotide sequence. The method get_all_ProteinFeatures() can be used to obtain a set of protein features from a Translation object.
+
+$translation = $transcript->translation();
+
+my $pfeatures = $translation->get_all_ProteinFeatures();
+while ( my $pfeature = shift @{$pfeatures} ) {
+    my $logic_name = $pfeature->analysis()->logic_name();
+
+    printf(
+        "%d-%d %s %s %s\n",
+        $pfeature->start(), $pfeature->end(), $logic_name,
+        $pfeature->interpro_ac(),
+        $pfeature->idesc()
+    );
+}
+
+If only the protein features created by a particular analysis are desired the name of the analysis can be provided as an argument. To obtain the subset of features which are considered to be 'domain' features the convenience method get_all_DomainFeatures() can be used:
+
+my $seg_features    = $translation->get_all_ProteinFeatures('Seg');
+my $domain_features = $translation->get_all_DomainFeatures();

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/r-bioc-ensembldb.git



More information about the debian-med-commit mailing list