MicrobeDB is distributed using the CERN VM File System (CVMFS). Docker and CSI deployment recipes are available in ./destinations. The recipes are
executed by Terraform.
Docker may fail to unmount CVMFS during shutdown, run sudo fusermount -u ./microbedb/mount if you encounter transport endpoint is not connected
errors.
OSX does not natively support Docker, it runs Docker within a Linux virtual machine. This workaround means that support is limited to only the most basic use case. While mounting MicrobeDB via CVMFS, it will fail with an error.
To work around this CVMFS must be installed and configured manually. First ensure that FUSE is enabled by
running kextstat | grep -i fuse. Download the CVMFS package. Install the pkg and
reboot. Copy ../destinations/docker/cvmfs.config to /etc/cvmfs/default.local.
Copy ./microbedb.brinkmanlab.ca.pub to /etc/cvmfs/keys/microbedb.brinkmanlab.ca.pub. Ensure everything is
configured properly by running sudo cvmfs_config chksetup. You MUST mount the CVMFS repository under a shared folder as configured in your
Docker settings for it to be accessible by Docker. By default /tmp should be included as a shared folder and you can mount the repository
to /tmp/microbedb. Ensure /tmp/microbedb exists and run sudo mount -t cvmfs microbedb.brinkmanlab.ca /tmp/microbedb.
Run sqlite3 microbedb.sqlite '.schema' to view documentation of the various tables and columns. The assembly table is largely undocumented because
NCBI does not document their data schemas.
Use SQLite recursive query to determine if tax_id is subclass of ancestor. The following returns 1 if the query_tax_id is a subclass of ancestor_tax_id:
WITH RECURSIVE subClassOf(n) AS (
VALUES (query_tax_id)
UNION
SELECT parent_tax_id
FROM taxonomy_nodes,
subClassOf
WHERE taxonomy_nodes.tax_id = subClassOf.n
AND taxonomy_nodes.tax_id != ancestor_tax_id
)
SELECT 1
FROM subClassOf
WHERE n = ancestor_tax_id
LIMIT 1;
- bash with filefuncs extension
- yq which also installs the xq executable
- jq compiled with ONIGURUMA regex libary
- Entrez CLI
- SQLite3
- GNU awk
- parallel
- gzip
- biopython.convert
- rsync
Ensure the find command supports -empty by running find --help | grep '-empty'. The most recent CVMFS commit of the repository must be mounted
on all compute nodes.
cvmfs_config must be accessible on all compute nodes.
destinations/*- terraform modules to deploy a CVMFS client configured with microbedb to various environmentsupdate.sh- Script to sync data with NCBI for a CVMFS serverinit_env.sh- Script to install dependencies forupdate.shfetch.sh- Executed byupdate.shper chunk of datasets returned by Entrezfinalize.sh- Executed byupdate.shonce all invocations offetch.shhave completedresume.sh- Script to allow resuming execution offetch.shinvocations in the event that any fail. This script is copied to the job directory byupdate.shand is intended to be executed from there.schema.sql- Database schematemp_tables.sql- Temporary table schema used byfetch.shsubclassOf.sh- Example utility to query database taxonomy data