Skip to content

CNAG-Biomedical-Informatics/beacon2-cbi-tools

 
 

Repository files navigation

beacon2-cbi-tools

Docker build Documentation Status Maintenance status License: GPL v3 Docker Pulls Docker Pulls Docker Pulls EGA-archive version

New documentation: https://cnag-biomedical-informatics.github.io/beacon2-cbi-tools

🐳 Docker Hub Image: https://hub.docker.com/r/manuelrueda/beacon2-cbi-tools/tags

🚫 Legacy B2RI Documentation: https://b2ri-documentation.readthedocs.io/

⚠️ Genetic Data Interpretation Disclaimer: https://cnag-biomedical-informatics.github.io/beacon2-cbi-tools/about/disclaimer


Actively maintained by CNAG Biomedical Informatics

Note: This repository was formerly known as beacon2-ri-tools (Beacon v2 Reference Implementation). It has been renamed to beacon2-cbi-tools (CNAG Biomedical Informatics) to better reflect its identity under CNAG.

Table of contents

DESCRIPTION

Overview

beacon2-cbi-tools is a suite of tools originally developed as part of the ELIXIR–Beacon v2 Reference Implementation, now continuing under CNAG Biomedical Informatics. It provides essential functionality around the Beacon Friendly Format (BFF) data exchange format, including:

  • Validating XLSX/JSON files against Beacon v2 schemas
  • Converting VCF and microarray files into BFF (genomicVariations)
  • Loading BFF data (metadata and genomic variations) into MongoDB

This toolkit streamlines data preparation, validation, and ingestion for federated genomic and phenotypic data sharing under Beacon v2. The resulting BFF-formatted data can be used with any implementation of the Beacon v2 API specification that operates on MongoDB.

Tools Included

BFF-Tools script (bin/bff-tools):

A command-line tool for converting VCF data into BFF format and inserting the resulting BFF data into a MongoDB instance.

The tool offers five modes:

  1. vcf: Convert a VCF.gz file into BFF format.

  2. 🆕 tsv: Convert a SNP microarray file (e.g., from 23andme) into BFF format.

  3. load: Load BFF-formatted data into a MongoDB instance.

  4. full: Perform both TSV/VCF conversion and MongoDB loading.

  5. validate: Validate XLSX or JSON metadata against Beacon v2 schemas and serialize into BFF. An Excel template is provided to help structure your metadata.

A collection of support tools to aid in data ingestion. Key among them:

  • BFF-Browser:

    A web application for interactive visualization of BFF data, particularly genomicVariations and individuals.

  • BFF-Portal:

    A simple API and web application to query BFF data via MongoDB.

A synthetic dataset for testing and demonstration purposes.

System Diagram

            * Beacon v2 - CBI Tools *

                ___________
          XLSX  |          |
           or   | Metadata | (incl. Phenotypic data)
          JSON  |__________|
_________            |
|       |            |
|  TSV  |            | bff-tools validate
|______ |            |
    |                |                                    Beacon v2
    | bff-tools tsv  |
____v____        ____v____            __________          ______
|       |        |       |            |          |        |     | <---- Request
|  VCF  | -----> |  BFF  | ---------> | Database | <----> | API |
|_______|        |_ _____|            |__________|        |_____| ----> Response
                     |                  MongoDB
     bff-tools vcf   |   bff-tools load
                     |
                     |
                  Optional (utils)
                     |
                _____v_____
                |         |
                | utils/  |
                |  bff-   |
                | browser | Visualization
                | (beta)  |
                |_________|

-----------------------------------------------|||---------------------------
beacon2-cbi-tools                                     e.g. beacon2-ri-api
                                                           beacon2-pi-api
                                                           java-beacon-v2.api
                                                           ...

Roadmap

Latest Update: May-2025

This repository has been widely adopted in Beacon v2 implementations and is also used internally at CNAG. As a result, we plan to continue its development. Some of our upcoming plans include:

  • Implement Beacon 2.x specification changes

    • For VCF: Adopt VRS nomenclature and transition away from LegacyVariation. Support for structural variants may be added.
    • For other entities: Align with the latest schema used in the BFF Validator and the Excel metadata template.
    • Update the CINECA Synthetic Cohort dataset.

INSTALLATION

You can install beacon2-cbi-tools using one of two methods:

Containerized Installation (Recommended)

Follow the guide here to use Docker for a streamlined setup.

Non-Containerized Installation

See here for manual installation instructions.

CITATION

The author requests that any published work that utilizes these tools includes a citation to the following reference:

Rueda, M, Ariosa R. "Beacon v2 Reference Implementation: a toolkit to enable federated sharing of genomic and phenotypic data". Bioinformatics, btac568, https://doi.org/10.1093/bioinformatics/btac568

AUTHOR

Written by Manuel Rueda, PhD. Info about CNAG Biomedical Informatics can be found at https://www.cnag.eu

COPYRIGHT and LICENSE

The software in this repository is copyrighted. See the LICENSE file included in this distribution.

Packages

No packages published

Languages

  • Perl 81.1%
  • Shell 11.6%
  • HTML 3.4%
  • Python 3.2%
  • Dockerfile 0.6%
  • Makefile 0.1%