Skip to content

Commit 94fd098

Browse files
committed
saving VAE graph NN
1 parent 901f5cd commit 94fd098

File tree

8 files changed

+580
-7
lines changed

8 files changed

+580
-7
lines changed

README.md

Lines changed: 33 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,17 +11,46 @@ Website: [Nextflow Graph Machine Learning](https://jbris.github.io/nextflow-grap
1111
- [Nextflow Graph Machine Learning](#nextflow-graph-machine-learning)
1212
- [Table of contents](#table-of-contents)
1313
- [Introduction](#introduction)
14-
- [The pipeline](#the-pipeline)
14+
- [The Nextflow pipeline](#the-nextflow-pipeline)
15+
- [Python Environment](#python-environment)
16+
- [MLOps](#mlops)
17+
- [ArangoDB](#arangodb)
1518

1619
# Introduction
1720

18-
The purpose of this project is to provide a simple demonstration of how to construct a Nextflow pipeline, with MLOps integration, for performing gene regulatory network (GRN) reconstruction using graph neural networks (GNNs).
21+
The purpose of this project is to provide a simple demonstration of how to construct a Nextflow pipeline, with MLOps integration, for performing gene regulatory network (GRN) reconstruction using graph neural networks (GNNs). In practice, GRN reconstruction is an unsupervised link prediction problem.
1922

20-
# The pipeline
23+
[For developing GNNs, we use PyTorch Geometric.](https://pytorch-geometric.readthedocs.io/en/latest/)
24+
25+
# The Nextflow pipeline
26+
27+
[Nextflow has been included to orchestrate the GRN reconstruction pipeline.](https://www.nextflow.io/)
2128

2229
The pipeline is composed of the following steps:
2330

2431
1. Exploratory data analysis: View the GRN and calculate some summary statistics.
2532
2. Processing: Process the graph feature matrix and edge list. Remove the disconnected subgraph.
2633
3. ArangoDB Importing: Import the graph into ArangoDB.
27-
4. Train a graph neural network using SAGE convolutional layers.
34+
4. GNN training: Train a GNN using SAGE convolutional layers.
35+
5. GNN training: Train a variational autoencoder GNN, and save the neural embeddings.
36+
37+
# Python Environment
38+
39+
[Python dependencies are specified in this requirements.txt file.](services/python/requirements.txt).
40+
41+
These dependencies are installed during the build process for the following Docker image: ghcr.io/jbris/nextflow-graph-machine-learning:1.0.0
42+
43+
Execute the following command to pull the image: *docker pull ghcr.io/jbris/nextflow-graph-machine-learning:1.0.0*
44+
45+
## MLOps
46+
47+
* [A Docker compose file has been provided to launch an MLOps stack.](docker-compose.yml)
48+
* [See the .env file for Docker environment variables.](.env)
49+
* [The docker_up.sh script can be executed to launch the Docker services.](scripts/docker_up.sh)
50+
* [DVC is included for data version control.](https://dvc.org/)
51+
* [MLFlow is available for experiment tracking.](https://mlflow.org/)
52+
* [MinIO is available for storing experiment artifacts.](https://min.io/)
53+
54+
# ArangoDB
55+
56+
[This pipeline provides a simple demonstration for saving and retrieving graph data to ArangoDB, combined with NetworkX usage and integration.](https://docs.arangodb.com/3.11/data-science/adapters/arangodb-networkx-adapter/)

0 commit comments

Comments
 (0)