You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The purpose of this project is to provide a simple demonstration of how to construct a Nextflow pipeline, with MLOps integration, for performing gene regulatory network (GRN) reconstruction using graph neural networks (GNNs).
21
+
The purpose of this project is to provide a simple demonstration of how to construct a Nextflow pipeline, with MLOps integration, for performing gene regulatory network (GRN) reconstruction using graph neural networks (GNNs). In practice, GRN reconstruction is an unsupervised link prediction problem.
19
22
20
-
# The pipeline
23
+
[For developing GNNs, we use PyTorch Geometric.](https://pytorch-geometric.readthedocs.io/en/latest/)
24
+
25
+
# The Nextflow pipeline
26
+
27
+
[Nextflow has been included to orchestrate the GRN reconstruction pipeline.](https://www.nextflow.io/)
21
28
22
29
The pipeline is composed of the following steps:
23
30
24
31
1. Exploratory data analysis: View the GRN and calculate some summary statistics.
25
32
2. Processing: Process the graph feature matrix and edge list. Remove the disconnected subgraph.
26
33
3. ArangoDB Importing: Import the graph into ArangoDB.
27
-
4. Train a graph neural network using SAGE convolutional layers.
34
+
4. GNN training: Train a GNN using SAGE convolutional layers.
35
+
5. GNN training: Train a variational autoencoder GNN, and save the neural embeddings.
36
+
37
+
# Python Environment
38
+
39
+
[Python dependencies are specified in this requirements.txt file.](services/python/requirements.txt).
40
+
41
+
These dependencies are installed during the build process for the following Docker image: ghcr.io/jbris/nextflow-graph-machine-learning:1.0.0
42
+
43
+
Execute the following command to pull the image: *docker pull ghcr.io/jbris/nextflow-graph-machine-learning:1.0.0*
44
+
45
+
## MLOps
46
+
47
+
*[A Docker compose file has been provided to launch an MLOps stack.](docker-compose.yml)
48
+
*[See the .env file for Docker environment variables.](.env)
49
+
*[The docker_up.sh script can be executed to launch the Docker services.](scripts/docker_up.sh)
50
+
*[DVC is included for data version control.](https://dvc.org/)
51
+
*[MLFlow is available for experiment tracking.](https://mlflow.org/)
52
+
*[MinIO is available for storing experiment artifacts.](https://min.io/)
53
+
54
+
# ArangoDB
55
+
56
+
[This pipeline provides a simple demonstration for saving and retrieving graph data to ArangoDB, combined with NetworkX usage and integration.](https://docs.arangodb.com/3.11/data-science/adapters/arangodb-networkx-adapter/)
0 commit comments