This repository contains scripts and tools for processing and analyzing sequencing data from Illumina and ONT (Oxford Nanopore Technologies) platforms.
1_data_processing- Sample output as .vcf of the data processing pipeline for Illumina and ONT for 10 sample patients.
 
2_data_analyzer2024-06-21_step_1_vcf_import_exons_Illumina2024-06-21_step_1_vcf_import_exons_ONT2024-06-21_step_2_Illumina_and_ONT_merged2024-06-21_step_3_dataset_analyzed2024-06-21_step_4_dataset_color_coded
reference_sequenceamplicon_reference_sequence.fa
install_dependencies.shsequ_data_framework_Illumina.shsequ_data_framework_ONT.sh
GenotypeAnalyzer.exeGenotypeAnalyzer.pySetting_Genotype-Analyzer.xlsxstep_1_build_genotype.pystep_2_merge_files.pystep_3_analyse_file.pystep_4_color_count.py
- OS required: Linux
 
- 
Clone the Repository:
git clone https://github.com/ChrAtt1/Sequencing-Data-Analysis-Framework.git cd Sequencing-Data-Analysis-Framework - 
Install Dependencies:
chmod +x ./install_dependencies.sh ./install_dependencies.sh
The
install_dependencies.shscript will install the following:- Conda
 - SAMtools
 - BWA Aligner
 - WhatsHap
 - Nanofilt
 - minimap2
 - fastq-filter
 
(Links for installation guides and repositories were last accessed on 23 June 2024.)
 
Before the first start, modify the permissions of the scripts to make them executable:
chmod +x ./sequ_data_framework_Illumina.sh
chmod +x ./sequ_data_framework_ONT.shInsert the variables directly into the shell scripts (sequ_data_framework_Illumina.sh and sequ_data_framework_ONT.sh). Assign your specific file paths to these variables. Example:
#!/bin/bash
# Specify the base path where the data is located
path_base_data="/path/to/your/base/data"
# Specify the path to the input data
input_data="/path/to/your/input/data"
# Provide the path to the reference sequence file (.fa-file)
path_reference_sequence="/path/to/your/reference/sequence.fa"Replace the placeholder paths (/path/to/your/...) with the actual paths on your system.
To run the scripts, use the following commands in the terminal:
./sequ_data_framework_Illumina.sh
./sequ_data_framework_ONT.shEnsure you have the necessary permissions and that the paths specified in the scripts are correct before execution.
Genotype Analyzer is a Python-based application designed to analyze genotype data using various settings and methods. The application provides a graphical user interface (GUI) built with Tkinter, enabling users to input necessary files, configure settings, and perform genotype analysis.
- Load and parse settings from an Excel file
 - Browse and select files and folders through the GUI
 - Configure sequencing methods and amplicon settings
 - Validate input paths and settings
 - Perform genotype building, merging, and analysis
 - Display progress with a progress bar
 - Measure and print execution time for each step
 
- Python 3.8+
 - Pandas
 - Tkinter
 - Linux
 - Pycharm
 
Use pip to install the required libraries:
pip install pandas
pip install math
pip install collections
pip install datetime
pip install scipy
pip install numpy
pip install re
pip install shutil
pip install openpyxl
pip install tkinterpython GenotypeAnalyzer.pyRun the GenotypeAnalyzer.exe file.
- 
Load the Settings File:
The application expects an Excel file named
Setting_Genotype-Analyzer.xlsxwith two sheets:Sequencing Method SettingAmplicon Setting
 - 
Configure Settings:
- Use the GUI to browse and select the necessary files and folders for the datasets and reference sequence as well es the additional settings.
 - Configure the sequencing methods and amplicon settings as needed.
 - Choose between analyzing VCF files (Option A) or using previous Genotype Analyzer output Excel files (Option B).
 
 - 
Run the Analysis:
- Click the "Analyse Genotypes" button to start the analysis process.
 - The application will validate the input paths and settings before proceeding.
 - Progress is displayed with a progress bar, and execution times for each step are printed in the console.