Skip to content

Understanding how methods work

Aurore-B edited this page Jul 30, 2025 · 2 revisions

Synteny Methods

Synteny methods are based on scripts developped by Fabien Degalez (Cross-species orthology detection of long non-coding RNAs (lncRNA) through 13 species using genomic and functional annotations. Degalez et al., 2024. bioRchiv).

The figure below is extracted from the paper and represents each synteny method:

  • method 1 with 2 orthologous Protein Coding Genes (PCGs) as anchors surrounding lncRNA
  • method 2 with one closest orthologous PCG as anchor with the same lncRNA classification according FEELnc classification (FEELnc_classifier.pl)
methods_fabien_agro

FEELnc classifier is initially performed on isoforms. The script FEELnc_tpLevel2gnLevelClassification.R developped by Degalez et al. allows to obtain lncRNA classification at gene level, as explained here:

SupFigure_2_FEELncClassificationAdapted

Legend

This classification is found in detailed output: scans_results/method2/syntenyByPairFeelnc/shortNameSP1-shortNameSP2_lncConfigurationHomologyAggregated.tsv

  • lncg = long genic non coding gene with lnc/PCG distance inferior to intergenic/genic threshold
  • linc = long intergenic non coding gene with lnc/PCG distance superior to intergenic/genic threshold
  • SS.up/.dw = same strand up/down
  • Conv => lnc and PCG are on different strands and convergent
  • Divg => lnc and PCG are on different strands and divergent

Sequence Alignment Method

output files

The method 3 based on sequence alignment produces several output files per analyzed species pair:

  • liftoff output files: liftoff_species1_to_species2_flankX.gtf with and without filtering
  • alignment_analysis output files: mapped_knownGenes.txt, mapped_unknownGenes.txt and unmapped_genes.txt (see Figure below to explain these files)
figure_scans_github
  • liftoff output figure to visualize sequence alignments in term of coverage relative to sequence identity
liftoff_plot_coverage_seqID

Available options

Several options are available:

  • biotype to analyze lncRNA only, mRNA only or both
  • flank to determine amount of flanking sequence to align as a fraction of gene length. This can improve gene alignment where gene structure differs between target and reference (liftoff option)
  • coverage to set cut-off on coverage
  • identity to set cut-off on sequence identity

Coverage and identity are liftoff options. Liftoff output file liftoff_species1_to_species2_flank0.gtf contains all date without cut-off while the filtered.gtf file contains all results according coverage and identity cut-off settled and extra copy are removed.

Bedtools intersect usage

Bedtools intersect is used with default fraction set at ~1bp. The option to custom this value is not available for now but the overlap fraction is indicated in mapped_knownGenes.txt files (see section)

Clone this wiki locally