Skip to content

4247 drug adjustments#55

Merged
DSuveges merged 4 commits intomainfrom
4247_drug_adjustments
Feb 27, 2026
Merged

4247 drug adjustments#55
DSuveges merged 4 commits intomainfrom
4247_drug_adjustments

Conversation

@d0choa
Copy link
Contributor

@d0choa d0choa commented Jan 20, 2026

Re-opened PR after renaming branch
Related to @opentargets/issues#4247

@d0choa
Copy link
Contributor Author

d0choa commented Jan 20, 2026

================================================================================
COMPARISON REPORT: DRUG_MECHANISM_OF_ACTION
================================================================================

--- SCHEMA COMPARISON ---
Local columns: 7
Reference columns: 7
Common columns: 7

--- ROW COUNT COMPARISON ---
Local rows: 6,505
Reference rows: 6,505
Difference: +0 (+0.00%)

--- ID OVERLAP ---
Common IDs: 5,092
Local-only IDs: 0
Reference-only IDs: 0
Jaccard similarity: 1.0000

--- COLUMN STATISTICS (Common Columns) ---
Column                         Local Type           Ref Type             Local Nulls  Ref Nulls    Null Diff 
----------------------------------------------------------------------------------------------------------
actionType                     string               string               0            0            +0        
chemblIds                      array<string>        array<string>        0            0            +0        
mechanismOfAction              string               string               0            0            +0        
references                     array<struct<source:string,ids:array<string>,urls:array<string>>> array<struct<source:string,ids:array<string>,urls:array<string>>> 0            0            +0        
targetName                     string               string               0            0            +0        
targetType                     string               string               0            0            +0        
targets                        array<string>        array<string>        0            0            +0        

--- VALUE COMPARISON (Common Rows) ---
Comparing 5,092 common rows

Columns with differences:
  mechanismOfAction: 4,844 differences (4.9% match)
    Example: {'chemblIds': ['CHEMBL101253'], 'local': 'Vascular endothelial growth factor receptor inhibitor', 'reference': 'Stem cell growth factor receptor inhibitor'}
    Example: {'chemblIds': ['CHEMBL101253'], 'local': 'Vascular endothelial growth factor receptor inhibitor', 'reference': 'Platelet-derived growth factor receptor inhibitor'}
  targetName: 4,832 differences (5.1% match)
    Example: {'chemblIds': ['CHEMBL101253'], 'local': 'Vascular endothelial growth factor receptor', 'reference': 'Mast/stem cell growth factor receptor Kit'}
    Example: {'chemblIds': ['CHEMBL101253'], 'local': 'Vascular endothelial growth factor receptor', 'reference': 'Platelet-derived growth factor receptor'}
  targetType: 1,262 differences (75.2% match)
    Example: {'chemblIds': ['CHEMBL101253'], 'local': 'protein family', 'reference': 'single protein'}
    Example: {'chemblIds': ['CHEMBL101253'], 'local': 'protein family', 'reference': 'protein complex'}
  actionType: 738 differences (85.5% match)
    Example: {'chemblIds': ['CHEMBL10878'], 'local': 'ANTAGONIST', 'reference': 'AGONIST'}
    Example: {'chemblIds': ['CHEMBL10878'], 'local': 'AGONIST', 'reference': 'ANTAGONIST'}

================================================================================
COMPARISON REPORT: DRUG_MOLECULE
================================================================================

--- SCHEMA COMPARISON ---
Local columns: 14
Reference columns: 18
Common columns: 14

Columns only in REFERENCE: ['isApproved', 'linkedDiseases', 'linkedTargets', 'yearOfFirstApproval']

--- ROW COUNT COMPARISON ---
Local rows: 18,475
Reference rows: 18,475
Difference: +0 (+0.00%)

--- ID OVERLAP ---
Common IDs: 18,475
Local-only IDs: 0
Reference-only IDs: 0
Jaccard similarity: 1.0000

--- COLUMN STATISTICS (Common Columns) ---
Column                         Local Type           Ref Type             Local Nulls  Ref Nulls    Null Diff 
----------------------------------------------------------------------------------------------------------
blackBoxWarning                boolean              boolean              0            0            +0        
canonicalSmiles                string               string               3335         3335         +0        
childChemblIds                 array<string>        array<string>        15356        15356        +0        
crossReferences                array<struct<source:string,ids:array<string>>> array<struct<source:string,ids:array<string>>> 8847         8847         +0        
description                    string               string               0            0            +0        
drugType                       string               string               0            0            +0        
hasBeenWithdrawn               boolean              boolean              0            0            +0        
id                             string               string               0            0            +0        
inchiKey                       string               string               3335         3335         +0        
maximumClinicalTrialPhase      double               double               8134         6725         +1,409    
name                           string               string               0            0            +0        
parentId                       string               string               16782        16782        +0        
synonyms                       array<string>        array<string>        0            0            +0        
tradeNames                     array<string>        array<string>        0            0            +0        

--- VALUE COMPARISON (Common Rows) ---
Comparing 18,475 common rows

Columns with differences:
  description: 5,622 differences (69.6% match)
    Example: {'id': 'CHEMBL1000', 'local': 'Small molecule drug with a maximum clinical trial phase of IV (across all indications) and has 9 approved and 18 investigational indications.', 'reference': 'Small molecule drug with a maximum clinical trial phase of IV (across all indications) that was first approved in 1995 and has 9 approved and 18 investigational indications.'}
    Example: {'id': 'CHEMBL100116', 'local': 'Small molecule drug with a maximum clinical trial phase of IV (across all indications) and is indicated for pain and has 1 investigational indication. This drug has a black box warning from the FDA.', 'reference': 'Small molecule drug with a maximum clinical trial phase of IV (across all indications) that was first approved in 1967 and is indicated for pain and has 1 investigational indication. This drug has a black box warning from the FDA.'}
  maximumClinicalTrialPhase: 1,530 differences (91.7% match)
    Example: {'id': 'CHEMBL1004', 'local': 3.0, 'reference': 4.0}
    Example: {'id': 'CHEMBL1005', 'local': 3.0, 'reference': 4.0}
  blackBoxWarning: 18 differences (99.9% match)
    Example: {'id': 'CHEMBL1068', 'local': False, 'reference': True}
    Example: {'id': 'CHEMBL1187417', 'local': True, 'reference': False}
  hasBeenWithdrawn: 2 differences (100.0% match)
    Example: {'id': 'CHEMBL1098319', 'local': True, 'reference': False}
    Example: {'id': 'CHEMBL2109065', 'local': False, 'reference': True}

================================================================================
COMPARISON REPORT: DRUG_WARNING
================================================================================

--- SCHEMA COMPARISON ---
Local columns: 11
Reference columns: 11
Common columns: 11

--- ROW COUNT COMPARISON ---
Local rows: 2,302
Reference rows: 2,302
Difference: +0 (+0.00%)

--- ID OVERLAP ---
Common IDs: 2,302
Local-only IDs: 0
Reference-only IDs: 0
Jaccard similarity: 1.0000

--- COLUMN STATISTICS (Common Columns) ---
Column                         Local Type           Ref Type             Local Nulls  Ref Nulls    Null Diff 
----------------------------------------------------------------------------------------------------------
chemblIds                      array<string>        array<string>        0            0            +0        
country                        string               string               0            0            +0        
description                    string               string               1290         1290         +0        
efo_id                         string               string               1330         1330         +0        
efo_id_for_warning_class       string               string               385          385          +0        
efo_term                       string               string               1330         1330         +0        
id                             bigint               bigint               0            0            +0        
references                     array<struct<ref_id:string,ref_type:string,ref_url:string>> array<struct<ref_id:string,ref_type:string,ref_url:string>> 0            0            +0        
toxicityClass                  string               string               385          385          +0        
warningType                    string               string               0            0            +0        
year                           bigint               bigint               1294         1294         +0        

--- VALUE COMPARISON (Common Rows) ---
Comparing 2,302 common rows

All common columns match perfectly!

================================================================================
COMPARISON REPORT: CHEMICAL_PROBES
================================================================================

--- SCHEMA COMPARISON ---
Local columns: 12
Reference columns: 12
Common columns: 12

--- ROW COUNT COMPARISON ---
Local rows: 5,287
Reference rows: 5,287
Difference: +0 (+0.00%)

--- ID OVERLAP ---
Common IDs: 4,645
Local-only IDs: 0
Reference-only IDs: 0
Jaccard similarity: 1.0000

--- COLUMN STATISTICS (Common Columns) ---
Column                         Local Type           Ref Type             Local Nulls  Ref Nulls    Null Diff 
----------------------------------------------------------------------------------------------------------
control                        string               string               3897         3897         +0        
drugId                         string               string               127          117          +10       
id                             string               string               0            0            +0        
isHighQuality                  boolean              boolean              0            0            +0        
mechanismOfAction              array<string>        array<string>        3823         3823         +0        
origin                         array<string>        array<string>        0            0            +0        
probeMinerScore                double               double               1112         1112         +0        
probesDrugsScore               double               double               0            0            +0        
scoreInCells                   double               double               4135         4135         +0        
scoreInOrganisms               double               double               4135         4135         +0        
targetFromSourceId             string               string               0            0            +0        
urls                           array<struct<niceName:string,url:string>> array<struct<niceName:string,url:string>> 0            0            +0        

--- VALUE COMPARISON (Common Rows) ---
Comparing 4,645 common rows

Columns with differences:
  targetFromSourceId: 2,098 differences (54.8% match)
    Example: {'id': '(+)-JQ1', 'local': 'Q58F21', 'reference': 'Q15059'}
    Example: {'id': '(+)-JQ1', 'local': 'Q58F21', 'reference': 'P25440'}
  probesDrugsScore: 1,416 differences (69.5% match)
    Example: {'id': 'A-1211212', 'local': 30.0, 'reference': 100.0}
    Example: {'id': 'A-1211212', 'local': 100.0, 'reference': 30.0}
  probeMinerScore: 532 differences (88.5% match)
    Example: {'id': '(+)-JQ1', 'local': 44.0, 'reference': 48.0}
    Example: {'id': '(+)-JQ1', 'local': 44.0, 'reference': 48.0}
  isHighQuality: 70 differences (98.5% match)
    Example: {'id': 'BIBP3226', 'local': False, 'reference': True}
    Example: {'id': 'BIBP3226', 'local': True, 'reference': False}
  drugId: 38 differences (99.2% match)
    Example: {'id': 'ABL127', 'local': 'CHEMBL1524542', 'reference': 'CHEMBL1475741'}
    Example: {'id': 'ABL127', 'local': 'CHEMBL1475741', 'reference': 'CHEMBL1524542'}

================================================================================
COMPARISON REPORT: PHARMACOGENOMICS
================================================================================

--- SCHEMA COMPARISON ---
Local columns: 22
Reference columns: 22
Common columns: 22

--- ROW COUNT COMPARISON ---
Local rows: 32,805
Reference rows: 32,838
Difference: -33 (-0.10%)

--- ID OVERLAP ---
Common IDs: 1,139
Local-only IDs: 0
Reference-only IDs: 0
Jaccard similarity: 1.0000

--- COLUMN STATISTICS (Common Columns) ---
Column                         Local Type           Ref Type             Local Nulls  Ref Nulls    Null Diff 
----------------------------------------------------------------------------------------------------------
datasourceId                   string               string               0            0            +0        
datasourceVersion              string               string               0            0            +0        
datatypeId                     string               string               0            0            +0        
directionality                 string               string               26061        26075        -14       
drugs                          array<struct<drugFromSource:string,drugId:string>> array<struct<drugFromSource:string,drugId:string>> 0            0            +0        
evidenceLevel                  string               string               0            0            +0        
genotype                       string               string               0            0            +0        
genotypeAnnotationText         string               string               0            0            +0        
genotypeId                     string               string               9384         9412         -28       
haplotypeFromSourceId          string               string               24787        24802        -15       
haplotypeId                    string               string               23515        23520        -5        
isDirectTarget                 boolean              boolean              0            0            +0        
literature                     array<string>        array<string>        0            0            +0        
pgxCategory                    string               string               0            0            +0        
phenotypeFromSourceId          string               string               32055        32088        -33       
phenotypeText                  string               string               1125         1109         +16       
studyId                        string               string               0            0            +0        
targetFromSourceId             string               string               1622         1622         +0        
variantAnnotation              array<struct<baseAlleleOrGenotype:string,comparisonAlleleOrGenotype:string,directionality:string,effect:string,effectDescription:string,effectType:string,entity:string,id:string,literature:string>> array<struct<baseAlleleOrGenotype:string,comparisonAlleleOrGenotype:string,directionality:string,effect:string,effectDescription:string,effectType:string,entity:string,id:string,literature:string>> 10686        10691        -5        
variantFunctionalConsequenceId string               string               10912        10940        -28       
variantId                      string               string               16914        16952        -38       
variantRsId                    string               string               9290         9318         -28       

--- VALUE COMPARISON (Common Rows) ---
Comparing 1,139 common rows

Columns with differences:
  genotypeAnnotationText: 12,188,930 differences (-1070043.1% match)
    Example: {'targetFromSourceId': 'ENSG00000150594', 'local': "Patients with the GC genotype may have a decreased sedative response to dexmedetomidine as compared to patients with the CC genotype. Other genetic and clinical factors may also influence a patient's response to dexmedetomidine.", 'reference': 'Patients (mainly pediatric patients) with the CC genotype and attention deficit hyperactivity disorder (ADHD) may have a poorer response to methylphenidate treatment as compared to patients with the CG or GG genotype. However, contradictory evidence exists for this association. Studies used different scales to analyze improvement, e.g. CGI-I, ARS-IV, and other. Other genetic and clinical factors may also influence response to methylphenidate.'}
    Example: {'targetFromSourceId': 'ENSG00000150594', 'local': "Patients with the GC genotype may have a decreased sedative response to dexmedetomidine as compared to patients with the CC genotype. Other genetic and clinical factors may also influence a patient's response to dexmedetomidine.", 'reference': "Patients with the CC genotype and depressive disorder may have an increased response to milnacipran as compared to patients with the GG genotype. Other genetic and clinical factors may also influence a patient's risk of side effects when treated with milnacipran"}
  studyId: 11,961,629 differences (-1050086.9% match)
    Example: {'targetFromSourceId': 'ENSG00000150594', 'local': '1183490962', 'reference': '1183703019'}
    Example: {'targetFromSourceId': 'ENSG00000150594', 'local': '1183490962', 'reference': '1447953199'}
  genotype: 11,094,243 differences (-973933.6% match)
    Example: {'targetFromSourceId': 'ENSG00000135114', 'local': 'TT', 'reference': 'CC'}
    Example: {'targetFromSourceId': 'ENSG00000135114', 'local': 'TT', 'reference': 'CC'}
  phenotypeText: 10,230,238 differences (-898077.2% match)
    Example: {'targetFromSourceId': 'ENSG00000150594', 'local': 'decreased sedative response', 'reference': 'poorer response to methylphenidate treatment'}
    Example: {'targetFromSourceId': 'ENSG00000150594', 'local': 'decreased sedative response', 'reference': 'increased response'}
  haplotypeId: 6,633,211 differences (-582271.5% match)
    Example: {'targetFromSourceId': 'ENSG00000137364', 'local': 'TPMT*3C', 'reference': 'TPMT*9'}
    Example: {'targetFromSourceId': 'ENSG00000137364', 'local': 'TPMT*3B', 'reference': 'TPMT*9'}
  pgxCategory: 6,316,890 differences (-554499.7% match)
    Example: {'targetFromSourceId': 'ENSG00000122643', 'local': 'efficacy', 'reference': 'metabolism/pk'}
    Example: {'targetFromSourceId': 'ENSG00000122643', 'local': 'efficacy', 'reference': 'metabolism/pk'}
  haplotypeFromSourceId: 5,458,586 differences (-479143.7% match)
    Example: {'targetFromSourceId': 'ENSG00000137364', 'local': 'PA165819272', 'reference': 'PA165819279'}
    Example: {'targetFromSourceId': 'ENSG00000137364', 'local': 'PA165819271', 'reference': 'PA165819279'}
  evidenceLevel: 5,346,423 differences (-469296.2% match)
    Example: {'targetFromSourceId': 'ENSG00000137364', 'local': '1A', 'reference': '3'}
    Example: {'targetFromSourceId': 'ENSG00000137364', 'local': '1A', 'reference': '3'}
  genotypeId: 4,491,747 differences (-394258.8% match)
    Example: {'targetFromSourceId': 'ENSG00000150594', 'local': '10_111076745_G_C,G', 'reference': '10_111076745_G_C,C'}
    Example: {'targetFromSourceId': 'ENSG00000150594', 'local': '10_111076745_G_C,G', 'reference': '10_111076745_G_C,C'}
  directionality: 4,045,420 differences (-355073.0% match)
    Example: {'targetFromSourceId': 'ENSG00000137364', 'local': 'No function', 'reference': 'Uncertain function'}
    Example: {'targetFromSourceId': 'ENSG00000137364', 'local': 'No function', 'reference': 'Uncertain function'}
  variantRsId: 3,896,708 differences (-342016.6% match)
    Example: {'targetFromSourceId': 'ENSG00000137364', 'local': 'rs12201199', 'reference': 'rs1142345'}
    Example: {'targetFromSourceId': 'ENSG00000137364', 'local': 'rs1800460', 'reference': 'rs1142345'}
  variantFunctionalConsequenceId: 3,028,230 differences (-265767.4% match)
    Example: {'targetFromSourceId': 'ENSG00000135114', 'local': 'SO_0001819', 'reference': 'SO_0002073'}
    Example: {'targetFromSourceId': 'ENSG00000135114', 'local': 'SO_0001819', 'reference': 'SO_0002073'}
  variantId: 1,877,752 differences (-164759.7% match)
    Example: {'targetFromSourceId': 'ENSG00000150594', 'local': '10_111076745_G_C', 'reference': '10_111078040_C_T'}
    Example: {'targetFromSourceId': 'ENSG00000150594', 'local': '10_111076745_G_C', 'reference': '10_111077780_G_A'}
  isDirectTarget: 194,352 differences (-16963.4% match)
    Example: {'targetFromSourceId': 'ENSG00000150594', 'local': True, 'reference': False}
    Example: {'targetFromSourceId': 'ENSG00000150594', 'local': True, 'reference': False}
  phenotypeFromSourceId: 32,690 differences (-2770.1% match)
    Example: {'targetFromSourceId': 'ENSG00000228080', 'local': 'EFO_0004228', 'reference': 'EFO_0001421'}
    Example: {'targetFromSourceId': 'ENSG00000228080', 'local': 'EFO_0001421', 'reference': 'EFO_0004228'}

================================================================================
COMPARISON REPORT: TARGET_PRIORITISATION
================================================================================

--- SCHEMA COMPARISON ---
Local columns: 17
Reference columns: 17
Common columns: 17

--- ROW COUNT COMPARISON ---
Local rows: 78,725
Reference rows: 78,725
Difference: +0 (+0.00%)

--- ID OVERLAP ---
Common IDs: 78,725
Local-only IDs: 0
Reference-only IDs: 0
Jaccard similarity: 1.0000

--- COLUMN STATISTICS (Common Columns) ---
Column                         Local Type           Ref Type             Local Nulls  Ref Nulls    Null Diff 
----------------------------------------------------------------------------------------------------------
geneticConstraint              double               double               60863        60863        +0        
hasHighQualityChemicalProbes   int                  int                  77796        77796        +0        
hasLigand                      int                  int                  59845        59845        +0        
hasPocket                      int                  int                  59845        59845        +0        
hasSafetyEvent                 int                  int                  77780        77780        +0        
hasSmallMoleculeBinder         int                  int                  59845        59845        +0        
hasTEP                         int                  int                  78684        78684        +0        
isCancerDriverGene             int                  int                  78391        78391        +0        
isInMembrane                   int                  int                  59704        59704        +0        
isSecreted                     int                  int                  59704        59704        +0        
maxClinicalTrialPhase          double               double               77191        77161        +30       
mouseKOScore                   double               double               65945        65945        +0        
mouseOrthologMaxIdentityPercentage double               double               59472        59472        +0        
paralogMaxIdentityPercentage   double               double               57868        57868        +0        
targetId                       string               string               0            0            +0        
tissueDistribution             double               double               59619        59619        +0        
tissueSpecificity              double               double               59619        59619        +0        

--- VALUE COMPARISON (Common Rows) ---
Comparing 78,725 common rows

Columns with differences:
  maxClinicalTrialPhase: 97 differences (99.9% match)
    Example: {'targetId': 'ENSG00000006432', 'local': 0.5, 'reference': 0.75}
    Example: {'targetId': 'ENSG00000014257', 'local': 1.0, 'reference': 0.75}

================================================================================
SUMMARY
================================================================================
  drug_mechanism_of_action: Rows: 6,505 vs 6,505 | Schema: 0 local-only, 0 ref-only
  drug_molecule: Rows: 18,475 vs 18,475 | Schema: 0 local-only, 4 ref-only
  drug_warning: Rows: 2,302 vs 2,302 | Schema: 0 local-only, 0 ref-only
  chemical_probes: Rows: 5,287 vs 5,287 | Schema: 0 local-only, 0 ref-only
  pharmacogenomics: Rows: 32,805 vs 32,838 | Schema: 0 local-only, 0 ref-only
  target_prioritisation: Rows: 78,725 vs 78,725 | Schema: 0 local-only, 0 ref-only

@javfg javfg force-pushed the 4247_drug_adjustments branch 4 times, most recently from 1e3a22c to 9c134f1 Compare February 26, 2026 16:03
@DSuveges DSuveges force-pushed the 4247_drug_adjustments branch from 5110139 to 3278175 Compare February 27, 2026 14:22
javfg and others added 4 commits February 27, 2026 14:24
Co-authored-by: Irene Lopez <irene.lopezs@protonmail.com>
Co-authored-by: David Ochoa <ochoa@ebi.ac.uk>
@DSuveges DSuveges force-pushed the 4247_drug_adjustments branch from 3278175 to 56009e8 Compare February 27, 2026 14:25
@DSuveges DSuveges marked this pull request as ready for review February 27, 2026 14:26
@DSuveges DSuveges merged commit 158c115 into main Feb 27, 2026
5 of 6 checks passed
@DSuveges DSuveges deleted the 4247_drug_adjustments branch February 27, 2026 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants