Skip to content

fix: resolving circular dependencies in MoA steps#69

Merged
DSuveges merged 2 commits intomainfrom
ds_moa_fix
Mar 2, 2026
Merged

fix: resolving circular dependencies in MoA steps#69
DSuveges merged 2 commits intomainfrom
ds_moa_fix

Conversation

@DSuveges
Copy link
Contributor

@DSuveges DSuveges commented Mar 2, 2026

Context

Upon assembling the unified pipeline, it was apparent that the rewrite introduced a circular dependency. Mechanism of action required target, target required safety, safety requires pharmacogenomics, pharmacogenomics requires mechanism of action.

We decided to use the intermediate target/ensembl dataset as it already contains swissprot and trembl protein identifiers that are used to link ChEMBL target with our target index.

!! This is a temporary solution, the real solution is to separate target etl step from unnecessary annotations.

@DSuveges
Copy link
Contributor Author

DSuveges commented Mar 2, 2026

The resulting dataset looks good:

In [3]: spark.read.parquet('/Users/dsuveges/project_data/releases/26.03/output/drug_mechanism_of_action/').printSchema()
root
 |-- actionType: string (nullable = true)
 |-- mechanismOfAction: string (nullable = true)
 |-- chemblIds: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- targetName: string (nullable = true)
 |-- targetType: string (nullable = true)
 |-- targets: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- references: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- source: string (nullable = true)
 |    |    |-- ids: array (nullable = true)
 |    |    |    |-- element: string (containsNull = true)
 |    |    |-- urls: array (nullable = true)
 |    |    |    |-- element: string (containsNull = true)


In [4]: spark.read.parquet('/Users/dsuveges/project_data/releases/26.03/output/drug_mechanism_of_action_old/').printSchema()
root
 |-- actionType: string (nullable = true)
 |-- mechanismOfAction: string (nullable = true)
 |-- chemblIds: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- targetName: string (nullable = true)
 |-- targetType: string (nullable = true)
 |-- targets: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- references: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- source: string (nullable = true)
 |    |    |-- ids: array (nullable = true)
 |    |    |    |-- element: string (containsNull = true)
 |    |    |-- urls: array (nullable = true)
 |    |    |    |-- element: string (containsNull = true)


In [5]: spark.read.parquet('/Users/dsuveges/project_data/releases/26.03/output/drug_mechanism_of_action_old/').count()
Out[5]: 6505

In [6]: spark.read.parquet('/Users/dsuveges/project_data/releases/26.03/output/drug_mechanism_of_action/').count()
Out[6]: 6505

@DSuveges DSuveges requested a review from ireneisdoomed March 2, 2026 14:29
Copy link
Contributor

@ireneisdoomed ireneisdoomed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one!

Making input definition more explicit

Co-authored-by: Irene López Santiago <45119610+ireneisdoomed@users.noreply.github.com>
@DSuveges DSuveges merged commit 539bbe8 into main Mar 2, 2026
3 checks passed
@DSuveges DSuveges deleted the ds_moa_fix branch March 2, 2026 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants