New Workflow: LDA then XGBoost #155

jcharkow · 2025-08-07T20:54:22Z

New workflow which first runs LDA and then runs XGBoost using the LDA results as the main score. This helps prevent pi0 errors that run into with XGBoost.

Overall, the results seem quite comparable to just running XGBoost on my dataset.

New workflow which first run LDA and then run XGBoost using the LDA results as the main score. This helps prevent overfitting with XGBoost, results pretty comparable to XGBoost

pyprophet/cli/score.py

singjc

Thanks for the addition! I added some comments/suggestions. I am wondering/concerned how the multi learner does with over-fitting, because it seems to basically perform the learning and scoring twice on the same data for ss num iters and xval num iters. Can you add an example output of running LDA, XGBoost vs LDA_XGBoost, and show the score distributions and pp plots, if not too much work.

I was also thinking we should make PyProphetMultiLearner abstract, so that we can open it up to different kinds of combinations for multi sequence learners.

pyprophet/scoring/runner.py

singjc · 2025-08-11T23:01:48Z

pyprophet/scoring/runner.py

+            # remove columns that are not needed for LDA
+            table_lda = self.table.drop(columns=["var_precursor_charge", "var_product_charge", "var_transition_count"], errors='ignore')
+
+            (result_lda, scorer_lda, weights_lda) = PyProphet(config_lda).learn_and_apply(table_lda)


Will this run the full learning and scoring on the ss num iters and xval num iters, and then do a second pass with XGBoost with the same data for ss num iters and xval num iters? I am wondering if this results in any over fitting?

I don't think it overfits however, it might be unnecessary to do that many iterations.

pyprophet/cli/score.py

jcharkow · 2025-08-12T19:57:14Z

Doing eFDR / FDR identification curves with different workflows we can se that LDA_XGBoost is quite similar to XGBoost in terms of overfitting and actually overfits slightly less than XGBoost but all results look reasonable.

Here are PyProphet reports for different classifiers for diaPASEF single injection with an experimental library.
pyprophet_reports.zip

Co-authored-by: Justin Sing <32938975+singjc@users.noreply.github.com>

Copilot

Pull Request Overview

This PR introduces a new hybrid workflow that combines Linear Discriminant Analysis (LDA) and XGBoost classifiers to improve scoring performance and prevent pi0 errors commonly encountered with standalone XGBoost.

Key changes include:

Addition of LDA_XGBoost as a new classifier option that runs LDA first, then uses LDA scores as input to XGBoost
Implementation of a new multi-classifier learning framework with abstract base classes
Test coverage for the new workflow including regression test outputs

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
pyprophet/cli/score.py	Adds LDA_XGBoost to classifier choices and implements execution logic
pyprophet/_config.py	Updates type annotations and documentation for new classifier option
pyprophet/scoring/runner.py	Implements PyProphetMultiLearner base class and LDA_XGBoostMultiLearner
pyprophet/io/_base.py	Extends XGBoost-specific feature handling to include LDA_XGBoost
tests/test_pyprophet_score.py	Adds test case for the new LDA_XGBoost workflow
tests/_regtest_outputs/test_pyprophet_score.test_osw_11.out	Expected test output for LDA_XGBoost test case

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

pyprophet/cli/score.py

pyprophet/scoring/runner.py

jcharkow added 3 commits July 30, 2025 17:50

feature: LDA then XGBoost

15f0e9c

New workflow which first run LDA and then run XGBoost using the LDA results as the main score. This helps prevent overfitting with XGBoost, results pretty comparable to XGBoost

test: add test for pyprophet_lda

0388991

minor: make log more verbose

8df4a91

singjc reviewed Aug 11, 2025

View reviewed changes

pyprophet/cli/score.py Show resolved Hide resolved

singjc requested changes Aug 11, 2025

View reviewed changes

jcharkow and others added 4 commits August 12, 2025 15:57

remove extra line

f6d48dc

Co-authored-by: Justin Sing <32938975+singjc@users.noreply.github.com>

refactor: make MultiLearner an abstract class

ac82df1

apply comments: main score column renaming

493c343

apply comments: add LDA_XGBoost to config

2961a67

singjc requested a review from Copilot August 19, 2025 17:12

Copilot AI reviewed Aug 19, 2025

View reviewed changes

pyprophet/cli/score.py Outdated Show resolved Hide resolved

pyprophet/cli/score.py Show resolved Hide resolved

pyprophet/scoring/runner.py Outdated Show resolved Hide resolved

pyprophet/scoring/runner.py Show resolved Hide resolved

pyprophet/scoring/runner.py Show resolved Hide resolved

apply copilot suggestions

224d45b

jcharkow requested a review from singjc August 19, 2025 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New Workflow: LDA then XGBoost #155

New Workflow: LDA then XGBoost #155

Uh oh!

jcharkow commented Aug 7, 2025

Uh oh!

Uh oh!

singjc left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

singjc Aug 11, 2025

Uh oh!

jcharkow Aug 12, 2025

Uh oh!

Uh oh!

jcharkow commented Aug 12, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

New Workflow: LDA then XGBoost #155

Are you sure you want to change the base?

New Workflow: LDA then XGBoost #155

Uh oh!

Conversation

jcharkow commented Aug 7, 2025

Uh oh!

Uh oh!

singjc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

singjc Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

jcharkow Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jcharkow commented Aug 12, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants