Add relevance metrics including pruned tokens to MS Marco ranking track #525

kderusso · 2023-11-28T13:36:04Z

Adds recall metrics for query pruning using the weighted tokens query and precomputed ELSER tokens.

…e the recall against the original weighted terms query

…NDCG based on a smaller version of the queries used to test performance.

kderusso · 2024-01-04T16:10:52Z

msmarco-passage-ranking/operations/default.json

+   "track_total_hits": false
+},
+{
+   "name": "pruned-text-expansion-search-maxwand-disabled",


Added operations here

kderusso · 2024-01-04T16:11:03Z

msmarco-passage-ranking/operations/default.json

+   "track_total_hits": false
+},
+{
+   "name": "pruned-weighted-terms-recall-10-10",


Added operations here

kderusso · 2024-01-04T16:11:34Z

msmarco-passage-ranking/operations/default.json

This file was auto-formatted for correct json indentation. I have left comments starting at the two places where I made changes.

demjened

LGTM with a few minor comments

msmarco-passage-ranking/track.py

Mikep86

Looking good! Left a few minor comments.

elastic/Makefile

msmarco-passage-ranking/track.json

msmarco-passage-ranking/track.py

pyproject.toml

msmarco-passage-ranking/README.md

msmarco-passage-ranking/challenges/default.json

msmarco-passage-ranking/track.py

gbanasiak · 2024-01-05T12:29:04Z

msmarco-passage-ranking/track.py

+def generate_pruned_query(field, query_expansion, boost=1.0):
+    return {
+        "query": {
+            "weighted_tokens": {


I've tried running the modified track and noticed weighted_tokens query is not available in 8.11.x. Is this planned for 8.13.0? We need to apply the right versioning as per https://esrally.readthedocs.io/en/stable/track.html#custom-track-repositories. The most recent non-master branch in the track repo is 8.11. We need to bump up to 8.13 before merging this change.

Correct, this is for 8.13.0.

gbanasiak

LGTM. Left small remark regarding _tools/requirements.txt file.

gbanasiak · 2024-01-08T09:53:58Z

msmarco-passage-ranking/_tools/requirements.txt

+pytrec_eval
+numpy


Should the presence of this file be mentioned in README? Also, please align version pinning with dependencies section.

Mikep86

LGTM!

pmpailis closed this Nov 30, 2023

pmpailis mentioned this pull request Nov 30, 2023

Adding KnnVectorStore to dense_vector track for supporting knn-recall operations #518

Merged

gbanasiak reopened this Dec 1, 2023

kderusso force-pushed the kderusso/msmarco-passage-ranking-iter-from-jim branch from d33ff08 to 1574135 Compare January 3, 2024 13:17

jimczi and others added 2 commits January 4, 2024 11:08

Add test for the pruned weighted terms queries and a runner to comput…

09a3031

…e the recall against the original weighted terms query

Update tests to use the merged weighted_tokens query, and calcuate …

ac79e54

…NDCG based on a smaller version of the queries used to test performance.

kderusso force-pushed the kderusso/msmarco-passage-ranking-iter-from-jim branch from 105b19a to ac79e54 Compare January 4, 2024 16:10

kderusso commented Jan 4, 2024

View reviewed changes

kderusso changed the title ~~WIP Weighted term benchmarking~~ Add relevance metrics including pruned tokens to MS Marco ranking track Jan 4, 2024

kderusso marked this pull request as ready for review January 4, 2024 16:14

Fix error

c554c6b

kderusso requested review from Mikep86, afoucret, demjened, ioanatia, jimczi and saikatsarkar056 January 4, 2024 16:20

demjened approved these changes Jan 4, 2024

View reviewed changes

msmarco-passage-ranking/track.py Outdated Show resolved Hide resolved

msmarco-passage-ranking/track.py Outdated Show resolved Hide resolved

kderusso added 5 commits January 4, 2024 13:19

PR feedback

1ebea6c

Linting

d044ee2

Add dependencies

7e6f151

Update where we set dependencies

ef4d8ed

Add dependencies back to top level project, and to makefile

e3e819a

Mikep86 reviewed Jan 4, 2024

View reviewed changes

saikatsarkar056 reviewed Jan 5, 2024

View reviewed changes

msmarco-passage-ranking/challenges/default.json Show resolved Hide resolved

gbanasiak reviewed Jan 5, 2024

View reviewed changes

msmarco-passage-ranking/track.py Outdated Show resolved Hide resolved

gbanasiak reviewed Jan 5, 2024

View reviewed changes

Remove dependencies from places they are not needed

99b25c4

kderusso added 5 commits January 5, 2024 09:24

PR feedback - add names to challenges

3ffe4ef

PR feedback

961f79a

Updates so this works in 3.8

5e852b2

Update README

87a073a

Downgrade numpy

7e750de

gbanasiak approved these changes Jan 8, 2024

View reviewed changes

gbanasiak mentioned this pull request Jan 8, 2024

Update sync workflow to 8.13 #547

Merged

Mikep86 approved these changes Jan 8, 2024

View reviewed changes

kderusso merged commit 30a9b98 into elastic:master Jan 10, 2024

		pytrec_eval
		numpy

Add relevance metrics including pruned tokens to MS Marco ranking track #525

Add relevance metrics including pruned tokens to MS Marco ranking track #525

Uh oh!

Conversation

kderusso commented Nov 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kderusso Jan 4, 2024

Choose a reason for hiding this comment

Uh oh!

kderusso Jan 4, 2024

Choose a reason for hiding this comment

Uh oh!

kderusso Jan 4, 2024

Choose a reason for hiding this comment

Uh oh!

demjened left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Mikep86 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gbanasiak Jan 5, 2024

Choose a reason for hiding this comment

Uh oh!

kderusso Jan 5, 2024

Choose a reason for hiding this comment

Uh oh!

gbanasiak left a comment

Choose a reason for hiding this comment

Uh oh!

gbanasiak Jan 8, 2024

Choose a reason for hiding this comment

Uh oh!

Mikep86 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

kderusso commented Nov 28, 2023 •

edited

Loading