Skip to content

Conversation

@kderusso
Copy link
Member

@kderusso kderusso commented Nov 28, 2023

Adds recall metrics for query pruning using the weighted tokens query and precomputed ELSER tokens.

jimczi and others added 2 commits January 4, 2024 11:08
…e the recall against the original weighted terms query
…NDCG based on a smaller version of the queries used to test performance.
@kderusso kderusso force-pushed the kderusso/msmarco-passage-ranking-iter-from-jim branch from 105b19a to ac79e54 Compare January 4, 2024 16:10
"track_total_hits": false
},
{
"name": "pruned-text-expansion-search-maxwand-disabled",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added operations here

"track_total_hits": false
},
{
"name": "pruned-weighted-terms-recall-10-10",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added operations here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was auto-formatted for correct json indentation. I have left comments starting at the two places where I made changes.

@kderusso kderusso changed the title WIP Weighted term benchmarking Add relevance metrics including pruned tokens to MS Marco ranking track Jan 4, 2024
@kderusso kderusso marked this pull request as ready for review January 4, 2024 16:14
Copy link
Contributor

@demjened demjened left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few minor comments

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Left a few minor comments.

def generate_pruned_query(field, query_expansion, boost=1.0):
return {
"query": {
"weighted_tokens": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried running the modified track and noticed weighted_tokens query is not available in 8.11.x. Is this planned for 8.13.0? We need to apply the right versioning as per https://esrally.readthedocs.io/en/stable/track.html#custom-track-repositories. The most recent non-master branch in the track repo is 8.11. We need to bump up to 8.13 before merging this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this is for 8.13.0.

Copy link
Contributor

@gbanasiak gbanasiak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left small remark regarding _tools/requirements.txt file.

Comment on lines +1 to +2
pytrec_eval
numpy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the presence of this file be mentioned in README? Also, please align version pinning with dependencies section.

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kderusso kderusso merged commit 30a9b98 into elastic:master Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants