Description: The current search implementation creates a discovery hurdle for users who search using natural language (spaces) for datasets that utilize underscores in their identifiers.
Steps to Reproduce:
1 - Search OpenML for house prices nominal.
2 - Notice that the top results are relevant but the exact match house_prices_nominal is missing or buried.
3 - Search for house_prices_nominal with underscores.
4 - Notice it appears immediately as the top result.
Impact: This is a high-priority UX issue because users naturally search with spaces. Having to know the exact underscore placement makes the search feel "broken" or "incomplete" for a large number of datasets.
Root Cause Analysis: The current Elasticsearch query builder appears to treat underscores as part of a single token rather than a word delimiter. When a user types a space, the analyzer doesn't correlate it with the underscore in the record's name.
Description: The current search implementation creates a discovery hurdle for users who search using natural language (spaces) for datasets that utilize underscores in their identifiers.
Steps to Reproduce:
1 - Search OpenML for house prices nominal.
2 - Notice that the top results are relevant but the exact match house_prices_nominal is missing or buried.
3 - Search for house_prices_nominal with underscores.
4 - Notice it appears immediately as the top result.
Impact: This is a high-priority UX issue because users naturally search with spaces. Having to know the exact underscore placement makes the search feel "broken" or "incomplete" for a large number of datasets.
Root Cause Analysis: The current Elasticsearch query builder appears to treat underscores as part of a single token rather than a word delimiter. When a user types a space, the analyzer doesn't correlate it with the underscore in the record's name.