-
Notifications
You must be signed in to change notification settings - Fork 73
Closed as not planned
Closed as not planned
Copy link
Labels
Description
What is the bug?
Retrieving documents using a wildcard query with case_insensitive: true fails for some languages (Turkish and Ukrainian for example), while working correctly for others (English and German for example). This failure occurs even when the query value and casing match the stored data exactly.
How can one reproduce the bug?
- create the index:
curl -X PUT "http://localhost:9200/test_idx" -H 'Content-Type: application/json' -d'
{
"mappings": {
"dynamic": false,
"properties": {
"name": {
"type": "wildcard",
"doc_values": false
}
}
}
}
'
- Insert test documents (English, Turkish, Ukranian, German)
# English (Control)
curl -X POST "http://localhost:9200/test_idx/_doc/" -H 'Content-Type: application/json' -d'{"name": "Alice Wonderland"}'
# German
curl -X POST "http://localhost:9200/test_idx/_doc/" -H 'Content-Type: application/json' -d'{"name": "Heinz Meißner"}'
# Turkish
curl -X POST "http://localhost:9200/test_idx/_doc/" -H 'Content-Type: application/json' -d'{"name": "Gökçe İrmak"}'
# Ukrainian
curl -X POST "http://localhost:9200/test_idx/_doc/" -H 'Content-Type: application/json' -d'{"name": "Олександр Зінченко"}'
- Run the GET queries with wildcard + case_insensitive:
use this query template
curl -X GET "http://localhost:9200/test_idx/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"wildcard": {
"name": {
"value": "*<NameString>*",
"case_insensitive": true
}
}
}
}
'
TEST RESULTS SUMMARY:
for example the failed Turkish query looks like this:
curl -X GET "http://localhost:9200/test_idx/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"wildcard": {
"name": {
"value": "*Gökçe İrmak*",
"case_insensitive": true
}
}
}
}
'
{
"took" : 44,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
What is the expected behavior?
All documents should be returned when the wildcard value matches the stored string. The character set should not impact the retrieval capability of the wildcard type, especially when using exact-case strings.
What is your host/environment?
- OpenSearch Version:
3.5.0(latest) - Deployment method:
Docker Compose
Do you have any additional context?
- removing the case_insensitive (or setting it to
falsereturns these languages as expected:
curl -X GET "http://localhost:9200/test_idx/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"wildcard": {
"name": {
"value": "*Gökçe İrmak*",
"case_insensitive": false
}
}
}
}
'
{
"took" : 24,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "test_idx",
"_id" : "_I5m3ZwBBv_NcRY78s9B",
"_score" : 1.0,
"_source" : {
"name" : "Gökçe İrmak"
}
}
]
}
}
Reactions are currently unavailable