diff --git a/docs-site/content/30.0/api/README.md b/docs-site/content/30.0/api/README.md index c28cff7d..2768f842 100644 --- a/docs-site/content/30.0/api/README.md +++ b/docs-site/content/30.0/api/README.md @@ -18,7 +18,11 @@ To learn how to install and run Typesense, see the [Guide section](https://types This release contains important new features, performance improvements and bug fixes. ### New Features - +- Support faceting on joined reference field ([Docs](https://typesense.org/docs/30.0/api/search.html#facet-referencing)) +- Show related_docs count for a document in joined collection with `include_fields` param [PR#2461] (https://github.com/typesense/typesense/pull/2461) +- Make facet sampling dynamic by adding `facet_sample_slope` param ([Docs](https://typesense.org/docs/30.0/api/search.html#faceting-parameters)) +- Support sorting and limit on joined fields with include_fields param([Docs](https://typesense.org/docs/30.0/api/joins.html#Sorting-and-limiting-on-joined-collection-docs)) +- Support `group_by` for Union search ([Docs](https://typesense.org/docs/30.0/api/federated-multi-search.html#union-search)) ### Enhancements @@ -27,6 +31,11 @@ This release contains important new features, performance improvements and bug f - Add support for Azure OpenAI models in Natural Language Search ([Docs](https://typesense.org/docs/30.0/api/natural-language-search.html#supported-model-types)). - Add configurable token truncation for string fields to improve exact match filtering on long strings ([Docs](https://typesense.org/docs/30.0/api/collections.html#field-parameters)). - Add GCP service account authentication for auto-embedding with GCP models ([Docs](https://typesense.org/docs/30.0/api/vector-search.html#service-account-authentication)). +- Use Transliterator objects pool to enhance tokenization performance of cyrilic and chinese langauges [PR#2412] (https://github.com/typesense/typesense/pull/2412) +- Support dynamic `facet_return_parent` fields ([Docs](https://typesense.org/docs/30.0/api/search.html#faceting-parameters)) +- Support `pinned_hits` with union search [PR#2422] (https://github.com/typesense/typesense/pull/2422) +- Support altering reference fields [PR#2445] (https://github.com/typesense/typesense/pull/2445) +- Filter our duplicates when using `Union` search with flag `remove_duplicates`. Defaulted to true. ### Bug Fixes - Fix parsing of `_eval()` expressions when backticks are used to wrap strings containing parentheses. @@ -40,6 +49,8 @@ This release contains important new features, performance improvements and bug f - Set user agent when initializing HTTP client for external API calls. - Fix hyphen handling in negation searches to only apply special treatment when token starts with `-`. - Fix query sub-tokenization to respect field-level `symbols_to_index` and `token_separators` configuration. +- Fixed the override matching for wildcard queries, dynamic filter, dynamic sort, and placeholders. +- Fix sort using `_eval()` for `id` fields ### Deprecations / behavior changes diff --git a/docs-site/content/30.0/api/federated-multi-search.md b/docs-site/content/30.0/api/federated-multi-search.md index a30248ac..53490f61 100644 --- a/docs-site/content/30.0/api/federated-multi-search.md +++ b/docs-site/content/30.0/api/federated-multi-search.md @@ -423,6 +423,35 @@ Since the results of each search are merged into one final result, union differs } ``` will return an error since the types (`user_name: string`, `rating: float`) are different. +

+Union search removes duplicates by default. Which can be turned off using flag `remove_duplicates: false` + +### Grouping with Union +Union supports `group_by` operations with flag `group_by` params in searches like below, + +```curl +curl 'http://localhost:8108/multi_search?page=1&per_page=2' -X POST \ + -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -d ' +{ + "union": true, + "searches": [ + { + "collection": "posts", + "q": "*", + "filter_by": "user_name:stark_industry", + "group_by": "content", + "group_limit": 2 + }, + { + "collection": "comments", + "q": "*", + "filter_by": "user_name:rogers_steve", + "group_by": "content" + } + ] +}' +``` +**NOTE**: Union searches with grouping should be uniform in shape. i.e either all searches should contain grouping params or none of them. ## `multi_search` Parameters diff --git a/docs-site/content/30.0/api/joins.md b/docs-site/content/30.0/api/joins.md index 4247cbfc..20a0117e 100644 --- a/docs-site/content/30.0/api/joins.md +++ b/docs-site/content/30.0/api/joins.md @@ -676,3 +676,31 @@ curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -X GET \ ``` This will return the book document along with the author's name from the referenced `authors` collection. + +## Sorting and limiting on joined collection docs + +You can sort and limit the joined collection docs using `sort_by` and `limit` param in `include_fields`. + +For example, if you're searching in `authors` collection and including fields from `books` collection, you can sort the docs by `id` like below, + +```json +{ + "collection": "authors", + "q": "*", + "filter_by": "$books(id:*)", + "include_fields": "$books(*, sort_by: id: desc)" +} +``` +Here, the docs will be sorted by their `id` in `desc` order. + +Similarly, if you want to limit the docs in referenced collection then you can do it like follwing, + +```json +{ + "collection": "authors", + "q": "*", + "filter_by": "$books(id:*)", + "include_fields": "$books(*, limit:5)" +} +``` +Which will limit the doc count by 5. \ No newline at end of file diff --git a/docs-site/content/30.0/api/search.md b/docs-site/content/30.0/api/search.md index b8db5453..82f93c96 100644 --- a/docs-site/content/30.0/api/search.md +++ b/docs-site/content/30.0/api/search.md @@ -283,14 +283,17 @@ When a `string[]` field is queried, the `highlights` structure will include the | Parameter | Required | Description | |:-----------------------|:---------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| facet_by | no | A list of fields that will be used for faceting your results on. Separate multiple fields with a comma.

Facet values can be sorted in alphabetical order for display by associating a `sort_by` parameter, e.g. `phone(sort_by: _alpha:asc)`. You can also sort facets on the value of a sibling field like this: `recipe.name(sort_by: recipe.calories:asc)`.

To facet on numerical ranges, you can specify labels for the ranges, e.g. `"facet_by": "rating(Average:[0, 3], Good:[3, 4], Great:[4, 5])"` ([read more](#facet-ranges)) | +| facet_by | no | A list of fields that will be used for faceting your results on. Separate multiple fields with a comma.

Facet values can be sorted in alphabetical order for display by associating a `sort_by` parameter, e.g. `phone(sort_by: _alpha:asc)`. You can also sort facets on the value of a sibling field like this: `recipe.name(sort_by: recipe.calories:asc)`.

To facet on numerical ranges, you can specify labels for the ranges, e.g. `"facet_by": "rating(Average:[0, 3], Good:[3, 4], Great:[4, 5])"` ([read more](#facet-ranges)) +

To do faceting on referenced collection you can specify joined collection name followed by facet field like `facet_by=$Customers(product_price)`9[read more](#facet-referencing)) | | facet_strategy | no | Typesense supports two strategies for efficient faceting, and has some built-in heuristics to pick the right strategy for you. The valid values for this parameter are `exhaustive`, `top_values` and `automatic` (default).

`exhaustive`: in this strategy, once we have the list of matching documents, we’ll simply iterate through each document’s `facet_by fields`, and sum up the number of documents for each unique facet value. This is effective when the number of documents is small (less than few tens of thousands of docs) and/or when the number of facet values requested (as defined by `max_facet_values`) is large.

`top_values`: in this strategy, once we have the list of matching documents, we’ll look up each facet field’s value in a reverse index that stores a mapping of `{facet_field_value => [list of all documents that have this value]}`. We’ll then find the intersection of these two lists of documents (the list of matching documents and the list of all documents that have this facet field value), and the length of the intersected list will give us the facet count. This strategy is efficient if we have a large number of hits, since we only have to do intersections on the top facet values (the values that have the largest number of documents in the reverse index). However, if the number of facet values to fetch (as configured by `max_facet_values`) is sufficiently large and the number of hits is small, then this strategy becomes less efficient, compared to the `exhaustive` strategy. Another downside of this approach is that it will not return an exact count for `total_values` in the facet stats because we only consider only consider limited number of facets for facet count intersections.

`automatic`: Typesense will pick an ideal strategy based on the heuristics described above and is the default value for this parameter. | | max_facet_values | no | Maximum number of facet values to be returned.

Default: `10` | | facet_query | no | Facet values that are returned can now be filtered via this parameter. The matching facet text is also highlighted. For example, when faceting by `category`, you can set `facet_query=category:shoe` to return only facet values that contain the prefix "shoe".

For facet queries, if a `per_page` parameter is not specified, it will default to `0`, thereby returning only facets and not hits. If you want hits as well, be sure to set `per_page` to a non-zero value.

Use the `facet_query_num_typos` parameter to control the _fuzziness_ of this facet value filter. | | facet_query_num_typos | no | Controls the _fuzziness_ of the facet query filter. Default: `2`. | -| facet_return_parent | no | Pass a comma separated string of nested facet fields whose parent object should be returned in facet response. For e.g. when you set this to `"color.name"`, this will return the parent `color` object as parent property in the facet response. | +| facet_return_parent | no | Pass a comma separated string of nested facet fields whose parent object should be returned in facet response. For e.g. when you set this to `"color.name"`, this will return the parent `color` object as parent property in the facet response.Dynamic fields can be passed using wildcards like `"product.color.*"` which will return parent for all facets like `"product.color.red"`, `"product.color.blue"`. Pure wildcard are also supported like `facet_return_parent=*` which will return parent object of all found facet fields. | | facet_sample_percent | no | Percentage of hits that will be used to estimate facet counts.

Facet sampling is helpful to improve facet computation speed for large datasets, where the exact count is not required in the UI.

Default: `100` (sampling is disabled by default). | | facet_sample_threshold | no | Minimum number of hits above which the facet counts are sampled.

Facet sampling is helpful to improve facet computation speed for large datasets, where the exact count is not required in the UI.

Default: `0`. | +| facet_sample_slope | no | Controls how steeply we want the `facet_sample_percent` to fall as the collection grows. Useful to make `facet_sample_percent` dynamic as per collection size. If the collection grows beyond facet_sample_threshold, the percentage slides downward on a straight-line “slope”.

`facet_sample_threshold` should be non-zero for `facet_sample_slope` to be effective. +

Default: `0`. | | validate_field_names | no | Controls whether Typesense should validate if the faceted fields exist in the schema. When set to false, Typesense will not throw an error if a faceted field is missing. This is useful for programmatic faceting where not all facets may exist.

Default: `true`. | @@ -406,6 +409,26 @@ the max range value is omitted so that the `others` facet label will cover all v Faceting by range requires the field to have `sort` property enabled. This is enabled by default for all numerical fields, unless you've explicitly configured otherwise. +### Facet Referencing +For faceting on referenced joined collection, you can pass it via `facet_by` param like below, + +```curl +$(, , ...) +``` +For example, for searching in `products` collection and referencing facet field in `customers` collection, we can pass the search query like following, + +```curl +curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}"\ +"http://localhost:8108/collections/products/documents/search\ +?q=*&filter_by=$Customers(customer_id: customer_a)&facet_by=$Customers(product_price)" +``` +Note: Facet referencing won't work if all of the following are true: + * No filter references are available. + * The document has no references. + * The joined collection has no references. + + + 🔗 You'll find detailed documentation for `facet_by` in the [Faceting Parameters](#faceting-parameters) table above. ## Sort Results