Skip to content

Conversation

@brianjaustin
Copy link
Member

Pull Request Checklist

Issue

https://otwarchive.atlassian.net/browse/AO3-6304

Purpose

Split the tag fields in Elasticsearch into _restricted and _public fields; the former is queried when a logged in user searches/filters, the latter when a guest does the same. The query also replaces tags_<visibility> with tag, which is then mapped to the correct tags_<visibility> to allow existing searches like "tag:Character" but prevent searches that directly reference the visibility-sensitive fields.

Due to the internal ES changes, bookmarkables will need to be reindexed.

Note: I have not added tests yet. A feature test is probably worthwhile, regardless of implementation details, but other tests depend on if we like this approach. Once folks have had a chance to offer feedback, I will go through and fix/add unit tests.

@brianjaustin brianjaustin added Priority: High Needs Reindex Contains changes that will require Elasticsearch reindexing labels May 8, 2025
@lydia-theda
Copy link
Contributor

Not sure if this is covered already or if you'll have to account for it specifically, but since tags on a work hidden by admin should not be searchable by users or guests, searching/indexing series tags also shouldn't include tags only used on hidden works in the series.

Copy link
Contributor

@Bilka2 Bilka2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, it's me, the reviewer. I have reviewed

Copy link
Contributor

@Bilka2 Bilka2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one implementation question, rest is minor things and tests!

# escape_tags_field("tag:1234") => general_tags:1234
def escape_restrictable_fields(query)
# Special-case for the "tag" convenience field name first, then sanitize visibility level.
query.gsub("tag:", "public_tags:").gsub(/(?:(?:public|general)_)?([^\s]+):/) do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First thought:

Suggested change
query.gsub("tag:", "public_tags:").gsub(/(?:(?:public|general)_)?([^\s]+):/) do
query.gsub("tag:", "public_tags:").gsub(/(?:(?:public|general)_)?(\w+):/) do

Second thought:
This could use RESTRICTABLE_FIELDS instead of \w to match less fields. E.g. right now this also matches foo:123 which we don't really need. (And then we don't need to run that regex in restrictable_field_name)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works... ish. I have to do a bit of annoying hackery because there doesn't seem to be an "easy" way to inject one regex into another (probably a good thing usually). I have something that makes this second regex ever so slightly more complicated, but I think it's OK?

Copy link
Contributor

@Bilka2 Bilka2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More regex noodling

# escape_tags_field("tag:1234") => general_tags:1234
def escape_restrictable_fields(query)
# Special-case for the "tag" convenience field name first, then sanitize visibility level.
query.gsub("tag:", "public_tags:").gsub(/(?:(?:public|general)_)?((?:#{RESTRICTABLE_FIELDS})(?:_ids)?):/) do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so I think the tests are messed up because this code here is overly permissive. I think in this case it might be easier to give up using Tag::FILTERS and hardcode it. That would also make brakeman happy. Here is an attempt:

Suggested change
query.gsub("tag:", "public_tags:").gsub(/(?:(?:public|general)_)?((?:#{RESTRICTABLE_FIELDS})(?:_ids)?):/) do
field_names = "rating_ids|archive_warning_ids|category_ids|fandom_ids|relationship_ids|character_ids|freeform_ids|filter_ids|tags"
query.gsub("tag:", "public_tags:").gsub(/(?:(?:public|general)_)?(#{field_names}):/) do

I also kind of like this because it gives a very direct overview which fields in the index this affects - something we'd otherwise would have to hunt for in the tests or other files, if the coder even knows where to look

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking about that as well. I think it makes sense (and bonus, less being clever with regex... which is usually good to avoid 😆 )

Copy link
Contributor

@Bilka2 Bilka2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay!

@Bilka2 Bilka2 merged commit 2497e9e into otwcode:master Jan 6, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs Reindex Contains changes that will require Elasticsearch reindexing Priority: High Reviewed: Ready to Merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants