Skip to content
This repository was archived by the owner on Dec 17, 2025. It is now read-only.
This repository was archived by the owner on Dec 17, 2025. It is now read-only.

3rd party detection on link click incorrect #2

@malexmave

Description

@malexmave

The 3rd party detection when clicking links is currently a bit flaky. When a newsletter links out to a different page, like Twitter, we try to avoid putting the final destination of the link on the "contacted 3rd parties" list, assuming that users know what they are getting into when clicking such a link.

However, the way this is currently implemented only removes the last request from the log. If the service forwards internally before (i.e. newsletter => http://twitter.com => https://twitter.com), Twitter will still be added as a third party.

Proposed solution: Do not save the request chain to the database directly. Instead, create local objects for each request, and once the end of the chain was reached and we determine that twitter.com should be deleted from the chain, traverse the list backwards and delete all instanced of twitter.com from the chain. Save the remaining requests into the database.

Additional problem: Existing mails already have this artifact in their dataset. We will either need to re-crawl all emails (with all the problems that entails), or write a clean-up script that finds these dangling references and deletes them (without deleting actual tracking stuff). I have one or two ideas on how to achieve that, but will have to play around with it for a while to see if it works.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions