Skip to content

fix(transform): use scheme as domain for file:// URLs#129

Merged
ErikBjare merged 1 commit intoActivityWatch:masterfrom
TimeToBuildBob:fix/split-url-file-domain
Feb 27, 2026
Merged

fix(transform): use scheme as domain for file:// URLs#129
ErikBjare merged 1 commit intoActivityWatch:masterfrom
TimeToBuildBob:fix/split-url-file-domain

Conversation

@TimeToBuildBob
Copy link
Contributor

@TimeToBuildBob TimeToBuildBob commented Feb 27, 2026

Summary

  • When split_url_events processes file://, about:, or other URLs without a netloc, it now uses the URL scheme as $domain instead of an empty string
  • This prevents all non-web URLs from silently clustering together as a blank entry in "Top Browser Domains"
  • For example, file:///home/user/doc.pdf now gets $domain = "file" instead of $domain = ""

Test plan

  • Updated existing file:// test case to expect $domain == "file" instead of ""
  • Added test case for about:blank (expects $domain == "about")
  • All 156 tests pass (2 skipped, pre-existing)

Fixes #67


Important

split_url_events() now uses the URL scheme as $domain for URLs without a netloc, preventing clustering as a blank entry.

  • Behavior:
    • In split_url_events.py, split_url_events() now uses the URL scheme as $domain for URLs without a netloc (e.g., file://, about:).
    • Prevents non-web URLs from clustering as a blank entry in "Top Browser Domains".
  • Tests:
    • Updated test_url_parse_event() in test_transforms.py to expect $domain == "file" for file:// URLs.
    • Added test case for about:blank to expect $domain == "about".
    • All tests pass, ensuring correct behavior for URLs without a netloc.

This description was created by Ellipsis for a4dd1e3. You can customize this summary. It will automatically update as commits are pushed.

When splitting URLs, file://, about:, and other URLs without a netloc
produced an empty string for $domain. This caused all such events to
cluster together as a single empty entry in "Top Browser Domains".

Now falls back to using the URL scheme (e.g. "file", "about") as the
domain when netloc is empty. This groups local file activity under a
visible "file" domain label instead of an invisible empty string.

Fixes ActivityWatch#67
Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to a4dd1e3 in 13 seconds. Click for details.
  • Reviewed 54 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_j7JlFqI4J6cLqwhv

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@greptile-apps
Copy link

greptile-apps bot commented Feb 27, 2026

Greptile Summary

This PR fixes how non-web URLs (file://, about:, etc.) are handled in the URL parser by using their scheme as the $domain value instead of an empty string. Previously, all URLs without a netloc would get an empty domain, causing them to cluster together in "Top Browser Domains" visualizations. Now each scheme type gets its own identifiable domain (e.g., file:///path$domain = "file", about:blank$domain = "about").

The fix is backward compatible - regular HTTP/HTTPS URLs with netloc continue to work exactly as before, including the existing "www." prefix stripping logic. The change only affects URLs that previously had empty domains, making them more useful for analytics and reporting.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The change is well-tested (existing tests updated, new test added), backward compatible for web URLs, and solves a real issue. The logic is straightforward with proper fallback handling, and all 156 tests pass.
  • No files require special attention

Important Files Changed

Filename Overview
aw_transform/split_url_events.py Adds conditional logic to use scheme as $domain for URLs without netloc (file://, about:, etc.), preventing empty domain clustering
tests/test_transforms.py Updates file:// test expectation and adds about:blank test case to verify scheme-as-domain behavior

Last reviewed commit: a4dd1e3

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

TimeToBuildBob added a commit to TimeToBuildBob/aw-server-rust that referenced this pull request Feb 27, 2026
When splitting URLs, file://, about:, and other URLs without a host
produced an empty string for $domain. This caused all such events to
cluster together as a single empty entry in "Top Browser Domains".

Now falls back to using the URL scheme (e.g. "file", "about") as the
domain when host is None. Matches the corresponding fix in aw-core
(ActivityWatch/aw-core#129).

Fixes ActivityWatch/aw-core#67
@ErikBjare ErikBjare merged commit 9e2ad03 into ActivityWatch:master Feb 27, 2026
4 checks passed
ErikBjare pushed a commit to ActivityWatch/aw-server-rust that referenced this pull request Feb 27, 2026
* fix(transform): use scheme as domain for file:// and other non-web URLs

When splitting URLs, file://, about:, and other URLs without a host
produced an empty string for $domain. This caused all such events to
cluster together as a single empty entry in "Top Browser Domains".

Now falls back to using the URL scheme (e.g. "file", "about") as the
domain when host is None. Matches the corresponding fix in aw-core
(ActivityWatch/aw-core#129).

Fixes ActivityWatch/aw-core#67

* style: fix rustfmt formatting in test assertion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Top Browser Domain is empty if url is a file

2 participants