⚠️ Disclaimer: This document was auto-generated by AI (GitHub Copilot / Claude) based on source code analysis, reviewed by the author. In case of discrepancies, the actual code prevails.
This document is divided into three parts: The first part covers architecture and technical solutions—how the code is written (How); the middle part discusses the project background—who is building it and why (Why); and finally, the design philosophy—what is the role of humans when AI can write all the code.
┌─────────────────────────────────────────────────────┐
│ background.js │
│ (Service Worker · Central Hub) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ Tab Mgmt │ │ Shortcuts │ │ AI Proxy / Msg │ │
│ │ (Position/│ │ (Boss Key/│ │ Forwarding │ │
│ │ Activation│ │ Mute/F2F3)│ │ (Pollinations/ │ │
│ │ Policy) │ │ │ │ Ollama proxy) │ │
│ └──────────┘ └──────────┘ └──────────────────┘ │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ Mouse │ │ Download │ │ Content Script │ │
│ │ Gestures │ │ Mgmt │ │ Injection Mgmt │ │
│ │ Sync │ │ (Q-Save) │ │ │ │
│ └──────────┘ └──────────┘ └──────────────────┘ │
└──────────────────────┬───────────────────────────────┘
│ chrome.runtime.sendMessage
┌────────────┼────────────────┐
▼ ▼ ▼
┌────────────┐ ┌──────────┐ ┌──────────────┐
│ content.js │ │search-box│ │related-search│
│ (Gestures/ │ │ .js │ │ .js │
│ Drag/Zoom)│ │(Floating │ │(AI Related │
│ │ │ Search) │ │ Search) │
└────────────┘ └──────────┘ └──────────────┘
│ Shadow DOM Shadow DOM
│ (closed) (closed)
▼
┌────────────┐
│ ntp.js │
│ (New Tab: │
│ Wallpaper/ │
│ Hot Search/ │
│ Low Poly) │
└────────────┘
Core Communication Pattern: all modules communicate with background.js via chrome.runtime.sendMessage, establishing background as the sole authority for state and API proxying. Content Scripts do not invoke external APIs directly.
Problem: When a user holds the right mouse button and scrolls to switch tabs, focus jumps to the new tab. However, the content script in the new tab is unaware that "the right button is still being held," causing the gesture to fail if the user continues scrolling.
Solution:
Tab A (content.js) background.js Tab B (content.js)
│ │ │
├─ mousedown(right) │ │
├─ mouseGestureStart ──────►│ │
│ ├─ isRightMouseDown=true │
│ │ │
├─ wheel(scroll) ────────► │ │
│ ├─ switchTab ──────────► │
│ │ tabs.sendMessage │
│ ├─ syncMouseGestureState─►│
│ │ {isRightMouseDown:true} │
│ │ ├─ Local state synced ✓
│ │ ├─ Continue response ✓
background.js maintains a global isRightMouseDown flag. Every time a tab switch occurs, it pushes the state to the new tab via a syncMouseGestureState message. The new tab simultaneously sets preventContextMenu = true to ensure the context menu doesn't pop up when the right button is released.
📍 Code location: content.js L56-L130, background.js L80-L88
Problem: When a user rapidly clicks multiple links (e.g., Ctrl+Clicking 5 results on a search page), chrome.tabs.onCreated events fire almost simultaneously. Without intervention, multiple tabs.move() calls trample each other, resulting in disordered tabs.
Solution: Serialization using tabCreationQueue (per-window Promise chain):
// Independent Promise queue for each window
const tabCreationQueue = new Map(); // Map<windowId, Promise>
chrome.tabs.onCreated.addListener((tab) => {
const windowId = tab.windowId;
// Critical: Capture baseTabId synchronously at onCreated
// At this moment, onActivated hasn't fired yet, so state.baseTabId is still the parent tab
const snapshotBaseTabId = state.baseTabId;
// Add processing task to Promise chain, ensuring sequential execution
const currentQueue = getWindowQueue(windowId);
const newQueue = currentQueue.then(() => handleNewTabCreated(tab, snapshotBaseTabId));
tabCreationQueue.set(windowId, newQueue);
});Two key points:
- Synchronous Snapshot: Grab
baseTabIdimmediately whenonCreatedfires to avoid subsequentonActivatedevents overwriting the true "parent tab." - Promise Chain Serialization: All
tabs.move()operations within the same window execute sequentially via a.then()chain, completely eliminating race conditions.
Additionally, onMoved events are handled—synchronizing baseTabIndex when users drag tabs, distinguishing between "dragging left" and "dragging right" for index updates.
📍 Code location: background.js L248-L510
Problem: The New Tab Page (NTP) loads high-definition wallpapers every time. If it relies on network requests for each load, the white screen time is noticeable.
Solution: Memory → IndexedDB → Network fallback strategy:
Open New Tab
│
▼
┌─────────────────┐
│ 1. Memory Cache │ ← Blob URL loaded in current session
│ (Var ref) │ Hit → Render immediately, 0 latency
└────────┬────────┘
│ Miss
▼
┌─────────────────┐
│ 2. IndexedDB │ ← echo_wallpaper_cache DB
│ (Blob offline) │ keyPath: url, 7-day TTL auto-clean
│ │ Hit → createObjectURL → Render
└────────┬────────┘
│ Miss
▼
┌─────────────────┐
│ 3. Network Req │ ← Bing Wallpaper API
│ (fetch+Blob) │ Success → Write to Memory + IndexedDB
└─────────────────┘
IndexedDB stores raw Blobs (not URL strings). cleanOldWallpaperCache() runs at startup to clear entries older than 7 days. This ensures wallpapers load instantly even if completely offline, provided NTP was opened within the last week.
📍 Code location: ntp/ntp.js L25-L120
Problem: The floating search box and AI related search need to be injected into arbitrary web pages without being affected by the host page's CSS (and vice-versa). More trickily, when users zoom the page with Ctrl+Scroll, the injected UI scales up/down along with it.
Solution:
Isolation: Using Closed Shadow DOM:
const host = document.createElement('div');
const shadow = host.attachShadow({ mode: 'closed' });
// External JS cannot access internal structure via host.shadowRootZoom Compensation: Periodically polling page zoom level to calculate inverse scale:
function applyZoomCompensation(zoomLevel) {
const inverseScale = 1 / zoomLevel;
// Inverse scaling: Page at 200% → UI scales to 50%, visual size remains constant
host.style.transform = `translateX(-50%) scale(${inverseScale})`;
// Position compensation: 'bottom' distance also needs inverse calculation
// Physical pixel distance = bottom(CSS px) × zoomLevel
// To keep physical distance constant: bottom = target distance / zoomLevel
host.style.setProperty('--echo-bottom', `${BOTTOM_OFFSET_PX * inverseScale}px`);
}Note that not only scale needs compensation, but also bottom positioning—otherwise, the search box would "float" off-screen when the page is zoomed in. CSS variable --echo-bottom is dynamically adjusted here.
Polling interval is 500ms, obtaining precise values via chrome.tabs.getZoom() (proxied through background).
📍 Code location: search-box/search-box.js L550-L645
Problem: Free AI APIs (Pollinations.ai / Ollama) return highly unstable output formats—standard JSON arrays, nested objects, Markdown code blocks wrapping JSON, numbered plain text lists, or even mixed formats. Prompts ask for "plain text lists," but actual returns often disobey.
Solution: A three-tier waterfall parsing strategy, degrading gracefully:
AI Raw Response
│
▼
┌─────────────────────────────────────┐
│ Strategy A: Structured JSON Parsing │
│ - Strip ```json ``` wrappers │
│ - Regex extract outermost {} or [] │
│ - Array → flatMap extract strings │
│ - Object → Find queries/keywords │
│ candidate keys, or recurse values │
│ - Clean reasoning chain fields │
│ - Handle {"content":"Line1\nLine2"} │
│ multi-line value cases │
│ - Try Keys as keywords if Values │
│ are invalid │
└──────────┬──────────────────────────┘
│ Failed
▼
┌─────────────────────────────────────┐
│ Strategy B: Regex Quote Extraction │
│ - Match all "..." quoted content │
│ - Filter out JSON keys (contain ":")│
│ - Filter mechanics markers like │
│ "assistant"/"user" │
└──────────┬──────────────────────────┘
│ Failed
▼
┌─────────────────────────────────────┐
│ Strategy C: Plain Text Line Split │
│ - Strip code block wrappers │
│ - Split by newline │
│ - Regex strip leading numbers │
│ (1. 2) 3-) and bullets │
│ - Strip surrounding quotes │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Unified Post-Processing │
│ - Deduplication (Set) │
│ - CJK Detection: CN 4-35 chars, EN │
│ must contain space & 10-100 chars │
│ - Filter pure punctuation/digits │
│ - Blacklist filter (null/keywords) │
│ - Anti-duplication (compare with │
│ document.title) │
│ → Finally keep ≥3 valid keywords │
└─────────────────────────────────────┘
Strategy A includes a specific fallback: when all Values in a JSON object are empty strings (e.g., {"keyword1":"", "keyword2":""} ), it inverts logic to use Keys as keywords—because some models write keywords in keys rather than values.
📍 Code location: related-search/related-search.js L168-L405
Design Observation: Browser native zoom steps are too coarse at high magnifications (110% → 125% → 150% → 175% → 200%) and too large in the commonly used range near 100% (jumping 10-25% in one step).
Solution: Dual threshold system with 175% as the dividing line:
| Zoom Range | Step | Rationale |
|---|---|---|
| ≤ 175% | 5% | Common range, needs fine adjustment (90→95→100→105) |
| > 175% | 25% | High magnification, fine adjustment is meaningless, large steps are more efficient |
175% wasn't chosen randomly—it's the natural boundary between "needs for precision" and "needs for speed" in browser native step lists.
📍 Code location: content.js Zoom handling section
Problem: AI related search should only trigger on "content-heavy article pages," not wasting API calls on homepages, search results, or navigation pages. But how to determine if a page is an "article page"?
Solution: Besides URL blacklists (.gov/.mil/.edu/search engines, etc.), there is a Content Signal-to-Noise Ratio Detection:
const lines = content.split('\n');
const validLines = lines.filter(l => l.trim().length > 0);
const avgLineLen = validLines.reduce((acc, l) => acc + l.length, 0) / (validLines.length || 1);
const hasLongParagraph = lines.some(l => l.length > 80);
// Aggregator page characteristics: high word count, but all short titles, no long paragraphs
if (!hasLongParagraph && avgLineLen < 30) {
// Skip analysis
}Logic: An article page definitely has "long paragraphs" (>80 char lines). If a page extracts to all short lines (avg <30 chars) and has no long paragraphs, it's likely a title aggregator (like YouTube home, news list pages)—skip it.
This detection runs after URL filtering and before API calls, consuming no API quota.
📍 Code location: related-search/related-search.js L190-L202
ECHO is a personal fan-made work by a non-technical PM using AI-assisted programming (Vibe Coding). It is NOT a Microsoft official product, and NOT from the Edge team. The author hasn't written a single line of production-grade code but has over a decade of industry experience in the Chinese internet sector, specifically browser products—every trade-off in ECHO is not imagined out of thin air.
The technical solutions above didn't emerge fully formed; they evolved through repeated use and debugging—some to fix actual bugs like "opening 5 tabs quickly messes up order," some to ensure injected UI doesn't deform at any zoom level, and some purely out of an obsession with "how the experience ought to be."
If you're interested in contributing code or suggesting improvements, hopefully, this document helps you quickly understand two things: How the code is written, and Why it was written that way.
Speaking of Why—why does a PM build a browser extension by themselves?
Edge is the browser with the highest market share in China, reflecting a rational choice.
In this unique digital soil, users often find themselves balancing between two extremes: on one side, a global benchmark with pure experience but frequently obstructed service links; on the other, localized products with grounded features but sometimes "overly enthusiastic" fights for user attention. Edge stands precisely at that rare intersection—possessing the advanced Chromium core and ecosystem compatibility, maintaining a rare dignity and restraint, while being bolstered by Microsoft's continuous investment in AI capabilities.
But this global dignity comes with a drawback: it struggles to fully bend down to accommodate every region's muscle memory. The Chinese internet has a unique history of browser evolution. From Maxthon pioneering multi-tab browsing... mouse gestures, super drag, quick save, granular tab management—these features were repeatedly polished in the competition among domestic browsers, eventually cementing into the muscle memory of hundreds of millions of users. This isn't a niche preference for geeks, but an instinctive reaction trained over a decade for a generation—drawing an 'L' with the right button means close tab, dragging selected text means search, new tabs should open next to the current one, not at the far right.
We cannot ask a global product to adapt to every regional historical habit one by one, nor should we ask users to abandon their intuition to "relearn." The best way is to meet halfway—make the browser understand the user better, and make it easier for the user to embrace a better browser.
ECHO attempts to solve exactly this problem.
There is no shortage of single-feature extensions—mouse gestures, new tab beautifiers, quick save tools, shortcut supports, even a few that tweak tab opening logic. But to assemble about ten core interaction discrepancies into a complete package, proposing the clear proposition of "Making Edge Understand Chinese Users Better," and providing independent switches for fine-grained control for every feature—ECHO is likely the first to do so.
And this is definitely not just for geeks and hardcore users. Imagine an average user—they don't know what MV3 or Service Worker is, but they know "browsers used to close tabs with a right-drag, now this one doesn't." If ECHO can smooth out these tiny friction points, allowing them to seamlessly retain their habits while enjoying Edge's security, performance, and AI capabilities—this isn't the value of a geek tool, but helping a mainstream browser truly win the hearts of Chinese users.
Perhaps this is a small sense of responsibility: not complaining "why don't they build it for us," but doing it ourselves—building bridges for those who understand, and lighting lamps for those who explore.
"In the age of AI, taste is the last moat."
This is a product built upon being "out of time".
To describe it with a self-deprecating meme, it embodies a "Wakandan Vibranium Spear" aesthetic: possessing the most advanced AI productivity (Vibranium), yet using it to forge a primitive cold weapon (mouse gestures).
In the functional dimension, it appears traditional and restrained. While the industry chases the general intelligence of large models, ECHO chooses to look back and solve those simplest legacy problems—because we believe that browser interactions (like gestures) should not be forgotten with the arrival of AI. Reducing the friction of basic operations paves the way for users to accept new technologies.
In the production dimension, it is a radical experiment. This is a codebase written almost entirely by AI. The author is no longer a craftsman coding line by line, but an orchestrator of logic. This represents a new production relationship: AI is responsible for exhausting implementation possibilities, while humans are responsible for converging the boundaries of choice.
Thus, forming a dual reality of "Carbon-Silicon Fusion":
The first is external: Using cutting-edge AI productivity to repair the most basic user experiences—so that even users who do not use large models can indirectly enjoy the efficiency dividends brought by technological change.
The second is internal: When AI takes over the tedium of code implementation, human core value returns to judgment and taste.
AI can write code that runs correctly, but it does not actively care about those delicate touches "outside the requirements document". In the development of ECHO, AI is responsible for "how to do it", while humans decide "what to do". Here are some examples—they may not be functionally necessary, but it is these details that define the difference between a "product" and a "tool".
NTP has two independent color sampling pipelines, each responsible for different UI areas:
Pipeline 1: Info Card Area ( extractAndApplyWallpaperTheme ) — Canvas samples the top-left corner of the wallpaper (where the card sits, approx 300×150px), quantizes colors into buckets (every 32 levels), finds the dominant hue, then:
- Dominant color gets saturation boosted (
boost = 200 / maxChannel) to generate a vivid version as the accent color for the card background gradient. - Dynamically switches color schemes for handles, borders, and glows based on sampling area brightness (Bright wallpaper → Dark theme color + White border; Dark wallpaper → Light theme color + Dark border).
- The card background itself is
backdrop-filter: blur(12px)frosted glass + left-side theme color gradient overlay.
Pipeline 2: Hot Search Area ( calculateAndSetTextColor ) — Independently samples the color at the hot search area's position on the wallpaper (using getBoundingClientRect ), not reusing the top-left brightness result, because brightness can vary wildly between top-left and bottom-right. Then, based on three brightness tiers (> 170 / < 85 / Mid), it sets global text-dark, text-light, text-gray classes, driving:
| Wallpaper Brightness | Frosted Glass Base | Text Color | Rank Color (Top 1-3) | Hover Base |
|---|---|---|---|---|
| Bright (> 170) | White 70% + blur(60px) | Dark #111 | Amber #b45309 | Blue tint rgba(0,120,212,0.12) |
| Dark (< 85) | Black 50% + blur(60px) | Light #f0f0f0 | Gold #fbbf24 | White tint rgba(255,255,255,0.15) |
| Mid | White 60% + blur(60px) | Dark Gray #222 | Orange tint #c2410c | Blue tint rgba(0,120,212,0.12) |
Hover inversion for Hot Search also flips with brightness: on bright wallpapers, hover text turns blue #004080; on dark wallpapers, it turns gold #fbbf24—not simply "brighter" or "darker," but selecting a semantically appropriate accent color based on context.
The final effect of this system is: no matter what wallpaper the user picks—pure white, pure black, high-saturation landscape, low-contrast gray tone—info cards and hot search lists automatically blend into the background, maintaining readability and visual harmony.
📍 ntp.js L2302-L2430 (Pipeline 1), ntp.js L2432-L2510 (Pipeline 2), ntp.css L2279-L2370 (Three-tier frosted glass + text color + hover)
Most browser extension First Run Experiences (FRE) are like this: show a screenshot, add some text saying "You can press Ctrl+B to call out the search box." ECHO's FRE Step 3 makes a different choice—let the user experience it directly.
Directly in the center of the page is a complete simulated browser window built with pure HTML/CSS: tab bar (with tabs titled "How to learn programming - Zhihu"), address bar with lock icon and full URL, back/forward/refresh buttons (forward button disabled), toolbar extension icons, skeleton screen content—it looks exactly like browsing a real Zhihu page.
Above it are two 3D floating keyboard keys Ctrl + B, with mechanical keycap lighting effects ( box-shadow: 0 4px 0 #e0e0e0 simulating thickness), automatically "pressing down" every 3 seconds ( translateY(3px) + shadow collapse), with prompt text: "Give it a try!"
Then—the key point—the page loads the real search-box.js. When the user presses Ctrl+B, what pops up isn't a screenshot, isn't an animation demo, but the exact same floating search box used on any webpage after installation, including hot search, rainbow borders, zoom compensation—all real production code.
It really works, right there on the spot.
Not just the search box: the page also loads mouse-gesture.js, super-drag.js, keyboard-enhance.js—all interaction enhancements appearing in the FRE are live. While "trying out the search box," users might accidentally discover right-click gesture tab switching, drag-to-search, and other capabilities.
fre.js also performs platform detection: Mac users see ⌘ + B, Windows users see Ctrl + B, replacing seven types of DOM elements including .mini-key, .feature-title, .hint-text, title attributes, data-tooltip, .alt-hint, and .alt-key.
This is a design that breaks the fourth wall—the user faces a fictional browser, a fictional Zhihu page, fictional skeleton content, and then presses Ctrl+B above this fictional scene to pop up a real search box. The boundary between fiction and reality vanishes at that moment, just like a character on stage suddenly turning to speak to the audience.
📍 fre/fre-step3.html (Full page), fre/fre.js L144-L152 (Click trigger real search box), fre/fre.js L22-L95 (Platform shortcut adaptation)
When the user calls out the floating search box with Ctrl+B, a 0.4-second elliptical ring pulse diffusion animation plays—three rings (Blue #38bdf8, Purple #c084fc, Pink #f472b6 ) diffuse sequentially and then vanish.
Several details are worth noting in this animation:
- The ring shape matches the search box dimensions and border radius exactly, and dynamically adjusts width based on whether the hot search panel is shown (710px vs 420px).
- Rings must be placed outside the Shadow DOM (directly attached to
document.body), because they cover the full screen and cannot be clipped by Shadow DOM'soverflow. - When the page is zoomed, rings also need inverse scaling compensation, otherwise ring and search box sizes/positions will misalign.
This animation lasts less than 1 second in total; perhaps 90% of users won't notice its existence. But it gives the moment of "search box popping up" a sense of "response," changing from "a box appearing" to "a box saying hello to you."
📍 search-box/search-box.js L1035-L1130
None of these details were in any spec, nor proactively proposed by AI. AI can implement anything you can describe, but it won't take these steps on its own—not even with infinite compute. It's not a capability gap; it simply has no reason to care. The globally optimal solution is to let users adapt to standard interaction paradigms. Only someone who genuinely cares about a specific group of people would choose to go the extra mile for them.
ECHO once had a complete self-drawn bookmark bar module—injecting a bookmark bar at the top of every page using Closed Shadow DOM, 100% replicating Edge's native bookmark bar visual style. You could turn off the native one with Ctrl + Shift + B and use this instead. More features, better features—and unashamedly, much better where it counts: multi-column horizontal cascading expansion, search positioning, in-place bookmarking to any folder level. Almost no other bookmark-related extension takes this path because it means your CSS has to fight a frontal war against the styles of every webpage in the world.
Shadow DOM blocked most conflicts, but host page zoom levels, special layouts, and unpredictable DOM changes constituted a war that could never be fully won. For a project maintained by one person, a single layout bug is enough to destroy a user's trust in the entire extension—the math didn't add up.
So it was completely written, then completely shelved.
This isn't a failure. The code verified that the interaction model is valid—the problem isn't the design, it's the container. Maybe one day a better container will appear, maybe not. But that saying always holds true: The height of a PM's realm is not deciding what to do, but deciding what NOT to do.
This document was analyzed and generated by AI, reviewed by the author. In case of doubt, refer to the source code.