Skip to content

Conversation

@sergical
Copy link
Member

@sergical sergical commented Oct 25, 2025

DESCRIBE YOUR PR

  • Caches release registry
  • looking into md exports

IS YOUR CHANGE URGENT?

Help us prioritize incoming PRs by letting us know when the change needs to go live.

  • Urgent deadline (GA date, etc.):
  • Other deadline:
  • None: Not urgent, can wait up to 1 week+

SLA

  • Teamwork makes the dream work, so please add a reviewer to your PRs.
  • Please give the docs team up to 1 week to review your PR unless you've added an urgent due date to it.
    Thanks in advance for your help!

PRE-MERGE CHECKLIST

Make sure you've checked the following before merging your changes:

  • Checked Vercel preview for correctness, including links
  • PR was reviewed and approved by any necessary SMEs (subject matter experts)
  • PR was reviewed and approved by a member of the Sentry docs team

LEGAL BOILERPLATE

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. and is gonna need some rights from me in order to utilize my contributions in this here PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

EXTRA RESOURCES

@vercel
Copy link

vercel bot commented Oct 25, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
develop-docs Ready Ready Preview Comment Oct 29, 2025 4:02pm
sentry-docs Ready Ready Preview Comment Oct 29, 2025 4:02pm

@codecov
Copy link

codecov bot commented Oct 25, 2025

Bundle Report

Changes will decrease total bundle size by 460.9kB (-1.96%) ⬇️. This is within the configured threshold ✅

Detailed changes
Bundle name Size Change
sentry-docs-client-array-push 10.16MB -6 bytes (-0.0%) ⬇️
sentry-docs-server-cjs 12.52MB -460.9kB (-3.55%) ⬇️

Affected Assets, Files, and Routes:

view changes for bundle: sentry-docs-client-array-push

Assets Changed:

Asset Name Size Change Total Size Change (%)
static/chunks/pages/_app-*.js -3 bytes 882.71kB -0.0%
static/chunks/8321-*.js -3 bytes 425.87kB -0.0%
server/middleware-*.js 6.46kB 7.46kB 645.5% ⚠️
server/middleware-*.js -6.46kB 1.0kB -86.59%
static/fqMK9BHK1nHyXvt1MK9Ok/_buildManifest.js (New) 684 bytes 684 bytes 100.0% 🚀
static/fqMK9BHK1nHyXvt1MK9Ok/_ssgManifest.js (New) 77 bytes 77 bytes 100.0% 🚀
static/c-*.js (Deleted) -77 bytes 0 bytes -100.0% 🗑️
static/c-*.js (Deleted) -684 bytes 0 bytes -100.0% 🗑️
view changes for bundle: sentry-docs-server-cjs

Assets Changed:

Asset Name Size Change Total Size Change (%)
1729.js -33.51kB 1.74MB -1.89%
../instrumentation.js -33.8kB 1.07MB -3.07%
9523.js -33.51kB 1.04MB -3.11%
../app/[[...path]]/page.js.nft.json -119.93kB 739.55kB -13.95%
../app/platform-redirect/page.js.nft.json -119.93kB 739.46kB -13.96%
../app/sitemap.xml/route.js.nft.json -119.93kB 736.69kB -14.0%
7153.js (New) 30.3kB 30.3kB 100.0% 🚀
9567.js 924 bytes 23.11kB 4.17%
../app/api/ip-ranges/route.js -300 bytes 5.79kB -4.92%
../app/robots.txt/route.js -300 bytes 5.02kB -5.64%
2311.js (Deleted) -30.9kB 0 bytes -100.0% 🗑️

Files in 9567.js:

  • ./src/mdx.ts → Total Size: 27.86kB

App Routes Affected:

App Route Size Change Total Size Change (%)
/ -600 bytes 2.81MB -0.02%

- Use VERCEL_GIT_COMMIT_REF (branch name) in cache keys for cross-commit persistence
- Include registry data hash in cache key to detect registry updates
- Enable caching for 200+ platform-include files (previously skipped)
- Add build timing instrumentation
- Expected: 18 min → 2-3 min on first build, ~2 min on subsequent commits
@sergical sergical changed the title feat(Vercel) dont generate md exports on preview builds feat(Vercel) Build cache improvements Oct 27, 2025
Copy link
Member

@BYK BYK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great time savings. That said we should address the issues I raised either before merging or in a quick follow up.

Also, the extra comments are mostly stating the obvious (vibe code artifacts?) and better removed.

src/mdx.ts Outdated
let lastSummaryLog = Date.now();
function logCacheSummary(force = false) {
const now = Date.now();
// Log every 30 seconds or when forced
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic seems unnecessary? Why not just emit at the end?

const skipCache =
// Check if file depends on Release Registry
const dependsOnRegistry =
source.includes('@inject') ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this @inject thing was related to the registry

src/mdx.ts Outdated
if (cachedRegistryHash) {
return cachedRegistryHash;
}
const [apps, packages] = await Promise.all([getAppRegistry(), getPackageRegistry()]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a race condition here: if you call this function 3 times back to back, it would make 3 separate calls.

What you need for proper caching is to change the type of cachedRegistryHash to Promise<string>, and do:

cachedRegistryHash = Promise.all(...). then(([apps, packages]) => md5(...));
return cachedRegistryHash;

src/mdx.ts Outdated
// Get registry hash (cached per worker to avoid redundant fetches)
const registryHash = await getRegistryHash();
cacheKey = `${sourceHash}-${registryHash}`;
} catch (err) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic can and probably should be improved: the only way this can throw an exception should be a network related issue. In that case, pages depending on the registry will also have a problem so the try-catch is redundant. It's also wasteful as if it raises an exception, that means it will raise an exception for every single page.

I'd rather add a retry mechanism into the cache key function and don't handle the exception if the retried fail, halting the build as we need the registry connection for the build.

@sergical sergical marked this pull request as ready for review October 28, 2025 20:02
cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

Comment on lines +232 to +234
const leanHTML = rawHTML
// Remove all script tags (build IDs, chunk hashes, Vercel injections)
.replace(/<script[^>]*>[\s\S]*?<\/script>/gi, '')

Check failure

Code scanning / CodeQL

Incomplete multi-character sanitization High

This string may still contain
<script
, which may cause an HTML element injection vulnerability.
// Remove elements that change between builds but don't affect markdown output
const leanHTML = rawHTML
// Remove all script tags (build IDs, chunk hashes, Vercel injections)
.replace(/<script[^>]*>[\s\S]*?<\/script>/gi, '')

Check failure

Code scanning / CodeQL

Bad HTML filtering regexp High

This regular expression does not match script end tags like </script >.

Copilot Autofix

AI 1 day ago

The best way to fix this problem is to use a proper HTML parser to remove unwanted tags (such as <script>, <link>, and <meta>), rather than relying on regular expressions. This provides more robust handling of HTML's intricacies, such as extra whitespace, unusual attribute formatting, and invalid but tolerated browser syntax. Since the script already imports rehype-parse (for parsing HTML to a syntax tree) and other tools from the unified/rehype ecosystem, the fix can use these existing libraries.

Specifically, instead of using .replace(/<script[^>]*>[\s\S]*?<\/script>/gi, '') (and similar regex for <link> and <meta>), we should parse the HTML into an AST, programmatically remove the unwanted nodes, and then serialize the AST back to HTML for further processing. This fix should be applied within the genMDFromHTML function, replacing the leanHTML construction (lines 233–242) with parser-based routines.

No new dependencies are needed since rehype-parse, unist-util-remove, and related packages are already imported. We'll need to use unified().use(rehypeParse, {fragment: true}) to parse the HTML, use remove(tree, test) from unist-util-remove to strip undesired nodes, and a rehype serializer (e.g., rehype-stringify) to convert the AST back to HTML. If not already available, we should add a rehype-stringify import.


Suggested changeset 1
scripts/generate-md-exports.mjs

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/scripts/generate-md-exports.mjs b/scripts/generate-md-exports.mjs
--- a/scripts/generate-md-exports.mjs
+++ b/scripts/generate-md-exports.mjs
@@ -26,6 +26,7 @@
 import remarkStringify from 'remark-stringify';
 import {unified} from 'unified';
 import {remove} from 'unist-util-remove';
+import rehypeStringify from 'rehype-stringify';
 
 const DOCS_ORIGIN = 'https://docs.sentry.io';
 const CACHE_VERSION = 3;
@@ -230,17 +231,44 @@
 
   // Normalize HTML to make cache keys deterministic across builds
   // Remove elements that change between builds but don't affect markdown output
-  const leanHTML = rawHTML
-    // Remove all script tags (build IDs, chunk hashes, Vercel injections)
-    .replace(/<script[^>]*>[\s\S]*?<\/script>/gi, '')
-    // Remove link tags for stylesheets and preloads (chunk hashes change)
-    .replace(/<link[^>]*>/gi, '')
-    // Remove meta tags that might have build-specific content
-    .replace(/<meta name="next-size-adjust"[^>]*>/gi, '')
-    // Remove data attributes that Next.js/Vercel add (build IDs, etc.)
-    .replace(/\s+data-next-[a-z-]+="[^"]*"/gi, '')
-    .replace(/\s+data-nextjs-[a-z-]+="[^"]*"/gi, '');
+  // Remove all <script>, <link>, and next-size-adjust <meta> tags, as well as data-* attributes, using an HTML parser.
+  const parsedHtmlTree = unified()
+    .use(rehypeParse, {fragment: true})
+    .parse(rawHTML);
 
+  // Remove unwanted elements using unist-util-remove
+  // Remove <script> tags
+  remove(parsedHtmlTree, (node) => node.type === 'element' && node.tagName === 'script');
+  // Remove <link> tags
+  remove(parsedHtmlTree, (node) => node.type === 'element' && node.tagName === 'link');
+  // Remove <meta name="next-size-adjust" ...>
+  remove(parsedHtmlTree, (node) =>
+    node.type === 'element' &&
+    node.tagName === 'meta' &&
+    node.properties &&
+    node.properties.name === 'next-size-adjust'
+  );
+  // Remove data-next-* and data-nextjs-* attributes from all elements
+  function cleanseDataAttrs(node) {
+    if (node && node.type === 'element' && node.properties) {
+      Object.keys(node.properties).forEach((key) => {
+        if (/^data-next(-|js-)/.test(key)) {
+          delete node.properties[key];
+        }
+      });
+    }
+    if (node.children) {
+      node.children.forEach(cleanseDataAttrs);
+    }
+  }
+  cleanseDataAttrs(parsedHtmlTree);
+
+  // Convert AST back to HTML
+  const leanHTML = unified()
+    .use(() => (tree) => tree) // identity plugin since tree already processed
+    .use(rehypeStringify)
+    .stringify(parsedHtmlTree);
+
   if (shouldDebug) {
     console.log(
       `✂️  Lean HTML length: ${leanHTML.length} chars (removed ${rawHTML.length - leanHTML.length} chars)`
EOF
@@ -26,6 +26,7 @@
import remarkStringify from 'remark-stringify';
import {unified} from 'unified';
import {remove} from 'unist-util-remove';
import rehypeStringify from 'rehype-stringify';

const DOCS_ORIGIN = 'https://docs.sentry.io';
const CACHE_VERSION = 3;
@@ -230,17 +231,44 @@

// Normalize HTML to make cache keys deterministic across builds
// Remove elements that change between builds but don't affect markdown output
const leanHTML = rawHTML
// Remove all script tags (build IDs, chunk hashes, Vercel injections)
.replace(/<script[^>]*>[\s\S]*?<\/script>/gi, '')
// Remove link tags for stylesheets and preloads (chunk hashes change)
.replace(/<link[^>]*>/gi, '')
// Remove meta tags that might have build-specific content
.replace(/<meta name="next-size-adjust"[^>]*>/gi, '')
// Remove data attributes that Next.js/Vercel add (build IDs, etc.)
.replace(/\s+data-next-[a-z-]+="[^"]*"/gi, '')
.replace(/\s+data-nextjs-[a-z-]+="[^"]*"/gi, '');
// Remove all <script>, <link>, and next-size-adjust <meta> tags, as well as data-* attributes, using an HTML parser.
const parsedHtmlTree = unified()
.use(rehypeParse, {fragment: true})
.parse(rawHTML);

// Remove unwanted elements using unist-util-remove
// Remove <script> tags
remove(parsedHtmlTree, (node) => node.type === 'element' && node.tagName === 'script');
// Remove <link> tags
remove(parsedHtmlTree, (node) => node.type === 'element' && node.tagName === 'link');
// Remove <meta name="next-size-adjust" ...>
remove(parsedHtmlTree, (node) =>
node.type === 'element' &&
node.tagName === 'meta' &&
node.properties &&
node.properties.name === 'next-size-adjust'
);
// Remove data-next-* and data-nextjs-* attributes from all elements
function cleanseDataAttrs(node) {
if (node && node.type === 'element' && node.properties) {
Object.keys(node.properties).forEach((key) => {
if (/^data-next(-|js-)/.test(key)) {
delete node.properties[key];
}
});
}
if (node.children) {
node.children.forEach(cleanseDataAttrs);
}
}
cleanseDataAttrs(parsedHtmlTree);

// Convert AST back to HTML
const leanHTML = unified()
.use(() => (tree) => tree) // identity plugin since tree already processed
.use(rehypeStringify)
.stringify(parsedHtmlTree);

if (shouldDebug) {
console.log(
`✂️ Lean HTML length: ${leanHTML.length} chars (removed ${rawHTML.length - leanHTML.length} chars)`
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants