Review of PR #1
I have reviewed the PR CommonMark ipynb export + image attachment embedding in full.
Overall Assessment
This is an excellent, well-architected PR. The implementation of the commonmark AST pre-transform is very clean, and the two-phase approach for image attachments (collecting data in the CLI, rewriting in the serializer) is a smart way to keep the core myst-to-ipynb package pure and decoupled from the filesystem. The test coverage is also fantastic.
Areas for Improvement & Edge Cases
1. The Regex Escaping Bug (Fixed)
In packages/myst-to-ipynb/src/attachments.ts, the regex used to find and replace images in the generated markdown would fail on escaped characters. If an image has brackets in its alt text or parentheses in its url, the markdown serializer (mdast-util-to-markdown) escapes them (e.g., ![alt \[text\]](url\(1\))). The original regex /[^\]]*/ stopped at the very first ], even if it was escaped.
Fix applied: Updated the regex to handle escaped brackets and parentheses, and unescaped the URL before looking it up in the imageData dictionary.
2. Fragile String Replacement in embedImagesAsAttachments (Fixed)
The original replacement logic used a loop with .replace(string, string), which only replaces the first occurrence. If a user included the exact same image twice in the same cell, it worked but was slightly fragile.
Fix applied: Refactored the replacement logic to use a single pass with md.replace(imgRegex, callback).
3. Synchronous File Reading in collectImageData
In packages/myst-cli/src/build/ipynb/index.ts, fs.readFileSync(filePath) is used. While this is fine for small documents, if a user is exporting a massive notebook with hundreds of high-res images, synchronous reads will block the Node.js event loop.
Recommendation: Since runIpynbExport is already an async function, consider making collectImageData async and using await fs.promises.readFile(filePath). You could even use Promise.all to read all images concurrently, which would significantly speed up the export for image-heavy notebooks.
4. AST Mutation in transformToCommonMark
In packages/myst-to-ipynb/src/commonmark.ts, the function modifies the AST in place (tree.children = newChildren;). Because a deep clone (JSON.parse(JSON.stringify(blockTree))) is passed in index.ts, this doesn't cause side effects in the current implementation. However, mutating ASTs in place can be a footgun if this function is ever reused elsewhere.
Recommendation: It's perfectly fine to leave as-is given the deep clone, but standard unified/remark ecosystem practice is to use a utility like unist-util-map or unist-util-visit to return a new tree without mutating the original.
Summary
The PR is in great shape and the logic for lifting code cells out of gated directives (liftCodeCellsFromGatedNodes) is particularly elegant. With the regex edge-case fixed, this is ready to merge! 🚀
Review of PR #1
I have reviewed the PR CommonMark ipynb export + image attachment embedding in full.
Overall Assessment
This is an excellent, well-architected PR. The implementation of the
commonmarkAST pre-transform is very clean, and the two-phase approach for image attachments (collecting data in the CLI, rewriting in the serializer) is a smart way to keep the coremyst-to-ipynbpackage pure and decoupled from the filesystem. The test coverage is also fantastic.Areas for Improvement & Edge Cases
1. The Regex Escaping Bug (Fixed)
In
packages/myst-to-ipynb/src/attachments.ts, the regex used to find and replace images in the generated markdown would fail on escaped characters. If an image has brackets in itsalttext or parentheses in itsurl, the markdown serializer (mdast-util-to-markdown) escapes them (e.g.,![alt \[text\]](url\(1\))). The original regex/[^\]]*/stopped at the very first], even if it was escaped.Fix applied: Updated the regex to handle escaped brackets and parentheses, and unescaped the URL before looking it up in the
imageDatadictionary.2. Fragile String Replacement in
embedImagesAsAttachments(Fixed)The original replacement logic used a loop with
.replace(string, string), which only replaces the first occurrence. If a user included the exact same image twice in the same cell, it worked but was slightly fragile.Fix applied: Refactored the replacement logic to use a single pass with
md.replace(imgRegex, callback).3. Synchronous File Reading in
collectImageDataIn
packages/myst-cli/src/build/ipynb/index.ts,fs.readFileSync(filePath)is used. While this is fine for small documents, if a user is exporting a massive notebook with hundreds of high-res images, synchronous reads will block the Node.js event loop.Recommendation: Since
runIpynbExportis already anasyncfunction, consider makingcollectImageDataasync and usingawait fs.promises.readFile(filePath). You could even usePromise.allto read all images concurrently, which would significantly speed up the export for image-heavy notebooks.4. AST Mutation in
transformToCommonMarkIn
packages/myst-to-ipynb/src/commonmark.ts, the function modifies the AST in place (tree.children = newChildren;). Because a deep clone (JSON.parse(JSON.stringify(blockTree))) is passed inindex.ts, this doesn't cause side effects in the current implementation. However, mutating ASTs in place can be a footgun if this function is ever reused elsewhere.Recommendation: It's perfectly fine to leave as-is given the deep clone, but standard unified/remark ecosystem practice is to use a utility like
unist-util-maporunist-util-visitto return a new tree without mutating the original.Summary
The PR is in great shape and the logic for lifting code cells out of gated directives (
liftCodeCellsFromGatedNodes) is particularly elegant. With the regex edge-case fixed, this is ready to merge! 🚀