perf: use decodeURIComponent for UTF-8 extended parameter decoding by Phillip9587 · Pull Request #115 · jshttp/content-disposition

Phillip9587 · 2026-02-10T21:31:35Z

Improves the performance of UTF-8 extended parameter decoding by 9-21x using native decodeURIComponent(), with graceful fallback for edge cases.

Changes

UTF-8 path: Use decodeURIComponent() for faster native UTF-8 handling. If decodeURIComponent() throws, fall back to manual decoding for backward compatibility with malformed percent sequences
ISO-8859-1 path: Preserve existing implementation

Benchmarks

=== Environment ===

┌──────────────┬──────────────────────────────────────────────┐
│ node         │ 'v25.6.1'                                    │
│ platform     │ 'linux'                                      │
│ arch         │ 'x64'                                        │
│ os           │ 'Linux 6.17.0-14-generic'                    │
│ memory.total │ 96483323904                                  │
│ memory.free  │ 82700402688                                  │
│ cpu.model    │ 'AMD Ryzen 9 8945HS w/ Radeon 780M Graphics' │
│ cpu.cores    │ 16                                           │
└──────────────┴──────────────────────────────────────────────┘

=== Results ===

┌─────────┬───────────────────────────────────────┬───────────────┬─────────┬──────────────┬───────────────┐
│ (index) │ Test                                  │ Avg Time (µs) │ Ops/sec │ Min (µs)     │ Max (µs)      │
├─────────┼───────────────────────────────────────┼───────────────┼─────────┼──────────────┼───────────────┤
│ 0       │ 'decodeWithRegex - multiple short'    │ '4764.80'     │ 213161  │ '4318.00'    │ '1938768.00'  │
│ 1       │ 'decodeHexEscapes - multiple short'   │ '1583.74'     │ 649343  │ '1362.00'    │ '723676.00'   │
│ 2       │ 'decodeURIComponent - multiple short' │ '480.58'      │ 2119663 │ '430.00'     │ '1485443.00'  │
│ 3       │ 'decodeWithRegex - long'              │ '4486.74'     │ 237722  │ '3867.00'    │ '25382214.00' │
│ 4       │ 'decodeHexEscapes - long'             │ '1270.93'     │ 805380  │ '1173.00'    │ '6660762.00'  │
│ 5       │ 'decodeURIComponent - long'           │ '298.78'      │ 3456286 │ '260.00'     │ '9897425.00'  │
│ 6       │ 'decodeWithRegex - very long'         │ '5058387.97'  │ 209     │ '4337660.00' │ '39812382.00' │
│ 7       │ 'decodeHexEscapes - very long'        │ '1395493.43'  │ 732     │ '1256357.00' │ '3560765.00'  │
│ 8       │ 'decodeURIComponent - very long'      │ '239006.61'   │ 4496    │ '185345.00'  │ '7202521.00'  │
└─────────┴───────────────────────────────────────┴───────────────┴─────────┴──────────────┴───────────────┘

=== Comparison ===

multiple short (vs decodeWithRegex):
  decodeHexEscapes is 3.01x faster
  decodeURIComponent is 9.91x faster

long (vs decodeWithRegex):
  decodeHexEscapes is 3.53x faster
  decodeURIComponent is 15.02x faster

very long (vs decodeWithRegex):
  decodeHexEscapes is 3.62x faster
  decodeURIComponent is 21.16x faster

This can be seen as alternative or addition to #112

closes #112 closes #114

Copilot

Pull request overview

This PR optimizes RFC 5987 extended parameter decoding by switching the UTF-8 decoding path to native decodeURIComponent() (with a manual fallback), while keeping the ISO-8859-1 behavior equivalent.

Changes:

Use decodeURIComponent() for UTF-8 extended parameter decoding, with fallback to manual %xx decoding + Buffer UTF-8 decoding on failures.
Replace regex-based %xx detection/decoding with helper functions (hasHexEscape, decodeHexEscapes, isHexDigit).
Minor doc comment formatting adjustments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

index.js

Phillip9587 · 2026-02-19T20:47:06Z

@blakeembrey The Copilot review

The catch-block comment says this fallback is for “malformed percent sequences”, but EXT_VALUE_REGEXP already rejects malformed % escapes; in practice decodeURIComponent() will mainly throw on invalid UTF-8 byte sequences (e.g. %E4). Suggest updating the comment (and consider removing the inline TODO, or tracking it via an issue/changelog) to avoid misleading future maintainers about when this path runs.

was right. The EXT_VALUE_REGEXP already checks for valid hex escapes. I used TextDecoder like in #112 for the fallback when decodeURIComponent() fails and it passes all test including

content-disposition/test/test.js

Lines 404 to 407 in 1a4f3bf

    
           assert.deepEqual(contentDisposition.parse('attachment; filename*=UTF-8\'\'%E4%20rates.pdf'), { 
        
             type: 'attachment', 
        
             parameters: { filename: '\ufffd rates.pdf' } 
        
           })

which checks for the invalid byte sequence %E4 and replaces it with the unicode replacement character.

So i would definitly prefer this PR over #112. The last thing we would need to decide is: #115 (comment)

jshttp#115 (comment)

Phillip9587 · 2026-02-19T21:05:10Z

This PR is ready now 🚀

index.js

ChALkeR · 2026-02-23T21:44:45Z

Hi there from nodejs/node#61041 which you linked to in #112.

You are using TextDecoder a bit wrong, at least you probably want consitent BOM handling, so either ignoreBOM or strip it on all platforms
@exodus/bytes has a fast fallback impl (upd: I checked the code, you likely don't need a fallback impl)
You are double-converting hex -> bytes -> string -> bytes -> u8arr

blakeembrey · 2026-02-23T22:06:45Z

You are using TextDecoder a bit wrong, at least you probably want consitent BOM handling, so either ignoreBOM or strip it on all platforms

It's only used as a fallback when decodeURIComponent fails, but this is a good point that we should have a test for the BOM.

You are double-converting hex -> bytes -> string -> bytes -> u8arr

In the fallback? That's a good point, it could be simpler to keep it as code points. @Phillip9587 Do you want to run some benchmarks for this in a new PR?

@exodus/bytes has a fast fallback impl

For the most part this is fallback code when decodeURIComponent fails, so I think it's not a huge trade-off. We also could drop usage of TextDecoder entirely in the next major. We need to keep in mind package size for browser environments but it's a good point that we could allow decoders to be injected into the package so someone can swap this into the library instead of the default TextDecoder.

Phillip9587 · 2026-02-26T22:11:22Z

In the fallback? That's a good point, it could be simpler to keep it as code points. @Phillip9587 Do you want to run some benchmarks for this in a new PR?

I explored the direct hex -> bytes -> u8arr approach to avoid the double conversion.

function decodeHexEscapesToBytes (str) {
  const bytes = new Uint8Array(str.length)
  let offset = 0

  for (let idx = 0; idx < str.length; idx++) {
    if (
      str[idx] === '%' &&
      idx + 2 < str.length &&
      isHexDigit(str[idx + 1]) &&
      isHexDigit(str[idx + 2])
    ) {
      bytes[offset++] = Number.parseInt(str[idx + 1] + str[idx + 2], 16)
      idx += 2
    } else {
      bytes[offset++] = str.charCodeAt(idx)
    }
  }

  return bytes.slice(0, offset)
}

As the string may contain percent escapes, the Uint8Array would be over-allocated. However, the overhead is negligible given HTTP header size limits, with the worst case being ~2x allocation.

It also would require us to keep the existing decodeHexEscapes() in order to keep the iso-8859-1 path in decodeField() fast. I explored using the bytes conversion and manual loop for latin1:

const bytes = decodeHexEscapesToBytes(encoded);
  let string = "";
  for (let idx = 0; idx < bytes.length; idx++) {
    // Filter to printable Latin-1: 0x20-0x7E (printable ASCII) or 0xA0-0xFF (high Latin-1)
    if (
      (bytes[idx] >= 0x20 && bytes[idx] <= 0x7e) ||
      (bytes[idx] >= 0xa0 && bytes[idx] <= 0xff)
    ) {
      string += String.fromCharCode(bytes[idx]);
    } else {
      string += "?";
    }
  }
  return string;

And benchmarked both approaches (bytes -> manual loop and string-> regex) (Details beolw):

ASCII-only input: current string + NON_LATIN1_REGEXP approach is ~3-4x faster
Latin-1 / light encoding: performance is slightly better or roughly equal
Heavy percent-encoding: the byte-loop approach is somewhat slower

Benchmark Details:

"ASCII only (20 chars)":              "my_document_file.pdf",
"ASCII with spaces (30 chars)":       "my%20document%20with%20spaces.txt",
"Latin-1 with accents (25 chars)":    "caf%E9_r%E9sum%E9_naive.pdf",
"Mixed content (50 chars)":           "my_file_%E9_test_%20with_%C3%A4.txt",
"Long percent-encoded (100 chars)":   "doc_%E9_%E0_%E8_%2B_%2D_%2F_%E9_%E9_%E9_%E9_%E9_%E9_%E9_%E9_%E9_%E9_%E9_%E9_%E9_%E9.txt",
"Heavy percent-encoding (100 chars)": "%20%21%22%23%24%25%26%27%28%29%2A%2B%2C%2D%2E%2F0123456789%3A%3B%3C%3D%3E%3F%40ABCDEFGHIJ",

┌─────────┬────────────────────────────────────────────┬──────────────────┬──────────────────┬────────────────────────┬────────────────────────┬─────────┐
│ (index) │ Task name                                  │ Latency avg (ns) │ Latency med (ns) │ Throughput avg (ops/s) │ Throughput med (ops/s) │ Samples │
├─────────┼────────────────────────────────────────────┼──────────────────┼──────────────────┼────────────────────────┼────────────────────────┼─────────┤
│ 0       │ 'old - ASCII only (20 chars)'              │ '55.29 ± 0.24%'  │ '50.00 ± 0.00'   │ '18960208 ± 0.02%'     │ '20000000 ± 0'         │ 1808563 │
│ 1       │ 'new - ASCII only (20 chars)'              │ '201.12 ± 1.00%' │ '181.00 ± 1.00'  │ '5355849 ± 0.03%'      │ '5524862 ± 30694'      │ 497205  │
│ 2       │ 'old - ASCII with spaces (30 chars)'       │ '349.37 ± 2.05%' │ '330.00 ± 9.00'  │ '3038513 ± 0.03%'      │ '3030303 ± 84962'      │ 286226  │
│ 3       │ 'new - ASCII with spaces (30 chars)'       │ '319.42 ± 2.49%' │ '291.00 ± 9.00'  │ '3355795 ± 0.04%'      │ '3436426 ± 103093'     │ 313067  │
│ 4       │ 'old - Latin-1 with accents (25 chars)'    │ '302.69 ± 1.82%' │ '281.00 ± 10.00' │ '3498993 ± 0.03%'      │ '3558719 ± 122293'     │ 330373  │
│ 5       │ 'new - Latin-1 with accents (25 chars)'    │ '283.39 ± 2.55%' │ '261.00 ± 9.00'  │ '3743698 ± 0.03%'      │ '3831418 ± 127714'     │ 352876  │
│ 6       │ 'old - Mixed content (50 chars)'           │ '341.98 ± 1.80%' │ '330.00 ± 10.00' │ '3025454 ± 0.03%'      │ '3030303 ± 89127'      │ 292414  │
│ 7       │ 'new - Mixed content (50 chars)'           │ '359.48 ± 9.12%' │ '311.00 ± 9.00'  │ '3095145 ± 0.04%'      │ '3215434 ± 90434'      │ 278182  │
│ 8       │ 'old - Long percent-encoded (100 chars)'   │ '922.50 ± 1.65%' │ '862.00 ± 10.00' │ '1139369 ± 0.05%'      │ '1160093 ± 13616'      │ 108402  │
│ 9       │ 'new - Long percent-encoded (100 chars)'   │ '1075.8 ± 0.74%' │ '982.00 ± 30.00' │ '981104 ± 0.08%'       │ '1018330 ± 32090'      │ 92957   │
│ 10      │ 'old - Heavy percent-encoding (100 chars)' │ '962.59 ± 1.29%' │ '902.00 ± 10.00' │ '1088949 ± 0.05%'      │ '1108647 ± 12429'      │ 103887  │
│ 11      │ 'new - Heavy percent-encoding (100 chars)' │ '1137.8 ± 1.34%' │ '1032.0 ± 40.00' │ '932541 ± 0.08%'       │ '968992 ± 36156'       │ 87887   │
└─────────┴────────────────────────────────────────────┴──────────────────┴──────────────────┴────────────────────────┴────────────────────────┴─────────┘

For the most part this is fallback code when decodeURIComponent fails, so I think it's not a huge trade-off. We also could drop usage of TextDecoder entirely in the next major. We need to keep in mind package size for browser environments but it's a good point that we could allow decoders to be injected into the package so someone can swap this into the library instead of the default TextDecoder.

I agree, I don’t think it’s worth optimizing a fallback path we’re planning to remove in the next major. Since it would require duplicating logic, the added complexity doesn’t seem justified. What do you think, @blakeembrey?

perf: use decodeURIComponent for UTF-8 extended parameter decoding

4eeef03

Phillip9587 requested review from blakeembrey and Copilot February 10, 2026 21:32

Copilot started reviewing on behalf of Phillip9587 February 10, 2026 21:41 View session

Copilot AI reviewed Feb 10, 2026

View reviewed changes

index.js Outdated Show resolved Hide resolved

index.js Show resolved Hide resolved

blakeembrey reviewed Feb 11, 2026

View reviewed changes

index.js Show resolved Hide resolved

index.js Show resolved Hide resolved

Phillip9587 mentioned this pull request Feb 12, 2026

feat: add TextDecoder fallback for extended parameter UTF-8 decoding when Buffer is unavailable #112

Closed

Phillip9587 mentioned this pull request Feb 19, 2026

Typescript Rewrite #118

Open

feat: fallback to TextEncoder for Browser Support

ab482be

fix: use better performing hasHexEscape method

e4a2f83

jshttp#115 (comment)

Phillip9587 mentioned this pull request Feb 19, 2026

Compatibility with non-Node.js environments #114

Closed

3 tasks

blakeembrey approved these changes Feb 19, 2026

View reviewed changes

index.js Show resolved Hide resolved

index.js Outdated Show resolved Hide resolved

blakeembrey reviewed Feb 19, 2026

View reviewed changes

index.js Outdated Show resolved Hide resolved

fix: use global TextDecoder instance and fix comment

774e80b

blakeembrey merged commit dffa489 into jshttp:master Feb 23, 2026
14 checks passed

Phillip9587 deleted the decodefield-utf8 branch February 23, 2026 19:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: use decodeURIComponent for UTF-8 extended parameter decoding#115

perf: use decodeURIComponent for UTF-8 extended parameter decoding#115
blakeembrey merged 4 commits intojshttp:masterfrom
Phillip9587:decodefield-utf8

Phillip9587 commented Feb 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Phillip9587 commented Feb 19, 2026

Uh oh!

Phillip9587 commented Feb 19, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChALkeR commented Feb 23, 2026 •

edited

Loading

Uh oh!

blakeembrey commented Feb 23, 2026

Uh oh!

Phillip9587 commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

Phillip9587 commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Benchmarks

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Phillip9587 commented Feb 19, 2026

Uh oh!

Phillip9587 commented Feb 19, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChALkeR commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blakeembrey commented Feb 23, 2026

Uh oh!

Phillip9587 commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Phillip9587 commented Feb 10, 2026 •

edited

Loading

ChALkeR commented Feb 23, 2026 •

edited

Loading