From 3195c39b72df10d569eb265c9099641305a41a0d Mon Sep 17 00:00:00 2001 From: Tom Limoncelli Date: Sun, 17 Mar 2024 20:41:40 -0400 Subject: [PATCH 1/4] IDNA proposal --- documentation/design-idna.md | 187 +++++++++++++++++++++++++++++++++++ 1 file changed, 187 insertions(+) create mode 100644 documentation/design-idna.md diff --git a/documentation/design-idna.md b/documentation/design-idna.md new file mode 100644 index 0000000000..484dbc8e14 --- /dev/null +++ b/documentation/design-idna.md @@ -0,0 +1,187 @@ +# DNSControl and Internationalized domain name + +This is my proposal for how to make IDNs work better in DNSControl. +Basically, the UI will accept any format. Early in the process +DNSControl will store labels/domains data 4 ways: As received from the +user (downcased), ASCII, Unicode, and in a "display" format that shows +both. The converstions already done ahead of time, providers can +access whatever format they need. Output from the main program will +use the "display" format when possible. + +# Problem Statement + +DNSControl doesn't handle internationalized domain names (IDNs) very +well. Coverage is unevent: They work better in some providers than +others. There are bugs and inconsistencies. Writing a provider that +handles IDNs properly requires doing most of the work in the provider +itself, which means every provider maintainer must be an expert in +IDNs, which is unreasonable. + +# Background: + +RFC 3490 recommends how applications should handle IDNs. My summary: +(1) the UI should accept a mix of Unicode and ASCII domains/labels. +(2) internally translate everything to ASCII (punycode) and do all +processing in that format, (3) when displaying output, display it as +the user input it, or Unicode, or ASCII, or give users a choice. + +* IDNA: Internationalizing Domain Names in Applications +* IDNs: Internationalized domain names +* ACE Prefix: The `xn--` that means "Puny code follows" +* ASCII: A label or domain is output as ASCII with ACE prefix if needed. +* Unicode: A label or domain is output as Unicode. + +Proposed Outcome: + +1. Users should be able to input domains and labels in either ASCII (with ACE prefix if needed), Unicode, or a mix. This holds for input via `dnsconfig.js` (domain names, labels, and targets); as well as flags such as `--domains`. +2a. Output should be Unicode or both with the ASCII being in parenthesis. Example: `рф.com (xn--p1ai.com)` +2b. Or maybe the reverse? Example: `xn--p1ai.com (рф.com)` +3. DNSControl's main code should create a "paved path" for providers to make it easy for them to do the right thing. It should be easier to do the right thing than the wrong thing. The default (i.e. "lazy") path should result in the behavior we desire. + +Here are some example outputs: + +NOTE: Feedback needed! Do you prefer "a" or "b"? Is there an even better format I should consider? Should we use `{}` instead of `()`? + +Example 1a: CREATE unicode (ascii) + +``` +#1: + CREATE foo.рф.com (foo.xn--p1ai.com) MX 10 рф.com. (xn--p1ai.com) (ttl=14400) +``` + +Example 1b: CREATE ascii (unicode) + +``` +#2: + CREATE foo.xn--p1ai.com (foo.рф.com) MX 10 xn--p1ai.com. (рф.com.) (ttl=14400) +``` + +Example 3a: MODIFY ascii (unicode) -> ascii (unicode) + +``` +#3: ± MODIFY foo.xn--p1ai.com (foo.рф.com) (10 xn--p1ai.com. (рф.com.) ttl=14400) -> (10 foo.xn--p1ai.com. (foo.рф.com.) ttl=14400) +``` + +Example 3b: MODIFY unicode (ascii) -> unicode (ascii) + +``` +#4: ± MODIFY foo.рф.com (foo.xn--p1ai.com) (10 рф.com. (xn--p1ai.com.) ttl=14400) -> (10 foo.рф.com. (foo.xn--p1ai.com.) ttl=14400) +``` + +Example 3c: MODIFY ascii + +``` +#5: ± MODIFY foo.рф.com (10 рф.com. ttl=14400) -> (10 foo.рф.com. ttl=14400) +``` + +Example 3d: MODIFY unicode + +``` +#6: ± MODIFY foo.xn--p1ai.com (10 xn--p1ai.com. ttl=14400) -> (10 foo.xn--p1ai.com. ttl=14400) +``` + +NOTE: When the ASCII and Unicode versions are the same (i.e. +everything is plain ASCII) the display would appear as before: + +``` +#7: + CREATE foo.example.com MX 10 xn--p1ai.com (рф.com) (ttl=14400) +#8: ± MODIFY foo.example.com (10 example.com. ttl=14400) -> (10 foo.example.com. ttl=14400) +``` + +# Design + +In general, DNSControl will store domains and labels in multiple +formats: (1) in the original +format the user specified (downcased), in ASCII, and in Unicode, and +in a format useful for displaying to users. This way +providers do not have to do conversions. + +When the user used Unicode: + +* Original: рф.com +* ASCII: xn--p1ai.com +* Unicode: рф.com +* Display: xn--p1ai.com (рф.com) + +When the user used ASCII: + +* Original: xn--p1ai.com +* ASCII: xn--p1ai.com +* Unicode: рф.com +* Display: xn--p1ai.com (рф.com) + +NOTE: User input is downcased. If the user input is `D('xn--P1AI.COM')` the Original field would be `xn--p1ai.com` and so on. + +Memory usage will be minimized by using Go's slices. In the above +example, the Display string would be generated first, the others would +be slices of that string. + + +``` +models.DomainConfig: + + .Name: the name from D() after downcased via unicode.ToLower() + .NameASCII: The name stored after calling ToASCII() (with ACE prefix if any Unicode chars are present) + .NameUnicode: The name stored after calling ToUnicode() + .NameDisplay: if .NameASCII != .NameUnicode, store as "ascii (unicode)" + Otherwise, the value is the same as .NameASCII + +models.Nameserver: + .Name will also be stored 4 ways, similar to models.DomainConfig + +models.RecordConfig: + + .Name: the name downcased via unicode.ToLower() + .NameASCII: The name stored after calling ToASCII() (with ACE prefix if any Unicode chars are present) + .NameUnicode: The name stored after calling ToUnicode() + .NameDisplay: if .NameASCII != .NameUnicode, store as "ascii (unicode)" + Otherwise, the value is the same as .NameASCII + + .NameFQDN: the name downcased via unicode.ToLower() + .NameFQDNASCII: The name stored after calling ToASCII() (with ACE prefix if any Unicode chars are present) + .NameFQDNUnicode: The name stored after calling ToUnicode() + .NameFQDNDisplay: if .NameFQDNASCII != .NameFQDNUnicode, store as "ascii (unicode)" + Otherwise, the value is the same as .NameFQDNASCII + + .SubDomain: will be passed through unicode.ToLower() then ToASCII() + +models.target: + GetTargetField() returns .target + GetTargetFieldASCII() returns .targetASCII + GetTargetFieldUnicode() returns .targetUnicode + GetTargetFieldDisplay() returns .targetDisplay +models.RecordConfig: + .R53Alias: will be passed through unicode.ToLower() then ToASCII() + .AzureAlias: will be passed through unicode.ToLower() then ToASCII() +``` + +# Code changes + +Since the labels/domains have been pre-converted, providers no longer +need to do the conversion themselves. + +1. After compiling `dnsconfig.js`, but before calling + ValidateAndNormalizeConfig(), the function NormalizeIDN() will be + called. NormaliIDN() will do all the conversions listed above + (.Name, .NameASCII, .NameUnicode, .NameDisplay and so on) + +2. All calls to `dc.Punycode()` will be removed. They are no longer + needed. + +3. Providers should no longer need to do their own conversions. +Calls to the idna module currently exist in +domainnameshop, cloudflare, vultr, and hostingde. These will require +special attention. + +4. Testing will be required for all providers. A PR with a checklist + will be used to let provider maintainers check in on their tests. + However, provider maintainers that do not check in within 3 (?) + weeks will not block the PR merge. We do this because (1) not + everyone has an IDN to test with, (2) old code should work as well + as before (bugs and all!). + +5. Documentation updates: Not sure what updates are neede but +suggestions welcome! + +# Call for volunteers! + +I am not an expert in IDN. If someone would like to help out with +testing, coding, and so on, I would greatly appreciate it! From 169ec20d6fb72e5c45266897aadf91e296a072c5 Mon Sep 17 00:00:00 2001 From: Tom Limoncelli Date: Sun, 17 Mar 2024 20:49:24 -0400 Subject: [PATCH 2/4] fixup! --- documentation/design-idna.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/documentation/design-idna.md b/documentation/design-idna.md index 484dbc8e14..0303b98af9 100644 --- a/documentation/design-idna.md +++ b/documentation/design-idna.md @@ -4,14 +4,14 @@ This is my proposal for how to make IDNs work better in DNSControl. Basically, the UI will accept any format. Early in the process DNSControl will store labels/domains data 4 ways: As received from the user (downcased), ASCII, Unicode, and in a "display" format that shows -both. The converstions already done ahead of time, providers can +both. The conversions already done ahead of time, providers can access whatever format they need. Output from the main program will use the "display" format when possible. # Problem Statement DNSControl doesn't handle internationalized domain names (IDNs) very -well. Coverage is unevent: They work better in some providers than +well. Coverage is uneven: They work better in some providers than others. There are bugs and inconsistencies. Writing a provider that handles IDNs properly requires doing most of the work in the provider itself, which means every provider maintainer must be an expert in @@ -178,7 +178,7 @@ special attention. everyone has an IDN to test with, (2) old code should work as well as before (bugs and all!). -5. Documentation updates: Not sure what updates are neede but +5. Documentation updates: Not sure what updates are need but suggestions welcome! # Call for volunteers! From 91d8ac6edb52a7742debd263082718a778dba972 Mon Sep 17 00:00:00 2001 From: Tom Limoncelli Date: Thu, 21 Mar 2024 10:49:57 -0400 Subject: [PATCH 3/4] Address Yannik's note --- documentation/design-idna.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/documentation/design-idna.md b/documentation/design-idna.md index 0303b98af9..57a004e1b1 100644 --- a/documentation/design-idna.md +++ b/documentation/design-idna.md @@ -81,6 +81,8 @@ Example 3d: MODIFY unicode NOTE: When the ASCII and Unicode versions are the same (i.e. everything is plain ASCII) the display would appear as before: +(Example #7 has a target that is unicode, #8 is all ASCII) + ``` #7: + CREATE foo.example.com MX 10 xn--p1ai.com (рф.com) (ttl=14400) #8: ± MODIFY foo.example.com (10 example.com. ttl=14400) -> (10 foo.example.com. ttl=14400) From bb4d88ddae6fa288b7d1fd51c3d2f30d7e0580e9 Mon Sep 17 00:00:00 2001 From: Tom Limoncelli Date: Thu, 21 Mar 2024 11:18:04 -0400 Subject: [PATCH 4/4] more examples --- documentation/design-idna.md | 63 +++++++++++++++++++++++++++++++++--- 1 file changed, 59 insertions(+), 4 deletions(-) diff --git a/documentation/design-idna.md b/documentation/design-idna.md index 57a004e1b1..1745e5e09e 100644 --- a/documentation/design-idna.md +++ b/documentation/design-idna.md @@ -45,7 +45,7 @@ NOTE: Feedback needed! Do you prefer "a" or "b"? Is there an even better forma Example 1a: CREATE unicode (ascii) ``` -#1: + CREATE foo.рф.com (foo.xn--p1ai.com) MX 10 рф.com. (xn--p1ai.com) (ttl=14400) +#1: + CREATE foo.рф.com (foo.xn--p1ai.com) MX 10 рф.com. (xn--p1ai.com.) (ttl=14400) ``` Example 1b: CREATE ascii (unicode) @@ -81,13 +81,68 @@ Example 3d: MODIFY unicode NOTE: When the ASCII and Unicode versions are the same (i.e. everything is plain ASCII) the display would appear as before: -(Example #7 has a target that is unicode, #8 is all ASCII) +``` +#7: + CREATE foo1.example.com MX 10 mxfoo.example.com. (ttl=14400) +#8: ± MODIFY foo2.example.com (10 example.com. ttl=14400) -> (10 foo.example.com. ttl=14400) +``` + +these examples are similar, but the targets are unicode: + +``` +#9: + CREATE foo3.example.com MX 10 xn--p1ai.com. (рф.com.) (ttl=14400) +#10: ± MODIFY foo4.example.com (10 xn--p1ai.com. (рф.com.) ttl=14400) -> (10 foo.example.com. ttl=14400) +#11: ± MODIFY foo5.example.com (10 example.com. ttl=14400) -> (10 xn--p1ai.com. (рф.com.) ttl=14400) +``` + +Now here are the same examples with `()` changed to `{}`: + +``` +#1: + CREATE foo.рф.com {foo.xn--p1ai.com} MX 10 рф.com. {xn--p1ai.com.} (ttl=14400) +#2: + CREATE foo.xn--p1ai.com {foo.рф.com} MX 10 xn--p1ai.com. {рф.com.} (ttl=14400) +#3: ± MODIFY foo.xn--p1ai.com {foo.рф.com} (10 xn--p1ai.com. {рф.com.} ttl=14400) -> (10 foo.xn--p1ai.com. {foo.рф.com.} ttl=14400) +#4: ± MODIFY foo.рф.com {foo.xn--p1ai.com} (10 рф.com. {xn--p1ai.com.} ttl=14400) -> (10 foo.рф.com. {foo.xn--p1ai.com.} ttl=14400) +#5: ± MODIFY foo.рф.com (10 рф.com. ttl=14400) -> (10 foo.рф.com. ttl=14400) +#6: ± MODIFY foo.xn--p1ai.com (10 xn--p1ai.com. ttl=14400) -> (10 foo.xn--p1ai.com. ttl=14400) +#7: + CREATE foo1.example.com MX 10 mxfoo.example.com (ttl=14400) +#8: ± MODIFY foo2.example.com (10 example.com. ttl=14400) -> (10 foo.example.com. ttl=14400) +#9: + CREATE foo3.example.com MX 10 xn--p1ai.com. {рф.com.} (ttl=14400) +#10: ± MODIFY foo4.example.com (10 xn--p1ai.com. {рф.com.} ttl=14400) -> (10 foo.example.com. ttl=14400) +#11: ± MODIFY foo5.example.com (10 example.com. ttl=14400) -> (10 xn--p1ai.com. {рф.com.} ttl=14400) +``` + +Now here are the same examples with `()` changed to `⟬⟭`: ``` -#7: + CREATE foo.example.com MX 10 xn--p1ai.com (рф.com) (ttl=14400) -#8: ± MODIFY foo.example.com (10 example.com. ttl=14400) -> (10 foo.example.com. ttl=14400) +#1: + CREATE foo.рф.com ⟬foo.xn--p1ai.com⟭ MX 10 рф.com. ⟬xn--p1ai.com.⟭ (ttl=14400) +#2: + CREATE foo.xn--p1ai.com ⟬foo.рф.com⟭ MX 10 xn--p1ai.com. ⟬рф.com.⟭ (ttl=14400) +#3: ± MODIFY foo.xn--p1ai.com ⟬foo.рф.com⟭ (10 xn--p1ai.com. ⟬рф.com.⟭ ttl=14400) -> (10 foo.xn--p1ai.com. ⟬foo.рф.com.⟭ ttl=14400) +#4: ± MODIFY foo.рф.com ⟬foo.xn--p1ai.com⟭ (10 рф.com. ⟬xn--p1ai.com.⟭ ttl=14400) -> (10 foo.рф.com. ⟬foo.xn--p1ai.com.⟭ ttl=14400) +#5: ± MODIFY foo.рф.com (10 рф.com. ttl=14400) -> (10 foo.рф.com. ttl=14400) +#6: ± MODIFY foo.xn--p1ai.com (10 xn--p1ai.com. ttl=14400) -> (10 foo.xn--p1ai.com. ttl=14400) +#7: + CREATE foo1.example.com MX 10 mxfoo.example.com (ttl=14400) +#8: ± MODIFY foo2.example.com (10 example.com. ttl=14400) -> (10 foo.example.com. ttl=14400) +#9: + CREATE foo3.example.com MX 10 xn--p1ai.com. ⟬рф.com.⟭ (ttl=14400) +#10: ± MODIFY foo4.example.com (10 xn--p1ai.com. ⟬рф.com.⟭ ttl=14400) -> (10 foo.example.com. ttl=14400) +#11: ± MODIFY foo5.example.com (10 example.com. ttl=14400) -> (10 xn--p1ai.com. ⟬рф.com.⟭ ttl=14400) ``` +Now here are the same examples with `()` changed to `❮❯`: + +``` +#1: + CREATE foo.рф.com ❮foo.xn--p1ai.com❯ MX 10 рф.com. ❮xn--p1ai.com.❯ (ttl=14400) +#2: + CREATE foo.xn--p1ai.com ❮foo.рф.com❯ MX 10 xn--p1ai.com. ❮рф.com.❯ (ttl=14400) +#3: ± MODIFY foo.xn--p1ai.com ❮foo.рф.com❯ (10 xn--p1ai.com. ❮рф.com.❯ ttl=14400) -> (10 foo.xn--p1ai.com. ❮foo.рф.com.❯ ttl=14400) +#4: ± MODIFY foo.рф.com ❮foo.xn--p1ai.com❯ (10 рф.com. ❮xn--p1ai.com.❯ ttl=14400) -> (10 foo.рф.com. ❮foo.xn--p1ai.com.❯ ttl=14400) +#5: ± MODIFY foo.рф.com (10 рф.com. ttl=14400) -> (10 foo.рф.com. ttl=14400) +#6: ± MODIFY foo.xn--p1ai.com (10 xn--p1ai.com. ttl=14400) -> (10 foo.xn--p1ai.com. ttl=14400) +#7: + CREATE foo1.example.com MX 10 mxfoo.example.com (ttl=14400) +#8: ± MODIFY foo2.example.com (10 example.com. ttl=14400) -> (10 foo.example.com. ttl=14400) +#9: + CREATE foo3.example.com MX 10 xn--p1ai.com. ❮рф.com.❯ (ttl=14400) +#10: ± MODIFY foo4.example.com (10 xn--p1ai.com. ❮рф.com.❯ ttl=14400) -> (10 foo.example.com. ttl=14400) +#11: ± MODIFY foo5.example.com (10 example.com. ttl=14400) -> (10 xn--p1ai.com. ❮рф.com.❯ ttl=14400) +``` + + # Design In general, DNSControl will store domains and labels in multiple