Skip to content

Commit 96b84ce

Browse files
committed
Update to Unicode 15.1
1 parent 8e13513 commit 96b84ce

24 files changed

+7236
-6318
lines changed

CHANGELOG.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,15 @@
11
# Changelog
22

3+
## Unicode v1.17.0
4+
5+
This is the changelog for Unicode v1.17.0 released on September 17th, 2023. For older changelogs please consult the release tag on [GitHub](https://github.com/elixir-unicode/unicode/tags)
6+
7+
### Enhancements
8+
9+
* Updates to [Unicode 15.1](https://unicode.org/versions/Unicode15.1.0/) data.
10+
11+
* Improve the security of the `mix unicode.download` task.
12+
313
## Unicode v1.16.2
414

515
This is the changelog for Unicode v1.16.2 released on August 16th, 2023. For older changelogs please consult the release tag on [GitHub](https://github.com/elixir-unicode/unicode/tags)

data/blocks.txt

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
# Blocks-15.0.0.txt
2-
# Date: 2022-01-28, 20:58:00 GMT [KW]
3-
# © 2022 Unicode®, Inc.
1+
# Blocks-15.1.0.txt
2+
# Date: 2023-07-28, 15:47:20 GMT
3+
# © 2023 Unicode®, Inc.
44
# For terms of use, see https://www.unicode.org/terms_of_use.html
55
#
66
# Unicode Character Database
@@ -352,6 +352,7 @@ FFF0..FFFF; Specials
352352
2B740..2B81F; CJK Unified Ideographs Extension D
353353
2B820..2CEAF; CJK Unified Ideographs Extension E
354354
2CEB0..2EBEF; CJK Unified Ideographs Extension F
355+
2EBF0..2EE5F; CJK Unified Ideographs Extension I
355356
2F800..2FA1F; CJK Compatibility Ideographs Supplement
356357
30000..3134F; CJK Unified Ideographs Extension G
357358
31350..323AF; CJK Unified Ideographs Extension H

data/case_folding.txt

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
# CaseFolding-15.0.0.txt
2-
# Date: 2022-02-02, 23:35:35 GMT
3-
# © 2022 Unicode®, Inc.
1+
# CaseFolding-15.1.0.txt
2+
# Date: 2023-05-12, 21:53:10 GMT
3+
# © 2023 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use, see https://www.unicode.org/terms_of_use.html
66
#
@@ -929,6 +929,7 @@
929929
1FCC; S; 1FC3; # GREEK CAPITAL LETTER ETA WITH PROSGEGRAMMENI
930930
1FD2; F; 03B9 0308 0300; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND VARIA
931931
1FD3; F; 03B9 0308 0301; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA
932+
1FD3; S; 0390; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA
932933
1FD6; F; 03B9 0342; # GREEK SMALL LETTER IOTA WITH PERISPOMENI
933934
1FD7; F; 03B9 0308 0342; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND PERISPOMENI
934935
1FD8; C; 1FD0; # GREEK CAPITAL LETTER IOTA WITH VRACHY
@@ -937,6 +938,7 @@
937938
1FDB; C; 1F77; # GREEK CAPITAL LETTER IOTA WITH OXIA
938939
1FE2; F; 03C5 0308 0300; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND VARIA
939940
1FE3; F; 03C5 0308 0301; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND OXIA
941+
1FE3; S; 03B0; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND OXIA
940942
1FE4; F; 03C1 0313; # GREEK SMALL LETTER RHO WITH PSILI
941943
1FE6; F; 03C5 0342; # GREEK SMALL LETTER UPSILON WITH PERISPOMENI
942944
1FE7; F; 03C5 0308 0342; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND PERISPOMENI
@@ -1328,6 +1330,7 @@ FB02; F; 0066 006C; # LATIN SMALL LIGATURE FL
13281330
FB03; F; 0066 0066 0069; # LATIN SMALL LIGATURE FFI
13291331
FB04; F; 0066 0066 006C; # LATIN SMALL LIGATURE FFL
13301332
FB05; F; 0073 0074; # LATIN SMALL LIGATURE LONG S T
1333+
FB05; S; FB06; # LATIN SMALL LIGATURE LONG S T
13311334
FB06; F; 0073 0074; # LATIN SMALL LIGATURE ST
13321335
FB13; F; 0574 0576; # ARMENIAN SMALL LIGATURE MEN NOW
13331336
FB14; F; 0574 0565; # ARMENIAN SMALL LIGATURE MEN ECH

data/categories.txt

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
# DerivedGeneralCategory-15.0.0.txt
2-
# Date: 2022-04-26, 23:14:35 GMT
3-
# © 2022 Unicode®, Inc.
1+
# DerivedGeneralCategory-15.1.0.txt
2+
# Date: 2023-07-28, 23:34:02 GMT
3+
# © 2023 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use, see https://www.unicode.org/terms_of_use.html
66
#
@@ -284,13 +284,12 @@
284284
2E9A ; Cn # <reserved-2E9A>
285285
2EF4..2EFF ; Cn # [12] <reserved-2EF4>..<reserved-2EFF>
286286
2FD6..2FEF ; Cn # [26] <reserved-2FD6>..<reserved-2FEF>
287-
2FFC..2FFF ; Cn # [4] <reserved-2FFC>..<reserved-2FFF>
288287
3040 ; Cn # <reserved-3040>
289288
3097..3098 ; Cn # [2] <reserved-3097>..<reserved-3098>
290289
3100..3104 ; Cn # [5] <reserved-3100>..<reserved-3104>
291290
3130 ; Cn # <reserved-3130>
292291
318F ; Cn # <reserved-318F>
293-
31E4..31EF ; Cn # [12] <reserved-31E4>..<reserved-31EF>
292+
31E4..31EE ; Cn # [11] <reserved-31E4>..<reserved-31EE>
294293
321F ; Cn # <reserved-321F>
295294
A48D..A48F ; Cn # [3] <reserved-A48D>..<reserved-A48F>
296295
A4C7..A4CF ; Cn # [9] <reserved-A4C7>..<reserved-A4CF>
@@ -713,7 +712,8 @@ FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
713712
2B73A..2B73F ; Cn # [6] <reserved-2B73A>..<reserved-2B73F>
714713
2B81E..2B81F ; Cn # [2] <reserved-2B81E>..<reserved-2B81F>
715714
2CEA2..2CEAF ; Cn # [14] <reserved-2CEA2>..<reserved-2CEAF>
716-
2EBE1..2F7FF ; Cn # [3103] <reserved-2EBE1>..<reserved-2F7FF>
715+
2EBE1..2EBEF ; Cn # [15] <reserved-2EBE1>..<reserved-2EBEF>
716+
2EE5E..2F7FF ; Cn # [2466] <reserved-2EE5E>..<reserved-2F7FF>
717717
2FA1E..2FFFF ; Cn # [1506] <reserved-2FA1E>..<noncharacter-2FFFF>
718718
3134B..3134F ; Cn # [5] <reserved-3134B>..<reserved-3134F>
719719
323B0..E0000 ; Cn # [711761] <reserved-323B0>..<reserved-E0000>
@@ -723,7 +723,7 @@ E01F0..EFFFF ; Cn # [65040] <reserved-E01F0>..<noncharacter-EFFFF>
723723
FFFFE..FFFFF ; Cn # [2] <noncharacter-FFFFE>..<noncharacter-FFFFF>
724724
10FFFE..10FFFF; Cn # [2] <noncharacter-10FFFE>..<noncharacter-10FFFF>
725725

726-
# Total code points: 825345
726+
# Total code points: 824718
727727

728728
# ================================================
729729

@@ -2649,11 +2649,12 @@ FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I
26492649
2B740..2B81D ; Lo # [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
26502650
2B820..2CEA1 ; Lo # [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
26512651
2CEB0..2EBE0 ; Lo # [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
2652+
2EBF0..2EE5D ; Lo # [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
26522653
2F800..2FA1D ; Lo # [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
26532654
30000..3134A ; Lo # [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
26542655
31350..323AF ; Lo # [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
26552656

2656-
# Total code points: 131612
2657+
# Total code points: 132234
26572658

26582659
# ================================================
26592660

@@ -4092,7 +4093,7 @@ FFE3 ; Sk # FULLWIDTH MACRON
40924093
2E80..2E99 ; So # [26] CJK RADICAL REPEAT..CJK RADICAL RAP
40934094
2E9B..2EF3 ; So # [89] CJK RADICAL CHOKE..CJK RADICAL C-SIMPLIFIED TURTLE
40944095
2F00..2FD5 ; So # [214] KANGXI RADICAL ONE..KANGXI RADICAL FLUTE
4095-
2FF0..2FFB ; So # [12] IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT..IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID
4096+
2FF0..2FFF ; So # [16] IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT..IDEOGRAPHIC DESCRIPTION CHARACTER ROTATION
40964097
3004 ; So # JAPANESE INDUSTRIAL STANDARD SYMBOL
40974098
3012..3013 ; So # [2] POSTAL MARK..GETA MARK
40984099
3020 ; So # POSTAL MARK FACE
@@ -4101,6 +4102,7 @@ FFE3 ; Sk # FULLWIDTH MACRON
41014102
3190..3191 ; So # [2] IDEOGRAPHIC ANNOTATION LINKING MARK..IDEOGRAPHIC ANNOTATION REVERSE MARK
41024103
3196..319F ; So # [10] IDEOGRAPHIC ANNOTATION TOP MARK..IDEOGRAPHIC ANNOTATION MAN MARK
41034104
31C0..31E3 ; So # [36] CJK STROKE T..CJK STROKE Q
4105+
31EF ; So # IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION
41044106
3200..321E ; So # [31] PARENTHESIZED HANGUL KIYEOK..PARENTHESIZED KOREAN CHARACTER O HU
41054107
322A..3247 ; So # [30] PARENTHESIZED IDEOGRAPH MOON..CIRCLED IDEOGRAPH KOTO
41064108
3250 ; So # PARTNERSHIP SIGN
@@ -4191,7 +4193,7 @@ FFFC..FFFD ; So # [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARACTER
41914193
1FB00..1FB92 ; So # [147] BLOCK SEXTANT-1..UPPER HALF INVERSE MEDIUM SHADE AND LOWER HALF BLOCK
41924194
1FB94..1FBCA ; So # [55] LEFT HALF INVERSE MEDIUM SHADE AND RIGHT HALF BLOCK..WHITE UP-POINTING CHEVRON
41934195

4194-
# Total code points: 6634
4196+
# Total code points: 6639
41954197

41964198
# ================================================
41974199

data/combining_class.txt

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
# DerivedCombiningClass-15.0.0.txt
2-
# Date: 2022-04-26, 23:14:29 GMT
3-
# © 2022 Unicode®, Inc.
1+
# DerivedCombiningClass-15.1.0.txt
2+
# Date: 2023-07-28, 23:33:58 GMT
3+
# © 2023 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use, see https://www.unicode.org/terms_of_use.html
66
#
@@ -988,7 +988,7 @@
988988
2E80..2E99 ; 0 # So [26] CJK RADICAL REPEAT..CJK RADICAL RAP
989989
2E9B..2EF3 ; 0 # So [89] CJK RADICAL CHOKE..CJK RADICAL C-SIMPLIFIED TURTLE
990990
2F00..2FD5 ; 0 # So [214] KANGXI RADICAL ONE..KANGXI RADICAL FLUTE
991-
2FF0..2FFB ; 0 # So [12] IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT..IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID
991+
2FF0..2FFF ; 0 # So [16] IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT..IDEOGRAPHIC DESCRIPTION CHARACTER ROTATION
992992
3000 ; 0 # Zs IDEOGRAPHIC SPACE
993993
3001..3003 ; 0 # Po [3] IDEOGRAPHIC COMMA..DITTO MARK
994994
3004 ; 0 # So JAPANESE INDUSTRIAL STANDARD SYMBOL
@@ -1043,6 +1043,7 @@
10431043
3196..319F ; 0 # So [10] IDEOGRAPHIC ANNOTATION TOP MARK..IDEOGRAPHIC ANNOTATION MAN MARK
10441044
31A0..31BF ; 0 # Lo [32] BOPOMOFO LETTER BU..BOPOMOFO LETTER AH
10451045
31C0..31E3 ; 0 # So [36] CJK STROKE T..CJK STROKE Q
1046+
31EF ; 0 # So IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION
10461047
31F0..31FF ; 0 # Lo [16] KATAKANA LETTER SMALL KU..KATAKANA LETTER SMALL RO
10471048
3200..321E ; 0 # So [31] PARENTHESIZED HANGUL KIYEOK..PARENTHESIZED KOREAN CHARACTER O HU
10481049
3220..3229 ; 0 # No [10] PARENTHESIZED IDEOGRAPH ONE..PARENTHESIZED IDEOGRAPH TEN
@@ -1994,6 +1995,7 @@ FFFC..FFFD ; 0 # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARACTER
19941995
2B740..2B81D ; 0 # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
19951996
2B820..2CEA1 ; 0 # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
19961997
2CEB0..2EBE0 ; 0 # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
1998+
2EBF0..2EE5D ; 0 # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
19971999
2F800..2FA1D ; 0 # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
19982000
30000..3134A ; 0 # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
19992001
31350..323AF ; 0 # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
@@ -2003,7 +2005,7 @@ E0100..E01EF ; 0 # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
20032005
F0000..FFFFD ; 0 # Co [65534] <private-use-F0000>..<private-use-FFFFD>
20042006
100000..10FFFD; 0 # Co [65534] <private-use-100000>..<private-use-10FFFD>
20052007

2006-
# The above property value applies to 827393 code points not listed here.
2008+
# The above property value applies to 826766 code points not listed here.
20072009
# Total code points: 1113190
20082010

20092011
# ================================================

0 commit comments

Comments
 (0)