Unicode 18.0.0 Beta

40 points by birdculture 10 hours ago on hackernews | 49 comments

Unicode® 18.0.0 (DRAFT)

2026 September NN (Announcement)

STATUS: This is a preliminary draft page for an upcoming release. Some details may be missing or incorrect, and some links may be wrong or broken. During the beta review period, feedback about errors on this page will be helpful and appreciated.

This page summarizes the important changes for the Unicode Standard, Version 18.0.0. This version supersedes all previous versions of the Unicode Standard.

A. Summary

B. Technical Overview

Core Specification

Code Charts

Han Radical-Stroke Indices

Unicode Standard Annexes

Unicode Character Database

Version References

Errata

C. Stability Policy Update

D. Textual Changes and Character Additions

E. Conformance Changes

F. Changes in the Unicode Character Database

G. Changes in the Unicode Standard Annexes

H. Changes in Synchronized Unicode Technical Standards

I. List of Components

M. Implications for Migration

A. Summary

Unicode 18.0 adds 13,047 characters, for a total of 172,848 characters. The new additions include 4 new scripts:

Chisoi
Proto-Cuneiform (numerals)
Jurchen
Seal

New Data Files for Unicode 18.0

JurchenSources.txt
SealSources.txt

Synchronization

Several other important Unicode specifications have been updated for Version 18.0. The following five Unicode Technical Standards are versioned in synchrony with the Unicode Standard, because their data files cover the same repertoire. All have been updated to Version 18.0:

Some of the changes in Version 18.0 and associated Unicode Technical Standards may require modifications to implementations. For more information, see the migration and modification sections of those specifications.

See Sections D through H below for additional details regarding the changes in this version of the Unicode Standard, its associated annexes, and the other synchronized Unicode specifications.

See the following resource links for general information about Unicode versions and other information about the Unicode Standard and other publications of the Unicode Consortium.

B. Technical Overview

Version 18.0 of the Unicode Standard consists of:

The core specification
The code charts (delta and archival) for this version
The Unicode Standard Annexes
The Unicode Character Database (UCD)

The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Standard Annexes supply detailed normative information about particular aspects of the standard. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard.

Core Specification

The core specification for Version 18.0 is available for browsing online as per-chapter web pages. Because the full table of contents for the core specification is provided, with interactive links, no separate bookmarks page is provided for this release, nor are separate chapter links provided directly in this summary page for the Unicode Standard. Anchors for chapters, sections, tables, and figures in the core specification are shown with the convention of a "#" in the left margin of the heading or caption. Those anchors can be clicked on to provide custom bookmarks to any particular portion of the text, down to the level of subsections.

The HTML version of the core specification is authoritative. However, for convenience of reference, an archival version of core specification is also available as a single pdf. (13 MB)

Code Charts

Several sets of code charts are available. They serve different purposes:

Chart Type	Description
Code Charts	Block-by-block code charts for Version 18.0.0. The charts are organized by scripts and blocks for easy reference. An interactive index by character name is also provided.
Delta Code Charts	These charts show the new blocks and any blocks in which characters were added specifically for Unicode 18.0.0. The new characters and any major updates to the representative glyphs are visually highlighted in these charts.
Consolidated Code Charts	These charts are distributed as a single pdf file (167 MB) containing the entire set of characters, names and representative glyphs at the time of publication of Unicode 18.0.0.
Auxiliary Code Charts	The auxiliary charts display information about collation and casing for the repertoire of the scripts for this release.

The block-by-block, delta, and consolidated code charts are a stable part of this release of the Unicode Standard. They will never be updated. The auxiliary code charts are provided for information, and have no stability guarantees.

Han Radical-Stroke Indices

There are a number of radical-stroke indices available to assist in the lookup of Han ideographs in the code charts.

Index Type	Description
Interactive	An interactive CJK character lookup page that supports lookup either by code point or by radical and stroke values.
IICore (4.6 MB)	A static radical-stroke index PDF file limited to only the IICore repertoire. (This RS index is seldom updated.)
Unihan Core 2020 (8.6 MB)	A static radical-stroke index PDF file limited to only the Unihan Core 2020 repertoire. (This RS index is seldom updated.)
Complete (35 MB)	A static radical-stroke index PDF file that covers the entire CJK ideograph repertoire for Unicode 18.0.
Complete	A static data file that corresponds to the complete radical-stroke index for Unicode 18.0.

The complete radical-stroke index is a stable part of this release of the Unicode Standard. It will never be updated.

Unicode Standard Annexes

STATUS: During the alpha review and beta review periods, links to individual UAXes (or UTSes) point to the proposed update for that document, if any. If no proposed update has been posted for the document, links point to the last published version of the document, for reference.

Links to the individual Unicode Standard Annexes for this version are available in Section I, List of Components below. The summary list of significant changes in the content of each Unicode Standard Annex for Version 18.0 can be found in Section G, Changes in the Unicode Standard Annexes below.

Unicode Character Database

STATUS: During the beta review period, the draft of UCD data includes data for the complete, planned character repertoire of Unicode 18.0, including all data changes approved by UTC for version 18.0.

Data files for Version 18.0 of the Unicode Character Database are available. The ReadMe.txt in that directory provides a roadmap to the functions of the various subdirectories. Detailed documentation about the data files can be found in UAX #44, Unicode Character Database.

Version References

Version 18.0.0 of the Unicode Standard should be referenced as:

The Unicode Consortium. The Unicode Standard, Version 18.0.0, (South San Francisco: The Unicode Consortium, 2026. ISBN 978-1-936213-NN-N)
https://www.unicode.org/versions/Unicode18.0.0/

The terms “Version 18.0” or “Unicode 18.0” are abbreviations for the full version reference, Version 18.0.0.

The citation and permalink for the latest published version of the Unicode Standard is:

The Unicode Consortium. The Unicode Standard.
https://www.unicode.org/versions/latest/

A complete specification of the contributory files for Unicode 18.0 is found below in Section I, List of Components. For examples of how to cite particular portions of the Unicode Standard, see also the Reference Examples.

Errata

Errata incorporated into Unicode 18.0 are listed by date in a separate table. For corrigenda and errata after the release of Unicode 18.0, see the list of current Updates and Errata.

C. Stability Policy Update

A new property stability policy has been added for the ID_Compat_Math_Start and ID_Compat_Math_Continue properties. See the Character Encoding Stability Policies.

D. Textual Changes and Character Additions

Changes in the Unicode Standard Annexes are listed in Section G.

Character Assignment Overview

13,047 characters have been added. Most character additions are in new blocks, but there are also character additions to a number of existing blocks. For details, see the delta code charts.

New Blocks

The following blocks are newly defined in Version 18.0:

Range	Block Name
11DF0..11DFF	Bengali Supplement
12550..1268F	Archaic Cuneiform Numerals
16D80..16DAF	Chisoi
18E00..1919F	Jurchen
191A0..191DF	Jurchen Radicals
1D250..1D28F	Musical Symbols Supplement
1DB00..1DBFF	Miscellaneous Symbols and Arrows Extended
3D000..3FC3F	Seal

E. Conformance Changes

The conformance section of the standard has been updated with new definitions and requirements regarding the use of variation selectors and variation sequences.

F. Changes in the Unicode Character Database

The detailed listing of all changes to the contributory data files of the Unicode Character Database for Version 18.0 can be found in UAX #44, Unicode Character Database. The changes listed there include character additions and property revisions to existing characters that will affect implementations. Some of the important impacts on implementations migrating from earlier versions of the standard are highlighted in Section M.

G. Changes in the Unicode Standard Annexes

In Version 18.0, some of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UAX, linked directly from the following list of UAXes.

Unicode Standard Annex	Changes
UAX #9 Unicode Bidirectional Algorithm	No significant changes in this version.
UAX #11 East Asian Width	The unassigned code point ranges in Section 6.1 were adjusted.
UAX #14 Unicode Line Breaking Algorithm	Rule LB12a was changed to disallow a break between BA and GL. The Line_Break assignment of FIGURE DASH and EN DASH was changed from HH to BA and SOFT HYPHEN from BA to HH for better linebreaking behavior for those characters.
UAX #15 Unicode Normalization Forms	No significant changes in this version.
UAX #24 Unicode Script Property	No significant changes in this version.
UAX #29 Unicode Text Segmentation	Rule GB9c (“Do not break within certain combinations with Indic_Conjunct_Break (InCB)=Linker.”) has been revised.
UAX #31 Unicode Identifiers and Syntax	The new scripts in Unicode 18.0 were added to Table 4, Excluded Scripts. A note was added indicating that the Property and Algorithms Group (PAG) is the primary point of contact for script reclassification, and pointing to the guidelines approved by the UTC.
UAX #34 Unicode Named Character Sequences	No significant changes in this version.
UAX #38 Unicode Han Database (Unihan)	The provisional properties kIRGDaeJaweon and kIRGKangXi were removed. The descriptions of kIRG_KSource and kIRG_UKSource have been updated. The syntax and description of kIRG-GSource has been updated.
UAX #41 Common References for Unicode Standard Annexes	All references were updated for Unicode 18.0.
UAX #42 Unicode Character Database in XML	New code point attributes, values, and patterns were added for Unicode 18.0.
UAX #44 Unicode Character Database	A new section was added to document UAX #60 and the data files for the large East Asian scripts it covers (Seal, Jurchen, Nushu, Tangut). Additions were made to Table 5 for the new data files JurchenSources and SealSources.txt. The section regarding the directory structure for UCD and non-UCD files distributed under /Public/draft/ and versioned directories was rewritten.
UAX #45 U-Source Ideographs	No significant changes in this version.
UAX #50 Unicode Vertical Text Layout	No significant changes in this version.
UAX #53 Unicode Arabic Mark Rendering	U+10EF4 and U+10EF6 were added to MCM.
UAX #57 Unicode Egyptian Hieroglyph Database (Unikemet)	Many small corrections for details about the data and regex values.
UAX #60 Data for Large East Asian Scripts	New in this release.

H. Changes in Synchronized Unicode Technical Standards

There are also significant revisions in the Unicode Technical Standards whose versions are synchronized with the Unicode Standard. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UTS, linked directly from the following list of UTSes.

Unicode Technical Standard	Changes
UTS #10 Unicode Collation Algorithm	Jurchen and Seal were added to the table for computing implicit weights. Missing Tibetan contractions were added to DUCET. A discussion of the mapping and behavior for U+FFFE and U+FFFF was added. The Shift-Trimmed option was removed.
UTS #39 Unicode Security Mechanisms	The term “nonspacing mark” has been clarified, and some outdated text has been removed.
UTS #46 Unicode IDNA Compatibility Processing	No significant changes in this version.
UTS #51 Unicode Emoji	Updated example of display for emoji modifiers for a base character with an unusual default skin tone.
UTS #58 Unicode Link Detection and Formatting: URLs and Email Addresses	No significant changes in this version.

I. List of Components

This section lists the components of Version 18.0.0 of the Unicode Standard. The version numbering and the role of each component are explained in Versions of The Unicode Standard.

Core Specification
Authoritative HTML
Archival PDF: UnicodeStandard-18.0.pdf (size: 13 MB)
Code Charts and Radical-Stroke Index
Code Charts (size: 134 MB) Radical-Stroke Index (size: 48 MB) Radical-Stroke Index data
Unicode Standard Annexes
UAX #9: Unicode Bidirectional Algorithm UAX #11: East Asian Width UAX #14: Unicode Line Breaking Algorithm UAX #15: Unicode Normalization Forms UAX #24: Unicode Script Property UAX #29: Unicode Text Segmentation UAX #31: Unicode Identifiers and Syntax UAX #34: Unicode Named Character Sequences UAX #38: Unicode Han Database (Unihan) UAX #41: Common References for Unicode Standard Annexes UAX #42: Unicode Character Database in XML UAX #44: Unicode Character Database UAX #45: U-Source Ideographs UAX #50: Unicode Vertical Text Layout UAX #53: Unicode Arabic Mark Rendering UAX #57: Unicode Egyptian Hieroglyph Database (Unikemet) UAX #60: Data for Large East Asian Scripts
Unicode Character Database
https://www.unicode.org/Public/18.0.0/
Documentation
Index.txt
NamesList.html
ReadMe.txt
Core Data
ArabicShaping.txt
BidiBrackets.txt
BidiMirroring.txt
Blocks.txt
CJKRadicals.txt
CompositionExclusions.txt
DoNotEmit.txt
EastAsianWidth.txt
EmojiSources.txt
EquivalentUnifiedIdeograph.txt
HangulSyllableType.txt
IndicPositionalCategory.txt
IndicSyllabicCategory.txt
Jamo.txt
LineBreak.txt
NameAliases.txt
NamedSequences.txt
NamedSequencesProv.txt
NamesList.txt
NormalizationCorrections.txt
PropertyAliases.txt
PropertyValueAliases.txt
PropList.txt
Scripts.txt
ScriptExtensions.txt
SpecialCasing.txt
StandardizedVariants.txt
UnicodeData.txt
VerticalOrientation.txt
Data for UAX #38: Unihan Database (Unihan.zip)
Unihan_DictionaryIndices.txt
Unihan_DictionaryLikeData.txt
Unihan_IRGSources.txt
Unihan_NumericValues.txt
Unihan_OtherMappings.txt
Unihan_RadicalStrokeCounts.txt
Unihan_Readings.txt
Unihan_Variants.txt
Data for UAX #45
USourceData.txt
USourceGlyphs.pdf
USourceRSChart.pdf
Data for UAX #57
Unikemet.txt
Data for UAX #60
JurchenSources.txt
NushuSources.txt
SealSources.txt
TangutSources.txt
Derived Data
CaseFolding.txt
DerivedAge.txt
DerivedCoreProperties.txt
DerivedNormalizationProps.txt
Extracted Data
DerivedBidiClass.txt
DerivedBinaryProperties.txt
DerivedCombiningClass.txt
DerivedDecompositionType.txt
DerivedEastAsianWidth.txt
DerivedGeneralCategory.txt
DerivedJoiningGroup.txt
DerivedJoiningType.txt
DerivedLineBreak.txt
DerivedName.txt
DerivedNumericType.txt
DerivedNumericValues.txt
Conformance Test Data
BidiCharacterTest.txt
BidiTest.txt
NormalizationTest.txt
Auxiliary Data for UAX #14 and UAX #29
GraphemeBreakProperty.txt
GraphemeBreakTest.txt
LineBreakTest.txt
SentenceBreakProperty.txt
SentenceBreakTest.txt
WordBreakProperty.txt
WordBreakTest.txt
Documentation for Auxiliary Data
GraphemeBreakTest.html
LineBreakTest.html
SentenceBreakTest.html
WordBreakTest.html
Emoji Data
emoji-data.txt
emoji-variation-sequences.txt

M. Implications for Migration

STATUS: During the beta review period, the following section is incomplete. For issues which may impact migration, see the detailed notes presented under Notable Issues for Beta Reviewers on the 18.0 beta review page.

There are a significant number of changes in Unicode 18.0 which may impact implementations upgrading to Version 18.0 from earlier versions of the standard. The most important of these are listed and explained here, to help focus on the issues most likely to cause unexpected trouble during upgrades.

Core Specification Changes

New content has been added for Unicode 18.0, and many other improvements have been made to the text.

Script-related Changes

There are four new scripts encoded in Unicode 18.0. Two of these scripts, Jurchen and Seal, are large ideographic scripts. For Proto-Cuneiform, only the archaic digits are added in Unicode 18.0; addition of additional signs is anticipated for a future version.

General Character Property Issues

Security and Identifier-related Issues (See UAX #31 and UTS #39.)

TBD

Segmentation (See UAX #14.)

TBD

Numeric Property Issues

There is one new set of decimal digits added in Unicode 18.0, for the newly encoded Chisoi script. Implementations of numeric values and numeric formatting should take this new set into account.

Archaic Cuneiform Numerals: TBD

CJK/Unihan Changes

See UAX #38, Unicode Han Database (Unihan) for further details on these changes

Standardized Variation Sequences

Changes to Code Charts

There are a number of other Han glyph updates.
Other glyph updates are listed explicitly in the delta charts index page.
The two code charts for Egyptian hieroglyphs contain extensive functional and phonetic information derived from the data file Unikemet.txt, and have notable further updates for Version 18.0.

Collation-related Changes

TBD

Emoji Changes

For details about emoji changes, see the Unicode 18.0 emoji charts and Emoji Recently Added, v18.0.