search for books and compare prices
Tables of Contents for The Unicode Standard, Version 4.0
Chapter/Section Title
Page #
Page Count
Acknowledgments
iii
 
Unicode Consortium Members and Directors
ix
 
Full Members
ix
 
Current Associate Members
ix
 
Current Liaison Members
ix
 
Current Specialist Members
x
 
Current Individual Members
x
 
Current Members of the Board of Directors
x
 
Former Members of the Board of Directors
x
 
Figures
xxiii
 
Tables
xxv
 
Preface
xxxi
 
About the Unicode Standard
xxxi
 
Concepts, Architecture, Conformance, and Guidelines
xxxi
 
Character Block Descriptions
xxxii
 
Charts and Han Radical-Stroke Index
xxxiii
 
Appendices
xxxiii
 
The Unicode Character Database and Technical Reports
xxxiii
 
On the CD-ROM
xxxiv
 
Notational Conventions
xxxiv
 
Code Points
xxxiv
 
Character Names
xxxiv
 
Sequences
xxxv
 
Miscellaneous
xxxv
 
Extended BNF
xxxv
 
Operators
xxxvii
 
Resources
xxxvii
 
Unicode Web Site
xxxvii
 
Unicode Anonymous FTP Site
xxxvii
 
Unicode E-mail Discussion List
xxxvii
 
How to Contact the Unicode Consortium
xxxviii
 
Introduction
1
10
Coverage
2
1
Standards Coverage
3
1
New Characters
3
1
Design Goals
3
1
Text Handling
4
1
Interpreting Characters
5
1
Text Elements
5
1
The Unicode Standard and ISO/IEC 10646
5
1
The Unicode Consortium
6
1
The Unicode Technical Committee
6
1
Submitting New Characters
7
4
General Structure
11
44
Architectural Context
11
3
Basic Text Processes
12
1
Text Elements, Characters, and Text Processes
12
1
Text Processes and Encoding
13
1
Unicode Design Principles
14
9
Universality
14
1
Efficiency
15
1
Characters, Not Glyphs
15
2
Semantics
17
1
Plain Text
18
1
Logical Order
18
1
Unification
19
1
Dynamic Composition
20
1
Equivalent Sequences
20
2
Convertibility
22
1
Compatibility Characters
23
1
Compatibility Characters
23
1
Compatibility Decomposable Characters
23
1
Mapping Compatibility Characters
23
1
Code Points and Characters
24
2
Types of Code Points
25
1
Encoding Forms
26
6
UTF-32
29
1
UTF-16
29
1
UTF-8
30
1
Comparison of the Advantages of UTF-32, UTF-16, and UTF-8
31
1
Encoding Schemes
32
2
Unicode Strings
34
1
Unicode Allocation
35
7
Planes
35
1
Allocation Areas and Character Blocks
36
1
Details of Allocation
36
6
Assignment of Code Points
42
1
Writing Direction
42
1
Combining Characters
43
4
Sequence of Base Characters and Diacritics
44
1
Multiple Combining Characters
44
2
Ligated Multiple Base Characters
46
1
Spacing Clones of European Diacritical Marks
46
1
``Characters'' and Grapheme Clusters
46
1
Special Characters and Noncharacters
47
1
Byte Order Mark (BOM)
47
1
Special Noncharacter Code Points
48
1
Layout and Format Control Characters
48
1
The Replacement Character
48
1
Control Codes
48
1
Conforming to the Unicode Standard
48
2
Supported Subsets
50
1
Related Publications
50
5
Conformance
55
40
Versions of the Unicode Standard
55
3
Stability
56
1
Version Numbering
56
1
Errata, Corrigenda, and Future Updates
56
1
References to the Unicode Standard
57
1
References to Unicode Character Properties
57
1
References to Unicode Algorithms
57
1
Conformance Requirements
58
5
Byte Ordering
58
1
Unassigned Code Points
58
1
Interpretation
59
1
Modification
60
1
Character Encoding Forms
60
1
Character Encoding Schemes
61
1
Bidirectional Text
61
1
Normalization Forms
62
1
Normative References
62
1
Unicode Algorithms
62
1
Default Casing Operations
62
1
Unicode Standard Annexes
62
1
Semantics
63
1
Definitions
63
1
Character Identity and Semantics
63
1
Characters and Encoding
64
2
Properties
66
4
Normative and Informative Properties
66
2
Simple and Derived Properties
68
1
Property Aliases
68
1
Default Property Values
69
1
Private Use
69
1
Combination
70
1
Decomposition
71
1
Compatibility Decomposition
71
1
Canonical Decomposition
72
1
Surrogates
72
1
Unicode Encoding Forms
73
5
UTF-32
76
1
UTF-16
76
1
UTF-8
77
1
Encoding Form Conversion
78
1
Unicode Encoding Schemes
78
4
Canonical Ordering Behavior
82
3
Application of Combining Marks
83
1
Combining Classes
83
1
Canonical Ordering
84
1
Canonical Ordering and Collation
85
1
Conjoining Jamo Behavior
85
4
Hangul Syllable Boundaries
86
1
Standard Korean Syllables
86
1
Hangul Syllable Composition
87
1
Hangul Syllable Decomposition
88
1
Hangul Syllable Names
88
1
Default Case Operations
89
6
Definitions
89
1
Case Conversion of Strings
90
1
Case Detection for Strings
90
1
Caseless Matching
91
4
Character Properties
95
12
Unicode Character Database
96
1
Case---Normative
96
1
Case Mapping
97
1
Combining Classes---Normative
97
1
Directionality---Normative
98
1
General Category---Normative
98
2
Numeric Value---Normative
100
1
Ideographic Numeric Values
100
1
Bidi Mirrored---Normative
101
1
Unicode 1.0 Names
101
1
Letters, Alphabetic, and Ideographic
102
1
Boundary Control
102
1
Characters with Unusual Properties
103
4
Implementation Guidelines
107
40
Transcoding to Other Standards
107
2
Issues
107
1
Multistage Tables
108
1
ANSI/ISO C wchar_t
109
1
Unknown and Missing Characters
110
1
Reserved and Private-Use Character Codes
110
1
Interpretable but Unrenderable Characters
110
1
Default Property Values
110
1
Default Ignorable Code Points
111
1
Interacting with Downlevel Systems
111
1
Handling Surrogate Pairs in UTF-16
111
3
Handling Numbers
114
1
Normalization
114
1
Compression
115
1
Newline Guidelines
116
3
Definitions
116
1
Background
117
1
Recommendations
118
1
Regular Expressions
119
1
Language Information in Plain Text
119
2
Requirements for Language Tagging
119
1
Language Tags and Han Unification
120
1
Editing and Selection
121
1
Consistent Text Elements
121
1
Strategies for Handling Nonspacing Marks
122
3
Keyboard Input
123
1
Truncation
124
1
Rendering Nonspacing Marks
125
5
Canonical Equivalence
127
1
Positioning Methods
128
2
Locating Text Element Boundaries
130
1
Identifiers
130
2
Property-Based Identifier Syntax
130
1
Syntactic Rule
131
1
Alternative Recommendation
132
1
Sorting and Searching
132
3
Culturally Expected Sorting and Searching
133
1
Language-Insensitive Sorting
133
1
Searching
133
1
Sublinear Searching
134
1
Binary Order
135
1
UTF-8 in UTF-16 Order
135
1
UTF-16 in UTF-8 Order
136
1
Case Mappings
136
4
Complications for Case Mapping
137
1
Reversibility
138
1
Caseless Matching
138
1
Normalization
139
1
Unicode Security
140
2
Default Ignorable Code Points
142
5
Writing Systems and Punctuation
147
18
Writing Systems
148
4
General Punctuation
152
13
Punctuation: U+0020--U+00BF
152
2
General Punctuation: U+2000--U+206F
154
6
CJK Symbols and Punctuation: U+3000--U+303F
160
1
CJK Compatibility Forms: U+FE30--U+FE4F
161
1
Small Form Variants: U+FE50--U+FE6F
162
3
European Alphabetic Scripts
165
26
Latin
166
8
Letters of Basic Latin: U+0041--U+007A
166
1
Letters of the Latin-1 Supplement: U+00C0--U+00FF
167
1
Latin Extended-A: U+0100--U+017F
167
2
Latin Extended-B: U+0180--U+024F
169
1
IPA Extensions: U+0250--U+02AF
170
1
Phonetic Extensions: U+1D00--U+1D6A
171
1
Latin Extended Additional: U+1E00--U+1EFF
172
1
Latin Ligatures: FB00--FB06
173
1
Greek
174
5
Greek: U+0370--U+03FF
174
3
Greek Extended: U+1F00--U+1FFF
177
2
Cyrillic
179
1
Cyrillic: U+0400--U+04FF
179
1
Cyrillic Supplement: U+0500--U+052F
179
1
Armenian
180
2
Armenian: U+0530--U+058F
180
2
Georgian
182
2
Georgian: U+10A0--U+10FF
182
2
Modifier Letters
184
2
Spacing Modifier Letters: U+02B0--U+02FF
184
2
Combining Marks
186
5
Combining Diacritical Marks: U+0300--U+036F
186
2
Combining Marks for Symbols: U+20D0--U+20FF
188
1
Combining Half Marks: U+FE20--U+FE2F
188
3
Middle Eastern Scripts
191
26
Hebrew
192
3
Hebrew: U+0590--U+05FF
192
2
Alphabetic Presentation Forms: U+FB1D--U+FB4F
194
1
Arabic
195
11
Arabic: U+0600--U+06FF
195
4
Cursive Joining
199
2
Ligatures
201
3
Arabic Presentation Forms-A: U+FB50--U+FDFF
204
1
Arabic Presentation Forms-B: U+FE70--U+FEFF
205
1
Syriac
206
7
Syriac: U+0700--U+074F
206
4
Syriac Shaping
210
1
Syriac Cursive Joining
210
2
Ligatures
212
1
Thaana
213
4
Thaana: U+0780--U+07BF
213
4
South Asian Scripts
217
48
Devanagari
219
13
Devanagari: U+0900--U+097F
219
13
Bengali
232
2
Bengali: U+0980--U+09FF
232
2
Gurmukhi
234
2
Gurmukhi: U+0A00--U+0A7F
234
2
Gujarati
236
1
Gujarati: U+0A80--U+0AFF
236
1
Oriya
237
2
Oriya: U+0B00--U+0B7F
237
2
Tamil
239
5
Tamil: U+0B80--U+0BFF
239
5
Telugu
244
1
Telugu: U+0C00--U+0C7F
244
1
Kannada
245
3
Kannada: U+0C80--U+0CFF
245
3
Malayalam
248
2
Malayalam: U+0D00--U+0D7F
248
2
Sinhala
250
1
Sinhala: U+0D80--U+0DFF
250
1
Tibetan
251
9
Tibetan: U+0F00--U+0FFF
251
9
Limbu
260
5
Limbu: U+1900--U+194F
260
5
Southeast Asian Scripts
265
26
Thai
266
3
Thai: U+0E00--U+0E7F
266
3
Lao
269
2
Lao: U+0E80--U+0EFF
269
2
Myanmar
271
3
Myanmar: U+1000--U+109F
271
3
Khmer
274
10
Khmer: U+1780--U+17FF
274
9
Khmer Symbols: U+19E0--U+19FF
283
1
Tai Le
284
2
Tai Le: U+1950--U+197F
284
2
Philippine Scripts
286
5
Tagalog: U+1700--U+171F
286
1
Hanunoo: U+1720--U+173F
286
1
Buhid: U+1740--U+175F
286
1
Tagbanwa: U+1760--U+177F
286
5
East Asian Scripts
291
30
Han
293
17
CJK Unified Ideographs
293
11
CJK Unified Ideographs Ext. B: U+20000--U+2A6D6
304
1
CJK Compatibility Ideographs: U+F900--U+FAFF
305
1
CJK Compatibility Supplement: U+2F800--U+2FA1D
305
1
Kanbun: U+3190--U+319F
305
1
CJK and KangXi Radicals: U+2E80--U+2FD5
306
1
Ideographic Description: U+2FF0--U+2FFB
307
3
Bopomofo
310
2
Bopomofo: U+3100--U+312F
310
2
Hiragana and Katakana
312
2
Hiragana: U+3040--U+309F
312
1
Katakana: U+30A0--U+30FF
312
1
Katakana Phonetic Extensions: U+31F0--U+31FF
313
1
Halfwidth and Fullwidth Forms: U+FF00--U+FFEF
313
1
Hangul
314
3
Hangul Jamo: U+1100--U+11FF
314
1
Hangul Compatibility Jamo: U+3130--U+318F
314
1
Hangul Syllables: U+AC00--U+D7A3
315
2
Yi
317
4
Yi: U+A000--U+A4CF
317
4
Additional Modern Scripts
321
16
Ethiopic
322
3
Ethiopic: U+1200--U+137F
322
3
Mongolian
325
4
Mongolian: U+1800--U+18AF
325
4
Osmanya
329
1
Osmanya: U+10480--U+104AF
329
1
Cherokee
330
1
Cherokee: U+13A0--U+13FF
330
1
Canadian Aboriginal Syllabics
331
1
Canadian Aboriginal Syllabics: U+1400--U+167F
331
1
Deseret
332
2
Deseret: U+10400--U+1044F
332
2
Shavian
334
3
Shavian: U+10450--U+1047F
334
3
Archaic Scripts
337
12
Ogham
338
1
Ogham: U+1680--U+169F
338
1
Old Italic
339
2
Old Italic: U+10300--U+1032F
339
2
Runic
341
2
Runic: U+16A0--U+16F0
341
2
Gothic
343
1
Gothic: U+10330--U+1034F
343
1
Ugaritic
344
1
Ugaritic: U+10380--U+1039F
344
1
Linear B
345
1
Linear B Syllabary: U+10000--U+1007F
345
1
Linear B Ideograms: U+10080--U+108FF
345
1
Aegean Numbers: U+10100--U+1013F
345
1
Cypriot Syllabary
346
3
Cypriot Syllabary: U+10800--U+1083F
346
3
Symbols
349
34
Currency Symbols
351
2
Currency Symbols: U+20A0--U+20CF
351
2
Letterlike Symbols
353
5
Letterlike Symbols: U+2100--U+214F
353
1
Math Alphanumeric Symbols: U+1D400--U+1D7FF
354
1
Mathematical Alphabets
354
2
Fonts Used for Mathematical Alphabets
356
2
Number Forms
358
2
Number Forms: U+2150--U+218F
358
1
Superscripts and Subscripts: U+2070--U+209F
359
1
Mathematical Symbols
360
5
Mathematical Operators: U+2200--U+22FF
360
2
Supplements to Mathematical Symbols and Arrows
362
1
Supplemental Math Operators: U+2A00--U+2AFF
362
1
Miscellaneous Math Symbols-A: U+27C0--U+27EF
362
1
Miscellaneous Math Symbols-B: U+2980--U+29FF
362
1
Arrows: U+2190--U+21FF
363
1
Supplemental Arrows
363
1
Standardized Variants of Mathematical Symbols
363
2
Technical Symbols
365
3
Control Pictures: U+2400--U+243F
365
1
Miscellaneous Technical: U+2300--U+23FF
365
2
Optical Character Recognition: U+2440--U+245F
367
1
Geometrical Symbols
368
2
Box Drawing: U+2500--U+257F
368
1
Block Elements: U+2580--U+259F
368
1
Geometric Shapes: U+25A0--U+25FF
368
2
Miscellaneous Symbols and Dingbats
370
3
Miscellaneous Symbols: U+2600--U+26FF
370
1
Dingbats: U+2700--U+27BF
371
1
Yijing Hexagram Symbols: U+4DC0--U+4DFF
372
1
Tai Xuan Jing Symbols: U+1D300--U+1D356
372
1
Enclosed and Square
373
1
Enclosed Alphanumerics: U+2460--U+24FF
373
1
Enclosed CJK Letters and Months: U+3200--U+32FF
373
1
CJK Compatibility: U+3300--U+33FF
373
1
Braille
374
2
Braille Patterns: U+2800--U+28FF
374
2
Byzantine Musical Symbols
376
1
Byzantine Musical Symbols: U+1D000--U+1D0FF
376
1
Western Musical Symbols
377
6
Musical Symbols: U+1D100--U+1D1FF
377
6
Special Areas and Format Characters
383
30
Control Codes
385
2
Layout Controls
387
6
Invisible Operators
393
1
Deprecated Format Characters
394
2
Deprecated Format Characters: U+206A--U+206F
394
2
Surrogates Area
396
1
Surrogates Area: U+D800--U+DFFF
396
1
Variation Selectors
397
1
Private-Use Characters
398
2
Private Use Area: U+E000--U+F8FF
398
1
Supplementary Private Use Areas
399
1
Noncharacters
400
1
Noncharacters: U+FFFE, U+FFFF, and Others
400
1
Specials
401
4
Specials: U+FEFF, U+FFF0--U+FFFD
401
4
Tag Characters
405
8
Tag Characters: U+E0000--U+E007F
405
8
Code Charts
413
776
Character Names List
413
4
Images in the Code Charts and Character Lists
414
1
Character Names
414
1
Aliases
415
1
Cross References
415
1
Information About Languages
415
1
Case Mappings
415
1
Decompositions
416
1
Reserved Characters
417
1
Noncharacters
417
1
Subheads
417
1
CJK Unified Ideographs
417
1
Hangul Syllables
418
771
Han Radical-Stroke Index
1189
152
Han Unification History
1341
2
Abstracts of Unicode Technical Reports
1343
4
Unicode Standard Annexes
1343
1
UAX #9: The Bidirectional Algorithm
1343
1
UAX #11: East Asian Width
1343
1
UAX #14: Line Breaking Properties
1343
1
UAX #15: Unicode Normalization Forms
1344
1
UAX #24: Script Names
1344
1
UAX #29: Text Boundaries
1344
1
Unicode Technical Standards
1344
1
UTS #6: A Standard Compression Scheme for Unicode
1344
1
UTS #10: Unicode Collation Algorithm
1344
1
Unicode Technical Reports
1344
1
UTR #16: UTF-EBCDIC
1344
1
UTR #17: Character Encoding Model
1344
1
UTR #18: Unicode Regular Expression Guidelines
1344
1
UTR #20: Unicode in XML and Other Markup Languages
1345
1
UTR #22: Character Mapping Markup Language (CharMapML)
1345
1
UTR #26: Compatibility Encoding Scheme for UTF-16: 8-Bit (CESU-8)
1345
1
Other Unicode References
1345
2
Unicode Technical Notes
1345
1
FAQ (Frequently Asked Questions)
1346
1
Charts
1346
1
Conferences
1346
1
Policies
1346
1
Updates and Errata
1346
1
Versions
1346
1
Where Is My Character?
1346
1
Relationship to ISO/IEC 10646
1347
8
History
1347
3
Unicode 1.0
1348
1
Unicode 2.0
1348
1
Unicode 3.0
1349
1
Unicode 4.0
1350
1
Encoding Forms in ISO/IEC 10646
1350
1
Zero Extending
1350
1
UCS Transformation Formats
1351
1
UTF-8
1351
1
UTF-16
1351
1
Synchronization of the Standards
1352
1
Identification of Features for the Unicode Standard
1352
1
Character Names
1353
1
Character Functional Specifications
1353
2
Changes from Unicode Version 3.0
1355
8
Versions of the Unicode Standard
1355
2
Changes from Unicode Version 3.0 to Version 3.1
1357
1
New Characters Added
1357
1
Unicode Character Database Changes
1357
1
Changes Affecting Conformance
1357
1
Unicode Standard Annexes
1358
1
Changes from Unicode Version 3.1 to Version 3.2
1358
1
New Characters Added
1358
1
Unicode Character Database Changes
1358
1
Changes Affecting Conformance
1358
1
Unicode Standard Annexes
1359
1
Changes from Unicode Version 3.2 to Version 4.0
1359
4
New Characters Added
1359
1
Unicode Character Database Changes
1359
1
Changes Affecting Conformance
1360
1
Unicode Standard Annexes
1360
1
Errata
1360
3
Glossary
1363
22
References
1385
22
Source Standards and Specifications
1385
6
Source Dictionaries for Han Unification
1391
1
Other Sources for the Unicode Standard
1391
6
Selected Resources: Technical
1397
3
Selected Resources: Scripts and Languages
1400
7
Indices
1407
1
Unicode Names Index
1407
42
General Index
1449