繁星计划:为东亚地区改进数字分组方式
“繁星计划”旨在为使用以万为单位计数体系的东亚语言(如中文、日文、韩文)改进数字分组方式。
目前,阿拉伯数字的常用分组方式(每三位加逗号)与东亚的计数习惯不符,导致阅读体验不佳。
该计划建议使用下划线(_)作为每四位数字的分隔符,以匹配万、亿、兆等单位。
下划线具有易于输入、与逗号区分明显等优点,有望提升东亚地区数字阅读的便捷性和准确性。
查看原文开头(英文 · 仅前 3 段)
Easier digit grouping for East Asia East Asian languages count in ten-thousands.Our digit grouping should too. Try reading this number 1,234,567,890 12_3456_7890 The mismatch When writing large numbers in Arabic numerals, we put a comma every three digits: 1,000,000,000. In English that number is “one billion,” and the commas land neatly on thousands, millions, and billions. Read the same number in Chinese, Japanese, or Korean, and it’s “ten hundred-millions”—the natural grouping is by ten-thousands, not thousands.East Asian languages count in multiples of 10,000 (萬). The named units are 萬, 億, 兆—each ten thousand times the last. Three-digit comma grouping cuts across these units at arbitrary points. Readers have to mentally regroup the digits every time they encounter a large number. Read aloud 1.23 billion (≈ 12 hundred-millions) Type any whole number. Watch how three-digit commas and four-digit underscores partition the same sequence of digits. The proposal Use an underscore (_) as a ten-thousand-place separator.The underscore passes the boring tests: it is plain ASCII, easy to type, and already tolerated by programmers as a digit separator in languages like Python and Rust. It also does not look like a comma, which is the whole point. Current 1,234,567,890 Proposed 12_3456_7890 Coexisting with thousands The underscore can mark ten-thousand groups while the comma still marks thousands inside them. You can read both scales at once. Combined 1,2_34,56_7,890 Why not four-digit commas? The obvious alternative is commas every four digits. The comma is already taken, though. Too many people, spreadsheets, price lists, and standards read it as a three-digit separator. Reassigning it would make numbers easier for one group of readers and more error-prone for another.A new separator is less elegant than reusing the comma, but it is clearer about what has changed. Other candidates We considered several other characters before settling on the underscore. Middle dot · U+00B7 Clean on the page and familiar in East Asian typography. Visually preferable, but most keyboards do not make it easy to type. Thin space U+2009 Has standards support through ISO 80000-1, but is too fragile in ordinary text. Many readers cannot tell it from a normal space, and many tools do not preserve it reliably. Apostrophe ' U+0027 Easy to type and used in Switzerland for thousand groups. Microsoft Word, Google Docs, and many mobile keyboards silently replace a straight apostrophe with a curly quotation mark—a bad foundation for casual notation. Regular space U+0020 The easiest to type, but it splits a number into separate tokens. Search, copy-paste, and data parsing all break, and lines can wrap in the middle of a number. How to start using it In writing, you can start now. Just type an underscore.For software and data pipelines, no standard yet handles this notation automatically; an explicit normalization step is needed—strip underscores before parsing.
※ 出于版权考虑,仅引用前 3 段。完整内容请阅读原文。