You may come across Chinese names in various contexts, such as Cixin Liu, the author of the science fiction novel The Three-Body Problem, and Ai Weiwei, a contemporary Chinese artist.
However, these names have been transliterated to Hanyu Pinyin, which represents Mandarin pronunciation, rather than their original characters. This can lead to name ambiguity, as even native Chinese speakers may find it challenging to decode the original names behind the Pinyin.
Scroll down for the story or click Explorer to dive in.
The Chinese language is character-based, including personal names. While surnames are inherited, parents can choose one or two (rarely three) characters for their children’s given names based on factors like meaning, sound, and family tradition.
surname
single
Liu
compound(rare)
given name
one character
two characters
Ci Xin
three characters (rare)
For example, in Cixin Liu, "Liu" (刘) is the surname, and "Cixin" is the chosen given name, made up of the characters "Ci" (慈) and "Xin" (欣). When written in Pinyin, these characters are represented as syllables that together form the name "Cixin."
Each Chinese character has a pronunciation represented by Pinyin, made up of syllables, each with an associated tone. Four standard tones (e.g.,
for "a") are shown in color below, with tones displayed in multiple colors for characters with multiple pronunciations. Of the 8,000+ General Standard Chinese Characters, about 16.1% have multiple pronunciations, with some pronounced differently in names....
Use to switch example
Chinese characters vary in usage frequency, and a character common in daily life may not be as frequent in names. In this project, we focus on 2,936 characters that are used as surnames or appear in given names of at least one person per million people.
However…
Various characters can share the same Pinyin. For example, "yì" refers to 39 distinct characters.
As Chinese is a tonal language, a syllable pronounced in different tones can correspond to various sets of characters.
When only the syllables (without tone markers) are used, identifying the original characters becomes even more challenging—especially in situations like names, where context is absent.
Together, transliteration and tone loss compress 2,936 distinct characters into Pinyin syllables.
This Pinyin romanization system can lead to name ambiguity, making it challenging to identify individuals based solely on Pinyin.
Given the varying popularity of characters in given names, some Pinyin syllables are more common than others. For example, "yu" is the most frequent, with over 47 per 1,000 people having it in their given names, while fewer than 1 per million have "za."
The distribution of surnames is highly uneven, with the top five accounting for 32% of the population, while many characters are not used as surnames at all.
Click any character above to switch example
Surname characters are highlighted in color, while those used less than 50 per million are in a lighter color.
This makes identifying surname characters from Pinyin easier, but it offers little help in distinguishing individuals, making given names essential for identification. Unfortunately, due to tone-omission and transliteration, even people with different given names may end up sharing the same Pinyin name.
Go to explorer