You may come across Chinese names in various contexts. However, these names have been transliterated to Hanyu Pinyin, which represents Mandarin pronunciation, rather than their original characters. This often results in ambiguity, misreadings, and erasure. Even native speakers may struggle to recognize the names behind the transliterations.
When studying author relations or gender inequality in academia, challenges related to foreign names, such as Chinese names, are often a barrier to inclusion in research. In this study, we extracted 65,244 publications in physics from the China National Knowledge Infrastructure (CNKI), which is the primary platform for Chinese academic publications. From these publications, we identified 81,177 unique scholar names. Using this data, we conducted research on name disambiguation and gender inference for Chinese scholar names.
Our name disambiguation workflow and outcomes: Starting with 35,873 Chinese names that each have at least two papers, we identified that 22,293 of them refer to the same individuals.
Pre-print. Mingrong She, Liuhuaying Yang, Ana Maria Jaramillo, Lisette Espin-Noboa. Bridging the Language Gap in Scholarly Data: Enhancing Name disambiguation and Gender Inference Algorithms for Chinese Names. ICSSI, 2025