From 'Word Unit' to 'Symbol Unit': The Debate Over the Chinese Translation of 'Token' and Its Underlying AI Cognitive Implications
Recent discussions have emerged regarding the official Chinese translation of the AI term "Token," which has been recommended as “词元” (Cíyuán, meaning "word unit") by the National Committee for Terminology in Science and Technology. While this translation is argued to align with historical usage in natural language processing (NLP) and is considered concise and communicable, this article presents a critical counterview advocating for “符元” (Fúyuán, meaning "symbol unit") as a more structurally accurate and future-proof alternative.
The author argues that defining Token based on its origin in NLP—as a linguistic semantic unit—overlooks its evolution into a general-purpose, discrete symbolic unit used across multimodal systems (text, image, audio, etc.). Using “词元” ties the concept too narrowly to language, causing cognitive misalignment and semantic drift when applied in non-linguistic contexts. By contrast, “符元” reflects Token’s fundamental role as a symbol in information theory and computation, independent of modality.
The article further critiques the reliance on metaphorical extensions (e.g., comparing image patches to “words”) as insufficient for rigorous terminology. It highlights risks including confusion with existing linguistic terms like Lemma (also translated as “词元”), poor cross-lingual reversibility (e.g., difficult back-translation to English), and systemic misunderstanding among non-expert audiences.
In conclusion, the author emphasizes that terminology should align with computational essence—not historical usage or explanatory convenience—to ensure conceptual clarity and scalability in AI’s multidisciplinary future. “符元” is proposed as a more neutral, stable, and structurally coherent translation for Token.
marsbit04/10 10:43