From 'Word Unit' to 'Symbol Unit': The Debate Over the Chinese Translation of 'Token' and Its Underlying AI Cognitive Implications

marsbitPublicado a 2026-04-10Actualizado a 2026-04-10

Resumen

Recent discussions have emerged regarding the official Chinese translation of the AI term "Token," which has been recommended as “词元” (Cíyuán, meaning "word unit") by the National Committee for Terminology in Science and Technology. While this translation is argued to align with historical usage in natural language processing (NLP) and is considered concise and communicable, this article presents a critical counterview advocating for “符元” (Fúyuán, meaning "symbol unit") as a more structurally accurate and future-proof alternative. The author argues that defining Token based on its origin in NLP—as a linguistic semantic unit—overlooks its evolution into a general-purpose, discrete symbolic unit used across multimodal systems (text, image, audio, etc.). Using “词元” ties the concept too narrowly to language, causing cognitive misalignment and semantic drift when applied in non-linguistic contexts. By contrast, “符元” reflects Token’s fundamental role as a symbol in information theory and computation, independent of modality. The article further critiques the reliance on metaphorical extensions (e.g., comparing image patches to “words”) as insufficient for rigorous terminology. It highlights risks including confusion with existing linguistic terms like Lemma (also translated as “词元”), poor cross-lingual reversibility (e.g., difficult back-translation to English), and systemic misunderstanding among non-expert audiences. In conclusion, the author emphasizes that terminology should...

Recently, the National Committee for Terminology in Science and Technology issued an announcement recommending the translation of "Token" in the field of artificial intelligence as "词元" (word unit), and it is now being trialed publicly. Subsequently, the "People's Daily" published an article titled "Experts Explain Why the Chinese Name for Token Was Determined as '词元' (Word Unit)", providing a systematic interpretation of this naming from a professional perspective.

The article mentioned that the word "token" originates from the Old English "tācen", meaning "symbol" or "sign". In language models, a token is the smallest discrete unit obtained after text segmentation or byte-level encoding, which can manifest in various forms such as words, subwords, affixes, or characters. It is through modeling sequences of tokens that models exhibit certain intelligent capabilities.

This translation is considered within the expert evaluation system to conform to the principles of univocality, scientificity, conciseness, and coordination, and it also has a certain basis for use in the current Chinese context. However, after reading the related interpretations, I have formed a different understanding regarding this naming approach.

From a standardization perspective, this naming scheme has advantages in terms of comprehensibility and dissemination in the short term. But if examined from dimensions such as computational ontology, information structure, multimodal evolution, and back-translation consistency, its long-term adaptability still requires further testing. In this context, an alternative worth considering—"符元" (symbol unit)—gradually reveals stronger structural consistency and cross-context stability.

I. Misalignment of Definition: Cannot Use "Origin" to Replace "Essence"

Article's Viewpoint (Chen Xilin, Researcher at the Institute of Computing Technology, Chinese Academy of Sciences): Token's initial role in artificial intelligence is the "basic semantic unit of language", therefore "词元" (word unit) can better fit its essence.

This judgment is reasonable in a historical context, but in the current era of major technological paradigm shifts, this thinking is essentially a form of "academic rigidity".

At the logical level of terminology definition, a strict distinction must be made between "initial application scenario" and "structural essential attributes".

Token did indeed originate in Natural Language Processing (NLP), but in the evolutionary path of AGI, it has long broken through the boundaries of language models and evolved into a fundamental unit for uniformly processing text, images, speech, and even physical signals. In modern computational systems, the true structural ontology of Token is a "discrete symbolic unit", not a linguistic unit of a single modality.

If named according to its "initial role", computers (Computer) should still be called "electronic calculators" (derived from their initial function of replacing human computers); the Internet (Internet) should be called the "Cold War Military Network". The fatal flaw in this naming logic is that it only sees the "temporary role" of technology at a specific historical moment, but ignores its "physical ontology" that transcends eras.

Historical path cannot be equated with essential attributes. Similarly, we cannot permanently lock Token into the narrow context of "word" just because it was initially used to process text.

Using the "initial application scenario" to define a fundamental concept essentially substitutes historical path dependency for the ontological truth of the structure. This type of naming might provide convenience for understanding in the early stages of technology, but in the phase of paradigm expansion with multimodal explosion, it quickly becomes obsolete and turns into a shackle hindering cognition. In contrast, "符元" (symbol unit) directly aligns with the symbolic ontology of cross-modal computation; it defines not Token's "past", but Token's "truth".

II. The Boundary of Analogy: When Explanation Becomes Definition, It Begins to Deviate

Article's Viewpoint (Dong Yuxiao, Associate Professor, Department of Computer Science, Tsinghua University): Through analogies like "word cloud" and "bag of words", the discrete units in multimodal contexts can be understood as "generalized words".

Professor Dong Yuxiao's analogy is helpful for understanding, but should not replace the definition. This line of thinking has some explanatory value, but if further elevated to the basis for naming, it may cause categorical misplacement at the conceptual level.

Methodologically, the role of analogy is to lower the barrier to understanding, while the duty of definition is to delineate semantic boundaries. When "word" is extended to cover image patches, speech segments, vector representations (embeddings), and even broader perceptual signals, its original linguistic attributes are continuously diluted, and semantic boundaries become blurred. This expansion path driven by "analogy" can maintain explanatory consistency in the short term, but is prone to semantic drift in long-term evolution.

Regarding cross-modal expansion capability, we need to be vigilant about the slippage from "analogy" to "definition". In the context of terminology standardization, the boundary between "explanatory metaphor" and "ontological definition" must be distinguished to avoid the former substituting for the latter.

A more intuitive comparison: in popular science contexts, we can analogize a light bulb to an "artificial sun" to enhance intuitive understanding; but in the scientific naming system, it is impossible to rename the unit of electric current "Ampere" as "light unit" based on this. The former belongs to descriptive expression, while the latter involves strict measurement systems and standardized definitions; the two cannot be mixed.

Similarly, terms like "word cloud" and "bag of words" are essentially descriptive or statistical metaphors, whose function is to aid in data structure or distribution pattern understanding; whereas Token, as a fundamental计量 unit in large models, is deeply embedded in systems for computing power billing, model training, and academic measurement. When its usage scale reaches hundreds of billions to trillions of daily calls, its naming carries not only explanatory function but also a foundational concept with engineering and standard significance. At this level, terminology needs to align more with its ontological attributes rather than rely on analogical extension.

If this analogical logic is further pushed to the naming level, it actually implies a dangerous premise: since people are already accustomed to understanding Token with "word", why not continue using this analogy. But this is actually a continuation of path dependency—substituting the convenience of existing cognition for the correction of conceptual ontology. In this sense, this naming is closer to a "linguistic romanticism" rather than a strict alignment with computational ontology.

We cannot discuss "electronic horses" in electric motors just because "horsepower" contains "horse". Analogy can启发 understanding, but cannot define standards.

In contrast, "符" (symbol) as a more neutral concept naturally possesses cross-modal adaptability, capable of covering various information forms like text, images, and speech without requiring additional explanation. Therefore, the naming path centered on "symbolic unit" is closer to the structural essence of Token at the definition level. Under this logic, "符元" (symbol unit) as the corresponding translation possesses higher conceptual consistency and long-term adaptability.

III. The Cost of Cognition: When Semantic Anchors Create Systemic Misunderstanding

Article's Viewpoint (Synthesized expert opinions): "词元" (word unit) is concise, conforms to Chinese habits, and is easy to disseminate.

This judgment has a certain rationality at the dissemination level, but its implicit premise is: the public can accept the cross-modal analogy of "word". However, analogy is essentially an expert thinking tool, not a natural way of cognition for the general public. For ordinary users, "word" has a strong semantic anchoring effect—once they hear "word", their intuitive direction is inevitably the language system, not other modalities like images, sounds, or actions. This cognitive path is not a technical issue, but a stable structure at the level of cognitive psychology.

On this basis, when "word" is extended to so-called "generalized words", it actually creates bias in user cognition. Users first form an intuitive understanding of "word = language unit", not the abstract concept of "cross-modal symbolic unit". Once this misunderstanding is established, all subsequent explanations become corrections to existing cognition, rather than extensions of natural understanding.

For example, when media reports that "the model was trained using 10 trillion word units", the public can easily understand it as "read a large amount of text", ignoring the vast amounts of image, speech, and other modal data included. This misunderstanding is not an isolated case, but is systematically induced by the semantic anchoring of the term itself.

In practical engineering contexts, this naming may also cause friction in cross-disciplinary communication. When discrete units in visual models or speech models are called "words", it not only easily causes semantic misunderstandings but also creates unnecessary linguistic conflicts between different fields. Multimodal systems require unification at the "symbol level", not the expansion of linguistic categories.

In comparison, "符" (symbol) as a more abstract concept, although slightly higher in initial understanding threshold, has a more neutral semantic direction and does not pre-lock cognition at the language level. In long-term use, it is more conducive to establishing a stable, unified cognitive framework, thereby reducing overall解释 costs and providing a more stable cognitive foundation for multimodal unification.

The cost of naming does not occur at the time of definition, but at the time of correction; once early naming forms a semantic anchor, the cost of subsequent cognitive repair will increase exponentially.

Experts can expand the boundary of "word" through analogy, but the public will not understand concepts by analogy. Naming is not for serving experts, but for being responsible for the cognitive system of the entire era.

IV. The Illusion of Univocality: When One Word Attempts to Bear Two Systems

Article's Viewpoint (Principles of terminology standardization): "词元" (word unit) conforms to the principle of univocality and helps solve the problem of混乱 translations.

Regarding the univocality of terminology, special attention must be paid to the systemic risks that may arise from "one word, two meanings". In scientific terminology standardization, "univocality" is one of the fundamental principles. If a term requires context or additional explanation to distinguish its meaning, its value as a standard component is already lost.

However, judging from the existing academic system, this judgment still has room for further discussion. The term "词元" (word unit) has long been "taken" in the fields of linguistics and Natural Language Processing (NLP). In classical linguistics, its long-standing corresponding English concept is Lemma, i.e., the canonical base form of a word (for example, the lemma of is/am/are is be). This usage has formed a stable consensus in basic linguistics and NLP textbooks and academic papers.

Against this background, if Token is also translated as "词元" (word unit), it is easy to cause semantic conflicts in specific expression, leading to disastrous scenarios.

For example, when describing "lemmatize a token" in NLP, the Chinese expression would appear as "对'词元'进行'词元化'" (perform 'lemmatization' on a 'word unit'). This expression not only increases理解 costs but also introduces ambiguity in academic writing and information retrieval, making it difficult for readers to distinguish whether "词元" refers to the segmented discrete unit or the canonical base form of a word.

From a conceptual function perspective, the two also have clear distinctions: Lemma emphasizes "reduction" at the language level, corresponding to the canonical expression after word form variation; while Token emphasizes "segmentation" in the computational process, corresponding to the smallest discrete unit when the model processes information. This difference between "reduction" and "segmentation" corresponds to different dimensions of the semantic layer and the symbolic layer.

Therefore, when a term needs to be "generalized" to cover multiple existing concepts simultaneously, its univocality has actually transformed into "unification at the explanatory level", rather than "stability at the semantic level".

When a term requires explanation to maintain unity, its stability as a standard term has often already begun to动摇.

In contrast, "符元" (symbol unit) does not have semantic conflicts in the existing terminology system. On one hand, it preserves Token's ontological attribute as a discrete symbol; on the other hand, it also avoids overlap with the existing translation of Lemma, thus exhibiting higher stability in terms of semantic clarity and system consistency.

V. The Return to Ontology: Token is Essentially a "Symbol", Not a "Word"

Article's Viewpoint (General explanation): Token is the smallest unit used for processing text in language models.

This statement is valid at the functional level, but still remains at the level of "how to use", without touching its ontological attributes in computational theory. From the perspectives of information theory and computational theory, the basic objects processed by computational systems are not "words", but "symbols".

This can be further understood from two aspects:

On one hand, from an information theory perspective, the essence of information lies in eliminating uncertainty, its unit of measurement is the bit, and its carrier entity is discrete symbols. Symbols do not care about semantic content, but are only related to probability distributions and encoding structures;

On the other hand, at the implementation level of computation, the底层 of large models does not "read characters"; its processing objects are discrete index representations (IDs). Whether this ID corresponds to a Chinese character, an image patch, or an audio sample point, it participates in computation in the form of a unified symbol during the computational process.

Within this framework, it is precisely because its essence lies at the "symbol level", not the "semantic level". Symbols themselves do not carry semantics, but exist as the basic carriers of encoding and computation.

Naming Token as "词元" (word unit) introduces an implicit指向 to the linguistic semantic layer to some extent, pulling this concept originally at the symbol level back into an understanding path centered on language. This naming method may provide intuitiveness at the explanatory level, but at the theoretical level, it容易 blurs the boundary between "symbolic computation" and "semantic understanding".

In comparison, "符元" (symbol unit) remains within the symbol level conceptually. On one hand, it accurately reflects Token's computational attribute as a discrete symbol; on the other hand, it avoids introducing semantic features into the ontological definition, thus better conforming to the basic framework of information theory and computational theory.

From a broader perspective, as artificial intelligence systems continue to evolve towards multimodality and general intelligence, if the naming of basic concepts can directly align with their mathematical and computational ontology, it will be more conducive to building a stable, scalable cognitive system. In this sense, the naming path centered on "symbolic unit" is not only a language choice issue but also a consistent expression of the computational essence, and "符元" is the natural correspondence within this framework.

Defining concepts from the symbol level is an alignment with the computational essence; naming concepts from the semantic level is closer to explanation rather than definition.

VI. Linguistic Fracture: Mapping Failure in the Back-Translation Mechanism

Article's Viewpoint (Synthesized interpretation): "词元" (word unit) has gradually formed a basis for use in the Chinese academic community and possesses certain dissemination advantages.

In cross-linguistic contexts, we need to be vigilant about the systemic impact caused by术语 "back-translation fracture". Measuring whether a scientific term has long-term vitality depends not only on its ability to convey meaning in the Chinese context but also on whether it can achieve stable mapping in the international academic system. An ideal term should possess "reversibility", meaning it can achieve consistent semantic round-trips between different languages.

The above judgment reflects the acceptability of "词元" (word unit) in the local context, but from a cross-linguistic perspective, there is still room for further discussion. If a term is only established in a single language system and cannot form a stable corresponding relationship in the international context, it may introduce additional理解 costs in academic exchange.

Specifically, "词元" (word unit) lacks a clear, unique corresponding path during back-translation. When it is translated back into English, it often causes divergence among several approximate concepts: for example, "word unit" lacks a strict academic definition, "morpheme" corresponds to morpheme in linguistics, and "lexeme" points to lexeme. None of these concepts can accurately cover the meaning of Token in the computational context; instead, they introduce categorical shifts.

In contrast, "符元" (symbol unit) can more naturally correspond to "symbolic unit". This concept has a clear theoretical basis and stable usage in fields such as information theory, discrete mathematics, and multimodal representation, and can maintain consistent semantic指向 across different contexts. Therefore, it is easier to form a one-to-one mapping relationship between Chinese and English.

From a practical perspective, once terminology enters academic papers, technical documentation, and international exchange scenarios, its back-translation ability will directly affect expression efficiency and understanding accuracy. If a term requires additional explanation to complete cross-language conversion, its long-term usage cost will continue to accumulate.

Therefore, in the cross-linguistic system, the main problem faced by "词元" (word unit) lies in the instability of the mapping path, while "符元" (symbol unit) exhibits higher certainty in terms of semantic correspondence and conceptual consistency. In the context of increasingly globalized artificial intelligence, choosing terms with good back-translation characteristics will be more conducive to building an open, interoperable academic and technical system.

The international reversibility of terminology is essentially a key measure of its long-term academic vitality.

VII. The Misconception of Unification: Formal Consistency Does Not Equal Structural Consistency

Article's Viewpoint (Synthesized expert opinions): The expression style of "词元" (word unit) is consistent with terms like "embedding" and "attention", being concise, abstract, and conforming to the Chinese technical context.

Conclusion first: The unification of a terminology system should be built on "conceptual isomorphism", not "linguistic homomorphism".

In the supporting arguments for "词元" (word unit), a common reason is: its expression style is consistent with terms like "embedding" and "attention", being concise, abstract, and conforming to the Chinese technical context. This reason captures the real need for terminology systems to have uniformity, but the problem is—if unification only stays at the linguistic level and not the structural level, it will slide from "order" to "illusion".

"Embedding" and "attention" have become stable terms because they correspond to clear computational structures: the former is vector mapping, the latter is a weighting mechanism; their naming directly points to the computational essence. Whereas "词元" (word unit) belongs to explanatory naming; its rationality relies on the analogical framework of "generalized word". Once separated from explanation, this naming itself does not possess a self-consistent structural指向.

This difference brings up a key problem: formal consistency, semantic shift.

The former reduces expression cost, the latter ensures cognitive stability. If "linguistic homomorphism" is prioritized, complexity does not disappear but is transferred into a long-term cognitive burden; only naming based on "conceptual isomorphism" can remain stable across contexts and multimodal evolution.

When "embedding", "attention", and "词元" (word unit) appear side by side, it is easy to create the illusion of "conceptual same-level". But in fact, the first two are mechanisms, the latter is an object; the first two have strict definitions, the latter relies on contextual explanation. This structural misalignment will bury hidden fractures in the cognitive system.

More importantly, when the naming of a basic concept relies on analogy rather than structural definition, its impact will not remain within a single term but will扩散 to the entire terminology system. When subsequent concepts attempt to unfold around this naming, they will have to constantly maintain consistency through explanation, thus forming an implicit structural misalignment.

In this sense, "符元" (symbol unit) provides a path of expression closer to the underlying structure. It directly points to the basic object in computational systems—the symbol—and can remain consistent across different contexts without relying on analogical explanation.

Terminology is not just labels, but entrances to cognition. Good terminology makes explanations gradually disappear; poor terminology makes annotations continuously increase. When basic concepts deviate from structure, the terminology system can only be maintained by explanation, not by self-consistent definition.

Conclusion

Essentially, the choice of terminology is not just a language issue, but an early shaping of the cognitive structure of a field. Once naming deviates from its structural ontology in the initial stage, the subsequent system can only maintain operation through constant explanation, making it difficult to form a self-consistent conceptual network.

In the process of artificial intelligence moving towards generalization and multimodal integration, a terminology that can align with computational ontology and possess cross-context stability is more likely to become a long-term effective cognitive cornerstone. In this sense, the naming path centered on "symbolic unit" presents more balanced adaptability in balancing technical essence and cognitive clarity.

Preguntas relacionadas

QWhat is the main argument against translating 'Token' as '词元' (word unit) in the context of AI terminology?

AThe main argument is that '词元' anchors the concept too narrowly to language processing, whereas Token's structural essence is as a 'discrete symbolic unit' that transcends modalities like text, image, and audio. This naming, based on its initial application in NLP, fails to capture its true computational ontology and may cause long-term cognitive and cross-contextual instability.

QWhy does the article argue that the analogy of 'word' (词) is problematic for defining Token in a multimodal AI context?

AThe article argues that while analogies like 'word cloud' or 'bag of words' can aid understanding, they should not define the term. Extending 'word' to cover non-linguistic modalities (e.g., image patches) dilutes its semantic boundaries, creates cognitive bias by anchoring users to language, and risks semantic drift. Analogies are explanatory tools, not replacements for ontological definitions.

QWhat potential conflict arises from using '词元' as the translation for Token, according to the article?

AThe term '词元' already has an established meaning in linguistics and NLP, where it corresponds to 'Lemma' (the canonical form of a word, e.g., 'be' for 'is/am/are'). Using it for Token creates ambiguity, as expressions like 'lemmatize a token' would become '对词元进行词元化', leading to confusion between discrete units in computation and normalized word forms in linguistics.

QHow does the article justify '符元' (symbol unit) as a better alternative to '词元' for Token?

A'符元' is justified as it directly aligns with Token's computational ontology as a 'discrete symbolic unit', neutral across modalities (text, image, audio). It avoids semantic anchoring to language, reduces long-term cognitive correction costs, ensures better cross-lingual reversibility (e.g., to 'symbolic unit'), and maintains structural consistency with concepts like embedding and attention.

QWhat does the article suggest about the importance of 'cross-linguistic reversibility' in term selection?

AThe article emphasizes that cross-linguistic reversibility is crucial for a term's long-term viability. A good term should allow consistent, unambiguous mapping between languages. '词元' lacks this, as it back-translates poorly to English (e.g., to 'word unit', 'morpheme', or 'lexeme'), while '符元' naturally maps to 'symbolic unit', ensuring stable international academic and technical communication.

Lecturas Relacionadas

Trading

Spot
Futuros
活动图片