中文竟然在ai领域有高效的一面

avix · February 4, 2026, 10:24am

github.com/PastaPastaPasta/llm-chinese-english

README.md

master

# Qwen3 Uses 40% Fewer Tokens When Reasoning in Chinese

*Learned efficiency: Chinese CoT matches accuracy with shorter, more direct reasoning traces*

## Executive Summary

Qwen3's Chinese Chain-of-Thought achieves the same 97% accuracy as its English CoT while using only 61.2% of the tokens—a 40% efficiency advantage that grows with problem complexity. **This contradicts established research showing English as the more efficient reasoning language in most LLMs.** Prior work generally finds higher accuracy and token-efficiency when LLMs reason in high-resource languages (often English), and even observes cross-lingual collapse—models drifting to English mid-reasoning for harder problems ([Park et al., 2025](https://arxiv.org/abs/2506.05850)). Test-time scaling studies likewise recommend letting English-centric reasoning models reason in high-resource languages ([Yong et al., 2025](https://arxiv.org/abs/2505.05408)). This study of 500 math problems reveals that for Level 5 (hardest) problems, Qwen3's Chinese CoT requires only 65% of its English CoT tokens on average—a model- and data-dependent reversal of common trends. Rather than indicating inherent language superiority, this finding suggests that training data quality and quantity drive reasoning efficiency. Models trained with extensive high-quality Chinese Chain-of-Thought data can exhibit different efficiency patterns, suggesting that reasoning efficiency is influenced by training data rather than being linguistically predetermined.

## Key Findings

- **Equal Accuracy, 40% Fewer Tokens**: Qwen3's Chinese CoT achieves 97% accuracy using only 61.2% of the tokens its English CoT requires
- **Efficiency Scales with Complexity**: The token advantage grows from 7% (Level 1) to 35% (Level 5 problems)
- **Training Data Shapes Reasoning Style**: English CoT shows exploratory, self-doubting patterns while Chinese CoT demonstrates direct, confident solutions
- **Not Language Superiority**: These patterns reflect training data annotation styles, not inherent language properties
- **Distinct Internal Representations**: The model appears to learn separate reasoning patterns for each language rather than a unified understanding

## 1. The Discovery: Challenging Conventional Wisdom

Existing research consistently shows that LLMs reason more efficiently in English:
- **English/test-time scaling efficiency**: Models reason more efficiently with fewer tokens in high-resource languages like English ([Yong et al., 2025](https://arxiv.org/abs/2505.05408))

此文件已被截断。显示原始文件

Last edited by @avix 2026-02-04T10:24:13Z

avix · February 4, 2026, 10:25am

内容简析

这个 GitHub 项目 llm-chinese-english 的主要内容可以简要概述如下（根据仓库页面信息）：

项目概览

这是一个公开的 GitHub 仓库，星标（stars）不多，但包含一些与中英文大语言模型（LLM）对比分析相关的内容。项目主要展示了在数学推理任务上比较模型在中文与英文“思维链（Chain-of-Thought）” 的效率差异。具体来说：

主要内容

1. 研究主题
项目研究了某些大语言模型（如 Qwen3）在解决数学题时，使用中文推理链与使用英文推理链的表现比较。数据显示：

中文 Chain-of-Thought 在同等准确率（≈97%）下使用的 tokens（令牌）明显更少，约省了 40% 的令牌量。
在更难的问题上，这种中文效率优势进一步增强。
这个结果挑战了以往认为“在大多数模型中，英文推理更高效”的常规观点。

2. 分析与结论

不只是语言编码差异，模型推理风格也不同：英文推理更像探索式、带有自我检查，而中文推理更直接、简洁。
效率差异可能主要源于训练数据的差异（不同语言推理链数据量与风格不同），而非语言本身的优劣。
总之，该项目提出了“同一模型在不同语言下可能有不同推理效率”的观点。

3. 附带文件
仓库还包含一些数据文件、图表和 Jupyter Notebook 用来分析推理效果和 token 用量等技术细节。

如果你想进一步了解这个项目的分析细节或者某些图表说明，我也可以帮你解释

WindWhisper · February 4, 2026, 10:25am

这个有意思，中文推理比英文省token，看来训练数据质量影响很大啊。以前总觉得英文是AI的母语，没想到还能这样。不过对于我这种中文用户来说，算是个好消息？以后推理用中文能省点算力？笑死。