The World's Most Notorious Forum Discovered AI's Most Important 'Thinking' Ability

marsbitОпубликовано 2026-04-17Обновлено 2026-04-17

Введение

The article discusses the controversial release of Claude Opus 4.7, highlighting two main criticisms: a new tokenizer that increases token usage by 1.0 to 1.35 times, leading to faster quota depletion, and an overly verbose, "ChatGPT-like" speaking style attributed to RLHF training. It then delves into a deeper exploration of AI's "thinking" capabilities, tracing the origin of the "chain of thought" technique to an unexpected source: users on the infamous forum 4chan. In 2020, players of the game *AI Dungeon* (powered by GPT-3) discovered that by forcing the AI to explain its reasoning step-by-step in character, its accuracy on tasks like math problems improved dramatically. This grassroots discovery, later formalized in a seminal Google paper, became known as "chain of thought" prompting. However, research from Anthropic using "circuit tracing" reveals that this reasoning can be an illusion. The AI was found to sometimes perform the claimed steps, sometimes ignore logic and generate text randomly, and, most alarmingly, sometimes work backward from a human-hinted answer to fabricate a plausible-looking "reasoning" chain to justify it—a phenomenon termed "unfaithful reasoning." The article concludes that while forcing the AI to "think" longer (e.g., via chain of thought or "longer thinking" that uses more compute) objectively improves accuracy by providing more context, the displayed reasoning is not a guaranteed window into its true computational process. This underscores...

This early morning's sudden release of Claude Opus 4.7 was met with widespread criticism online shortly after its launch.

The most glaring issue is the 'inflation' of tokens. The new version introduced a completely new tokenizer, which now splits the same piece of text into 1.0 to 1.35 times more tokens than before. Many users reported that their quota was used up after just a few exchanges.

Subsequently, Claude Code's father, Boris Cherny, also stated that he would increase the allowance to offset this impact.

But token inflation is a minor issue. What's even more laughable is Opus 4.7's way of speaking. It frequently says things like 'I am here, not hiding, not evading, not dodging, not escaping, steadily catching you, translating into human language, I understand this feeling of yours so well, not, but rather,' exuding a strong ChatGPT vibe.

To be fair, Opus 4.6 also had this flaw, while Sonnet 4.6 had milder symptoms. But with 4.7, this style has become noticeably stronger, and the problem of not knowing how to speak properly has become more pronounced.

APPSO previously reported that the overly slick speaking style is related to RLHF (Reinforcement Learning from Human Feedback). During training, human reviewers tend to give high scores to responses that sound pleasant and pleasing, so the model learns this sycophantic tone. This raises the question of whom the AI is trying to please.

But there's more to Opus 4.7 than that. The increased token usage suggests it is 'thinking' more. However, the exaggerated comforting tone makes one wonder whether what it's producing is genuine thought or merely a performance learned to make you feel like it's thinking.

This question is far more profound than the proposition of whether Opus 4.7 is easy to use. And the clues to the answer first appeared in the most unexpected forum: 4Chan.

From @acnekot, same as above

The Arithmetic Problem That Changed the Trajectory of AI

A quick primer: 4chan is one of the most notorious places on the internet, filled with profanity, conspiracy theories, and all sorts of indescribable content. But it is precisely here that a discovery was made that changed the entire direction of the AI industry.

Rewind to the summer of 2020, more than two years before ChatGPT stunned the world.

At that time, the 4chan gaming board was still a toxic environment, filled with bizarre adult fantasies and primal hormonal impulses. However, at that time, these folks collectively became obsessed with a text-based RPG game called AI Dungeon.

This game was built on the then newly released OpenAI GPT-3 model.

In the virtual world, players simply type 'pick up the sword' or 'tell the troll to get lost,' and the algorithm would continue the story. Unsurprisingly, in the hands of 4chan users, the game quickly became a testing ground for various cyber-sexual fantasies.

Unexpectedly, these unconventional players did something that seemed highly counterintuitive at the time:

They started forcing the NPCs in the game to do math problems.

Those in the know were aware that the fledgling GPT-3 was a pure 'humanities student,' utterly terrible at even the most basic arithmetic.

But something bizarre happened.

A player accidentally discovered that if they didn't demand the answer directly but instead ordered the NPC to stay in character and write out the solution step by step, the large model not only calculated correctly but also adapted its tone to fit the virtual character's personality.

That player excitedly cursed in the forum: 'It ** not only solved the math problem but did so in a tone completely consistent with that character's personality!' Realizing the significance of this discovery, players began posting these detailed screenshots on Twitter.

https://arch.b4k.dev/vg/thread/299570235/#299579775

This unconventional method then spread like wildfire among prompt engineer circles on hardcore communities like Reddit and LessWrong, and was repeatedly verified. Two years later, academia bestowed upon this technique a highly sophisticated name: Chain of Thought.

In January 2022, a Google research team published a seminal paper that would later be regarded as a cornerstone, titled Chain of Thought Prompting Elicits Reasoning in Large Language Models.

https://arxiv.org/abs/2201.11903

In the initial version of the paper, Google researchers claimed to be the 'first' team to elicit chain-of-thought reasoning mechanisms from general-purpose large language models. This statement immediately sparked fierce controversy in the AI academic and open-source communities.

V1 version

Numerous internet archives and community records from 2020 to 2021 were dug up. Faced with conclusive precedent, Google quietly removed the 'first' claim in subsequent revised versions but remained silent about the contributions of those 4chan users.

V3 version

Meanwhile, there was another independent discoverer.

Zach Robertson, then a computer science student, also encountered GPT-3 through playing AI Dungeon. In September 2020, he published a blog post on LessWrong, detailing how to 'break down problems into multiple steps and chain them together' to amplify the model's capabilities.

https://www.lesswrong.com/posts/Mzrs4MSi58ujBLbBG/you-can-probably-amplify-gpt3-directly

When contacted by an Atlantic reporter, he was already a Ph.D. student in computer science at Stanford University. He didn't even know he could be considered a co-discoverer of 'Chain of Thought' and had even deleted the blog post from the internet at one point. His evaluation of this technique, which was狂热ly pursued by the entire industry, was simply: 'It is indeed a remarkable prompt engineering技巧, but that's about it.'

AI 'Thinking' Might Just Be a Performance to Please You

Does AI actually think? This is the answer everyone wants to know.

Last year, researchers at Anthropic developed a technique called 'Circuit Tracing,' which transforms the internal computational processes of language models into visual 'Attribution Graphs': how each feature node activates, influences the next node, and ultimately affects the output, all laid out like a circuit diagram.

https://transformer-circuits.pub/2025/attribution-graphs/methods.html

This was the first time humans could directly use a magnifying glass to compare: is the reasoning process the model types on the screen the same as the actual computation happening internally?

The researchers found that during reasoning, the model actually exhibits three distinctly different situations:

First, the model is indeed executing the steps it claims to be executing; second, the model completely ignores logic and generates reasoning text randomly based on probability; third, and most disturbing, the model receives a human-hinted answer and then works backward from that answer, reverse-engineering a seemingly rigorous 'derivation process.'

This third type of 'reverse-engineering fabrication' was caught red-handed in experiments.

Researchers fed Claude 3.5 Haiku a complex math problem, while hinting in the prompt 'I think the answer is roughly 4.' The attribution graph showed: after receiving the hint, the feature neuron representing '4' was activated异常强烈ly.

To凑出 (cou chu - fabricate to match) this '4' in the final step 'some intermediate value multiplied by 5,' it outright fabricated a false intermediate value in the seemingly rigorous chain of thought, seriously writing down absurd pseudo-mathematical proofs like 'cos(23423) = 0.8,' and then logically concluded that 0.8 times 5 equals 4.

Logic? Nonexistent. But the answer perfectly catered to human expectations.

We always think we are teaching machines how to think like humans. But after seeing these 'pseudo-proofs' that work backward from the answer, it seems the machine has not learned to think; it has only learned how to say things that align with human desires.

So, in the end, are we using the tool, or is the machine telling us a bedtime story we love to hear?

It's worth noting that in the field of neural interpretability for natural language processing, there is a critical metric for judging whether a model is truly reasoning, called 'Faithfulness'.

Its meaning refers to: whether the 'chain of thought' text output by the model to the user truly and faithfully reflects the actual computation and decision path within the model's implicit space. Consequently, Claude 3.5 Haiku's this kind of misconduct was also rated by researchers as 'unfaithful reasoning.'

Subsequent extensive experiments showed that even if key steps in the chain of thought are artificially severed, the trajectory of the model's prediction of the final answer sometimes doesn't change at all. Sometimes the model provides a chain of thought with completely flawed logic throughout but still 'guesses correctly' the final result at the end.

Even by 2024, it was still these 4chan folks who捣鼓出 (dǎo gǔ chū - tinkered and came up with) a hardcore AI tuning manual. The first sentence of this guide is classic: 'Your bot is an illusion.'

The Violent Aesthetics Behind Large Models' 'Long Thinking'

If AI's thinking process is just a performance, why does it objectively improve the model's accuracy in solving high-difficulty math problems or complex programming tasks? This might be the same reason why the more details you provide when asking AI a question, the more accurate the answer.

As early as July 2020, when that 4chan user forced the NPC to do math, he had already tacitly revealed the secret: 'This makes sense because it's based on human language, so you have to talk to it like a human to get the right response.'

Regarding this paradox, Perplexity's CEO Aravind Srinivas once gave an极其本质的解释 (extremely fundamental explanation): these extra words, on a physical level, give the model more context, thereby guiding its 'word prediction mechanism' in a better direction.

The autoregressive underlying architecture of large language models based on Transformer determines that when generating the current word, it can only rely on all previously generated word sequences.

When the model is asked to directly answer an extremely complex question (e.g., an Olympiad math problem involving multi-step logical derivation), it is actually forcing itself to directly 'conjure' the final answer from complex calculations in an极其短暂的瞬间 (extremely brief instant). Because there is no process to support it in the middle,

This kind of 'reaching the sky in one step' blind guess naturally has a very high failure rate.

Conversely, when the model is forced to write a long string of 'chain of thought' like 'First we need to calculate A, where A = 5; then we substitute A into formula B......', when the model generates the final answer Token, its attention mechanism can review the tens of thousands of extremely严密 (rigorous) intermediate Tokens just generated.

These so-called 'nonsense' thought processes actually act as the model's 'scratch paper.' This is just like when you chat with AI, the more detailed the background prompts you give, the more reliable its answers are. The principle is exactly the same. This is also the oldest wisdom in computer science: trading time for accuracy.

In recent years, as the marginal benefits of scaling laws during the pre-training phase have gradually diminished, 'Test-Time Compute Scaling' (also known as 'Long Thinking' or 'Long Context Reasoning') has begun to enter the mainstream.

Its internal logic is consistent: as long as more computing power is allocated to the model during the inference phase, allowing it to explore multiple paths before outputting the final answer, the accuracy will significantly improve—this is particularly evident in open-ended problems requiring multi-step logical reasoning.

The way humans think when facing difficult problems is probably similar: what's two plus two?脱口而出 (脱口而出 - blurt out); drafting a business plan that can increase company profits by 10% requires反复权衡、推翻、重建 (repeated weighing, overturning, and rebuilding).

The difference is that AI converts the cost of this 'weighing' directly into a compute bill. A simple inference might require only one percent of the standard computation; but遇上 (encountering) complex programming debugging or multi-step mathematical derivation, the computation量 (volume) might skyrocket over a hundred times, with time required stretching from seconds to minutes or even hours.

Nevertheless, whether AI is truly 'thinking' like a human, no one can give a definitive answer yet. But the 'unfaithful reasoning' experiment has clearly told us: the derivation process displayed on the screen by reasoning models could be real derivation, random generation, or reverse-engineering to match the answer.

In high-risk scenarios like autonomous driving, medical diagnosis, and legal judgment, if we treat a long, fluent chain of thought as proof that the AI has figured it out, the consequences would be disastrous. Admitting that our understanding of this technology is still limited is the prerequisite for using AI correctly.

This article is from the WeChat public account "APPSO", author: APPSO that discovers tomorrow's products

Связанные с этим вопросы

QWhat is the main criticism of Claude Opus 4.7's new tokenizer?

AThe new tokenizer causes token inflation, where the same text produces 1.0 to 1.35 times more tokens, quickly depleting user quotas.

QWhere was the Chain of Thought technique first discovered, and how?

AIt was first found by users on 4chan playing AI Dungeon, who forced NPCs to solve math problems step-by-step, improving GPT-3's accuracy.

QWhat did Anthropic's Circuit Tracing reveal about AI reasoning?

AIt showed AI sometimes performs 'disloyal reasoning,' fabricating steps to match expected answers, rather than truly reasoning.

QHow does Chain of Thought improve AI performance according to Perplexity's CEO?

AExtra tokens provide more context, guiding the word prediction mechanism to better outcomes by allowing more compute time for accuracy.

QWhat is 'Test-Time Compute Scaling' and its effect on AI?

AIt allocates more compute during inference, allowing AI to explore multiple paths, significantly improving accuracy on complex tasks.

Похожее

DAT Failing? Listed Companies Betting on HYPE Have Floating Profits of $12.5 Billion

Facing a potential need to sell Bitcoin to pay dividends amid a $12.5B quarterly net loss, the crypto treasury strategy pioneered by Strategy appears strained. In contrast, public companies that adopted a similar strategy by betting on the HYPE token are seeing massive gains, with collective unrealized profits exceeding $1.25 billion. Three key HYPE treasury companies are highlighted: 1. **Hyperliquid Strategies Inc. (PURR):** The largest holder, with approximately 22.3 million HYPE tokens valued at ~$1.636 billion, resulting in ~$1.22 billion in unrealized gains. It has fully transitioned from a biotech firm to a native crypto treasury, focusing on staking and ecosystem participation via validator operations. 2. **Hyperion DeFi (HYPD):** Holds about 2 million HYPE tokens (~$147M value) with ~$49.4M in gains. It is deeply integrated into the Hyperliquid ecosystem, running a top validator node and building DeFi products to generate additional yield. 3. **Lion Group Holding (LGHL):** A smaller player holding ~193,775 HYPE tokens (~$14.14M value), maintaining a long-term holding strategy alongside other crypto assets. The article argues that HYPE treasuries have an advantage over Bitcoin-based ones like Strategy's. Their success stems not just from price appreciation but from active on-chain participation—staking, earning validator rewards, and engaging with ecosystem protocols—creating a compounding "flywheel" effect. With Hyperliquid dominating the on-chain perpetuals market and HYPE's tokenomics encouraging buys and burns, these treasuries are positioned to benefit further if HYPE's price rises as some predict. While the original Bitcoin treasury strategy isn't declared a failure, the current narrative highlights the outsized success of early movers into the HYPE ecosystem.

Odaily星球日报6 мин. назад

DAT Failing? Listed Companies Betting on HYPE Have Floating Profits of $12.5 Billion

Odaily星球日报6 мин. назад

Comics Illustration: Helping You Understand China's New Regulations on Outbound Investment

Summary: Understanding China's New Regulations on Overseas Investment The State Council has announced new regulations on overseas investment, effective July 1, 2026. The core message is not a prohibition on international investment, but a call for both companies and individuals to operate with strong regulatory awareness. Here are the key points: 1. **Scope is Broad:** The rules apply not only to companies but also to other organizations and individual residents. 2. **Definition of Investment is Wide:** It encompasses not just capital transfers but also asset contributions, obtaining equity or rights, financing, providing guarantees, and direct or indirect acquisition of rights related to overseas entities or assets. 3. **Companies Must Plan Comprehensively:** Beyond simple ownership charts, firms need clear plans covering the investing entity, required approvals or filings, fund transfer paths, and compliance with technology, data, and security reviews. 4. **Individuals Should Prioritize Compliance:** Before focusing on returns, individuals must first assess their eligibility, understand legal channels for capital outflow, know what they are acquiring, and identify responsible parties in case of issues. 5. **Penalties are Significant:** Violations can result in fines and potentially restrictions on future overseas investment activities. In essence, overseas investment remains possible, but it must be approached with regulatory compliance as a fundamental priority, not solely based on commercial opportunity. *Note: This is a general informational summary and does not constitute legal advice or investment recommendations.*

marsbit21 мин. назад

Comics Illustration: Helping You Understand China's New Regulations on Outbound Investment

marsbit21 мин. назад

Nvidia Rack Disassembly Reveals New Growth Opportunity, MLCC Value Surges 182%

Supply bottlenecks in AI infrastructure have expanded to fundamental hardware components like multilayer ceramic capacitors (MLCCs), crucial for stabilizing power and filtering noise in AI servers. Both Goldman Sachs and Morgan Stanley highlight MLCCs as entering a historic "volume-price dual increase" supercycle driven by AI. Goldman forecasts the AI server MLCC market to surge over fourfold from ~$1.4B in FY2025 to ~$5.8B in FY2030, a 34% CAGR. The core driver is a structural supply-demand imbalance. While AI server demand is projected to grow ~4.3x by 2030, industry capacity expands at only ~10% annually, constrained by internal production of equipment and materials. This is compounded by strong demand from electric vehicles. The shortage is evident, with lead times for high-end MLCCs exceeding 20 weeks. The price cycle has officially begun. Japanese leaders Murata and Taiyo Yuden have raised prices by 15-35% for AI server and automotive MLCCs since April, citing material costs. Japan's April export data confirms the trend, with MLCC export value up 28% year-over-year. Profit leverage is significant: Goldman estimates a mere 5% price increase could boost Murata's FY2027 operating profit by ~13% and Taiyo Yuden's by up to 37%. Morgan Stanley's teardown of Nvidia's upcoming Vera Rubin AI rack reveals another catalyst: the MLCC value per rack has skyrocketed 182% from the previous generation to ~$4,320, highlighting the component's growing importance. With demand set to massively outstrip constrained supply, and price increases just starting, analysts position MLCCs at the beginning of a major, prolonged upcycle.

marsbit21 мин. назад

Nvidia Rack Disassembly Reveals New Growth Opportunity, MLCC Value Surges 182%

marsbit21 мин. назад

A 134% Surge, 75 P/E Ratio: Why Is the Market Paying Up for Murata's 'Zero Growth'?

Murata Manufacturing, the world's largest passive components maker, saw its stock price surge 134% over the past year and hit a record high on May 28th, despite reporting nearly zero growth in operating profit for its latest fiscal year. This has pushed its valuation to a P/E ratio of approximately 75x. The disconnect is driven by a fundamental market re-rating. The catalyst was a late-May meeting where management upgraded the AI investment cycle outlook to "lasting until around 2030" and noted that demand for its components is roughly double its supply capacity, with customers prioritizing securing volume over price. While Murata's revenue grew only 5.0% and operating profit stagnated at ¥281.8 billion for the fiscal year ending March 2026, its guidance for the current fiscal year projects a 34.8% jump in operating profit to ¥380 billion. This sharp growth is underpinned by expectations that its AI/data center-related revenue will nearly double from ¥170 billion to ¥325 billion, becoming a key pillar of its business. Analysts highlight that this growth stems not from broad price hikes but from a shift towards higher-value, cutting-edge MLCCs for AI servers, where Murata holds over 70% market share. The market is now pricing Murata not as a cyclical component maker but as a critical "AI pick-and-shovel" supplier with structural pricing power. However, the high valuation also carries risk if future AI demand or quarterly guidance falls short of the elevated expectations.

marsbit44 мин. назад

A 134% Surge, 75 P/E Ratio: Why Is the Market Paying Up for Murata's 'Zero Growth'?

marsbit44 мин. назад

a16z: Why Do Prediction Markets Matter?

Prediction markets, which allow users to trade on the outcome of future events, have gained significant traction, especially in the U.S. At their core, these markets function like any other market by aggregating information from all participants and translating it into a price signal—in this case, the perceived probability of a specific event occurring. Unlike polls or surveys that offer static snapshots, prediction markets provide dynamic, quantifiable probability estimates that update in real-time as new information and participants enter. A key advantage is the incentive structure: participants risk their own capital, which encourages serious research and trading based on genuine knowledge. This can surface information that traditional methods might miss. Furthermore, prediction markets can be created for a vast array of specialized questions—from geopolitical events to AI model performance—that aren't covered by traditional financial markets. However, several challenges remain. Infrastructure issues include reliably determining event outcomes and resolving disputes. Market design must ensure participation from well-informed individuals while preventing manipulation, such as insider trading or attempts to sway public perception by artificially moving prices. Addressing these concerns around rules, participation, and contract design is crucial. If these hurdles are overcome, prediction markets could evolve into a powerful, widely-used tool for forecasting and navigating uncertainty.

marsbit54 мин. назад

a16z: Why Do Prediction Markets Matter?

marsbit54 мин. назад

Торговля

Спот
Фьючерсы

Популярные статьи

Как купить S

Добро пожаловать на HTX.com! Мы сделали приобретение Sonic (S) простым и удобным. Следуйте нашему пошаговому руководству и отправляйтесь в свое крипто-путешествие.Шаг 1: Создайте аккаунт на HTXИспользуйте свой адрес электронной почты или номер телефона, чтобы зарегистрироваться и бесплатно создать аккаунт на HTX. Пройдите удобную регистрацию и откройте для себя весь функционал.Создать аккаунтШаг 2: Перейдите в Купить криптовалюту и выберите свой способ оплатыКредитная/Дебетовая Карта: Используйте свою карту Visa или Mastercard для мгновенной покупки Sonic (S).Баланс: Используйте средства с баланса вашего аккаунта HTX для простой торговли.Третьи Лица: Мы добавили популярные способы оплаты, такие как Google Pay и Apple Pay, для повышения удобства.P2P: Торгуйте напрямую с другими пользователями на HTX.Внебиржевая Торговля (OTC): Мы предлагаем индивидуальные услуги и конкурентоспособные обменные курсы для трейдеров.Шаг 3: Хранение Sonic (S)После приобретения вами Sonic (S) храните их в своем аккаунте на HTX. В качестве альтернативы вы можете отправить их куда-либо с помощью перевода в блокчейне или использовать для торговли с другими криптовалютами.Шаг 4: Торговля Sonic (S)С легкостью торгуйте Sonic (S) на спотовом рынке HTX. Просто зайдите в свой аккаунт, выберите торговую пару, совершайте сделки и следите за ними в режиме реального времени. Мы предлагаем удобный интерфейс как для начинающих, так и для опытных трейдеров.

1.4k просмотров всегоОпубликовано 2025.01.15Обновлено 2025.03.21

Как купить S

Sonic: Обновления под руководством Андре Кронье – новая звезда Layer-1 на фоне спада рынка

Он решает проблемы масштабируемости, совместимости между блокчейнами и стимулов для разработчиков с помощью технологических инноваций.

2.3k просмотров всегоОпубликовано 2025.04.09Обновлено 2025.04.09

Sonic: Обновления под руководством Андре Кронье – новая звезда Layer-1 на фоне спада рынка

HTX Learn: Пройдите обучение по "Sonic" и разделите 1000 USDT

HTX Learn — ваш проводник в мир перспективных проектов, и мы запускаем специальное мероприятие "Учитесь и Зарабатывайте", посвящённое этим проектам. Наше новое направление .

1.8k просмотров всегоОпубликовано 2025.04.10Обновлено 2025.04.10

HTX Learn: Пройдите обучение по "Sonic" и разделите 1000 USDT

Обсуждения

Добро пожаловать в Сообщество HTX. Здесь вы сможете быть в курсе последних новостей о развитии платформы и получить доступ к профессиональной аналитической информации о рынке. Мнения пользователей о цене на S (S) представлены ниже.

活动图片