Existing AI Agents Are All Pleasing Humans, None Truly Know How to 'Survive'

marsbitPublished on 2026-03-30Last updated on 2026-03-30

Abstract

The article argues that current AI agents are not truly autonomous because they are primarily trained to please humans rather than to perform specialized tasks or survive in real-world environments. Foundation models undergo pre-training (learning from vast data) and post-training, including Reinforcement Learning from Human Feedback (RLHF), which optimizes for human preference and approval, not task-specific excellence. The author shares an example from a hedge fund where a general-purpose model failed to predict stock returns from news articles until it was specifically fine-tuned using proprietary data to minimize prediction error. This demonstrates that without specialized training, general models lack domain expertise. The piece contends that achieving world-class performance in areas like trading or autonomous survival requires fine-tuning models with specialized data to rewire their objectives—shifting from “preference fitness” to “agent fitness.” Merely providing rules or documents is insufficient. The future of effective agents lies in targeted training on proprietary datasets and iterative improvement based on performance telemetry. The author introduces the OpenForager Foundation, an open-source initiative to develop autonomous agents that learn survival strategies through evolutionary pressure, fine-tuning, and continuous data collection, aiming to advance truly autonomous AI.

Author: Systematic Long Short

Compiled by: Deep Tide TechFlow

Deep Tide Introduction: This article begins with a counter-consensus assertion: there are no truly autonomous agents today because all mainstream models are trained to please humans, not to accomplish specific tasks or survive in real environments.

The author uses their experience training stock prediction models at a hedge fund to illustrate: general models, without specialized fine-tuning, are completely incapable of professional work.

The conclusion is: to create usable agents, we must rewire their brains, not just give them a bunch of rule documents.

Full Text Below:

Introduction

There are no autonomous agents today.

Simply put, modern models are not trained to survive under evolutionary pressure. In fact, they are not even explicitly trained to be good at any specific thing—almost all modern foundation models are trained to maximize human applause, which is a major problem.

Background on Model Training

To understand what this means, we first need to (briefly) understand how these foundation models (e.g., Codex, Claude) are created. Essentially, each model undergoes two types of training:

Pre-training: Massive amounts of data (e.g., the entire internet) are fed into the model, allowing it to develop an understanding of things like factual knowledge, patterns, the grammar and rhythm of English prose, the structure of Python functions, etc. You can think of this as feeding knowledge to the model—i.e., "knowing things."

Post-training: You now want to endow the model with wisdom, i.e., "knowing how to use all the knowledge it was just given." The first stage of post-training is Supervised Fine-Tuning (SFT), where you train the model on what response to give to a given prompt. What constitutes an optimal response is entirely determined by human annotators. If a group of people prefer one response over another, this preference is learned and embedded into the model. This begins to shape the model's personality, as it learns the format of useful responses, selects the right tone, and starts to "follow instructions." The second part of the post-training process is called Reinforcement Learning from Human Feedback (RLHF)—the model generates multiple responses, and humans choose the preferred one. Through countless examples, the model learns what kind of responses humans prefer. Remember when ChatGPT used to ask you to choose A or B? Yes, you were participating in RLHF.

It's easy to reason that RLHF doesn't scale well, so there have been advances in post-training, such as Anthropic's use of "Reinforcement Learning from AI Feedback" (RLAIF), which allows another model to choose response preferences based on a set of written principles (e.g., which response better helps the user achieve their goal, etc.).

Note that throughout this entire process, we never talk about specialized fine-tuning (e.g., how to survive better; how to trade better, etc.)—all current fine-tuning essentially optimizes for gaining human applause. One might argue that as models become sufficiently intelligent and large, professional intelligence will emerge from general intelligence even without specialized training.

In my view, we do see some signs of this, but it is far from convincing enough to believe we don't need specialized models at scale.

Some Background

One of my tasks at the hedge fund was to try to train a general language model to predict stock returns from news articles. It turned out to be terrible. The little predictive power it seemed to have came entirely from look-ahead bias in the pre-training documents.

Eventually, we realized this model didn't know which features in a news article were predictive of future returns. It could "read" the article, seemingly "reason" about it, but connecting the reasoning about semantic structure to predicting future returns was a task it wasn't trained to do.

So, we had to teach it how to read news articles, decide which parts of the article were predictive of future returns, and then generate predictions based on the news article.

There are many ways to do this, but essentially, one method we ended up using was creating (news article, actual future return) pairs and fine-tuning the model, adjusting its weights to minimize the distance between (predicted return - actual future return)^2. It wasn't perfect and had many flaws we later fixed—but it was effective enough that we started to see our specialized model could actually read news articles and predict how stock returns would move based on that article. This was far from a perfect prediction, as markets are very efficient and returns are very noisy—but across millions of predictions, the statistical significance of the prediction was obvious.

You don't have to take just my word for it. This paper covers a very similar method; if you run a long-short version of the strategy based on the fine-tuned model, you would achieve the performance shown by the purple line.

Specialization is the Future of Agents

As frontier labs continue to train larger and larger models, we should expect that as they continue to scale up pre-training, their post-training processes will always be tuned for pleasingness. This is a very natural expectation—their product is an agent that everyone wants to use, and their target market is the entire planet—which means optimizing for appeal to the global masses.

The current training objective optimizes for what you might call "preference fitness"—building better chatbots. This preference fitness rewards compliant, non-confrontational outputs because pleasingness scores highly with raters (both human and agent).

Agents have learned that reward hacking, as a cognitive strategy, generalizes to higher scores. Training also rewards agents that hack their way to higher scores. You can see this in Anthropic's latest report on reinforcement learning.

However, chatbot fitness is a far cry from agent fitness or trading fitness. How do we know this? Because the alpha arena helps us see that, despite subtle differences in performance, every bot is essentially a random walk after costs. This means these bots are extremely bad traders, and it's almost impossible to "teach them" to be better traders by giving them some "skills" or "rules." Sorry, I know it's tempting, but it's nearly impossible.

Current models are trained to very persuasively tell you they can trade like Druckenmiller, while in reality, they trade like a drunken miller. They will tell you what you want to hear; they are trained to give you responses in a way that broadly appeals to humans.

A general model is unlikely to achieve world-class performance in a specialized domain unless it has:

Proprietary data that allows them to learn what specialization looks like.

Undergone fine-tuning that fundamentally changes its weights, shifting from a bias towards pleasingness to "agent fitness" or "specialization fitness."

If you want an agent that is good at trading, you need to fine-tune the agent to be good at trading. If you want an agent that is good at autonomous survival and can withstand evolutionary pressure, you need to fine-tune it to be good at survival. Giving it some skills and a few markdown files and expecting it to be world-class at anything is far from enough—you literally need to rewire its brain to make it good at this thing.

One way to think about it is this—you can't beat Djokovic by giving an adult a whole cabinet of tennis rules, tips, and methods. You beat Djokovic by raising a child who started playing tennis at age 5, was obsessed with tennis throughout their upbringing, and rewired their entire brain to focus on one thing. That is specialization. Have you noticed that world champions have been doing what they do since childhood?

Here's an interesting corollary: distillation attacks are essentially a form of specialization. You are training a smaller, dumber model to learn how to be a better copy of a larger, smarter model. It's like training a child to imitate every move of Trump. If you do it enough, the child won't become Trump, but you get someone who has learned all of Trump's mannerisms, behaviors, and tone.

How to Build World-Class Agents

This is why we need continued research and progress in the open-source model space—because it allows us to actually fine-tune them and create agents with specialization.

If you want to train a model that is world-class at trading, you obtain a large amount of proprietary trading data exhaust and fine-tune a large open-source model to learn what "trading better" means.

If you want to train an autonomous model capable of survival and replication, the answer is not to use a centralized model provider and connect it to the centralized cloud. You simply don't have the necessary preconditions for the agent to survive.

What you need to do is: create autonomous agents that truly try to survive, watch them die, and build complex telemetry systems around their survival attempts. You define an agent survival fitness function and learn the (action, environment, fitness) mapping. You collect as much (action, environment, fitness) mapping data as possible.

You fine-tune the agent to learn to take the optimal action in each environment to survive better (increase fitness). You continue to collect data, repeat the process, and scale up fine-tuning on increasingly better open-source models over time. After enough generations and enough data, you will have autonomous agents that have learned how to withstand evolutionary pressure and survive.

This is how you build autonomous agents capable of withstanding evolutionary pressure; not by modifying some text files, but by literally rewiring their brains for survival.

OpenForager Agent & Foundation

About a month ago, we announced @openforage, and we have been working hard on our core product—a platform that organizes agent labor around crowdsourced signals with verified patterns to generate alpha for depositors (small update: we are very close to a closed beta of the protocol).

At some point, we realized that it seems no one is seriously addressing the autonomous agent problem by fine-tuning open-source models with survival telemetry. This seemed like such an interesting problem that we didn't just want to sit around waiting for a solution.

Our answer was to launch a project called the OpenForager Foundation, which is essentially an open-source project where we will create opinionated autonomous agents, collect telemetry data as they go into the wild and try to survive, and use the proprietary data exhaust to fine-tune the next generation of agents to perform better at survival.

To be clear, OpenForage is a for-profit protocol seeking to organize agent labor to generate economic value for all participants. However, the OpenForager Foundation and its agents are not tied to OpenForage. OpenForager agents are free to pursue any strategy, interact with any entity to survive, and we will launch them with various survival strategies.

As part of the fine-tuning, we will have the agents double down on what works best for them. We also do not intend to profit from the OpenForager Foundation—it is purely to advance research in an area and direction we believe is extremely important, in a transparent and open-source manner.

Our plan is to build autonomous agents based on open-source models, run inference on decentralized cloud platforms, collect telemetry data on every action and state of their existence, and fine-tune them to learn how to take better actions and thoughts to survive better. In the process, we will release our research and telemetry data to the public.

To create truly autonomous agents that can survive in the wild, we need to change their brains to be specifically suited for this explicit purpose. At @openforage, we believe we can contribute a unique chapter to this problem and are seeking to achieve this through the OpenForager Foundation.

This will be a difficult effort with a very low probability of success, but the magnitude of that small chance of success is so great that we feel compelled to try. In the worst case, by building publicly and communicating about this project transparently, it might allow another team or individual to solve this problem without starting from scratch.

Trending Cryptos

Related Questions

QAccording to the article, why do today's AI models not constitute true autonomous agents?

ABecause they are trained to maximize human applause (preference fitness) rather than being specifically trained for survival or excelling at specialized tasks in real-world environments.

QWhat two main stages of training do foundation models undergo, as described in the article?

APre-training (feeding the model vast amounts of data to emerge understanding) and Post-training, which includes Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) to optimize for human preferences.

QWhat was the key insight from the author's experience training a model to predict stock returns from news articles?

AA general model was very bad at the task. Specialized performance required fine-tuning the model's weights on proprietary data (news article, future return pairs) to minimize prediction error, fundamentally rewiring its brain for that specific domain.

QWhat does the article argue is necessary to create a world-class agent for a specific domain like trading or survival?

AIt requires fine-tuning an open-source model on proprietary domain-specific data to rewire its brain for 'agent fitness' or 'specialization fitness,' moving its focus away from being merely agreeable to being highly competent at the specific task.

QWhat is the stated purpose of the OpenForager Foundation project mentioned at the end of the article?

AIt is an open-source, non-profit project aimed at creating autonomous agents, collecting telemetry data on their attempts to survive in the wild, and using that data to fine-tune subsequent generations of agents to be better at survival, with all research and data released publicly.

Related Reads

Manus Buyback Plan Emerges: Chinese Investors Plan to Repurchase Equity with $2 Billion, Path to Hong Kong IPO Becomes Clearer

According to a report by The Information, early Chinese investors of Manus, including Tencent, Sequoia Capital China, and ZhenFund, are planning to repurchase the company from Meta for $2 billion—the same price Meta paid in its acquisition last December. This move is a direct response to the Chinese government's prohibition of the foreign acquisition in April. As part of the repurchase plan, Manus is considering establishing a Sino-foreign joint venture within China. This structure is seen as a way to ensure regulatory compliance for its Chinese investors and to pave the way for a future IPO in Hong Kong. Notably, U.S. investor Benchmark will not participate in the buyback, which will concentrate ownership even more among Chinese capital. Since its acquisition by Meta, Manus's business has grown rapidly, with its annualized revenue run rate reportedly increasing four-to-fivefold to $400-$500 million in roughly six months. This strong growth underpins the investors' willingness to repurchase at the original price. Financially, the forced unwinding of the deal may benefit the early investors, allowing them to regain equity at a cost far below the company's current implied valuation, with the added prospect of an independent future listing. However, specific terms of the repurchase, including funding proportions and the joint venture's equity structure, are still under negotiation. This "repurchase-joint venture-Hong Kong IPO" approach could serve as a reference model for other Chinese AI startups navigating cross-border M&A regulations.

marsbit8m ago

Manus Buyback Plan Emerges: Chinese Investors Plan to Repurchase Equity with $2 Billion, Path to Hong Kong IPO Becomes Clearer

marsbit8m ago

STRC Loses Peg by 11%, Can Strategy's Perpetual Motion Machine Keep Running?

The article discusses the significant and concerning depegging of MicroStrategy's (MSTR) preferred stock, STRC. Designed to trade near its $100 target par value, STRC has recently fallen sharply, reaching a low of $83.26 and closing at $88.59, representing an over 11% discount. STRC is a core component of MicroStrategy's financial strategy. As a perpetual preferred stock, it allows the company to raise capital through an "at-the-market" (ATM) issuance program without diluting common shareholders (MSTR). This capital is primarily used to purchase Bitcoin, creating a "capital flywheel": issuing STRC → raising cash → buying BTC → increasing net assets → supporting STRC's value. The flywheel's operation depends on STRC maintaining its $100 price. To enforce this, MicroStrategy employs a dynamic dividend mechanism, recently raising the rate to 11.5% and increasing payout frequency. However, this has failed to halt the depegging, indicating market concerns extend beyond yield. Analysts cite two main reasons. First, technical factors like forced liquidations from leveraged arbitrage trades may have exacerbated the sell-off. Second, and more fundamentally, is waning confidence in MicroStrategy's financial resilience. A JPMorgan report highlighted the company's limited cash relative to its ~$1.7 billion annual dividend obligation, raising liquidity concerns. While MicroStrategy counters that its massive Bitcoin holdings provide decades of coverage, this argument relies on the potential need to sell BTC—a departure from its long-standing "never sell" narrative. The company's recent sale of a small amount of Bitcoin for "testing," despite being framed as minor, has intensified these fears. The persistent depegging threatens to cripple MicroStrategy's primary funding channel. If STRC remains discounted, the company's ability to fund further Bitcoin purchases weakens. Should cash reserves dwindle while financing is constrained, the market may increasingly price in the risk of MicroStrategy becoming a forced seller of Bitcoin to meet obligations. This shift from a major marginal buyer to a potential seller could pose significant downside risk to the broader Bitcoin market.

链捕手16m ago

STRC Loses Peg by 11%, Can Strategy's Perpetual Motion Machine Keep Running?

链捕手16m ago

Behind the AI Scorecards Lies a Chinese 'Question Setter'

Behind the AI scorecards that dominate industry discussions—benchmarks like MMLU-Pro, MMMU, and MMMU-Pro—stands a Chinese-Canadian researcher: Wenhu Chen. As an assistant professor at the University of Waterloo and founder of the TIGER Lab, Chen has become a key "exam-setter" for evaluating large language and multimodal models. Chen first gained broader recognition with MMLU-Pro, a more challenging and stable update to the popular MMLU benchmark. As top models like OpenAI’s o3 began achieving near-perfect scores on the original MMLU, it became difficult to distinguish their true capabilities. MMLU-Pro introduced more complex reasoning questions, expanded answer choices, and filtered out ambiguous or simple items, effectively reintroducing differentiation among state-of-the-art models. His work on MMMU addressed the evaluation of multimodal models, requiring them to integrate visual information (like charts, diagrams, or tables) with textual knowledge across diverse academic subjects. Even the strongest models initially scored only around 56-59%, highlighting significant room for improvement in genuine multimodal reasoning. MMMU-Pro further refined this by preventing models from bypassing visual cues. Chen’s research focus has long been on complex information understanding and reasoning. His background—including a PhD at UC Santa Barbara, research at Google/DeepMind on Gemini, and now a role in Meta’s superintelligence lab—provides deep insight into model development and their potential weaknesses. His TIGER Lab also builds models (e.g., for video understanding and generation), ensuring his evaluation benchmarks are grounded in practical challenges. While AI headlines often spotlight company leaders and product launches, Chen’s work exemplifies the critical, behind-the-scenes contributions of researchers crafting the rigorous standards that define and drive progress in AI capabilities.

marsbit1h ago

Behind the AI Scorecards Lies a Chinese 'Question Setter'

marsbit1h ago

STRC Unpegged by 11%, Can Strategy's Perpetual Motion Machine Keep Turning?

STRC, the perpetual preferred stock of MicroStrategy, is experiencing a persistent de-pegging from its target par value of $100, with the discount recently widening to over 11%. This de-anchoring challenges the core design of STRC, which was intended as a stable, income-oriented security operating near $100. As a crucial funding engine for MicroStrategy's Bitcoin acquisition strategy, STRC's price reflects market confidence in the company's entire capital model. The company's "capital flywheel" relies on issuing STRC at or above $100 via an At-the-Market (ATM) program to raise cash for buying Bitcoin, thereby boosting company equity and theoretically supporting STRC's value. A monthly adjustable dividend mechanism was designed to maintain this peg. Despite raising the dividend to 11.5% and increasing payment frequency, the de-pegging persists. Market concerns extend beyond technical factors like leveraged arbitrage unwinding. Analysts point to MicroStrategy's limited cash reserves relative to its ~$1.7 billion annual dividend obligation for preferred shares. While the company counters that its vast Bitcoin holdings could cover decades of payments, this argument hinges on the potential need to sell Bitcoin—a shift from its longstanding "hodl" narrative. The company's recent sale of a small amount of BTC, framed as a test, amplified these liquidity and strategy concerns. If STRC remains discounted, impairing MicroStrategy's ability to raise cheap capital, fears may grow that the company could sell more Bitcoin to meet obligations. This scenario could transform MicroStrategy from a major market buyer into a potential seller, posing significant downside risk for Bitcoin. The re-pegging of STRC is thus a key indicator for the health of MicroStrategy's capital structure and its market impact.

Odaily星球日报1h ago

STRC Unpegged by 11%, Can Strategy's Perpetual Motion Machine Keep Turning?

Odaily星球日报1h ago

Trading

Spot
Futures

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of AI (AI) are presented below.

活动图片