OpenAI Post-Training Engineer Weng Jiayi Proposes a New Paradigm Hypothesis for Agentic AI

marsbit发布于2026-05-11更新于2026-05-11

文章摘要

OpenAI engineer Weng Jiayi's "Heuristic Learning" experiments propose a new paradigm for Agentic AI, suggesting that intelligent agents can improve not just by training neural networks, but also by autonomously writing and refining code based on environmental feedback. In the experiment, a coding agent (powered by Codex) was tasked with developing and maintaining a programmatic strategy for the Atari game Breakout. Starting from a basic prompt, the agent iteratively wrote code, ran the game, analyzed logs and video replays to identify failures, and then modified the code. Through this engineering loop of "code-run-debug-update," it evolved a pure Python heuristic strategy that achieved a perfect score of 864 in Breakout and performed competitively with deep reinforcement learning (RL) algorithms in MuJoCo control tasks like Ant and HalfCheetah. This approach, termed Heuristic Learning (HL), contrasts with Deep RL. In HL, experience is captured in readable, modifiable code, tests, logs, and configurations—a software system—rather than being encoded solely into opaque neural network weights. This offers potential advantages in explainability, auditability for safety-critical applications, easier integration of regression tests to combat catastrophic forgetting, and more efficient sample use in early learning stages, as demonstrated in broader tests on 57 Atari games. However, the blog acknowledges clear limitations. Programmatic strategies struggle with tasks requiring long-...

Over the past decade, the advancement of AI has primarily relied on one path: feeding more data and computing power into larger models, allowing experience to accumulate within neural network parameters. This path has led to the leap in large models after ChatGPT, but it has also left behind a persistent challenge: as models become increasingly powerful, the reasons behind their successes and failures often remain difficult to explain and correct.

Recent experiments by OpenAI engineer Weng Jiayi suggest another possibility: within a clear objective, a runnable environment, and a feedback loop, AI can improve not only by training models but also by "autonomously modifying code."

On May 8, 2026, Weng Jiayi systematically documented this set of experiments in his personal blog "Learning Beyond Gradients" and simultaneously made the code repository, CSV experiment logs, and video replays public. He has long focused on reinforcement learning and post-training infrastructure, participated in the initial launch of ChatGPT, and contributed to projects like GPT-4, GPT-4 Turbo, GPT-4o, o-series, and GPT-5. Before joining OpenAI, he earned his bachelor's degree from the Department of Computer Science at Tsinghua University and his master's degree from Carnegie Mellon University. He is also a main author of the open-source reinforcement learning library Tianshou and the high-performance parallel environment engine EnvPool.

Image generated by AI

He had Codex repeatedly write policy code, run environments, read logs, review replays, locate failures, then modify code, add tests, and continue evaluation. After multiple iterations, Codex "cultivated" a set of pure Python programmatic strategies: it achieved a theoretical perfect score of 864 points in Atari Breakout and also produced results in robot control simulation environments like MuJoCo Ant and HalfCheetah that were close to those of common deep reinforcement learning algorithms.

The truly significant aspect of these experiments lies in a core question: When the coding agent is sufficiently capable, must learning necessarily occur within neural network weights?

In this experimental setup, experience is written into code, tests, logs, and replays, becoming a software system that can be read, modified, reviewed, and audited. If this direction continues to hold, the next step for Agentic AI might not only be training larger models but also enabling models to participate in maintaining a continuously evolving engineering system.

01 From 387 Points to a Perfect Score: An Engineering Loop

Weng Jiayi wrote in his blog that the starting point for this experiment was actually an engineering need. While maintaining EnvPool in his spare time, he required a cheaper method than "running a neural network every time" to test whether the game environment was functioning correctly, as placing neural networks in CI was too expensive. The original question was: Could he write cheap, reproducible, heuristic rules that were clearly better than a random policy, to drive the environment to information-rich states?

He used Codex (base model gpt-5.4) to attempt writing a completely rule-based version. The initial prompt was very direct: "Write a strategy that can solve Breakout." The result was unsatisfactory. A low score itself provided no information—the action semantics could be wrong, state detection could be wrong, the evaluation process could be wrong, or the policy structure itself could be too weak.

Subsequently, Weng Jiayi changed the task format. He no longer asked Codex to simply deliver a policy.py file; instead, he required it to maintain a complete loop: probe actions and observations, write state detectors, write the policy, run complete episodes, record trials.jsonl and summary.csv, generate videos or curves, inspect failure modes, modify the policy, simplify code, and run regressions.

The experimental log for Breakout clearly recorded this process. In the first round, Codex confirmed the action space and observation shape, identified the colors of the ball, paddle, and bricks from the RGB frames, and then used image labels to scan the 128-byte Atari RAM. The initial baseline scored only 99 points. After adding tunnel offset logic, the score increased to 387 points.

387 points was a deceptively high local optimum. The strategy could stably hit the ball, but the ball path was trapped in a periodic loop: no lives were lost, but no new bricks could be broken, and the score was stuck. If a human were writing the code, they might continue fine-tuning the "accuracy of hitting the ball." Codex watched the video and the last few dozen steps of the trajectory, and identified the problem as a lack of disturbance in the ball's path.

Image: Atari Breakout gameplay. The player controls the bottom paddle to bounce a ball, breaking layers of colored bricks above. Codex achieved the theoretical perfect score of 864 points in this game.

Codex then added a mechanism to "break the cycle": if no reward was received for a long time, periodically add an offset to the landing point prediction to knock the ball out of the local loop. The score jumped from 387 to 507. During further iterations, a new problem emerged: for fast low balls, conventional interception would cause the paddle to "over-lead" and drift away. Codex added a `fast_low_ball_lead_steps=3` parameter, and the score jumped from 507 to 839. The final improvement from 839 to 864 resembled maintaining an already complex system: trying deadband, serve offset, stuck offset, brick balance bias, lookahead steps; many directions were ineffective. The final useful change was a late-stage condition: "After the first wall of bricks is cleared, enable the stuck offset only when the ball is far from the paddle, and gradually release it when the ball is close."

The final RAM default configuration stably output 864 / 864 / 864 points across three episodes, reaching the theoretical limit of Breakout. Codex then migrated the same geometric controller to a pure vision input version—without reading RAM, relying solely on RGB segmentation to identify the paddle, ball, and brick balance. The vision version initially scored 310 points, then 428 points, and reached 864 points after the seventh local episode, corresponding to 14,504 local policy environment steps.

Image: Sample efficiency curve of Codex on Breakout. The blue line is the version that reads game memory (RAM), and the red line is the vision-only version (Vision). The RAM version experienced several jumps: 99 → 387 → 507 → 839 → 864, finally reaching the perfect score for the first time at episode 81, with a cumulative 1.5 million environment steps; the Vision version, migrating the mature structure from the RAM version, reached 864 points with only 7 episodes and approximately 14,500 environment steps.

Weng Jiayi specifically noted that this should not be understood as "the vision input started from scratch and reached a perfect score using only 14.5K steps." The actual process was that Codex first discovered the geometric controller, cycle-breaking mechanism, and late-stage offset release in the RAM version. Once the structure was stable, the state reading layer was switched from RAM to RGB. The 14.5K steps represent the migration budget for the vision version.

02 Defining Heuristic Learning

Finding a name for this evolving "software policy" was more difficult than writing the first version of the policy. Weng Jiayi ultimately named this process Heuristic Learning (HL) and termed the object it maintains as a Heuristic System (HS).

According to his blog definition, HL is composed of program code. Like today's common deep reinforcement learning, it has a loop of state, action, feedback, and update. The difference is that the object being updated is the software structure, not neural network parameters; its feedback, digested by the coding agent, can come from environmental rewards, test cases, logs, videos, replays, or human feedback; its update does not use backpropagation, but rather the coding agent directly edits the policy, state detectors, tests, configurations, or memories.

It should be added that the concept of "using programs rather than neural networks as policies" is not Weng Jiayi's original creation. Academic discussions on Programmatic RL have been ongoing for years: the PROPEL framework proposed by Rice University and Caltech in 2019 researched reinforcement learning methods representing policies as short programs in a symbolic language; the 2021 LEAPS work further learned program embedding spaces, combining differentiable program policies with RL training; the HPRL (Hierarchical Programmatic Reinforcement Learning) presented at ICML 2023 allows a meta-policy to combine multiple programs; the LLM-GS framework from National Taiwan University and Microsoft in 2024 uses LLM's programming ability and commonsense reasoning to guide the search for programmatic RL policies.

The consensus from this research is that, compared to neural policies, programmatic policies possess better interpretability, formal verifiability, and generalization ability to unseen scenarios.

Weng Jiayi's substantive contribution this time lies in treating the coding agent as the engineering channel for maintaining the heuristic system. In the past, doing programmatic RL either relied on manually designed domain-specific languages or search algorithms within restricted program spaces; Weng Jiayi, however, uses Codex to integrate code, logs, tests, video replays, and parameter adjustments into the same agent workflow, drastically reducing the iteration cost of program policies at once. In other words, he is arguing for a new engineering path: when the coding agent is sufficiently capable, those heuristic strategies once deemed "too expensive to maintain" might become cost-effective again.

Weng Jiayi provided a comparison table in his blog to clearly illustrate the differences between HL and Deep RL: in terms of policy form, the former consists of rules, state machines, controllers, model predictive control (MPC), and macro actions composed into code, while the latter consists of neural network parameters; in terms of state form, the former uses explicit variables, detectors, and caches, while the latter uses network-readable observation vectors; in terms of feedback form, the former treats tests, logs, and replays as valid signals, while the latter primarily relies on fixed reward functions; in terms of memory form, the former can explicitly store trials, summaries, failure reasons, and version diffs, while the latter has essentially none in on-policy algorithms and relies on replay buffers in off-policy algorithms.

This comparison demonstrates that HL possesses some engineering attributes: the policy is interpretable and can be translated into natural language; sample efficiency is measured in units of "one effective code change," not slow gradient updates; old capabilities can become regression tests, fixed-seed replays, or golden cases; overfitting to training seeds or test loopholes can be constrained through simplification, regression checks, and multi-seed evaluation; old capabilities don't have to reside solely in weights but can also reside in rule sets and tests, which partly addresses the catastrophic forgetting problem that neural networks have long struggled to solve.

03 Bulk Validation on Atari57: Boundaries and Shortcomings

If focusing only on Breakout, the story could easily be simplified to "AI wrote a perfect strategy." But Weng Jiayi didn't stop at Breakout; he scaled this Codex workflow in bulk to Atari57, running 57 games, two observation modes, and three repetitions each, totaling 342 "unattended" search trajectories.

The experimental design was quite rigorous. Each game was tested with two input methods: one directly reading game memory, and the other viewing only the screen. Each method was independently repeated three times. This produced a total of 342 "unattended" experimental trajectories: each Codex agent received the same prompt template, explored actions on its own, wrote code on its own, ran experiments on its own, and recorded results on its own, with no human providing hints. Constraints were strictly enforced: no training neural networks, no reading game source code, no exploiting any hidden information. All steps used for debugging and trial-and-error had to be counted in the total cost. This was to prevent Codex from cheating in any "peeking at the answers" way.

When measuring results, a metric called HNS (Human-Normalized Score) is commonly used—simply put, it standardizes the score of each game relative to "average human player performance = 1" for easy cross-game comparison.

Image: Sample efficiency comparison on the full Atari57 suite. The x-axis is environment steps (log scale), and the y-axis is HNS (Human-Normalized Score, where 1.0 indicates reaching average human player level). Codex's vision input version (red line) significantly outperforms the PPO baseline (blue/gray dashed lines) in early-stage efficiency, reaching 0.81 at 9.7 million steps, comparable to PPO's level around 10 million steps; Codex's memory input version (purple line) converges at 0.59.

Measured by this standard, Codex's early-stage efficiency appears quite impressive. With only 1 million environment steps consumed, Codex's median HNS for vision input had already reached 0.32, and for memory input, 0.26, significantly higher than that of classical reinforcement learning algorithms like PPO at the same stage. By 9.7 million steps, Codex's vision version reached 0.81, already close to PPO's level of approximately 0.88 to 0.92 at 10 million steps. If allowed to aggregate by selecting the better-performing input method for each game, Codex's median HNS was 0.83, OpenAI Baselines PPO2 was 0.80, and CleanRL EnvPool PPO was 0.98—essentially a tie.

However, Weng Jiayi himself calmly drew a boundary: this is only a comparison of environment interaction efficiency, without accounting for the costs of Codex reading logs, writing code, and watching videos. "Running fast" does not equal "low total cost," and the latter remains a black box for now.

More noteworthy is that Codex's performance across the 57 games was not uniform. In games with clear geometric structures like Breakout, Boxing, and Krull, both heuristic strategies and deep reinforcement learning could significantly surpass human levels; in games with clear rules like Asterix, Jamesbond, and Tennis, heuristic strategies were even stronger; but in fast-paced, complex-pattern games like Atlantis, VideoPinball, RoadRunner, and StarGunner, PPO still dominated.

The most cautionary counterexample is Montezuma's Revenge. This is a notorious "hard nut to crack" in reinforcement learning, where the protagonist needs to find keys, avoid enemies, and open doors in a complex underground labyrinth, with extremely sparse reward signals—a classic "long-term planning + failure recovery" challenge. Codex did score 400 points in this game, but examining the policy file it generated reveals that it's not a true "strategy" but a hardcoded sequence of 86 actions corresponding to 1,769 environment steps: more like memorizing a fixed route than learning to navigate a maze. Weng Jiayi specifically noted: "This is a boundary case and should not be understood as a generic Montezuma strategy."

Montezuma exposes the expressive limits of Heuristic Learning. Ordinary programmatic strategies are essentially reactive logic of "do this action when you see that state," struggling with tasks requiring strict action sequences, resuming plans from intermediate states, and long-horizon planning. Such tasks require not just more if-else statements but program structures closer to "macro-action combination + recoverable search state + long-term memory." This tells us one thing: even if the coding agent becomes very powerful, some problems cannot be contained by ordinary code.

04 If the Paradigm Holds, What Are the Industrial Implications?

Zooming out to an industrial perspective. If the Heuristic Learning path truly holds—meaning "coding agents can stably maintain programmatic strategies surpassing handcrafted rules and approaching RL baselines"—where does its practical significance lie?

The first application point is robot control, especially in structurally stable scenarios. The framework Weng Jiayi outlined in his blog involves hierarchical division of labor: joint-level HL, limb-level HL, full-body balance HL, and task-level HL. Lower levels handle safety and low-latency control, middle levels handle gait and contact, and higher levels handle tasks and long-term memory; the coding agent doesn't need to "understand walking"—it's more like an update channel inserted into the system, sending failure videos, sensor streams, and simulation results back to the system, and rewriting feedback into code, parameters, protection rules, and memories.

Scenarios like warehouse AGVs, inspection robots, factory robotic arms, and standardized sorting, where the environment structure is relatively fixed and safety boundaries are clear—if core control strategies can be solidified into lightweight code, robots wouldn't need to run a large policy network for every action step. Deployment-end reliance on high-power GPU inference cards would decrease, with more load handled by traditional controllers and local program logic.

This doesn't mean robots don't need GPUs; perception, localization, mapping, and semantic understanding still rely on neural networks. What changes is the role of the GPU, shifting from "burning compute for end-to-end action decisions every second" to "playing a periodic role in perception, offline simulation, policy generation, and anomaly analysis."

The second application point is the auditability of safety-critical scenarios. The most troublesome engineering problem with neural policies is the inability to locate the cause after a failure. When a robotic arm suddenly fails at a certain angle, a vehicle misjudges in an edge case, or a medical robot acts abnormally in a rare posture, engineers cannot answer "which weight caused this error." Ultimately, they can only add data, retrain, run regression tests, and bet that the new model hasn't introduced new problems.

If the policy exists in code form, state variables, conditional branches, failure logs, and regression tests are all visible; a dangerous action can be hardcoded to be prohibited, a corner case can be written as a test, and an erroneous state transition can be individually patched. This doesn't make the system inherently safer, but it allows safety issues to enter normal software engineering workflows for the first time—they can be code-reviewed, intercepted by CI, and responded to by SRE on-call. In fields requiring regulation and liability division, like autonomous driving, industrial robotic arms, and medical robots, this auditability itself is of commercial value.

The third application point is the engineering of continual learning and online learning. Weng Jiayi presented this as the main argument of the entire blog post. Catastrophic forgetting in neural networks is a structural problem: learning new things washes away old capabilities. HL also experiences forgetting, but in a more engineering form: a new rule fixes one failure mode but breaks an old scenario; a new memory repeatedly leads the agent in the wrong direction; a test range is too narrow, and the policy learns to exploit it; a patch modifies a shared interface, and old calling paths silently fail.

These problems don't disappear automatically, but they are issues that software engineering has dealt with for decades, with existing toolchains—regression testing, version diffs, fixed-seed replays, golden traces, and explicitly recorded failure directions.

A healthy HS must perform two operations simultaneously: absorbing new feedback and compressing historical patches. An HS that only grows without reduction will eventually become a "code ball of mud" no one dares to touch. In other words, HL transforms the mathematical problem of "how to update parameters" into the engineering problem of "how to maintain a software system that continuously absorbs feedback."

The latter is not necessarily easier, but it is closer to the existing boundaries of human capability.

The fourth application point is capability accumulation in Agent products. What current Agent products lack most are stable tool invocation, reliable execution chains, reusable failure experiences, and auditable task records. If HL's logic holds, an Agent's memories during execution would precipitate into code assets that can be reused across sessions, users, and tasks. It can directly interface with existing DevOps processes and also means that Agents from different companies and teams can share heuristics without needing to share models—something the neural network approach cannot achieve.

However, it must be emphasized: all four application points depend on further validation of the HL path on more complex tasks. Breakout and Ant are relatively clean environments. Real robots face changes in ground friction, lighting variations, actuator delays, and sensor noise—none of which have been systematically evaluated in public materials. The Montezuma counterexample has already shown that long-horizon tasks require program forms beyond ordinary if-else. How far this vision can go depends on the next phase of experiments.

05 Technical Debt Shifts from Weights to Code

Weng Jiayi's assessment in his blog is measured. He wrote that HL cannot accomplish everything neural networks can do; it is limited by what code can express, especially in complex perception and long-horizon generalization. With today's understanding, he cannot imagine an agent using pure Python code without any neural networks to solve ImageNet. The truly worthwhile question is how to combine neural networks with HL to jointly address Online Learning and Continual Learning.

The division of labor he proposes borrows from the System 1 / System 2 framework: specialized shallow neural networks take on part of System 1, responsible for fast perception, classification, and object state estimation; HL also takes on part of System 1, responsible for processing fresh data, rules, tests, replays, memories, safety boundaries, and local recovery; the LLM agent acts as System 2, providing feedback and improvement data to HL, and periodically extracting information from data generated by HL to update itself.

If deep learning over the past decade has proven that "experience can be compressed into weights," then the hypothesis Weng Jiayi proposes this time is another proposition: in the era of coding agents, experience might once again become readable, modifiable, testable software.

This article is from the WeChat public account "Tencent Technology," author: Xiao Jing, editor: Xu Qingyang

你可能也喜欢

唐纳德·特朗普的公司再度出售大批比特币！

据报道，与美国总统唐纳德·特朗普的媒体公司Trump Media & Technology Group相关的地址，疑似向加密货币交易所CryptoCom转移了约2628枚比特币，价值约1.65亿美元。此前有分析称，该公司总计购买了11542枚比特币，平均成本为每枚11.85万美元。据称，2026年至今，相关地址已转出约7281枚比特币，目前仍持有约4261枚。 Trump Media在比特币投资上的已实现和未实现损失总额估计约为5.55亿美元。不过，将比特币转移至交易所并不一定意味着出售资产，也可能是为了托管、流动性管理或其他财务操作。目前尚无法确定其具体意图，但从冷钱包向中心化交易所转移通常被视为潜在的出售行为。 *本文不构成投资建议。

cryptonews.ru44分钟前

cryptonews.ru44分钟前

帕克·刘易斯解释为何比特币仍是最佳货币

知名比特币分析师帕克·刘易斯在访谈中批评了某些上市公司以“数字信贷”形式销售永续优先股的营销策略，认为这从根本上扭曲了比特币的本质。他指出，比特币在算法层面不具备法币收益性，承诺定期分红主要依赖牛市吸引新投资者来维持，风险极高。刘易斯引用数据说明此类衍生品的巨大风险：全球信贷市场规模达300万亿美元，而永续优先股市场仅约1万亿美元，这表明机构有意规避这种无还款期限的资产，将风险转嫁给信息不足的散户。针对“比特币波动性太大”的常见观点，他认为波动性是这一供应量严格受限的新资产被大规模采用过程中的自然数学结果。新人入场需出更高价从早期持有者手中购买，导致价格剧烈波动。他建议投资者直接持有比特币，这比投资MicroStrategy等公司发行的衍生品更安全。投资者将焦点从直接持有加密货币转向公司衍生品，会忽视法币急速贬值的真正威胁。刘易斯以自创的“肋眼牛排指数”为例，指出其本地超市一款牛排价格从2020年的19.99美元涨至37.99美元，反映年化约12-13%的真实通胀，远超官方平滑后的CPI数据。在全球通胀环境下，最明智、保守且安全的策略仍是直接持有比特币并完全掌控私钥。追逐加密货币国库股等公司工具的收益只会叠加隐性系统风险，而理解去中心化货币的本质才能有效保护财富免受宏观经济动荡影响。

cryptonews.ru56分钟前

cryptonews.ru56分钟前

比特币为何在美联储强硬暂停后守住 64,000 美元关口

比特币在美联储暂停加息后维持在64,000美元附近，结束了7月的交易。市场对美联储维持利率不变的决定反应剧烈，但并未获得政策即将转向的明确信号。在此背景下，资金重新流入比特币现货ETF，加密货币总市值保持在2.29万亿美元左右，主要山寨币走势分化。投资者目前处于观望状态。一方面，高利率和美联储的强硬立场抑制了风险偏好；另一方面，市场未出现恐慌性抛售、比特币ETF恢复资金净流入以及关键价位的韧性表明，数字资产市场尚未准备好大幅下跌。 **关键信息：** - 美联储维持利率在3.50%-3.75%，投票结果为9:3，三位委员支持加息。 - 比特币现货ETF净流入3210万美元，结束了连续流出；以太坊ETF则净流出约1865万美元。 - 比特币在63,000-66,000美元区间内盘整，63,000-63,500美元构成支撑，66,000美元是近期阻力。 - 以太坊价格在1,900美元附近承压，但其网络基本面（如质押意愿）保持稳定。 - 资金在主要加密货币间轮动，比特币重获机构青睐，Solana相关产品也有资金流入。 - 美国CLARITY Act法案审议被推迟至秋季，降低了其在2026年内通过的可能性。技术层面，比特币能否在63,000美元上方保持稳固、以太坊能否守住1,860美元以及机构资金流入能否持续，被视为市场能否在2026年下半年构筑复苏基础的关键信号。

cryptonews.ru56分钟前

cryptonews.ru56分钟前

ARK Invest公司凯西·伍德以683万美元购入约109,129股Circle股票

凯西·伍德领导的ARK Invest公司通过旗下三只交易所交易基金，购入约109,129股Circle股票，价值近683万美元。此次买入前，Circle刚获得纽约州金融服务部门颁发的信托运营许可证，其信托实体将更名为Circle New York Trust。公司首席执行官杰里米·阿莱尔称此为公司的长期目标。尽管获得监管批准，7月31日Circle股价仍下跌2.54%，收于62.61美元，市场反应显示投资者可能未将该许可视为股价上涨动力。同期，ARK Invest还趁科技股抛售潮，购入了价值约4020万美元的特斯拉、SpaceX和英伟达股票，并减持了Shopify、Cloudflare、CrowdStrike等公司的股份。

cryptonews.ru57分钟前

ARK Invest公司凯西·伍德以683万美元购入约109,129股Circle股票

cryptonews.ru57分钟前

涉及XRP的诈骗案嫌疑人被捕，从71名投资者处窃取900万美元

据韩国《朝鲜日报》报道，7月30日，首尔市警察厅宣布逮捕了三名涉嫌运营XRP投资诈骗平台的嫌疑人。该团伙被控在2023年10月16日至23日期间，通过网站Fxrpntwork.com，从71名投资者处骗取了约340万枚XRP（当时价值约900万美元），随后关闭网站并消失。诈骗者通过博客、网络文章和YouTube视频推广该平台，承诺保本及每月1.5%至1.8%的收益，并指示投资者通过海外平台将XRP从韩国交易所转入其控制的钱包。警方警告投资者切勿轻信YouTube等平台的未经验证信息，投资前务必核查官方来源。目前，两名29岁嫌疑人已被捕，一名34岁同伙将被移送检方，另有一名29岁嫌疑人在海外被国际刑警组织通缉。该诈骗平台模仿了合法项目Flare Network及其FAssets系统的品牌名称（如FXRP），以此制造可信假象。Ripple公司曾警告，此类骗局常通过复制名称、标识和网站来误导投资者。此案再次表明，承诺“保本高收益”是加密货币诈骗的常见特征。诈骗者常利用伪造的促销活动、推荐和紧迫感诱使受害者转账。FBI数据显示，加密货币投资诈骗在全球造成巨额损失。韩国调查人员已冻结价值173亿韩元的涉案资产，并追查可能存在的更多受害者和共犯。

cryptonews.ru59分钟前

cryptonews.ru59分钟前

交易

现货

OpenAI Post-Training Engineer Weng Jiayi Proposes a New Paradigm Hypothesis for Agentic AI

文章摘要

01

From 387 Points to a Perfect Score: An Engineering Loop

02

Defining Heuristic Learning

03

Bulk Validation on Atari57: Boundaries and Shortcomings

04

If the Paradigm Holds, What Are the Industrial Implications?

05

Technical Debt Shifts from Weights to Code

热门币种推荐

相关问答

你可能也喜欢

唐纳德·特朗普的公司再度出售大批比特币！

帕克·刘易斯解释为何比特币仍是最佳货币

比特币为何在美联储强硬暂停后守住 64,000 美元关口

ARK Invest公司凯西·伍德以683万美元购入约109,129股Circle股票

涉及XRP的诈骗案嫌疑人被捕，从71名投资者处窃取900万美元

交易

热门文章

火币成长学院：AI与Crypto深度研报：算法与账本的共生时代

从H2A到A2A：AI Agent经济体与Crypto新机遇

美股TradFi：传统金融在AI IPO浪潮下的稳健锚点

相关讨论

热门问答

热门分类

热门标签