# Сопутствующие статьи по теме Benchmark

Новостной центр HTX предлагает последние статьи и углубленный анализ по "Benchmark", охватывающие рыночные тренды, новости проектов, развитие технологий и политику регулирования в криптоиндустрии.

Auto Research Era: 47 Tasks Without Standard Answers Become the Must-Test Leaderboard for Agent Capabilities

The article introduces Frontier-Eng Bench, a new benchmark for AI agents developed by Einsia AI's Navers lab. Unlike traditional tests with clear answers, this benchmark presents 47 complex, real-world engineering tasks—such as optimizing underwater robot stability, battery fast-charging protocols, or quantum circuit noise control—where there is no single correct solution, only continuous optimization towards a limit. It shifts AI evaluation from static knowledge retrieval to a dynamic "engineering closed-loop": the AI must propose solutions, run simulations, interpret errors, adjust parameters, and re-run experiments to iteratively improve performance. This process tests an agent's ability to learn and evolve through long-term feedback, much like a human engineer tackling trade-offs between power, safety, and performance. Key findings from the benchmark reveal two patterns: 1) Improvements follow a power-law decay, becoming harder and smaller as optimization progresses, and 2) While exploring multiple solution paths (breadth) helps, sustained depth in a single path is crucial for breakthrough innovations. The research suggests this marks a step toward "Auto Research," where AI systems can autonomously conduct continuous, tireless optimization in scientific and engineering domains. Humans would set high-level goals, while AI agents handle the iterative experimentation and refinement. This could fundamentally change research and development workflows.

marsbit05/13 07:06

Auto Research Era: 47 Tasks Without Standard Answers Become the Must-Test Leaderboard for Agent Capabilities

marsbit05/13 07:06

Institutional Adoption of Prediction Markets Stuck at the Third Stage

Prediction markets are transitioning from niche platforms focused on elections and sports to mainstream financial tools, as highlighted at Kalshi Research's inaugural conference. While sports still dominate trading volume (around 80%), non-sports categories like macroeconomics, politics, and entertainment are growing faster, signaling a shift from entertainment-based trading to information and risk management tools. Institutions, including Wall Street firms, are increasingly using prediction markets for data reference (Stage 1 adoption), with some progressing to system integration (Stage 2). However, full-scale trading (Stage 3) is limited due to the lack of margin trading, requiring full collateral for positions—a barrier for leverage-dependent entities. Kalshi is working with regulators to introduce margin mechanisms. Key insights from participants like Goldman Sachs and CNBC emphasize the value of real-time pricing for events (e.g., Fed decisions, tariffs), providing benchmarks previously unavailable. The path to maturity mirrors historical financial instruments like options, with expectations that prediction markets will become institutional staples within five years. Political leaders, including Trump and Schumer, now cite Kalshi odds, underscoring its growing influence. The platform rewards domain expertise over traditional finance backgrounds, attracting diverse participants from fields like music and poker. Ultimately, prediction markets are evolving into critical infrastructure for pricing uncertainty.

marsbit04/17 02:27

Institutional Adoption of Prediction Markets Stuck at the Third Stage

marsbit04/17 02:27

活动图片