Artículos Relacionados con Code Generation

El Centro de Noticias de HTX ofrece los artículos más recientes y un análisis profundo sobre "Code Generation", cubriendo tendencias del mercado, actualizaciones de proyectos, desarrollos tecnológicos y políticas regulatorias en la industria de cripto.

First Long-Horizon Doc2Repo Training Dataset: Code Agents Move Beyond Bug Fixing and Begin Creating Repositories

With the advancement of LLM Code Agents, the research focus is shifting towards long-horizon, real-world tasks, moving beyond simple bug fixes to full repository generation. To address this, researchers from Renmin University of China introduced the DeNovoSWE dataset. This dataset focuses on long-term software engineering tasks, specifically the "document-to-repository" challenge—generating an entire, executable code repository from a task description. The DeNovoSWE construction method employs a Divide & Conquer approach. It breaks down target repositories into core capabilities and uses a multi-agent Draft-Critic-Repair workflow to automatically generate high-quality, evaluation-aligned task documents. The dataset also implements difficulty-aware filtering to balance quality and diversity. The result is a high-quality, anti-leakage dataset of 4,818 instances. Experiments show that models trained on DeNovoSWE achieve significant improvements in long-horizon repository generation. For instance, Qwen3-30B-A3B-Instruct's performance on the BeyondSWE-Doc2Repo benchmark increased from 5.8% to 47.2%, and on NL2RepoBench from 4.3% to 23.0%. Similar gains were observed with stronger backbones, demonstrating that dedicated long-horizon training data is crucial for advancing Code Agents from maintainers to architects capable of planning and building complete software projects from scratch.

marsbitAyer 08:51

First Long-Horizon Doc2Repo Training Dataset: Code Agents Move Beyond Bug Fixing and Begin Creating Repositories

marsbitAyer 08:51

Who Makes the Best Use of Claude Code? The Answer Might Not Be Programmers

Claude Code Usage Report Summary (Based on ~400k sessions) Core Finding: In agentic programming with Claude Code, a clear division of labor has emerged: humans primarily decide *what* to build (planning decisions), while Claude decides *how* to build it (execution decisions). Key Insights: 1. **Effectiveness is not limited to programmers.** In code-generation tasks, success rates for users in non-technical fields (law, finance, management, research) are nearing those of software engineers. What matters most is the user's domain expertise and understanding of the problem to be solved. 2. **Domain expertise drives success and efficiency.** Sessions where users exhibited "expert" proficiency in the task's domain saw verified success rates double compared to "novice" sessions. Experts also delegated more work per instruction, with Claude executing more actions and producing more output. 3. **AI is amplifying, not replacing, domain knowledge.** Claude Code lowers the *implementation* barrier, not the *judgment* barrier. The value of knowing the "what" and "why" is increasing relative to just knowing the "how" to code. 4. **Usage is evolving.** Over a 7-month period (Oct '25 - Apr '26), the share of sessions for debugging halved, while use for software operations, data analysis, and non-code writing roughly doubled. The estimated economic value of typical tasks increased by ~25%. Conclusion: The data suggests coding agents are making programming background less critical for completing technical tasks. However, they reward and amplify deep domain understanding. The ability to successfully direct an AI agent stems more from mastery of a specific field than from coding skill itself. The primary gains come from being competent in a domain; deep specialization adds only marginal additional advantage. This may signal a shift where software creation becomes integrated into various professions.

marsbit06/20 02:03

Who Makes the Best Use of Claude Code? The Answer Might Not Be Programmers

marsbit06/20 02:03

AGI is Just One Step Away

The article discusses Anthropic's release of the Fable 5 model, a heavily restricted version of its powerful Mythos model. Initially unveiled in April, Mythos reportedly identified over 10,000 high-risk vulnerabilities for 50 enterprise clients, causing significant concern. Due to its dangerous capabilities in areas like autonomous cyber-attacks and biochemical weapons design guidance (classified as CB-1 level), the unaltered Mythos 5 remains limited to about 200 vetted entities like government agencies. Fable 5, released with a safety classifier, demonstrates extraordinary performance, leading benchmarks in coding (SWE-Bench Pro), software engineering, and research. It exhibits true "long-horizon agency," autonomously planning and executing complex, multi-step tasks like migrating 50 million lines of code in a day, moving beyond simple question-answering. The article positions Fable 5 at OpenAI's Level 3 ("Agent") and progressing toward Level 4 ("Innovator"), suggesting AGI (Artificial General Intelligence) is within reach, potentially 18-24 months away. To mitigate risks, Anthropic implemented a two-layer safety "cage": a silent routing system that redirects dangerous queries to a weaker model, and a mandatory 30-day data retention policy for all Mythos traffic to detect patterns of malicious use. Despite its high cost ($10/$50 per million input/output tokens), the model targets the enterprise market, where its unparalleled productivity and defensive capabilities against AI-powered cyber threats justify the premium. This signals a market maturation where top-tier AI becomes a strategic, high-value tool for businesses, potentially widening the gap with consumer-focused models and accelerating the rise of "one-person companies" while disrupting labor markets.

marsbit06/11 05:10

AGI is Just One Step Away

marsbit06/11 05:10

Sam Altman in Conversation with Stripe CEO: The Era Where Ideas Are More Valuable Than Code Has Arrived!

At Stripe's 2026 annual conference, OpenAI CEO Sam Altman joined Stripe CEO Patrick Collison for a fireside chat. Altman shared key insights on the AI revolution, emphasizing that we are in a period of rapid takeoff, with AI capabilities advancing weekly. He outlined OpenAI's evolution from a research lab to a product company and now a large-scale "token factory" – a low-margin, utility-like provider of intelligence. Altman stressed that the most successful AI adopters have CEOs who personally automate workflows, driving organizational change. A significant shift is the rise of the "idea person." Altman now actively invests in founders with deep product insight but no coding skills, as AI tools enable them to build. He advocates for "suspension of disbelief" in investing, planning long-term (e.g., 20-year infrastructure deals) while focusing on a clear 2-year product roadmap. Beyond products, Altman is most excited about AI accelerating scientific discovery, shortening decade-long research cycles in complex diseases and driving breakthroughs in materials science and energy. He predicts the first profitable fusion reactor could emerge within five years, spurred by AI's compute demands. Finally, Altman defended OpenAI's philosophy of iterative public deployment over elite control, believing democratizing AI access is crucial to avoid centralized power and unlock global innovation.

marsbit05/15 13:52

Sam Altman in Conversation with Stripe CEO: The Era Where Ideas Are More Valuable Than Code Has Arrived!

marsbit05/15 13:52

OpenAI Post-Training Engineer Weng Jiayi Proposes a New Paradigm Hypothesis for Agentic AI

OpenAI engineer Weng Jiayi's "Heuristic Learning" experiments propose a new paradigm for Agentic AI, suggesting that intelligent agents can improve not just by training neural networks, but also by autonomously writing and refining code based on environmental feedback. In the experiment, a coding agent (powered by Codex) was tasked with developing and maintaining a programmatic strategy for the Atari game Breakout. Starting from a basic prompt, the agent iteratively wrote code, ran the game, analyzed logs and video replays to identify failures, and then modified the code. Through this engineering loop of "code-run-debug-update," it evolved a pure Python heuristic strategy that achieved a perfect score of 864 in Breakout and performed competitively with deep reinforcement learning (RL) algorithms in MuJoCo control tasks like Ant and HalfCheetah. This approach, termed Heuristic Learning (HL), contrasts with Deep RL. In HL, experience is captured in readable, modifiable code, tests, logs, and configurations—a software system—rather than being encoded solely into opaque neural network weights. This offers potential advantages in explainability, auditability for safety-critical applications, easier integration of regression tests to combat catastrophic forgetting, and more efficient sample use in early learning stages, as demonstrated in broader tests on 57 Atari games. However, the blog acknowledges clear limitations. Programmatic strategies struggle with tasks requiring long-horizon planning or complex perception (e.g., Montezuma's Revenge), areas where neural networks excel. The future vision is a hybrid architecture: specialized neural networks for fast perception (System 1), HL systems for rules, safety, and local recovery (also System 1), and LLM agents providing high-level feedback and learning from the HL system's data (System 2). The core proposition is that in the era of capable coding agents, a significant portion of an AI's learned experience could be maintained as an auditable, evolving software system.

marsbit05/11 00:17

OpenAI Post-Training Engineer Weng Jiayi Proposes a New Paradigm Hypothesis for Agentic AI

marsbit05/11 00:17

活动图片