谷歌新Gemini AI模型在基准测试中击败GPT-4o

币界网Published on 2024-08-02Last updated on 2024-08-02

币界网报道:

作者:Tristan Greene,CoinTelegraph;编译:陶朱,

生成式人工智能基准测试领域又出现了一位新霸主,它的名字是 Gemini 1.5 Pro。

之前的冠军 OpenAI 的 ChatGPT-4o 终于在 8 月 1 日被超越,当时谷歌悄然发布了其最新模型的实验版本。

Gemini 的最新更新没有大张旗鼓地发布,目前被标记为实验性的。但它很快引起了社交媒体上人工智能社区的关注,因为有报道称它在基准测试分数上超越了竞争对手。

人工智能基准

自 GPT-3 发布以来,OpenAI 的 ChatGPT 一直是生成式 AI 的标杆。过去一年左右,其最新模型 GPT-4o 和最接近的竞争对手 Anthropic 的 Claude-3 在大多数常见基准测试中都遥遥领先于大多数其他模型,几乎没有遇到任何竞争对手。

joLFxVORsiw7ebQNskYsq6svnXEnsKu4FYrunFjb.jpeg

来源:大型模型系统组织。

最受欢迎的基准测试之一是 LMSYS Chatbot Arena。它测试各种任务的模型并分配总体能力分数。GPT-4o 的得分为 1,286,而 Claude-3 获得了可观的 1,271 分。

Gemini 1.5 Pro 的先前版本得分为 1,261。但 8 月 1 日发布的实验版本 (Gemini 1.5 Pro 0801) 得分高达 1,300。

这表明它总体上比竞争对手更强大,但基准测试并不一定能准确反映 AI 模型能做什么和不能做什么。

社区兴奋

在没有更深入的比较的情况下,我们正进入一个 AI 聊天机器人市场已经足够成熟,可以提供多种选择的时代。最终由用户来决定哪种 AI 模型最适合他们。

据传,Gemini 的最新版本引起了一波兴奋,社交媒体上的用户称它“非常好”。一位 Redditor 甚至写道,它“完全胜过 4o”。

目前尚不清楚 Gemini 1.5 Pro 的实验版本是否会成为未来的默认版本。虽然截至本文发表时,它仍然普遍可用,但它处于早期发布或测试阶段这一事实表明,出于安全或协调原因,该模型可能会被撤销或更改。

Trending Cryptos

Related Reads

If It's Not a Clear Yes, It's a No: A Nine-Year Retrospective by a VC Who Survived Four Cycles

**"Invest Only When Certain": A Nine-Year Retrospective from a VC Across Four Cycles** IOSG founder Jocy shares hard-earned lessons from nine years and over a hundred investments in Web3. The core challenge isn't identifying successful founders, but understanding why talented founders with solid ideas still fail. Through building a "failed founder database," IOSG identified six recurring failure patterns. **Founder Trait Red Flags:** 1. **Emotionally Unstable:** Founders who react defensively to criticism or publicly lash out under pressure (e.g., 80% drawdowns) often fail. Resilience is key. 2. **Lacking Hunger / Having a Fallback:** Founders with significant safety nets (family wealth, cushy fallback jobs) may lack the "do-or-die" commitment needed to survive crypto's brutal cycles. 3. **Unchecked Ego:** Includes "polished execution machines" who excel in known frameworks but struggle when paradigms shift, and "professor-types" who are technically brilliant but resistant to commercial feedback or coaching. **Project Structure Red Flags:** 4. **Token-First, Not Product-First:** Treating the token solely as a fundraising tool with no real utility or connection to product value is a major warning sign. The project should have value even if the token goes to zero. 5. **No Day-1 Exit Thesis:** Founders must have a clear, staged capital strategy from the start, understanding what each funding round needs to prove to unlock the next. "Exit before entry" is crucial. 6. **No Full-Cycle Experience:** Founders who haven't lived through a complete crypto bull/bear cycle (e.g., 2018, 2022) often underestimate their vulnerability. IOSG limits initial checks for such teams to $250k, sizing for risk. **The Positive Flipside: Desirable Founder Traits** The ideal candidate exhibits: obsessive problem-depth, being a second-time founder with a non-consensus vision, strong communication skills with *controlled* ego, relentless perseverance, and a global perspective with agency and taste (increasingly vital in the AI era). **Three Survival Tips for Founders:** 1. **Cash Flow Over Narrative:** Real revenue is what sustains projects, not vanity metrics. 2. **Tokens Are a Liability:** Avoid issuing a token unless absolutely necessary. The hidden costs (market making, liquidity, compliance) are immense, often a multi-million-dollar burden. 3. **Respect Liquidity:** Sell during peaks to build treasury, buy back to support the protocol during troughs. Be realistic about valuations and your ability to deliver for the next round. The final principle is simple yet paramount: **"If it's a borderline 'yes' or 'no,' don't invest."** In an industry that reinvents itself every few years, the discipline to consistently say "no" is the ultimate secret to longevity.

Foresight News10m ago

If It's Not a Clear Yes, It's a No: A Nine-Year Retrospective by a VC Who Survived Four Cycles

Foresight News10m ago

SemiAnalysis Deep Dive into CXMT: $50 Billion Revenue, An IPO Amidst a Supercycle

SemiAnalysis' in-depth report on ChangXin Memory Technologies (CXMT) details its rapid rise as China's largest upcoming semiconductor IPO. Founded in 2016 by Zhu Yiming, CXMT built its DRAM foundation on acquired patents and talent from the bankrupt German firm Qimonda. It achieved its first annual profit in 2025 after nearly a decade of significant capital support, primarily from patient Hefei municipal investors who fostered a local supply chain. The company is now capitalizing on a strong DRAM supercycle. Its revenue soared from ~$3.3B in 2024 to ~$8.6B in 2025, with Q1 2026 alone reaching ~$7.3B. SemiAnalysis projects full-year 2026 revenue could exceed $50B, driven by soaring ASPs rather than massive market share gains. While CXMT is closing the capacity gap with Micron, its product mix remains heavily focused on commodity DDR/LPDDR, which currently offers higher margins than its nascent HBM business. CXMT faces significant challenges in HBM, struggling with yield and stability for HBM3 8-Hi stacks while lagging behind the big three (Samsung, SK Hynix, Micron) in advanced nodes. However, strategic national priorities for AI self-sufficiency may push it to accelerate HBM capacity. Its complex IPO structure reveals heavy state-backed ownership and voting control over its fabs, with Alibaba appearing as both a key cloud customer and a minority shareholder. The IPO aims to raise ~$4.1B, primarily to strengthen its core DRAM manufacturing base.

marsbit30m ago

SemiAnalysis Deep Dive into CXMT: $50 Billion Revenue, An IPO Amidst a Supercycle

marsbit30m ago

From Corning to Ciena: The 10x Opportunity in the AI Optical Communication Chain

The transition from copper to optical communication in AI data centers is creating significant investment opportunities beyond just chipmakers. The entire photonics supply chain, from glass and fiber to connectors and test equipment, is critical. Corning, a key fiber supplier, has locked in multi-billion dollar, multi-year contracts with major cloud providers (Meta, Amazon, Google, Microsoft, OpenAI, NVIDIA), demonstrating pricing power and scale. Its profit growth is outpacing revenue growth. In the interconnect layer, Amphenol benefits from high growth in AI data centers, driven by strategic acquisitions and operational efficiency, while Credo Technology acts as a bridge between copper and optical solutions, though with high customer concentration risk. At the systems level, Ciena enables higher data capacity on existing fiber lines, with a strong backlog and cloud customer adoption. Further upstream, AXT is a bottleneck supplier of key indium phosphide wafers for lasers but faces geopolitical supply chain risks. VEO Solutions provides essential testing equipment for the entire photonics industry. A new pure-play photonics ETF (FOTO) offers a consolidated investment approach. The core thesis is that the physical limits of copper are driving an inevitable shift to optical technologies, with wealth flowing to essential, often overlooked, suppliers across the photonics value chain.

marsbit42m ago

From Corning to Ciena: The 10x Opportunity in the AI Optical Communication Chain

marsbit42m ago

Collector Crypt's DAU Is Only 800, Yet It's Already One of Crypto's Most Profitable Projects?

"Collector Crypt: A Highly Profitable Crypto Project with Only 800 Daily Active Users?" Collector Crypt (CARDS) is a crypto project tokenizing physical graded trading cards (primarily Pokémon) on Solana, achieving significant real-world profitability and growth. According to a Maelstrom Fund analysis, it generated approximately $53M in annualized profit in May, with a June run-rate nearing $109M, against a $550M FDV. Its core revenue driver is a digital pack-opening 'Gacha' system. The platform bulk-buys cards at a 5-15% discount. Users can open digital packs and choose to keep cards or sell them back to the platform at a 7-15% discount to market price. Most users sell back common cards, creating an efficient model: users get packs with a ~2% positive expected value, while Collector Crypt captures ~4.5% profit. The project aims to disrupt the inefficient $22.2B GMV (Q1 2026) eBay trading card market, which charges sellers 16-20% in total fees. Collector Crypt offers 2% fees, instant settlement, insured custody, and one-click trading. Beyond Gacha, future revenue streams include secondary market trading fees, infrastructure partnerships, and an eBay "snipe" tool. It holds ~$23M in card inventory and ~$10M in cash, and has already begun token buybacks. With a total supply of 2B tokens, effective circulation post-2027 unlocks is estimated at ~1.3B. Trading primarily on DEXs has so far limited large institutional entry. The project is expanding into sports cards and attracting Web2 users. Maelstrom Fund's price target is $4 by summer's end, positioning Collector Crypt at the forefront of migrating collectibles on-chain.

Foresight News54m ago

Collector Crypt's DAU Is Only 800, Yet It's Already One of Crypto's Most Profitable Projects?

Foresight News54m ago

Trading

Spot
Futures

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of AI (AI) are presented below.

活动图片