Why Are We So Persistent in That 'Laborious and Unrewarding' Data Cleaning?

marsbitPublished on 2026-01-24Last updated on 2026-01-24

Abstract

In the article "Why Are We So Committed to 'Labor-Intensive and Unrewarding' Data Cleaning?", the RootData team reflects on their second bounty event, which focused on enhancing data transparency in Web3. The event involving over 140 participants resulted in 1,220 submissions, with 564 valid data points approved—a 46.2% acceptance rate. Key improvements included identifying key team members from projects like MOMO.FUN and Subhub (often not publicly listed), correcting inaccuracies in token unlock details and TGE timelines, and updating outdated information such as misattributed founders and deprecated social accounts. The author emphasizes that ensuring data transparency—though challenging—is critical for protecting investors' "right to know." In Web3, where misinformation is common (e.g., inconsistent token unlock data across platforms), RootData aims to serve as a reliable source of validated information. The team notes that core team changes around TGE events often signal project risks, yet such details are frequently overlooked. To uphold transparency, RootData publishes monthly reports on false fundraising claims, conducts in-depth analyses (e.g., exchange listing reports), and cross-verifies data rigorously—even declining unverified submissions. They also engage with industry leaders like Binance to align on data accuracy goals. The long-term vision is to transform isolated data points into structured, actionable transparency reports that support informed investmen...

Author: @BlockCookies

Hello everyone, I am the Data Activity Lead at RootData.

The second round of RootData's Bounty Activity has been successfully concluded. While sharing this review, rather than just cold numbers, I'd like to discuss: Why is promoting 'data transparency' in Web3 extremely challenging, yet something that must be done?

First, here are the data for this round's activity: Over 140 unique users participated, providing 1220 pieces of feedback, ultimately resulting in 564 validated data points, with an average approval rate of 46.2%.

Overview of Round 2 Bounty Activity Data

This activity helped RootData supplement nearly 300+ 'People Behind the Alpha,' such as executives and leads from MOMO.FUN, Subhub, boop, etc. These individuals often do not list their positions in their X bios or LinkedIn but may appear at events or be active in communities.

Additionally, we corrected about 120 token unlock information points. Some had inaccurate TGE times, while some had unlock rules not disclosed promptly; these issues were all optimized through the community's efforts.

Furthermore, we conducted in-depth optimization on 150 existing data points. For instance, we found that the founder of Fanable was mistakenly recorded as a non-Web3 individual with the same name, and its Managing Director Sergio had already left; the AINFT project had long changed its Twitter account...

Why are we pushing for transparency in the Web3 space? This data might seem mundane, and RootData itself is an expert in aggregating off-chain data, so why spend our own funds and mobilize the community for such 'grunt work'?

Honestly, when my boss @yubopan1 assigned me this task, I hesitated too. But one thing he said struck a chord: "From the ICO era to the FTX incident, the biggest tragedy for users is the lack of fair 'investment知情权 (right to know).' As crypto moves towards compliance, data platforms must be at the forefront, acting as that mirror."

As the data lead, I deeply feel his judgment is correct: Relying on a single source is insufficient for accuracy. Data未经多方验证 is不足以让 RootData become a platform trusted by investors.

Take token unlock data alone; it's very 'fragmented': the same project might have 5 different versions across 5 mainstream unlock platforms.

As is well known, Binance Listing requires submitting at least 3 team members. RootData has cataloged over 18,000 industry figures. How many update their resumes urgently before TGE, and how many 'quietly leave' after securing funding?

This round revealed: Significant projects experience frequent core team changes around TGE. For investors, this is often a 'barometer' of the project's direction. If no one verifies and discloses this, it gets lost in the daily information overload.

To ensure 'transparency' isn't just a slogan, our current implemented solutions include:

Monthly disclosures of false funding intelligence.
Regular in-depth research, like the recently published 《Exchange Listing Decision Report》.
Increasing the frequency of LinkedIn profile动态抓取 and verification.

Moreover, we insist on rigorous review standards. In this round, a user provided detailed information on the River development team, but the source was merely a post by a third-party account on Binance Square. Despite the detailed content, due to the lack of official endorsement or multi-source cross-verification, we still chose not to approve it.

This round focused on 'Binance Alpha,' and we also attempted communication with the Binance team. We don't aim to target any specific exchange; on the contrary, we hope to stand together with industry giants.

We once reached out to the Binance team to confirm some key dimensions, and the response was very positive: "If there's any information regarding Alpha that needs confirmation, feel free to communicate anytime."

Single-point data correction is just the beginning. In the future, RootData will connect 'discrete data points' into 'logically rigorous transparency reports,'甚至 transforming them into practical investment strategies.

Transparency is a持久战 (long-term battle) and an inevitable path for Web3 to go mainstream. We need more 'data hunters' to join us in揭开迷雾 (lifting the fog). Everyone is welcome to leave comments and discuss.

Only 153 Venture Capital Firms Invested in July: Is the Crypto VC Industry Experiencing a 'Mass Extinction'?

In July 2026, only 153 unique venture capital firms participated in disclosed crypto funding rounds, marking the lowest monthly count since November 2020. This figure represents an 87% decline from the peak of 1,177 firms in 2022. Overall, the first seven months of 2026 saw crypto projects raise approximately $11.78 billion across 481 rounds. This crypto VC contraction contrasts sharply with the broader venture capital landscape, where global VC investment reached a record $560.4 billion in H1 2026, heavily fueled by major AI company financings. This shift in capital allocation has drawn funds away from the crypto sector. Within crypto, funding is highly concentrated. Trading platforms, prediction markets, and payment sectors absorbed 53% of the total capital. While early-stage deals remain frequent, the largest sums flow to a few late-stage rounds and mergers & acquisitions, which surged to $7.23 billion in Q2 2026. The market is consolidating around top funds like a16z crypto and Dragonfly, which successfully raised new multi-billion dollar funds, while many smaller firms have retreated. Analysts describe this as a "great extinction" for crypto VCs, where capital is becoming more selective, favoring proven business models and assets over early-stage speculation. This raises the bar for project quality, funding efficiency, and viable exit paths.

marsbit3m ago

Only 153 Venture Capital Firms Invested in July: Is the Crypto VC Industry Experiencing a 'Mass Extinction'?

marsbit3m ago

Storage Chip Stocks Skyrocket: Is This a Return to a Bull Market or a Dead Cat Bounce?

The storage chip sector experienced a dramatic overnight rebound in US markets, with major stocks like SK Hynix and Micron surging over 17%. This sharp reversal is attributed to multiple factors: potential short-selling bans and market stabilization measures from South Korea, significant leverage unwinding (with leveraged ETF assets down ~70% from peaks), cooler-than-expected US inflation data easing rate hike fears, and a stellar Microsoft earnings report showing strong cloud/AI demand growth. Microsoft's results, highlighting Azure revenue crossing $100 billion, helped counter the "AI bubble burst" narrative that had pressured the sector. However, the article cautions against declaring a sustained bull market. Key risks remain, including potential further interest rate hikes by the Bank of Japan, which could pressure global carry trades, and unanswered questions about whether high industry profits can be maintained amid the supply cycle and new competition. Investors are advised to monitor whether gains hold, watch for updates on HBM/DRAM/NAND pricing and orders, and await key upcoming events like NVIDIA's earnings. The conclusion emphasizes disciplined investing—avoiding panic selling or FOMO buying—and using strategies like hedging to manage risk in this volatile environment.

marsbit13m ago

Storage Chip Stocks Skyrocket: Is This a Return to a Bull Market or a Dead Cat Bounce?

marsbit13m ago

Strategy's Loss in the Second Quarter Reaches $8.22 Billion Amid Bitcoin Decline

Strategy, the largest corporate holder of Bitcoin, reported a net loss of $8.22 billion for the second quarter. This loss was primarily driven by an $8.32 billion unrealized loss on its Bitcoin holdings due to a decline in the asset's price during the period. Despite these paper losses, the company increased its Bitcoin holdings to 843,775 BTC, a 25% growth since the start of the year. As part of a new monetization strategy, Strategy sold approximately $218.4 million worth of Bitcoin, mainly to fund dividends for preferred shareholders, with $216 million of that sold after Q2 ended. The company also built a $3.75 billion cash reserve, which it claims is sufficient to cover over two years of dividend and interest payments, aiming to insulate itself from Bitcoin's volatility while meeting obligations. Following the earnings release, Strategy's stock (MSTR) rose 4.7% in regular trading but corrected slightly after-hours. This pattern reflects how the company's accounting results are heavily tied to Bitcoin's price swings, even as its long-term strategy remains unchanged. The report indicates that Strategy is maintaining its core strategy of accumulating Bitcoin while building a financial buffer. This quarterly loss follows a recognizable pattern, with the company posting significant unrealized losses in previous quarters (e.g., $12.4 billion in Q4 2025 and ~$12.5 billion in Q1 2026) due to fair-value accounting. A key technical shift is its new monetization program, which introduces periodic selling pressure on the market, transitioning Strategy from a pure accumulator to a participant that occasionally adds supply. A critical question remains: how long can the cash reserve cover dividend obligations if a Bitcoin price downturn persists beyond two years?

cryptonews.ru23m ago

Strategy's Loss in the Second Quarter Reaches $8.22 Billion Amid Bitcoin Decline

cryptonews.ru23m ago

Will Terrorist Durov Ban Russian Officials?

Telegram founder Pavel Durov publicly reacted to being labeled a "terrorist" by Russian authorities, stating the designation came after he refused demands for mass surveillance and censorship on the platform. In a Telegram post, he highlighted that this status formally bans him from "publishing information online." Durov concluded with a statement widely circulated: Russian officials "clearly don't understand who can ban whom on the internet." This remark suggests Durov could potentially restrict official Russian government and officials' channels on Telegram, which continue to operate on the platform despite its formal blocking in Russia. The situation parallels previous, slow-moving state directives, like switching officials to domestic cars, contrasted with the current push to migrate all government communication to the Russian-made messenger MAX by 2030. However, reports indicate many officials still use Telegram via workarounds, fearing surveillance on MAX, while alternatives like BiP and KakaoTalk recently became inaccessible in Russia without a VPN. Durov has not specified any immediate actions against state channels. His statement is an initial response, with further developments depending on the authorities' reaction. The dynamic differs from 2020 when Russian regulators lifted a block on Telegram; now, Durov implies control from within the platform itself over the official accounts that persisted through that earlier blockade.

cryptonews.ru23m ago

Will Terrorist Durov Ban Russian Officials?

cryptonews.ru23m ago

DeepSeek V4 Official Version Arrives, New Capabilities Emerge, Value-for-Money King Enters the Fray

On July 31st, DeepSeek officially launched the public API beta for its DeepSeek-V4-Flash model. A key highlight is its performance on multiple Agent benchmark tests, reportedly nearing or even surpassing the level of the V4-Pro preview version from three months ago. Notably, the Flash model achieves this with significantly smaller scale (130B active parameters vs. Pro's 490B), suggesting that post-training optimization and data quality may be as crucial as raw model size. DeepSeek emphasized that the V4-Flash-0731 uses the same model architecture and size as its preview version, with improvements attributed solely to "re-trained post-training." The update also marks the official debut of DeepSeek's self-developed Agent framework, "Harness." The move signals DeepSeek's strategic push to position its cost-effective Flash model as a competitive base for Agent applications—scenarios requiring autonomous planning, tool usage, and complex task execution—where inference speed and cost are critical. By natively supporting OpenAI's Responses API format and adapting for code-generation scenarios, DeepSeek aims not just to be a cheaper alternative but to establish its own ecosystem in the Agent era. This release follows DeepSeek's record-breaking ~$50 billion fundraising round roughly two months prior, underscoring market confidence in its technology and commercialization prospects. The company is reportedly preparing for another funding round at a valuation of approximately $71 billion. The Flash model's advancement represents a step in fulfilling the high expectations that come with this valuation, setting the stage for the impending release of the V4-Pro official version and intensifying competition in the global Agent landscape.

marsbit27m ago

DeepSeek V4 Official Version Arrives, New Capabilities Emerge, Value-for-Money King Enters the Fray

marsbit27m ago

Trading

Spot

Why Are We So Persistent in That 'Laborious and Unrewarding' Data Cleaning?

Abstract

Related Questions

Related Reads

Only 153 Venture Capital Firms Invested in July: Is the Crypto VC Industry Experiencing a 'Mass Extinction'?

Storage Chip Stocks Skyrocket: Is This a Return to a Bull Market or a Dead Cat Bounce?

Strategy's Loss in the Second Quarter Reaches $8.22 Billion Amid Bitcoin Decline

Will Terrorist Durov Ban Russian Officials?

DeepSeek V4 Official Version Arrives, New Capabilities Emerge, Value-for-Money King Enters the Fray

Trading