Author: Huang Shiliang
"Data is the new oil"—this phrase is almost worn out in the AI circle. But in the mainstream narrative, it seems to have nothing to do with us ordinary folks—it's a capital game for tech giants, competing with GPUs and trillions of parameters.
But I later pondered it, and this metaphor is actually a very good compass for us to navigate the AI world.
I. A Severely Misunderstood Metaphor
"Data is the new oil"—this phrase has almost become the bible of the AI era.
But honestly, most people's first reaction upon hearing this is: This is TMD a big company thing, what the hell does it have to do with an ordinary person like me?
Because in the mainstream narrative, the "data" they talk about is stuff on the scale of the entire internet, Wikipedia—petabyte-level things; "refining technology" is tens of thousands of H100 GPUs + a bunch of scientists with million-dollar salaries; the "final product" is an omniscient, all-powerful God-model like GPT-5.
This logic is fine commercially, but the problem is—it basically says: Don't participate, you're not at the table.
We ordinary people are directly kicked out of the game.
Even darker, there's another version of this saying that pisses me off more the more I think about it:
Data is the new oil, our consumer data is the oil field in Venezuela; and companies like Meituan, Alibaba, Douyin are the US's Trump.
They accidentally (actually deliberately) come to our place, stick pipes in to extract oil, take our data for free, refine it into "98-octane gasoline" (precise algorithms, big data price discrimination), and then forcefully sell it back to us.
The result is: we become the suckers—not only contributing the raw materials for free, but also helping the platforms count their money after being sold out.
In this version of the story, the players are only the giants. We have neither massive data nor capital, let alone the ability to train a large model. So "data is the new oil" becomes a slogan that sounds awesome but is utterly useless, even somewhat disgusting, to individuals.
II. Change Your Perspective, and There's Hope
I think this consensus is problematic. We need to look at it from a different angle.
If we insist on applying the concept of "data is the new oil" to ordinary people, then the question is no longer "is this metaphor correct", but rather: how exactly does this thing guide me to work?
The reason the oil industry is awesome is that it has a very clear, unavoidable logical chain:
Find oil fields (exploration) → Build refineries (processing) → Standardize products (gasoline) → Build channels (gas stations) → Sell to users.
For us ordinary people, the "data oil" of the AI era must also be broken down step by step according to this logic. Miss one step, and your AI anxiety will never turn into productivity, only into the mental drain of "scrolling news + saving links + watching others get rich".
Below, I'll break down how ordinary folks should proceed according to this logic.
III. Step One: Where is the Oil Field?—Find the "Micro Rich Mines" Around You
In traditional industries, you go to places like Saudi Arabia, Russia to find oil. But on our path, the oil field is actually right next to you. I think there are at least two major categories.
1. Personal Private Data: Your Own Backyard
This is the most easily overlooked, but the most stable type of data. It doesn't need to be large in scale, but its purity is extremely high.
For example, your work processes, the logic behind your decisions, the pitfalls you've stepped into (failure reviews), and the unwritten rules you've learned from years in the industry.
Also, your digital footprint: notes, code repositories, drafts, emails written over the past decade... these all count.
The value of this lies in: it belongs entirely to you. A "personal digital twin" or "domain expert Agent" trained on this data cannot be replaced by any general-purpose large model.
If in the past 5 years you haven't really used a computer in your work and life, relying solely on a smartphone to get by, then you probably won't evolve into an AI producer, destined to be just an AI consumer.
If you really want to make money with AI, I think you need to buy a computer. Why?
Because without a computer, you most likely don't have a systematic data沉淀 (precipitation/sedimentation), you are a complete "oil-poor country". Don't expect the few pictures in your phone's gallery, or the dozens of GB of voice messages and fragmented chat records on WeChat to do anything big—too many impurities, too poor structure, you can't refine qualified 92-octane gasoline, at best you might get some 29-octane stuff.
2. Public Data Rich Mines: Assemble Your "Exploration Team"
The second category is data that everyone can see, but 99% of people are just "consuming" rather than "exploring": X.com, public accounts, arXiv, YouTube... these are the "high seas" of the data era.
The internet today, especially social media, is deteriorating too fast. I dare say, definitely over 50%, maybe even over 90% of the content is AGRC (AI Generated Rubbish Content).
These people use AI to mass-produce nonsense, directly polluting the stratum. If you're not conscious during geological exploration, you'll just dig up garbage.
Worse: if you feed garbage to your brain or to AI, what you refine in the end can only be garbage, and it might even clog your refinery.
So to ensure what you dig up isn't AGRC, I suggest you create a strictly curated **"inspiration source portfolio". But note: just reading is useless, this is hoarding crude oil. You must learn crude oil pre-processing **—run each source through AI to turn it into fuel the machine can read:
Deep Sedimentary Rock (Books): This is the ballast. Set an annual reading list, must include professional classics and literature.
AI Method: Don't just read dumbly. Must use Gemini or ChatGPT to assist reading, discuss each chapter with it, let it generate thought questions. After reading, must create electronic reading notes and feed them to AI—this is your knowledge base.
Frontier Exploration Zone (Papers & Reports): Browse arXiv or Google Scholar more often. Force yourself to啃 (chew on) one paper a week in a "paper lunch meeting".
AI Method: Can't read the raw meat? Throw the PDF directly to NotebookLM or ChatGPT, let it summarize the core arguments and data for you, turn the "tough bones" into "rich broth" to store.
Surface Runoff (News & Information): Use RSS or customized feeds. I scan headlines, only deeply bookmark the truly awesome ones.
AI Method: Don't just bookmark links. Copy the content, let AI help you tag it, extract keywords, categorize and save it to your note-taking software, otherwise bookmarks just gather dust.
Associated Gas Field (Podcasts & Lectures): Listen to stuff like TED Radio Hour during commutes. Force yourself to attend one or two offline salons each month.
AI Method: When you hear a good point, don't just nod. Use Whisper to transcribe the audio to text, then let AI organize it into structured notes. Sound cannot be searched, but text can.
High-Yield Oil Well (Social Media): Follow a group of real experts on Twitter/X. Regularly clean your follow list, unfollow those posting emotional garbage.
AI Method: See an awesome Thread, copy it directly to AI, let it analyze the logical flaws, or integrate its viewpoints into your knowledge system.
Field Expedition (Life Observation, Field Research): Deliberately practice "viewing life with questions". This is perceptual data that AI crawlers can never scrape.
AI Method: When inspiration strikes, don't type, just talk via voice, then throw it to AI to organize into a diary. Let AI help you turn ramblings into logical insights.
We must develop the habit of随时 (always) picking up the phone and口喷 (verbally spewing) a bunch of words to Douban (or similar AI assistant).
These six sources are your "hybrid oil field". Only if your input is wild and diverse enough, and all pre-processed by AI, will the stuff you refine not be clichéd.
IV. Step Two: Where is the Refining Equipment?—Don't Just Stare at Large Models
Found the oil, next step is to refine it. Mainstream media忽悠 (hype/trick) you into buying GPUs every day, but for individuals, the real refinery must be your own software stack + thinking process.
1. Large Models are Just a "Boiler"
Paying for a ChatGPT Plus subscription doesn't make us awesome, it's like buying a boiler and then standing next to admire its brightness—but you're not starting work!
Large models like ChatGPT, DeepSeek are, frankly, basic power units, the foundation. They can burn, but that doesn't mean you can produce oil.
2. The Real Refinery is the "Personal Tool System"
An efficient personal refinery needs these components:
Pipelines (Toolchain): VS Code, Python, Skills (likely referring to AI agent skills/functions)这些东西 (these things).
Process (Methodology): This is the core barrier. It's how you write Prompts, how you build a RAG knowledge base, how you make several Agents (skills) cooperate.
The focus is never "how strong the model is", but rather: how do you interact with AI, how do you translate the tacit experience in your brain into instructions AI can understand.
This set of "personal engineering system" is your refinery, not the model itself.
V. Step Three: The Product is Not the End, Selling it is the Real Battle
This is the cruelest link in the whole chain. Sinopec just needs to transport oil to gas stations, and car owners naturally queue up. But in the AI era, productization and sales are really TMD difficult.
1. The "Gasoline" Refined by AI is Extremely Non-Standard
The stuff you refine using "personal data" + "large model" is most likely not universal gasoline, but rather:
- A Python script only you can use
- An article with a unique style
- An AI-processed report after seeing a doctor for check-ups
- A set of personalized legal advice
These things are not universal, not standard, and very scenario-specific.
2. The Real Big Problem: Who to Sell To?
So before you start, you must ask in reverse: Who the hell am I going to sell this thing I make? This actually argues backwards for what oil we should refine.
Sell to yourself (Self-use): Saving time is making money, this is the easiest closed loop to achieve.
Sell to businesses (B2B): Package your Prompt or workflow into a solution. This requires extremely strong pre-sales ability (忽悠能力 - hustling/convincing ability).
Sell to the public (B2C): Make it into an App or content column. This depends on whether you have traffic distribution ability.
Actually: Refining oil (generating content) in the AI era is getting easier and easier, but building gas stations (distribution & sales) is unprecedentedly difficult.
VI. Don't Forget Environmental Protection: Don't Let Waste Bury You
Traditional oil refining produces waste residue, wastewater, exhaust. If you don't treat it, the refinery won't make money before the person is熏死 (fumigated to death).
Data refining is the same, **"cyber pollution"** is extremely serious, you must have an "environmental department" to clean up regularly.
1. Clean Up Expired "Tool Waste"
AI evolves way too TMD fast, ridiculously fast.
The "Top 10 Must-Use AI Navigation Sites for 2025" you bookmarked last month might have five倒闭 (go bankrupt) this week; the AI drawing parameters you are struggling with today might be降维打击 (dimensionality reduced/demoted/surpassed) by "one-click generation" tomorrow.
Don't be a "cyber scavenger", hoarding a bunch of outdated tools unwilling to throw away. Uninstall what needs uninstalling, unfollow what needs unfollowing. Tools are for using, not for worshiping.
Hoarding outdated tools is like filling your house with rusty scrap iron, it only slows down your operating speed.
2. Discard Drained "Data Shells"
Many people have "squirrel syndrome": download upon seeing a PDF, bookmark upon seeing a video, fill their hard drives with several TB of materials, and feel like they own the world.
That's not knowledge, that's landfill garbage.
The true environmentally friendly approach is: use AI to榨 (squeeze/extract) the "oil" from PDFs, videos, long articles—generate summaries, extract golden quotes, convert them into your notes.
Once榨干 (squeezed dry), throw away the original file (or archive it to cold storage). Your attention is an extremely expensive limited resource, don't let these raw files占用 (occupy) your bandwidth.
Only keep "refined fuel", discard "crude oil shells", this is an efficient refinery.
3. Cut Off Those "Blood-Sucking Zombie Bills"
AI anxiety makes us do many stupid things, the stupidest of which is: spending money in a hurry to buy a sense.
Signing up for classes, buying courses, rushing to conferences, buying Plus memberships... the costs are not low. What's worse, many things一旦订阅 (once subscribed, especially monthly deductions), you often forget to cancel.
I once bought a server for testing, for over three years, it silently deducted money from me every month, hidden among a pile of bills, I had no idea—I only used it on the day of testing.
Also, in a moment of brain fever, I bought ChatGPT, Gemini, Claude, Perplexity... a bunch of auto-renewals, and bought some APIs. Result? Most of the time they were吃灰 (gathering dust).
Damn, what a waste.
These are things that must be cleaned up for "environmental protection". Otherwise, before you refine any sellable oil, your family assets will be stolen by these pollutants.
VII. Final Words: An Action Map
When we strip off the grand exterior of "data is the new oil", it is no longer a distant capital story, but a cold, hard roadmap for ordinary people.
In this era, if you want to win, quickly check your "balance sheet":
- Reserves: Are you still scrolling Douyin? Or are you already consciously accumulating high-quality data through "inspiration sources" + AI assistance? (Remember to avoid AGRC garbage)
- Production Capacity: Do you have your own set of tools and methodology (refinery), and what oil to refine?
- Channels: Have you figured out who you are going to sell these non-standard products to? This can argue backwards for production capacity, whether to refine 92-octane or 98-octane oil.
- Environmental Protection: Are you hoarding a bunch of digital junk? Have you checked your credit card bills to cut off those zombie subscriptions?
Final advice: Forget those news about billions of parameters. Start today—buy a computer, establish your "inspiration data sources", go drill your first micro oil well, sell to yourself first, refine an automated tool that solidifies your work into AI-primary, self-secondary.
Actually, I'm also very confused, I've been折腾 (tinkering/struggling) with AI for over three years, I haven't refined anything. I only refined an AI to manage my to-do list, and refined an AI to manage my reading notes, I'm still thinking, what can I refine?