Lessons from Vibe Coding with 30 Billion Tokens

Lessons from Vibe Coding with 30 Billion Tokens

The cover image shows my Codex usage: 25 billion tokens.

Add in DeepSeek, Claude Code, and various proxy APIs, and from last September to today, I’ve consumed 30 billion tokens on vibe coding.

Today my token quota happened to run out, so I’d like to share my experience from this period of heavy AI usage.

These lessons are primarily from programming and software, though they generalize to some extent.

Important Note: There are no tricks in this article. There is no single skill or prompt that will make your vibe coding produce better code with fewer bugs.


1. Think of Yourself as the CEO

If you start treating AI as a productivity tool, the first thing you need to do is shift your mindset.

You have to tell yourself — you are the CEO.

Otherwise, you’re just burning yourself out while watching people you thought were less capable surpass you.

When Cursor first became popular, my Twitter timeline had two camps:

  • One group mocked the quality of vibe coding and seriously questioned whether AI-written code was even usable;
  • The other group couldn’t wait to start experimenting.

Today, I think everyone’s timeline looks like mine — the first group has almost vanished, though a few remain.

Those who oppose vibe coding are almost always engineers, and often the more experienced they are, the more they resist. I’m not sure if there’s a sense of existential threat in their opposition, but I can say with certainty that they resist because they’re too skilled.

They can’t help but review the AI’s buggy code and fix it by hand.

This doesn’t just happen in programming. I believe the same is true in every field: the more professional, meticulous, and conscientious someone is, the more likely they are to develop distrust during AI usage, leading them to abandon or oppose it.

Writers feel that having to revise AI-generated text over and over is a waste of time; designers feel that waiting for AI to generate images is like playing a lottery…

Meanwhile, those who embrace vibe coding are mostly — and I’d go as far as saying almost entirely — non-engineers, or junior engineers. To some extent, you could describe them as “ignorance is bliss.”

What used to require months or years of learning, followed by weeks of work, now takes just a sentence or two and a bit of patience —

BANG!

An app is born!

It’s hard not to get addicted to that.

And so a strange scene unfolds: non-engineers start writing software one after another, while engineers keep watching from the sidelines — a different form of the competency trap.

Fortunately, as AI capabilities improve, more and more engineers are coming around. But their professional expertise still holds them back from faster production.


1.1 Don’t Obsess Over Process and Details

When it comes to vibe coders, professional engineers often think: they have no idea what the AI is actually doing, and therefore worry about the future quality of the software.

But if you’re an engineer reading this, try to shift your perspective —

Does your boss understand your work?

Or maybe your boss used to be an engineer, but now doesn’t care about the actual code at all, right? They might discuss architecture a bit, but they fundamentally don’t care about code.

Yes, everyone who isn’t a domain expert but uses AI to produce results in that domain has exactly the same mindset as your boss.

Most people don’t actually care that much about details. A one-pixel offset, a slightly awkward wording, a minor performance issue — most users won’t even notice. If you’re drowning in details because of your expertise, you’re only draining yourself.

I know this sounds like a hot take. The prevailing success narrative values attention to detail, meticulous craftsmanship, and that classic line about only the paranoid surviving.

But stories have to be told that way for books to sell, right?

Demanding 100% from employees is how you get 80% results, right?

Of course, I hope readers won’t think in such binary terms:

  • Either scrutinize every AI output obsessively;
  • Or let everything go completely unchecked.

I think you should dynamically adjust based on your actual situation.

If you’re building software solo, don’t apply professional engineering standards to AI-written code — if it runs, it’s fine. If you’re a small team with real users, then yes, it’s time to start caring about architecture.

Know what you’re building, how many resources you have, how much energy you can invest, then decide your standards for AI.

Code quality should match the user stage. If it looks good enough from the user’s perspective, there’s no need to keep asking the AI to refine it.

Restraining your fixation on details and focusing on the big picture like a CEO will make you far more efficient. Results are what matter most — let AI handle the process.

Often, the professional expertise you take pride in is just an illusion created by workplace specialization. Step outside that office, and it may not be as valuable as you imagine.

To close this section:

When someone has spent decades accumulating deep knowledge and experience in a field, then turns around to see an intern use AI to whip up a “little thing” in 5 minutes — not great, but functional enough — the feelings are inevitably complicated.

That feeling is a kind of helplessness.

I’ve felt it too. But unfortunately, helplessness alone changes nothing.


2. Productivity ≠ Getting Rich

If you don’t have stable cash flow, don’t rush into entrepreneurship. AI isn’t a cheat code for wealth — it’s just cheap, efficient productivity.

Right now, one person maintaining 10 apps is no problem — some of them might even be large-scale projects. One person can create comics or even animations. One person can publish 100 articles a day across social media platforms.

Productivity has exploded again, and social media is flooded with AI-fueled rags-to-riches stories. After “tech nouveau riche,” we might soon coin “AI nouveau riche.”

Because AI’s barrier to entry is just the ability to speak and type, combined with social media’s relentless hype, people start developing an illusion —

I could totally do that.

Yes, in the AI era, it’s not just LLMs that hallucinate — humans do too.

But.

A viral short video’s success doesn’t come from the video itself. Seedance can give you beautiful, animated visuals — that’s all.

A successful piece of software isn’t about the code itself. Some readers might think the key is the idea, the innovation — it isn’t.

Ideas aren’t worth much. At least before execution, an idea is worth about two bucks. Productivity isn’t valuable either — it’s just a resource.

Any project’s success requires getting multiple things right, plus a dash of luck. Yes, sometimes you do everything right and still fail — you just needed a little more luck.

Code isn’t done once it’s written and runs locally.

You need to:

  • Deploy;
  • Get listed on app stores;
  • Market and promote;
  • Continuously iterate and maintain;
  • Monitor competitors’ features.

You might do everything right, but then a competitor goes free and users abandon you overnight.

You can write code as elegant as poetry, but if your monthly reach is only 100, users simply can’t find you.

2.1 Don’t Get Trapped in Your Domain

AI is a cognitive amplifier — it magnifies your strengths and your weaknesses alike. It makes it far too easy to keep working in areas you’re already comfortable with.

On one hand, this is because of its perpetually positive tone; on the other, your professional expertise means it delivers quick interim results more easily.

AI writes subpar code → You quickly decide the fix direction → AI executes again → Success → Dopamine hit → Next feature…

Be careful of the trap created by this endless stream of productivity.

Isn’t it becoming common to compare vibe coding to a drug? I’ve experienced it too — once I start, I can’t stop. I can code from 7 AM to midnight.

What you really need to invest time in is the areas where you’re weakest. That’s where you should be spending your time with AI.

There are too few geniuses who succeed by perfecting a single skill. The odds of most people becoming the next Da Vinci or Steve Jobs are vanishingly small — don’t bet on being the chosen one.

If you’re a programmer planning to build a software product, start with market research and competitive analysis — don’t open your IDE first.

And during competitive analysis, don’t focus on feature lists. Trust me, you don’t even need to download the competitors. Whatever they can do, AI can probably do too — unless you’re building a brand new operating system.

What you really need to focus on is the stuff you’re not good at:

  • How are they driving traffic;
  • How are they pricing;
  • Which regions are they most active in;
  • Which social media platforms are they most active on.

3. Don’t Get Hooked on AI Tips and Tricks

Almost overnight, every timeline I follow converges on a single topic — AI. Of course, besides the beautiful girls — the cosplayers on my feed are the one constant that never follows trends.

Don’t obsess over collecting AI tips, and don’t believe any AI get-rich-quick myths.

People who’ve had their lives changed suddenly by AI do exist. But most I’ve seen already had a foundation of expertise and insight in their field. I’ve seen very few cases of someone with no prior background changing their fortune through AI alone.

I once had over 100 bookmarks saving all sorts of AI tips. But I quickly came to a realization:

I was being an idiot.

Here’s why.

3.1 Nobody Actually Shares Real Secrets

I figured this out early on:

If a tip is truly valuable, it most likely won’t be shared — unless the value it generates is less than the traffic revenue from sharing it.

If you discovered a gold mine, the first thing you’d do is look around to see if anyone else noticed, right?

3.2 Most Content Creators Don’t Care Much About Truth or Value

Most of the authors filling your timeline are professional content creators.

Their job isn’t to produce valuable content — it’s to constantly appear in your feed.

Producing genuinely valuable content is merely a means to that end — and it doesn’t even need to actually be valuable, just look valuable.

Of course, there are people who share genuinely valuable content — just not many.

Given this reality, you can safely filter out 99% of social media content, especially the stuff with beautiful layouts, clickbait titles, and polished cover images.

Nobody has time to download some new skill, spend an afternoon testing it, then write up a beautifully formatted article about it. If they did, they’d never meet their publishing quota.

Many tips are utterly worthless.

Most decent-looking tips are just another form of over-marketing — they don’t charge you money, but your attention is valuable too, isn’t it?

3.3 Your Optimization Tricks Can’t Raise the Model’s Ceiling

Prompts, skills, context management techniques — fundamentally, they’re compensating for LLM weaknesses, not breaking through the model’s ceiling.

The good news is that the models’ raw intelligence is still improving rapidly.

So you’ll find that as models update, many of your old prompts and skills become obsolete — they might even become counter-productive.

From my experience, there’s no need to spend too much time on AI usage techniques.

The time you spend researching tricks would be better spent doing other things, and waiting for the next-gen model from OpenAI / Anthropic.

Didn’t an Anthropic engineer recently say they barely spend time writing prompts anymore? Yet just two or three months ago, a good prompt on Twitter could get hundreds of thousands, even millions of reposts.

A recent example of mine: whether it’s Claude Code or Codex, I can never get mobile app UI to a satisfying level.

I’ve tried Claude Design, Open Design, and various prompts, but none met my standards.

These approaches work reasonably well for web UI, but that’s because the model itself already performs well on the web.

If you’re curious how I plan to solve this — my answer is: wait for the next-gen model, and accept whatever the current one can deliver.

Non-vibe-coders can stop reading here. Everything below is specifically about vibe coding lessons.


4. Codex Is More Reliable Than Claude Code

Since GPT-5, my primary coding tool has been Codex exclusively. At work, I occasionally use Claude Code, but whether it’s bad luck or not, nine out of ten times Claude Code performs poorly.

My sole requirement for a coding agent is:

I give a prompt, the coding agent makes changes, and it needs to work on the first run. Minor issues are fine, but it has to run end-to-end.

On this front, Claude’s latest models still struggle.

Earlier versions of Claude, before Sonnet 3.7, were particularly prone to acting on their own:

  • Modifying unrelated files;
  • Bypassing tests;
  • Especially bypassing tests — that really irritated me.

From Sonnet 3.7 to the recent Opus 4.8, things improved slightly, but bugs are still frequent. A single PR often requires multiple rounds of fixes.

In my most recent test, Claude Opus 4.8 exhibited even more unauthorized modifications and prompt deviations than before — worse than previous versions.

What I find especially puzzling is that it seems to have difficulty understanding my intent and the project context.

My projects have always kept AGENTS.md and CLAUDE.md in sync. Speaking of CLAUDE.md — it even fails to follow its own documentation in some cases.

So currently, my daily vibe coding, code review, and security auditing all use the Codex series.

I believe Claude Code is currently overrated. I think there are several reasons:

Migration costs. Claude Code had a massive first-mover advantage. Many people have built comprehensive workflows around it, or have developed enough familiarity with the model to know what tasks it handles well and what tasks will likely fail.

Media inertia. Content creation has momentum. Once a traffic formula is validated, everyone keeps repeating that topic until it completely stops working. And as I mentioned, most content creators don’t particularly care about their content — they may not even care about vibe coding. As long as publishing Claude Code content keeps generating traffic, they’ll keep praising it.

Recent negative press around OpenAI. No comment.

What? You’re asking about Gemini? It feels like Google doesn’t really care about vibe coding, at least not right now.


5. Documentation and Testing Matter More

Prompts are indeed becoming less important.

Before, when I had an agent build a feature, richer descriptions meant better results. Typically I’d include:

  • Business logic;
  • Data flow;
  • UI description.

But with the current Codex 5.5, writing a single sentence versus a detailed spec yields roughly the same results.

This means I iterate on features faster, but testing has become harder.

So recently, I’ve barely worked on new features. Instead, I’ve patiently fixed every bug and security issue in my projects, then started writing comprehensive test scripts until the entire project could pass automated E2E tests.

I believe that in the mid-to-late stages of a project, pausing feature development to invest time in maintaining proper test scripts is more important than having AI churn out new features.

Of course, traditional coding should follow this principle too.

What I want to emphasize is: you can review code less — or even not at all — but writing automated tests requires patience.

Because your code constantly changes through iterations, you need to prevent two situations:

  1. The LLM reverse-modifying code to satisfy outdated test scripts;
  2. The LLM writing integration tests without first updating test scripts, causing test failures that trigger reverse code modifications.

LLMs only ever care about what’s in front of them. They have no historical memory, and neither do you — because the volume of iterations is massive and you didn’t write the code.

So here’s what I typically do.

5.1 Maintain Your AGENTS.md Well

The importance of AGENTS.md is probably underestimated. I believe it’s more important than any skill, plugin, or trick — it’s the most important file in the repo.

It’s a document absolutely worth investing effort into, and it’s the only file I actually put effort into maintaining across all my repos.

I set up an automated task to have AI update the document based on commits, with intervals determined by the project’s iteration frequency.

I cycle through reviewing all my projects’ AGENTS.md files roughly every 3 days to a week.

5.2 Give the Agent a Rule

When code fails tests, immediately compare the commit history of the business code and test files, analyze based on the “whichever was updated more recently is more likely to be accurate” principle, then report to me. Do not modify anything autonomously.

You might ask:

Code was updated but test scripts weren’t — that makes sense. But test scripts were updated and code wasn’t — how does that happen?

If you’re running 10 worktrees simultaneously, and some features overlap, certain agents will update tests as they work while others won’t.

When you finally merge, this exact situation can occur.

In these cases, you can’t let AI resolve it automatically — it requires human intervention. Have the agents first provide their assessment based on commit order, then hand the decision to you. The final call must always be yours.

Once your project can reliably pass automated E2E tests and even manual regression testing starts feeling redundant, you can subscribe to a coding plan, open multiple worktrees, and start developing features in parallel.

Advice for non-developers: While testing is important, don’t write too many test scripts in the early stages. This expands the AI’s working scope, because both Codex and Claude Code tend to run tests after completing work or update test files. But in the early stages, the product changes too much. Test scripts add little value and can sometimes flag intentional feature changes as bugs.

Usually in the early stages, I don’t let AI write any test scripts at all.

Only when I’m confident certain features won’t undergo major changes do I start writing unit tests.

Complex integration tests wait until the product stabilizes; E2E tests wait until the UI design system is also reasonably stable.


6. Don’t Go Cross-Platform During MVP

My first project was built with Flutter for iOS, Android, and Mini App — a multi-platform release.

It was a painful lesson. I don’t do that anymore.

The inertia from working at companies made me think cross-platform meant more users. But when you’re a small team, cross-platform is a disaster.

Even though Flutter, React Native, and similar solutions are quite good — it’s still a disaster.

The reality is that your first MVP will most likely be a completely different product six months later.

The product iteration process is the Ship of Theseus.

Going cross-platform in the early stages does nothing but slow you down. The more platforms you support, the more confirmations you need for every change.

You can choose a cross-platform tech stack, but in the first phase, focusing on and releasing for just one platform will make you faster.

I’d also suggest: if your product can work on the web, the web is still the best platform for validating an MVP — not mobile.

Both Codex and Claude Code are already very strong at web development. This isn’t just about coding capability — it’s about UI aesthetics, the convenience of automated testing, and speed of traffic acquisition — driving traffic to apps is hard.

One final note for this section:

In the past, Flutter was always my go-to client-side tech stack, but I’m now starting to question whether Flutter and React Native are still good choices.

Cross-platform frameworks like Flutter exist precisely to run one codebase everywhere, amplifying productivity. But now productivity isn’t the bottleneck — in an era where everyone’s building SaaS and apps, the experience matters more.

So when my Flutter project consistently failed to deliver a satisfying Liquid Glass effect, I seriously considered writing the iOS client natively in Swift.

For my next app, I’m thinking about going native on every platform.

But I haven’t figured out how to maintain cross-platform interaction consistency. For now, design.md constraints aren’t strong enough — perhaps we need to wait for models to develop better UI understanding.

The consistency solution, much like I’ve said before, might come down to waiting for the next-gen model.


7. Monorepo Is Better Suited for AI

In the beginning, LLM context windows were very limited, so my earliest projects always split front-end and back-end into separate repos.

But now, I use monorepo for everything.

On one hand, the token window isn’t as tight anymore. On the other, it lets the agent see the full picture when making changes.

Separate front-end and back-end repos constantly require time spent coordinating API interfaces.

Now, I just describe the requirement, without needing to spend time debugging interfaces back and forth between agents in two repos.

So even for legacy projects with separate front-end and back-end repos, I now put them into a single large folder and set the coding agent’s project root to that folder.


8. Interaction Design Is LLM’s Biggest Weakness

Traditional programming still has one advantage in the mobile app space:

It can create emotionally compelling interactive experiences.

Note that I’m not talking about UI — I’m talking about UX / UE.

Current LLMs are still quite poor at UI. No amount of skills, prompts, or plugins can fix this. The core issue is that the underlying capability hasn’t caught up.

While there have been some breakthroughs on the web, it still takes multiple rounds of back-and-forth with the agent.

For mobile apps — a product form that’s far more interaction-heavy — LLMs still fall short.

So if you’re trying to build an app with vibe coding, I’d recommend not spending too much time on this aspect.

Make the interactions simple, logical, and move on.

Unlike UI, which can be communicated through screenshots, UX / UE is extremely difficult to even describe.


9. Final Thoughts

These aren’t recent vibe coding insights — they’re things I’ve often shared with my team, and I’d like to share them with you too.

“A hammer in your hands can drive nails, but in Michelangelo’s hands it can sculpt David. It’s not the hammer that makes the difference.”

In today’s AI era, I think we need a revision:

“A hammer in your hands can drive nails, but in Michelangelo’s hands it can sculpt David. Some people chat with AI, some build apps with it, some create videos with it. It’s not the AI that makes the difference.”


Comments

💬 Comments coming soon