Today I was a guest to a software engineering class run by an esteemed colleague. I was asked to describe my usage of AI for software development. Being the first time I'd organized my thoughts around my process, I was surprised to learn that I had a process at all. In retrospect, it seems I've been developing and refining this process, however informally, for over three years (I enjoyed beta access to ChatGPT 3.5).
I believe the process is repeatable and thus far has produced useful outputs, at least for my work. I'd like to share it with you, not only to hopefully verify its usefulness across a more broad spectrum of use cases but also in the "open-source" spirit that made me fall in love with software development in the early 00s.
"Human knowledge belongs to the world, like Shakespeare, or Aspirin." ~ Teddy Chin, AntiTrust (2001)
I'm calling the process "GAN-coding", in contrast with the widely used and overloaded term "vibe-coding".
Vibe-coding is a slangy term used to describe a style of software development in which a human uses natural language prompts to instruct an AI model to generate code rather than manually writing code by hand.
GAN-coding is a software development methodology where code is produced through repeated adversarial cycles between generators and discriminators, with humans retaining final authority, and correctness is enforced through explicit rejection loops rather than trust.
This isn't "GAN" in the machine-learning sense, technically. I'm borrowing the generator/discriminator pattern because it matches the workflow: produce an artifact, then adversarially challenge it.
"Vibe-coding" has a broad scope. In practice it can mean hyper-rigorous AI-assisted coding very much like what I'm labeling "GAN-coding", OR it can mean a one-shot "make me a website that takes payments for a digital product" or anything in between. When I refer to vibe-coding in this article, I'm scoping down to "average-ish" entrepreneurial or professional use of AI-assisted coding.
GAN-coding can be seen as an extension of vibe-coding with some key differences:
| Characteristic | Vibe-coding | Manual human coding | GAN-coding |
|---|---|---|---|
| AI-driven: LLM generates code | Very high | Low (incidental assistance) | Medium-high |
| Natural language (not code) prompts | Very high | Low (inline code completion) | Medium-high (structured prompts including code) |
| Human comprehension of generated code | Low-high | Very high (ideally) | Medium-high (by definition) |
| Applied rigor: code reviews and testing | Low-high | Low-high | High (by definition) |
It's like the "TDD" (test-driven development) cousin of vibe-coding. If you already do strict code review + tests with AI assistance, GAN-coding is the largely the same but I've attempted to make it explicit, role-driven and repeatable.
- No Single Agent Is Trusted in Isolation Every meaningful artifact—design, code, tests, or review—is challenged by an independent agent. Humans and AI models alternate roles as generators and discriminators.
- Discrimination Is the Bottleneck Generation is fast and inexpensive. Evaluation, understanding, and rejection are not. The process is intentionally optimized to preserve human attention for high-leverage decisions.
- Correctness Is Proven, Not Assumed Code must survive adversarial scrutiny: tests that actually verify behavior, reviews that look for failure modes, and diffs that can be reasoned about line by line.
- Diversity Defends Against Correlated Failure Multiple AI models are used intentionally, not interchangeably. Different priors, biases, and blind spots reduce the risk of silent, shared error.
When we adopt these principles and the process I'll describe below, some interesting things happen.
| Characteristic | Vibe-coding | Manual human coding | GAN-coding |
|---|---|---|---|
| Primary failure mode | Confidently shipping broken code | Human error, blind spots, fatigue | Over-constraint or slow convergence |
| Who is accountable | Ambiguous (blame the AI) | Human author(s) | Human discriminators / co-authors |
If the human doesn't fully understand the code, then the human cannot be accountable, or has an exit hatch for accountability. If rigorous validation is optional, as is the case in "average-ish" vibe-coding, then failure emerges from false confidence. "The AI scores high on SWE so this must be good to go."
While far from perfect, human manual coding was the progenitor of most of the Internet up until recently. It's the devil we know and we have process around it. I propose we apply similarly rigorous process to AI-assisted coding in order to reap the considerable benefits AI can confer while maintaining some semblance of human ownership, accountability and authority that helps align outcomes with intent and value.
A few critical features of the GAN-coding process:
- Explicit human rejection loops and human accountability
- AI-assisted adversarial discrimination cycles driven by human judgement
- Constraint-driven prompting (tests, invariants, contracts)
- Human-owned architectural decisions
The process begins with traditional requirements gathering, specification writing, and architectural design. This phase is intentionally aligned with best practices from manual human coding. The human generates design artifacts, specifications, requirements, etc.
Roles
- Human: Generator
- AI (design-review model): Discriminator
The AI is used to challenge assumptions, surface edge cases, and critique architecture—not to author it. Design concerns are resolved before implementation planning begins.
A canonical prompt is produced that describes the system with sufficient clarity that either a human engineer or an AI system could implement it.
This constraint is deliberate. If only a specific model can interpret the prompt correctly, intent has leaked into the model rather than being captured in the design. In GAN-coding, prompts function as contracts, not vibes.
Roles
- AI (design-review model): Generator
- Human: Discriminator
A coding-focused AI model decomposes the work into explicit phases or chunks, ideally with semantic and functional boundaries. The human reviews and approves the plan. Coding does not begin without an approved plan.
Roles
- AI (coding model): Generator
- Human: Discriminator
Pro tip: keep iteration phases small to prevent context drift.
For each phase:
- AI generates code.
- AI generates a test suite targeting that code.
- Could do #1 and #2 in reverse order for closer TDD alignment.
- The human scrutinizes the tests:
- Do they verify behavior or merely exercise code?
- Do they cover critical paths and error modes?
- The human reviews the code for comprehension, targeting medium-high understanding. Adversarial tests and independent reviews act as safety nets for the parts the human hasn't fully parsed, but tolerate gaps with extreme caution.
- All changes are committed. At every step, diffs are reviewed manually by the human.
Roles
- AI (coding model): Generator
- Human: Discriminator
Iterations continue until the phase converges.
A different AI model performs a full review of the code and tests. The human is responsible for selecting a discriminator model suitable to the task and providing sufficient context for an effective review: requirements, prioritization, convention, domain knowledge, etc. NOTE: this happens before pushing a PR for automated AI code review in CI/CD.
Roles
- Original AI: Generator (as source of artifacts under review)
- Independent AI: Discriminator
This phase defends against human fatigue and blind spots but also injects diversity of thought to reduce correlated failures.
The human reviews the AI reviewer's feedback. Valid concerns are verified by the original coding model.
Roles
- Independent (reviewer) AI: Generator
- Human + Original (coding) AI: Discriminator
While the introduction of the original coding AI as a discriminator may introduce defensive rejection ("my code is solid") the original coding AI possesses unique context that can greatly reduce false positive issues. There's an implicit additional step in which the roles are:
- Original (coding) AI: Generator
- Human: Discriminator
Only verified issues are addressed. All subsequent changes are subject to "Phase 4: Iterative Code Generation and Testing".
The system is considered complete only when:
- All planned phases are implemented
- Tests meaningfully enforce invariants
- The human can explain the system at an architectural and code level
- No unresolved discriminator objections remain
| Phase | Generator | Discriminator | Artifact |
|---|---|---|---|
| 1. Design | Human | AI (Design-Review Model) | Design spec |
| 2. Implementation Prompt | AI (Design-Review Model) | Human | Implementation prompt |
| 3. Implementation Planning | AI (Coding Model) | Human | Phased implementation plan |
| 4. Code Generation & Testing | AI (Coding Model) | Human | Code + tests |
| 5. Independent AI Review | AI (Coding Model) | AI (Review Model) | Change requests |
| 6. Review Arbitration | AI (Review Model) | Human + AI (Coding Model) | Verified/rejected changes |
Recursion: Verified changes loop back to Phase 4. Upon phase completion, the cycle repeats for subsequent phases until the project is complete.
flowchart TD
subgraph Setup["Setup"]
A["1. Design<br/>Human → AI"] --> B{OK?}
B -->|No| A
B -->|Yes| C["2. Prompt<br/>AI → Human"]
C --> D{OK?}
D -->|No| C
D -->|Yes| E["3. Planning<br/>AI → Human"]
E --> F{OK?}
F -->|No| E
end
F -->|Yes| G
subgraph Cycle["Recursive Implementation"]
G["4. Code & Test<br/>AI → Human"] --> H{OK?}
H -->|No| G
H -->|Yes| I["5. AI Review<br/>AI → AI"]
I --> J["6. Arbitration<br/>AI → Human + AI"]
J --> K{Valid<br/>Changes?}
K -->|Yes| G
end
K -->|No| L{Phase<br/>Done?}
L -->|No| G
L -->|Yes| M{More<br/>Phases?}
M -->|Yes| G
M -->|No| N["✓ Complete"]
Is this actually useful? Isn't it boring and expensive?
Yes. Yes.
For many software engineers, individuals or teams, this process may net out to reduced productivity. At first. But there's a similar argument to be made for TDD in general, or for the stricter quality controls that are natural extensions for scaling, successful products. The argument is that the cost of rigor gets amortized, paid off, and starts returning value over time. (And by the way, some or all of your team members may be doing this already.)
That said, the GAN-coding process as it stands isn't cheap. To calibrate on when this is worth the cost, let's posit some dimensions on which we can gauge when GAN-coding could net more valuable returns.
| Dimension | Vibe-coding | Manual human coding | GAN-coding |
|---|---|---|---|
| Team size | Very small | Small–medium | Medium–large |
| Codebase growth | Fast, uneven, brittle | Slow, deliberate | Medium-fast, structured |
| Onboarding new contributors | Easy initially | Slow but deep | Moderate, principled |
| Consistency across modules | Low | Medium to High if enforced | Medium to High if enforced |
| Failure detection | Late | Medium-early | Early |
| Long-term maintainability | Low–medium | Medium–high | High |
For small teams and prototype products, GAN-coding might be a time-sink with minimal return. Say "no" to GAN-coding during rapid prototyping, or in the cases where tests are disproportionately expensive relative to the risks they mitigate.
If speed is practically irrelevant, the codebase super-stable and mature, or there's little pressure for engineers to stretch beyond their capacity, then manual human coding may be the best approach that introduces the least risk. But even then, GAN-coding can provide valuable diversity of thought to detect otherwise hidden failures and tacit assumptions. When the iterative review cycle and AI assistance inherent in GAN-coding results in more thorough documentation or better test coverage, any team could benefit from improved maintainability.
GAN-coding isn't for everyone. It doesn't cater to the strengths and weaknesses of every coder equally. Let's look at a few skills that may be relevant (this list is far from exhaustive).
| Skill / Trait | Vibe-coding | Manual human coding | GAN-coding |
|---|---|---|---|
| Prompt articulation | Extremely high | Low or n/a | Extremely high |
| Syntax & language mastery | Low | Extremely high | Medium |
| Typing speed / mechanical | Low or n/a | High | Low or n/a |
| Systems thinking & architecture | Low–medium | High | High-extremely high |
| Critical evaluation & skepticism | Low–medium | Medium-high | Extremely high |
| Debugging & failure analysis | Low | High | Extremely high |
| Speed of ideation | Extremely high | Low–medium | Medium-high |
"Average-ish" vibe-coding rewards expressive intent over technical depth. You don't need to comprehend the outputs. You "load it in your browser" and if it works, it works.
Manual human coding rewards deep expertise, it sometimes rewards an architectural mindset and skepticism, it demands (but doesn't always get) a high level of debugging and failure analysis, and there's a manual dexterity component that makes it uniquely human (for now).
GAN-coding rewards judgement and curiosity.
No Single Agent Is Trusted in Isolation.
This core principle requires the GAN-coder to not only be skeptical of the outputs of others, but also their own outputs.
Discrimination Is the Bottleneck
The GAN-coder, by virtue of selecting the methodology, is under strain to extend their innate capacity. Their curiosity drives the resilience required to review yet another line of AI-generated code and also to learn from it. It's only boring if curiosity is exhausted.
Correctness Is Proven, Not Assumed
The GAN-coder is required to set aside their biases, practice objective judgement and be curious about the truth rather than their preferences.
While GAN-coding may not work for everyone, I posit that to those it does work for, it will be greatly beneficial. Here's a recap:
- No agent trusted in isolation.
- Write a spec a human could implement.
- Don't code without an implementation plan.
- Correctness is proven. Use tests and always review them manually.
- Discrimination is the bottleneck. Always run an independent model review before PR.
- Diversity defends against correlated failure. Use different models for different roles.
- The human is always the tiebreaker.
Note there are some nuances in terms of prompt engineering, model selection and other bits I collectively consider "implementation details." A goal of mine for the GAN-coding process is that it can be successful agnostic to such details, as long as each step of the process is implemented faithfully. The one exception is the prompt in "Phase 2: Implementation Prompt". That's a detail I feel worth reiterating.
If you've read this far, I think it means something resonated with you. I'd look forward to your feedback and input. This is a work in progress, after all. I used something like the GAN-coding process to write this article. I produced outputs that I asked three different frontier AI models to review, validate, and enhance. I didn't write the mermaid diagram by hand, but I edited and iterated on it with an AI collaborator. I pasted the entire output for final review by multiple LLMs.
The final authority and discriminator though, is you :)