How To Hire & Manage AI Agents: GSTACK and Paperclip

Frameworks For Zero-human & Agentic Teams

Jun 17, 2026

∙ Paid

Hey, it’s Daniil. This is a subscriber deep dive on the two agent-management tools everyone is arguing about right now, and which one, if any, you actually need.

Last week I counted nine Claude Code tabs open across two monitors. I could not tell you what three of them were doing.

One had been looping on the same failing test for forty minutes while I answered Slack. That mess is the entire reason this post exists.

For two years the question with AI coding was whether the agent could do the work. In my stack, running an agency, media and launching AI products, that question is basically dead.

Claude Code, Codex and Hermes write real code, open real browsers, file real pull requests. The bottleneck moved. It is not the LLM anymore. It is me, trying to remember which agent is doing what, what each one is burning in tokens, and whether anyone reviewed the output before it hit main.

That gap turned into a whole category this year, and it arrived loud. Two open source projects went viral recently, answering the same question from opposite ends.

GSTACK, from Y Combinator president Garry Tan, is sitting near 97,000 GitHub stars. Paperclip, from a developer who goes by dotta, is above 67,000.

Neither is a new agent. Both are the org chart around the agents you already run.

At a glance

In this piece, you will learn:

What GSTACK and Paperclip actually are, in plain language, and how they differ
Which one fixes your real bottleneck: judgment, throughput, or management
How to install and test each one on day one, step by step
Real cases from operators on Reddit and X, and two setups worth stealing
The honest catch nobody selling you a “zero-human company” mentions

Stop reading this as “more AI tools.” Start reading it as layers stacked on the agent.

One layer adds judgment. One adds throughput. One adds management. You want the cheapest layer that fixes your actual bottleneck, not the one with the best launch video.

This whole post assumes your stack is already split into layers. If yours still is not, start here: Your AI Agent Stack Is Spaghetti, It Should Be Lasagna.

GSTACK: judgment for one agent

I will start with the smaller and, to its credit, more honest of the two.

GSTACK is a free, MIT licensed pack of opinionated skills you drop into Claude Code. The install is one line and takes about thirty seconds. Nothing runs in the background, nothing touches your PATH.

What you get is 23 core skills plus 8 power tools. Each is a slash command that loads a specialist persona with its own priorities and constraints.

The personas are the whole i

dea. A CEO that challenges your scope before you write a line. An engineering manager that locks architecture. A designer that hunts AI slop. A reviewer that looks for production bugs. A QA lead that opens a real browser. A security officer that runs OWASP audits. A release engineer that ships the PR.

The work moves through a fixed loop: Think, Plan, Build, Review, Test, Ship, Reflect.

This is the tweet that kicked it all off, and it is still the cleanest one line summary of the pitch.

Garry Tan@garrytan

I've been having such an amazing time with Claude Code I wanted you to be able to have my *exact* skill setup: Introducing gstack, which you can install just by pasting a short piece of text into your Claude code

8:43 AM · Mar 12, 2026 · 1.01M Views

278 Replies · 477 Reposts · 6.67K Likes

What it solves is real, and I feel it every week. A solo developer has no senior reviewer in the room. Not because they are careless, but because that person does not exist on a team of one.

Under deadline, the review step is the first thing to go. GSTACK fills the empty chairs and makes review the default instead of the thing you meant to do.

Why this skill pack out of the thousands floating around? Because of who shipped it. Garry Tan runs the engine that funded more than 4,000 startups. When that person publishes the exact setup he ships his own code with, I pay attention.

Newer to Claude Code itself? Start with our walkthrough on how Claude Code can be your AI teammate, then come back here.

Now the honest part, which is also the most interesting thing about GSTACK.

Tan’s launch claim was enormous: 600,000 lines of production code in 60 days, part time, while running YC. My first reaction was the same as yours. Sure, buddy.

Then Tan walked it back himself. On the repo today he notes that raw line counts inflate with AI and switches to a “logical code change” metric instead. His own line: “AI wrote most of it. The point isn’t who typed it, it’s what shipped.”

The creator of the viral tool retired his own viral number. That made me trust the tool more, not less.

*github.com/garrytan/gstack. It crossed 10,000 stars inside 48 hours and sits near **97,000** by June.*

What is actually inside it. The install drops markdown skill files into ~/.claude/skills/gstack/, plus a roughly 58MB browser binary. Everything is a slash command grouped under the seven stage loop.

The three skills I lean on hardest:

/qa launches a real browser and clicks through your app. Full pass runs 5 to 15 minutes, a 30 second smoke test, or a regression mode.

/ship runs the whole release in one command: sync main, run the suite, update the changelog, push, open the PR.

/review triages comments from a PR reviewer like Greptile into valid, already fixed, or false positive, so a second pair of eyes never gets quietly ignored.

gstack runs entirely inside Claude Code, so the “interface” is just your terminal with these slash commands. The full command list and real output examples are in the repo README.

Real cases

Garry Tan, in production. The most cited case is the creator’s own. Tan reports running 10 to 15 gstack sprints in parallel, roughly 30 minutes per feature sprint, on real services he ships while running YC. It is self reported, and he has since walked back the raw line count, but his public GitHub graph is the receipt. Treat it as the ceiling, not your Tuesday.

The security skill caught a real bug. Tan’s launch included a story of a CTO friend whose gstack security pass flagged a genuine flaw on the first run, and the repo documents catching live issues like cross-site scripting. That is the most repeatable value in the whole pack: not raw speed, but a reviewer that catches the thing you would have shipped.

fagemx ported it to game development. A developer took the Think-to-Ship methodology and rebuilt it as gstack-game, swapping the engineering roles for an economy designer, a UX researcher, and a game-feel diagnostician. People are not just using gstack, they are forking its structure into new domains. Link

Want the wider map of Claude Code skills, plugins and swarm mode? We broke it down here: Skills, Plugins and Swarm Mode. GSTACK is the opinionated, batteries-included version of that idea.

Paperclip: management for many agents

Paperclip starts where GSTACK stops. I will admit I rolled my eyes at its pitch first.

Continue reading this post for free, courtesy of Creators AI.

Or purchase a paid subscription.

Creators' AI