Department of Vibe Code Assessment

The Rubric

How we grade your code. No curves, no mercy, no AI influence.

Last optimized March 22, 2026 — Excluded Streamlit/Gradio layout blocks from nesting depth, switched to code-only line counting for god file detection, added monorepo config files to framework conventions

Deterministic

Same repo, same score, every time. No randomness, no AI judgment calls on scoring. Run it twice, get the same number.

AI-Free Scoring

AI writes the snarky verdict and nothing else. Every score comes from pattern matching and static analysis against the thresholds on this page.

Always Improving

We run stress tests against real repos, review false positive reports, and tune thresholds continuously. Every flag on a report card has a “false positive?” button that feeds back into our optimization loop.

How Scoring Works

Each dimension starts at 100 points. Every finding subtracts points based on severity:

critical −20 ptshigh −12 ptsmedium −6 ptslow −2 pts

Multiple findings stack. Scores are clamped to 0–100, then weighted by dimension importance and combined into a final grade.

Grade scale: A (90–100) · B (80–89) · C (70–79) · D (60–69) · F (0–59)

Code Structure

25% of final score

The biggest weight because it's the biggest AI tell. AI tools write everything into one file, one function, deeply nested. This dimension catches that.

God Files

low

Files that do too much. We count actual code lines — comments, blanks, and docstrings don't count. A 600-line file with 200 lines of comments is really a 400-line file.

Threshold: 400+ code lines (low), 500+ (medium), 750+ (high)

What we skip

Test files, generated files, vendored code
Barrel/index files (re-exports only)
Data files: locales, i18n, seed data, fixtures
Files with >50% string/template literal content
Storybook stories

Deep Nesting

low

Counts control flow nesting only — if/else/for/while/try/catch. Not object literals, not JSX structure, not config nesting. Single-line returns like `if (x) return y` don't count.

Threshold: 5 levels (low), 6-7 (medium), 8+ (high)

What we skip

Test files, generated files, data files
Streamlit layout blocks (st.sidebar, st.container, st.columns)
Gradio layout blocks (gr.Row, gr.Column, gr.Tab)

Duplicate Files

low

Catches copy-paste patterns where the same logic lives in multiple places. Compares content similarity, not just file names.

Threshold: 3+ files with same base name and ≥50% content overlap

What we skip

Framework conventions (page.tsx, layout.tsx, route.ts, index.ts)
Monorepo config files expected to repeat across workspaces
Template/example directories

Error Handling

20% of final score

AI-generated code loves to swallow errors. An empty catch block is a bug waiting to happen — we look for evidence that errors are actually being handled, not just silenced.

Empty Catch Blocks

high

A catch block that catches an error and does nothing with it. This is almost never what you want.

Threshold: Any catch block with no code inside

Console-Only Error Handling

medium

Logging an error isn't handling it. If the user can't tell something went wrong, you're just hiding problems.

Threshold: Catch blocks that only console.log the error

What we skip

console.warn in catch blocks (assumed intentional graceful degradation)

Console.log Density

medium

A high density of console.log usually means debug code left behind in production. We're not counting console.warn or console.error — those are often intentional.

Threshold: 5+ console.log calls in a single file

What we skip

CLI tools and scripts (scripts/ directories)
Build config files (vite.config, webpack.config, next.config, etc.)

Test Coverage

20% of final score

Not whether your tests are good — we can't tell that from static analysis. But we can tell if they exist, and roughly how much of your code is covered.

No Tests Found

critical

We look for .test.*, .spec.*, test_*, and files in /test/ directories. If we find nothing, that's a critical finding.

Threshold: Zero test files in the entire repo

No Test Script

high

Even if test files exist, there should be a way to run them. We look for test, test:unit, test:e2e, and test:integration scripts.

Threshold: Node.js project with no test script in package.json

Low Test-to-Source Ratio

high

Compares total lines of test code to total lines of source code. Below 10% means you're barely testing anything.

Threshold: Below 10% (high), 10-29% (medium)

Security

15% of final score

We can't do a real security audit from static analysis, but we can catch the obvious disasters — committed secrets, hardcoded credentials, and .env files in the repo.

.env File Committed

critical

If we can see your .env file, so can everyone else. This usually means secrets are exposed.

Threshold: Any .env file present in the repo

Hardcoded Secrets

critical

Detects OpenAI keys (sk-...), GitHub tokens (ghp_...), AWS keys (AKIA...), and general api_key/password/secret assignments with string values.

Threshold: API keys, passwords, tokens in source code

What we skip

Environment variable references ($DB_PASSWORD)
Placeholder values (your-password, example-key, test-secret, dummy_*, fake_*)
Error code constants (INCORRECT_PASSWORD)
.env.example and .env.template files
Example, sample, tutorial, and documentation directories
E2E setup scripts, seed files, fixtures, and mock data
Public keys in Docusaurus/Algolia configs

Dependencies

10% of final score

Lower weight because dependency management is genuinely hard and sometimes you just need the packages. But we still flag obvious bloat and hygiene issues.

Excessive Dependencies

high

A high dependency count increases attack surface, bundle size, and the odds of supply chain issues.

Threshold: 60+ production deps (high), 45-59 (medium)

Missing Lock File

high

Without a lock file, every install might get different versions. We look for package-lock.json, yarn.lock, pnpm-lock.yaml, and bun.lockb.

Threshold: Dependencies present but no lock file

Duplicate-Purpose Packages

medium

Having both axios and node-fetch? Both moment and dayjs? Pick one. We check 8 categories: HTTP clients, utilities, dates, frameworks, test runners, loggers, validators, and CSS-in-JS.

Threshold: 2+ packages that do the same thing

Documentation

10% of final score

The lightest weight because docs are genuinely optional for small projects. But no README at all? That's a choice.

No README

high

A repo without a README is a repo nobody can use. Even a few lines explaining what it does and how to run it is better than nothing.

Threshold: No README.md file in the repo

Thin README

medium

An auto-generated or placeholder README with just a title and nothing else. We want to see at least what it does and how to run it.

Threshold: Fewer than 5 lines of content

Very Few Inline Comments

low

We're not looking for comments on every line. But a large file with almost no comments suggests the author didn't stop to explain any of the non-obvious logic.

Threshold: Below 2% comment ratio in files with 200+ lines

How We Improve

A rubric is only as good as its accuracy. We actively optimize against false positives through 4 feedback loops:

Nightly Automated Stress Tests

Every night at 2am, an automated agent picks 5-10 diverse repos from GitHub — monorepos, framework-heavy apps, repos with generated code, data fixtures, and public API keys — runs them through every analyzer, reads the actual flagged code, and classifies each finding as legitimate or false positive. Pattern-fixable false positives get auto-fixed with test coverage, and a detailed report is generated. This runs every single night, so the rubric is always getting tighter.

False Positive Reports

Every finding on a report card has a flag button. When users report a false positive, we review it and, if confirmed, update the analyzer to handle that pattern. Real feedback from real repos drives real improvements.

Test-Gated Fixes

Every analyzer optimization must pass the full test suite before it ships. If a fix is too broad and breaks an existing test, it gets reverted automatically. No fix goes live without proof it doesn't create new problems.

Framework Awareness

Different frameworks have different conventions. Streamlit apps have deep nesting from layout blocks. Next.js repos have duplicated page.tsx files. Monorepos repeat config files. We teach the analyzer to understand these patterns instead of blindly flagging them.

The goal isn't zero findings — it's zero unfair findings.

What We Skip

Some files are excluded from analysis entirely:

node_modules, dist, build, .next, vendor, venv, and other dependency/output directories
Generated files (auto-generated headers, .d.ts declarations, .map files, migration files)
Lock files (package-lock.json, yarn.lock, etc.) — analyzed for presence, not content
Binary files and media assets
Vendored UI components (components/ui/, ui/primitives/)