Department of Vibe Code Assessment
The Rubric
How we grade your code. No curves, no mercy, no AI influence.
Last optimized March 22, 2026 — Excluded Streamlit/Gradio layout blocks from nesting depth, switched to code-only line counting for god file detection, added monorepo config files to framework conventions
Deterministic
Same repo, same score, every time. No randomness, no AI judgment calls on scoring. Run it twice, get the same number.
AI-Free Scoring
AI writes the snarky verdict and nothing else. Every score comes from pattern matching and static analysis against the thresholds on this page.
Always Improving
We run stress tests against real repos, review false positive reports, and tune thresholds continuously. Every flag on a report card has a “false positive?” button that feeds back into our optimization loop.
How Scoring Works
Each dimension starts at 100 points. Every finding subtracts points based on severity:
Multiple findings stack. Scores are clamped to 0–100, then weighted by dimension importance and combined into a final grade.
Grade scale: A (90–100) · B (80–89) · C (70–79) · D (60–69) · F (0–59)
Code Structure
25% of final scoreThe biggest weight because it's the biggest AI tell. AI tools write everything into one file, one function, deeply nested. This dimension catches that.
God Files
lowFiles that do too much. We count actual code lines — comments, blanks, and docstrings don't count. A 600-line file with 200 lines of comments is really a 400-line file.
Threshold: 400+ code lines (low), 500+ (medium), 750+ (high)
What we skip
- Test files, generated files, vendored code
- Barrel/index files (re-exports only)
- Data files: locales, i18n, seed data, fixtures
- Files with >50% string/template literal content
- Storybook stories
Deep Nesting
lowCounts control flow nesting only — if/else/for/while/try/catch. Not object literals, not JSX structure, not config nesting. Single-line returns like `if (x) return y` don't count.
Threshold: 5 levels (low), 6-7 (medium), 8+ (high)
What we skip
- Test files, generated files, data files
- Streamlit layout blocks (st.sidebar, st.container, st.columns)
- Gradio layout blocks (gr.Row, gr.Column, gr.Tab)
Duplicate Files
lowCatches copy-paste patterns where the same logic lives in multiple places. Compares content similarity, not just file names.
Threshold: 3+ files with same base name and ≥50% content overlap
What we skip
- Framework conventions (page.tsx, layout.tsx, route.ts, index.ts)
- Monorepo config files expected to repeat across workspaces
- Template/example directories
Error Handling
20% of final scoreAI-generated code loves to swallow errors. An empty catch block is a bug waiting to happen — we look for evidence that errors are actually being handled, not just silenced.
Empty Catch Blocks
highA catch block that catches an error and does nothing with it. This is almost never what you want.
Threshold: Any catch block with no code inside
Console-Only Error Handling
mediumLogging an error isn't handling it. If the user can't tell something went wrong, you're just hiding problems.
Threshold: Catch blocks that only console.log the error
What we skip
- console.warn in catch blocks (assumed intentional graceful degradation)
Console.log Density
mediumA high density of console.log usually means debug code left behind in production. We're not counting console.warn or console.error — those are often intentional.
Threshold: 5+ console.log calls in a single file
What we skip
- CLI tools and scripts (scripts/ directories)
- Build config files (vite.config, webpack.config, next.config, etc.)
Test Coverage
20% of final scoreNot whether your tests are good — we can't tell that from static analysis. But we can tell if they exist, and roughly how much of your code is covered.
No Tests Found
criticalWe look for .test.*, .spec.*, test_*, and files in /test/ directories. If we find nothing, that's a critical finding.
Threshold: Zero test files in the entire repo
No Test Script
highEven if test files exist, there should be a way to run them. We look for test, test:unit, test:e2e, and test:integration scripts.
Threshold: Node.js project with no test script in package.json
Low Test-to-Source Ratio
highCompares total lines of test code to total lines of source code. Below 10% means you're barely testing anything.
Threshold: Below 10% (high), 10-29% (medium)
Security
15% of final scoreWe can't do a real security audit from static analysis, but we can catch the obvious disasters — committed secrets, hardcoded credentials, and .env files in the repo.
.env File Committed
criticalIf we can see your .env file, so can everyone else. This usually means secrets are exposed.
Threshold: Any .env file present in the repo
Hardcoded Secrets
criticalDetects OpenAI keys (sk-...), GitHub tokens (ghp_...), AWS keys (AKIA...), and general api_key/password/secret assignments with string values.
Threshold: API keys, passwords, tokens in source code
What we skip
- Environment variable references ($DB_PASSWORD)
- Placeholder values (your-password, example-key, test-secret, dummy_*, fake_*)
- Error code constants (INCORRECT_PASSWORD)
- .env.example and .env.template files
- Example, sample, tutorial, and documentation directories
- E2E setup scripts, seed files, fixtures, and mock data
- Public keys in Docusaurus/Algolia configs
Dependencies
10% of final scoreLower weight because dependency management is genuinely hard and sometimes you just need the packages. But we still flag obvious bloat and hygiene issues.
Excessive Dependencies
highA high dependency count increases attack surface, bundle size, and the odds of supply chain issues.
Threshold: 60+ production deps (high), 45-59 (medium)
Missing Lock File
highWithout a lock file, every install might get different versions. We look for package-lock.json, yarn.lock, pnpm-lock.yaml, and bun.lockb.
Threshold: Dependencies present but no lock file
Duplicate-Purpose Packages
mediumHaving both axios and node-fetch? Both moment and dayjs? Pick one. We check 8 categories: HTTP clients, utilities, dates, frameworks, test runners, loggers, validators, and CSS-in-JS.
Threshold: 2+ packages that do the same thing
Documentation
10% of final scoreThe lightest weight because docs are genuinely optional for small projects. But no README at all? That's a choice.
No README
highA repo without a README is a repo nobody can use. Even a few lines explaining what it does and how to run it is better than nothing.
Threshold: No README.md file in the repo
Thin README
mediumAn auto-generated or placeholder README with just a title and nothing else. We want to see at least what it does and how to run it.
Threshold: Fewer than 5 lines of content
Very Few Inline Comments
lowWe're not looking for comments on every line. But a large file with almost no comments suggests the author didn't stop to explain any of the non-obvious logic.
Threshold: Below 2% comment ratio in files with 200+ lines
How We Improve
A rubric is only as good as its accuracy. We actively optimize against false positives through 4 feedback loops:
Nightly Automated Stress Tests
Every night at 2am, an automated agent picks 5-10 diverse repos from GitHub — monorepos, framework-heavy apps, repos with generated code, data fixtures, and public API keys — runs them through every analyzer, reads the actual flagged code, and classifies each finding as legitimate or false positive. Pattern-fixable false positives get auto-fixed with test coverage, and a detailed report is generated. This runs every single night, so the rubric is always getting tighter.
False Positive Reports
Every finding on a report card has a flag button. When users report a false positive, we review it and, if confirmed, update the analyzer to handle that pattern. Real feedback from real repos drives real improvements.
Test-Gated Fixes
Every analyzer optimization must pass the full test suite before it ships. If a fix is too broad and breaks an existing test, it gets reverted automatically. No fix goes live without proof it doesn't create new problems.
Framework Awareness
Different frameworks have different conventions. Streamlit apps have deep nesting from layout blocks. Next.js repos have duplicated page.tsx files. Monorepos repeat config files. We teach the analyzer to understand these patterns instead of blindly flagging them.
The goal isn't zero findings — it's zero unfair findings.
What We Skip
Some files are excluded from analysis entirely:
- node_modules, dist, build, .next, vendor, venv, and other dependency/output directories
- Generated files (auto-generated headers, .d.ts declarations, .map files, migration files)
- Lock files (package-lock.json, yarn.lock, etc.) — analyzed for presence, not content
- Binary files and media assets
- Vendored UI components (components/ui/, ui/primitives/)