← AppSec Signal

Dependency Triage at Scale: What 50 Deploys a Week Does to Your Patch Queue

Derek Voss 9 min read
Growing patch queue dashboard with CVE backlog accumulating across deploys

There's a particular failure mode that hits engineering teams around the time they start shipping 40 to 60 deploys per week: the CVE backlog becomes effectively infinite. Not because new advisories arrive faster than teams can patch — though that's part of it — but because the triage step itself becomes the bottleneck.

You can patch a confirmed-critical CVE in an afternoon if you know it's the right one. What takes days is the process of deciding which of the 300 advisory matches are worth that afternoon. At 50 deploys per week, the dependency graph is changing constantly. A new direct dependency brings its own transitive tree. A package bump introduces new advisories for the bumped version. The advisory scanner runs on every PR. The queue grows faster than anyone can review it.

The Math Behind the Backlog

Let's make this concrete. A Node.js application with 800 transitive dependencies — typical for a mid-size Express API with a React frontend in a monorepo — sits against an advisory database where roughly 3–5 CVEs per week are published against packages in the npm ecosystem that are commonly included in dependency trees of this type. Not all of these will hit your specific lockfile, but over a quarter, you're looking at 25–40 new advisory matches entering your SCA queue.

A CVSS-sorted triage of one advisory match takes a competent AppSec engineer about 45 minutes: pull the advisory, read the affected version range, check your lockfile version, grep for usages, determine if user-controlled input reaches the affected code, write a disposition note. Call it 40 minutes at best. With two new high-severity advisories per week, that's 80 minutes of triage per week just to keep up — before you touch the backlog.

If triage effort is linear with advisory count and you're already behind, the backlog compounds. A team that fell 20 advisories behind in Q3 and adds 8 per month is at 44 by year end. The psychological effect is real: when the queue is unworkable, engineers stop engaging with it. The queue becomes a formality, not a workflow.

What Dependency Churn Adds at High Deploy Velocity

The 50 deploys per week figure matters for a specific reason: at that velocity, your dependency graph is probably changing in ways that aren't always tracked. Engineers bump versions to get bug fixes. Lockfile conflicts get resolved with npm install --legacy-peer-deps. New packages get added and then abandoned in package.json without removal.

Each of these mutations can change your reachability profile. A package that was unreachable last sprint might become reachable if a new code path got added that calls into it. A patched dependency might introduce a new CVE for the fixed version that the old version didn't have.

This means triage is not a one-time exercise per advisory — it's tied to the current state of your dependency graph. An advisory you marked "unreachable, skip" three months ago might need re-evaluation if your codebase changed how it calls that library. Static snapshots of the advisory queue degrade quickly at high churn rates.

The Signal Collapse Pattern

We've identified a consistent pattern that we call signal collapse. It happens in roughly this sequence:

  1. Advisory scanner runs on every PR. Output is 200+ matches.
  2. No one can triage 200+ matches per PR cycle, so the output gets treated as background noise.
  3. Engineers learn to merge PRs that have open security advisory warnings because "they always have warnings."
  4. A real exploitable CVE ships to production because it was indistinguishable from the noise.
  5. Post-incident, the team debates whether to tighten the scanner threshold — which would block PRs on things that weren't real risks — or accept that the scanner is ornamental.

Stage 5 is the worst outcome because both options are bad. A CVSS-threshold PR gate blocks on noise and destroys developer trust in the security tool. An ornamental scanner protects nobody.

The escape from signal collapse is reducing the advisory output to confirmed-reachable findings before it hits any human. If the scanner consistently produces 4–8 items, it produces 4–8 items on a PR that introduces a new reachable CVE. The noise floor is low enough that the new signal is visible.

Structuring Triage When the Queue Is Already Large

If you're already sitting on a backlog of 80+ advisories, the first step isn't to triage them in CVSS order. Run reachability analysis against your current codebase and classify the entire backlog by reachability status in one pass. In most cases, 70–85% of the backlog will come back as UNREACHABLE against your current call graph.

Move those to a separate review queue with a 30-day cadence: once a month, re-run reachability on the unreachable backlog items. If any flip to reachable (because the codebase changed), pull them into the active queue. If they remain unreachable for 90 days, suppress them with a documented reason.

The active queue is now 12–24 confirmed-reachable items. That's a workable triage surface. Rank those by CVSS + EPSS + fix availability. Items where a version bump closes the CVE with no API breakage are your first sprint's work. Items that require refactoring or where the fix version has its own issues go into backlog with explicit owner assignment.

We're not saying unreachable advisories should be permanently ignored. They're real vulnerabilities in your dependency tree. The point is they shouldn't compete for sprint capacity with confirmed-reachable, exploitable issues. Keep them tracked, review them on cadence, act on them when the call graph changes.

What 50 Deploys Per Week Actually Requires from AppSec Tooling

A team shipping at this velocity needs a security workflow that runs at deploy speed. That means:

  • Reachability analysis runs as part of the CI pipeline, not as a weekly batch job. Every PR that changes package.json or package-lock.json triggers a full scan.
  • The output is a diff from the previous scan — what changed, not the full list every time. A PR that bumps [email protected] to 4.18.3 shows only the advisories affected by that specific change.
  • PR blocking is scoped to confirmed-reachable critical/high findings with fix available. Unreachable items and medium-or-below reachable items do not block merge — they go into the review queue for the next triage session.
  • SBOM generation happens on merge to main, not on request. You always know what's in production.

The engineering team's interaction with security is then: "This PR introduces CVE-2024-XXXX, which is CONFIRMED REACHABLE via the payment endpoint. Patch version available. Fix it before merge." That's actionable in 20 minutes, not 3 hours of investigation to determine if it matters.

Measuring Triage Health

The right metric for triage health isn't "how many advisories are in the queue" — that number is driven partly by factors outside your control (ecosystem vulnerability rate). The right metric is MTTR for confirmed-reachable critical and high CVEs: the time from when a reachable critical is first detected to when it's patched and deployed.

A team with good triage health at 50 deploys per week should be able to hit sub-5-day MTTR for confirmed-reachable critical CVEs. The constraint isn't usually the patching work — it's the time lost before the team knows the item is real and worth fixing immediately. Reachability analysis applied at scan time eliminates that delay.