Product Updates

Introducing Stigg 2.0 Governance: The First Milliseconds-Latency Usage Control Layer for AI

Written by

Or Arnon

Last updated

July 2, 2026

read time

minutes

ENFORCEMENT

It happens more than anyone wants to admit. A single power user, or a misbehaving script, or an agent stuck in a retry loop, chews through an enterprise customer's entire monthly AI allocation, in a day, sometimes in an hour. Nobody notices until the invoice lands, or until support gets the angry email.

By then it's not a technical problem, it's a trust problem. Your customer has to explain to their own finance team why they blew through a budget they thought was capped. You have to explain why your product let that happen. And the enterprise deal you were about to expand now has a new, very specific requirement on the table: "we need to control this ourselves."

Most teams have quietly been setting entitlement limits higher than what customers actually paid for. Not by choice. Enforcing a real cap in the request path means a consistent, distributed counter that resolves in less time than the call it's gating, and until now that was hard enough that most teams skipped it. So they padded the limit and hoped. That's not governance.

The First Governance Layer That Enforces in Real Time

Stigg Governance is the first milliseconds-latency usage control layer built for AI products. It enforces across millions of active counters at once, every user times every agent times every workspace, without the latency or memory blowup that breaks naive implementations. It is not a reporting dashboard that tells you what already happened. It is a decision engine that acts at the moment of consumption.

Enforcement on any dimension. Users, teams, departments, organizations, sites, regions, applications, workspaces, agents, even multiple agents running in parallel against a shared cap. You define the hierarchy; Stigg enforces it.

Real-time decisioning, in single-digit milliseconds. When a user or an agent makes a request, Stigg evaluates entitlements, checks the budget, verifies the limit, and returns a decision before the expensive thing happens. The check typically adds under 50ms at p50 and under 100ms at p99, even across deep entitlement hierarchies. Counters stay consistent across regions and parallel agents, so two agents spending against the same cap cannot both slip under it. This is the same decision-waterfall architecture the frontier AI labs built in-house. Maintaining that waterfall is a cost center, not a product. Stigg lets you own the metering layer through a single API call instead of staffing it forever.

One call, before the model runs:

const response = await client.v1Beta.customers.entitlements.check('cus-acme', {
  featureId:      'feature-ai-tokens',
  requestedUsage: 1000,
  dimensions:     { teamId: 'team-eng' },
});

if (!response.data.isGranted) {
  throw new Error('Usage limit reached');
}

You configure whether the check fails open or fails closed when Stigg is unreachable, so you decide what happens to your critical path under failure.

Live cost attribution. Know exactly which team, product, feature, and model is consuming budget. Query it over the API or subscribe to a push stream, with read freshness in seconds, not next month's report.

Model-level controls. Set separate rate limits, quotas, or spend caps per model, per endpoint, per API key, or per customer tier; so a GPT-4-class model can be throttled at 100 req/min while a smaller model runs at 10,000 req/min, with no shared bottleneck.

‍A self-service governance portal. Budget dashboards, alert configuration, and limit management your customers can operate themselves.

Self-serve Spend Controls, Enterprise-ready

Governance turns "we need usage controls" from a sales blocker into a reason to buy. Enterprise buyers aren't asking if you have a nice dashboard. They're asking if they can cap a runaway agent before it drains $50,000 in inference, and if they can attribute AI spend to the department that caused it. That's no longer a nice-to-have. It's a contract requirement, and it's the sentence that gets six-figure AI deals signed: "you can see and control exactly what you're using."

Explore Governance → https://docs.stigg.io/documentation/governance/overview

Copy link

https://www.stigg.io/blog-posts/introducing-stigg-2-0-governance-the-first-milliseconds-latency-usage-control-layer-for-ai

Latest news.

Product Updates

Bringing More Visibility and Control to Stigg Credits

Deepening our capabilities with granular event dimensions, isolated credit pools, and advanced time-series filtering to give you complete control over your credit-based pricing.

Or Arnon

Jun 15, 2026

Product Updates

Now Live: Stigg Agent Skills for Claude Code

Announcing Agent Skills for Claude Code. Eliminate the trial and error prompt loop and give your AI coding agent the structural domain knowledge it needs to safely integrate Stigg.

Or Arnon

Jun 15, 2026