Hybrid Pricing Model Guide: The AI Infrastructure Problem

Combining a flat subscription with usage-based charges creates an infrastructure problem most teams don't anticipate. The product has to decide what each customer is allowed to do before each request executes, track token or compute usage as it happens, enforce limits in real time, and then pass accurate data to billing.

Get any of those steps wrong, and you end up with silent enforcement failures, incorrect invoices when inference costs vary, or engineers pulled in mid-sprint to fix it.

Hybrid pricing is an infrastructure decision. This article covers how it works, what it demands from your infrastructure, and where in-house systems tend to break.

What is a hybrid pricing model?

A hybrid pricing model combines a fixed recurring fee with one or more variable charges tied to usage. The fixed component provides a revenue floor, while the variable component scales with consumption.

In practice, this typically shows up as:

A base plan with overage charges when usage exceeds a threshold
A subscription plus metered credits for AI tokens or API calls
Tiered access combined with usage-based pricing for specific features

The value metric depends on the product. Some meter by credits, tokens, or API calls, while others track storage or seats.

Regardless of the model, the requirement is the same. The system must track usage in real time and apply it against what each customer is allowed to consume.

Why a hybrid pricing model creates an engineering problem

A flat subscription is straightforward. A customer subscribes, gets provisioned, and billing runs on a schedule.

Hybrid pricing adds another layer. The system now has to:

Track usage in real time
Apply usage against entitlements
Enforce limits or allow overages
Pass accurate data to billing

These systems have to stay aligned at all times.

When they drift, problems show up quickly:

Customers get billed incorrectly
Limits fail inside the product
Engineering gets pulled in to reconcile the difference

For example, when New Relic introduced usage-based pricing, it spent six months just piloting the model with a few dozen customers before launch. Testing edge cases, fine-tuning the offering, and rebuilding internal tooling so sales reps could model usage scenarios for each account. The complexity was in the systems needed to support the pricing model, not the model itself.

Where in-house systems break

Most teams start by building. A few database tables, a feature flag check, perhaps a middleware layer. That works at 50 customers. At 500, the same system begins to show strain. Teams rarely get this wrong on day one. They underestimate it, or understand its importance too late.

Now the team is dealing with:

Grandfathered plans with legacy rules
Multiple products with different entitlement models
Usage data coming from multiple services

The system that took a week to build now needs ongoing ownership.

The breaking point tends to come when teams:

Introduce hybrid pricing for the first time
Add a second product with different packaging
Run a pricing experiment that requires more than a config change

At that point, edge cases start to pile up:

Grandfathered plans that behave differently
Trials that override plan limits
Promotional credits that expire before standard credits
Add-ons that increment caps instead of replacing them

Most teams discover these issues mid-sprint, under deadline. First, the team adopts a usage-based billing tool. Then, they realize billing and enforcement are separate problems:

Billing systems count and charge usage
Entitlement systems enforce what customers can access

They needed both. One to track and charge usage, the other to enforce it inside the product.

Entitlements: The layer billing doesn't cover

The product has to know what a customer is allowed to do before a request is executed. That’s what entitlements handle, and billing systems don’t.

Entitlements are commercial allowances that define what a customer can access and how much they can use, based on a combination of their active plan, any parent or base plan they inherit from, add-ons that can increment or override plan limits, active trials, and any promotional allowances. When sources conflict, the system applies the most generous value.

This is different from RBAC, which is binary. Entitlements are quantitative. At runtime, the system needs to evaluate each request and return:

The feature’s access status
The applicable usage limit
Current consumption against that limit
Whether the customer has unlimited access

A user might have 1,000 credits, 10,000 tokens, or unlimited access, and those values need to be evaluated continuously against live usage data.

In hybrid pricing, feature gating depends on this data being accurate at runtime. If it isn’t accurate, enforcement becomes unreliable, and upgrade paths break.

That same runtime state also needs to be queryable from the client side. If your frontend can't read current consumption and entitlement thresholds, you can't render usage indicators, gate UI components conditionally, or surface upgrade prompts at the right moment in the user flow.

Billing systems like Stripe and Zuora handle invoicing and payments after usage occurs. They don’t evaluate access or enforce usage limits.

That happens in a separate runtime layer that sits above billing. It decides what is allowed before usage is committed, then passes the results downstream for invoicing.

Low-latency entitlement checks at scale

Entitlement checks run on every request, so latency has a direct impact on product performance. Stigg's architecture handles over a billion metering events monthly.

In Stigg’s architecture, the Sidecar runs inside your cloud environment (BYOC) as a Docker container and stores entitlement data in a persistent Redis cache. That allows enforcement checks to stay local to the application and continue operating during upstream API interruptions.

On cache hits, checks complete in single-digit milliseconds. On cache misses, the sidecar fetches data from Stigg’s Edge API, typically around 100ms, with a configurable timeout to prevent upstream latency from affecting the application.

This architecture matters because it isolates entitlement enforcement from network instability. Without local caching and fallback behavior, issues in the entitlements layer can propagate directly into the product and impact user experience.

How AI features change the entitlements problem

AI features move entitlements from simple access limits to real-time cost control. Credits and tokens have real marginal costs, burn at unpredictable rates, and need to be controlled before usage continues, while the request is still in flight.

A base subscription covers platform access. The moment variable usage enters the picture through tokens consumed, queries run, or compute used, the system has to:

Track usage continuously as it happens
Deduct from credit balances
Enforce limits before costs escalate

Stigg's credit system reflects this added complexity. Credits come in blocks, each with its own:

Expiry windows (for example, monthly resets or promotional periods)
Cost basis (paid vs. promotional credits)
Categories that determine how credits are used
Priority rules that control burn order

Depletion behavior is also configurable:

Hard limits block usage when credits run out
Soft limits allow usage to continue into negative balances

This level of control matters with AI workloads, where a single agent loop can burn through credits in minutes.

With AI, entitlements are about controlling cost, usage, and risk at runtime. That is an engineering problem first, and a pricing decision second.

What hybrid pricing metrics actually tell you about your stack

Hybrid pricing metrics show how well your systems handle usage, enforcement, and scale, alongside revenue.

Metric	What it tells you about your stack
Net revenue retention >100%	Usage is expanding, which increases load onmetering systems and exposes weaknesses in event tracking and edge case handling
Revenue mix (subscription vs. usage)	High variability can indicate inconsistent event ingestion, delayed processing, or gaps in entitlement enforcement
Expansion revenue by feature or usage	Shows which entitlements drive upgrades, but only if usage, billing, and entitlement data are properly linked
Entitlement check error rate	Spikes indicate cache misses, stale entitlement data, or enforcement failures that billing won't catch

These metrics are also signals of system health. If they are unstable or hard to explain, the issue is often in metering accuracy, entitlement enforcement, or data consistency across the stack.

Build vs. buy for hybrid pricing infrastructure

The first version of an in-house build works. The second version, after pricing changes, is where the system starts to strain.

The challenge is how the system handles change over time.

Dimension	Build in-house	Buy (entitlements/credit infrastructure)
Data model	Often tied to a single pricing model	Designed for multiple models (seats, usage, credits)
Change management	Every pricing change needs code	Packaging via config; complex pricing changes still engineering-owned
Edge cases	Accumulate in application logic	Modeled centrally (plans, add-ons, overrides)
System consistency	Multiple sources of truth across services	Single source of truth for entitlements
Engineering overhead	Ongoing maintenance and debugging	Upfront integration, lower long-term overhead

The trade-off

Dedicated entitlements infrastructure moves these decisions into the data model instead of scattering them across application code. The tradeoff is more integration work upfront in exchange for less operational complexity over time.

Miro launched a credit-based AI model in under 6 weeks using this approach, and Webflow used it to ship localization, add-ons, and pricing updates through configuration rather than deployments.

You can't ship hybrid pricing faster than your stack allows

Most teams struggle with hybrid pricing models because the infrastructure underneath wasn't built to support it.

The entitlements layer, metering system, and billing integration all have to stay in sync. When they don't, pricing experiments stall, enforcement fails at the product level, and engineering ends up owning a system that was never meant to scale this far.

In AI systems, the expensive mistake is usually allowing the wrong request through. That’s why enforcement ends up becoming its own infrastructure layer.

Stigg approaches that layer as a dedicated runtime for entitlements, credits, and usage governance:

Entitlement enforcement across plans, add-ons, trials, and promotional allowances
Real-time metering with low-latency local caching via the Sidecar deployed in your own cloud
Credit management on an append-only ledger, with configurable burn order and hard or soft depletion limits
Product catalog management so packaging changes go through config, not code
Integrations across billing, CPQ, CRM, and data warehouses

If hybrid pricing is on your roadmap and the infrastructure architecture needed to support it isn't in place yet, that’s the right problem to solve first. Explore how Stigg structures the runtime architecture for entitlements, metering, and real-time enforcement across your system.

FAQs

1. What is hybrid pricing?

Hybrid pricing is a model that combines a fixed recurring fee with usage-based charges. The fixed fee covers baseline access, while the variable component scales with usage, such as AI credits, tokens, API calls, or seats.

2. What is the difference between hybrid pricing and usage-based pricing?

The main difference between hybrid pricing and usage-based pricing is that hybrid pricing includes a fixed fee plus usage charges. Usage-based pricing relies entirely on consumption with no base subscription.

3. How does hybrid pricing affect metering infrastructure?

Hybrid pricing affects metering systems by requiring real-time usage tracking aligned with customer entitlements. Without accurate metering, billing becomes unreliable, and product-level enforcement breaks down.

4. Can Stigg support hybrid pricing?

Yes, Stigg supports hybrid pricing as a usage runtime that enforces entitlements, manages credits, and meters usage in real time. It works alongside billing systems like Stripe and Zuora rather than replacing them.

5. What is the difference between entitlements and feature flags in a hybrid pricing model?

The main difference between entitlements and feature flags is that entitlements include usage limits and commercial rules, while feature flags are binary on/off controls. Entitlements adapt to plans, usage, and pricing changes.