AI Credit Infrastructure for LLM Products: From Design to Scale

Many teams think credits are a data model problem. You add a balance field, write a deduction function, and ship it. Three months later, you're debugging double charges, reconciling a ledger that doesn't match finance's numbers, and trying to explain why a customer’s balance hit zero when it shouldn't have.

AI credits are a runtime infrastructure problem. The moment you introduce them, you need real-time deduction, shared pools across features, expiration logic that satisfies finance, and low-latency enforcement on every request.

This guide breaks down how to build AI credits for LLM and compute-heavy products, including the core infrastructure components, trade-offs, and failure points teams hit as they scale.

Why AI credits became the default pricing model for AI products

Pure pay-as-you-go worked when products had one AI feature. As products added more, pricing each feature separately became unmanageable for customers and engineering alike.

Credits solve that with a single abstraction. One currency, consumed across multiple features at different rates. Stability AI, for example, structures its API around prepaid credits consumed across image generation, video, and editing capabilities. One currency drawn down at different rates depending on the model and task, rather than a separate pricing structure per capability.

Credits simplify pricing on the surface. Underneath, they require a metering layer that can map one currency across multiple features, each with different costs, and apply those rules in real time.

The first decision: Which credit model to support

The first decision is choosing between subscription-based credits and prepaid credit packs. This choice determines how usage is billed, how entitlements are enforced, and how complex your system becomes.

Recurring credits (subscription-based)

Customers receive a credit allocation as part of their plan, typically monthly or annually. Unused credits roll over or expire based on configuration, and additional credits come through tier upgrades or top-ups.

This works best when customers are willing to commit upfront.

For example, Notion bundles AI features like Notion Agent into its Business Plan at a flat per-seat fee, then meters autonomous Custom Agent workloads separately at $10 per 1,000 Notion credits. This gives workspaces a predictable per-seat baseline for everyday AI, with separately metered credits for agent runs.

Prepaid credit packs

Customers deposit a fixed amount and consume credits until the balance runs out, then top up on demand. This works best when usage is unpredictable and customers need flexibility.

How to choose

The decision depends on how predictable usage is and how much commitment customers are willing to make upfront.

Subscription credits work when customers can forecast their usage and are willing to commit to a plan. The system is simpler because allocations are tied to billing cycles and reset on a schedule. This model suits:

Enterprise customers on annual contracts with defined usage tiers
Products where AI features are embedded in a workflow, not triggered ad hoc
Teams that need predictable revenue and want to avoid top-up friction

Prepaid credits work when usage is unpredictable or spiky, which is common in LLM products where a single job or inference run can consume significant credits in a short window. This model suits:

Developer tools where usage varies significantly between users
Products with infrequent but high-cost inference workloads
Early-stage products where usage patterns are still unknown

Prepaid adds engineering surface area that compounds quickly: balance state has to be managed per customer, top-up flows need to handle partial balances and expired grants, and depletion logic has to account for negative states.

Supporting both models before you understand your usage patterns means building and maintaining that complexity twice, across allocation, enforcement, and billing, without the data to justify it.

The core building blocks behind how AI credits work

A credit system is a set of stateful components that manage balance, usage, and enforcement at runtime. Each component plays a specific role in keeping metering, entitlements, and billing aligned.

Component	What it represents	Engineering considerations
Credit wallet	State container for a customer’s balance	Stores current balance, aggregates grants, and serves as the source of truth for runtime checks
Credit grant	Ledger event that adds credits	Includes metadata like expiry, remaining balance, and consumption priority
Feature consumption rate	Mapping between actions and credit cost	Must be defined centrally and applied consistently across services to avoid drift
Depletion behavior	System response when balance reaches zero	Enforced at runtime (hard vs soft limits) and must align with billing logic
Rollover policy	Rules for unused credits across cycles	Requires scheduled jobs or events to expire or carry forward balances

Together, these components turn a simple credit balance into a real-time system that tracks usage, updates state, and enforces limits on every request.

How credits are granted and tracked

Credits are allocated through multiple sources: recurring grants tied to subscription plans, promotional credits, add-ons, and one-time top-ups. Each source produces a grant, and each grant needs to carry more than a balance.

At minimum, every grant record needs:

Effective date and expiration date: Determines when credits become available and when they expire. Without this, you cannot enforce time-bound promotions or monthly resets correctly.
Cost basis: The price paid per credit at the time of the grant. Promotional credits carry a zero or discounted cost basis. Paid credits carry the purchase rate. Finance needs this to recognize revenue correctly at consumption time, not grant time.
Category: Defines which features or workflows the credits can be applied to. Some grants are unrestricted. Others are scoped to specific capabilities, such as image generation or agent actions. Without categories, you cannot enforce feature-level credit rules at runtime.
Burn priority: Controls which grant is consumed first when a customer has multiple active balances. Promotional credits should typically deplete before paid ones. Credits expiring sooner should be consumed before those with longer windows. These rules need to live in the data model and be enforced consistently, not handled in application code.

Each grant must be recorded as an immutable ledger event. This is what gives finance the audit trail to reconcile what was issued against what was consumed, and what gives your enforcement layer the data it needs to apply the right rules on every inference request.

For example, when Miro launched its AI-powered collaboration platform, it needed credits allocated based on plan and seat count, tracked in real time, and reset each billing cycle. Miro needed this without rebuilding its existing architecture from scratch. Using Stigg, the team configured credit allocation, runtime enforcement, and usage visibility in under 6 weeks.

Credit consumption: Where system complexity shows up

Consumption looks relatively easy. A feature is used, credits are deducted, and the balance updates. In practice, this is where most systems break.

Problem area	What’s happening under the hood	What the system must handle
Event delivery	Usage pipelines are at-least-once with retries	Idempotent deduction logic to prevent double-charging
Event identity	Same event may be processed multiple times	Unique event IDs and pipeline-level de-duplication
State consistency	Deductions touch multiple systems (wallet, billing, analytics)	Consistent updates across data stores
Enforcement timing	Usage can outpace enforcement if delayed	Real-time checks at the moment of consumption

Why this matters

If these systems fail:

Double deductions break trust
Missed deductions create revenue leakage
Delayed enforcement leads to uncontrolled usage and cost spikes

The problems in this table are connected. Under real load, multiple requests can hit the same credit pool simultaneously, each reading the same available balance before any debit is committed. Every request passes, the writes land afterward, and the balance drops below zero once the compute is already spent.

Idempotency, consistency, and enforcement timing are all expressions of the same concurrency problem. Treating them as separate edge cases usually leads to fragmented enforcement spread across services and middleware.

Credit lifecycle management: States, rules, and edge cases

Credit lifecycles introduce more states and transitions than most teams expect.

Grants can be recurring, promotional, or one-time. Credits have effective dates, meaning they can be issued before they are available to consume. Expiration logic removes balances after a configured date. Priority rules determine which grant is consumed first. Admins also need the ability to make manual adjustments with a full audit trail.

Expiration requires careful design. Credits that disappear without notice create support tickets. Credits that never expire reduce pricing leverage and impact margin.

For example, Canva uses time-bound credits for certain AI features and trials, encouraging users to engage quickly or upgrade before credits expire. Supporting this requires systems that track expiry at the grant level and surface visibility to users before credits are removed.

Priority rules interact directly with expiration logic. A credit expiring in 3 days must be consumed before a paid credit expiring in 30, even if the paid credit was issued first. This burn order must be recalculated dynamically on every deduction as balances and expiration windows change.

Real-time credit enforcement at runtime

Real-time enforcement turns credits from accounting entries into product controls. It runs on every request, before compute is consumed.

Scenario	What happens	System requirement	Risk if missing
Balance available	Usage event is processed and credits are deducted	Real-time balance check + atomic deduction	Inconsistent state across services
Balance depleted (hard limit)	Request is blocked immediately	Low-latency enforcement at request time	Uncontrolled usage and cost leakage
Balance depleted (soft limit)	Usage continues into negative balance	Support for negative states + reconciliation with billing	Revenue leakage or billing disputes
Overage enabled	Usage continues and is billed separately	Integration between enforcement and billing systems	Misaligned billing and product behavior

Why request-level enforcement matters for AI agents

AI agents turn a single user action into multiple concurrent credit decisions, which means enforcement has to happen at the request level while execution is still in progress.

One agent action can fan out into sub-calls, tool invocations, and model requests that all draw from the same credit pool simultaneously. Unlike a single-request flow, there is no natural serialization point. Enforcement has to follow each step in the chain, with atomic writes and reads that reflect current credit state.

Entry-point checks miss this. Enforcement has to follow each step in the chain, with atomic writes and reads that reflect current credit state.

How Stigg handles it

Stigg is the runtime layer that ties credit balances directly to entitlement enforcement, resolving access decisions before compute is consumed.

Enforcement happens on each request
Depletion behavior is configurable per feature
Product behavior stays aligned with billing and contract terms

Multi-feature credit drawdown: Shared pool enforcement and rate mapping

This is where credit systems move beyond simple metering. Multiple features drawing from the same pool require real-time cost mapping and consistent balance updates across services.

Component	What it does	Engineering requirement
Shared wallet	Single balance used across features	Atomic updates to prevent race conditions
Feature rate mapping	Defines cost per action (e.g., API call vs image)	Centralized, version-controlled configuration
Consumption engine	Applies rates to usage events	Low-latency evaluation at request time
Historical accuracy	Preserves past pricing logic	Versioned rate mappings tied to usage events

In practice, most AI products follow this pattern. A single credit balance is consumed across multiple features such as text generation, image creation, and search, each with different costs.

The challenge is keeping this consistent. The system has to apply the correct rate for each feature at runtime while maintaining a single shared balance across all services.

If rates change, historical usage must still reflect the rate at the time of consumption, not the current one. This requires versioned configurations and immutable usage records.

Credit-based revenue recognition: Aligning usage, pricing, and finance

Paid credits sit on the balance sheet as a liability and convert to revenue as they're consumed. The core challenge is timing and traceability. When credits roll over, revenue cannot be recognized at grant time. It must be recognized when credits are consumed.

Concept	What it means	System requirement
Cost basis	Price per credit at purchase	Stored per grant and linked to usage
Revenue timing	Recognized on consumption	Usage events tied to financial records
Grant tracking	Multiple credit sources (paid, promo)	Separate accounting per grant type
Auditability	Trace every credit to revenue	Immutable ledger with full history

For example:

$10,000 purchase → 8,500 credits
Effective rate → $0.85 per credit

Each usage event must reference the original grant and cost basis, so finance can recognize revenue correctly.

Edge cases add complexity:

Rollovers → delay revenue recognition
Promotional credits → zero or discounted cost basis
Refunds → require ledger adjustments

What looks like a data modeling decision, such as how grants and usage events are linked, becomes a financial requirement for audit and compliance.

Credit system UX and observability: Making usage visible and auditable

A credit system that works but lacks visibility will still fail. Customers and internal teams need real-time insight into balances, usage, and changes.

Customer-facing requirements:

Live balance visibility across all active credits
Usage history broken down by feature and time
Proactive alerts for low balance or upcoming expiration
Self-serve top-ups without support involvement

Admin and internal tooling

Capability	Why it’s important
Audit log (who, what, when)	Required for finance reconciliation and support
Grant and adjustment tracking	Ensures traceability of all balance changes
Feature rate configuration	Controls cost per feature centrally
Manual overrides	Supports promotions, corrections, and edge cases

Without these layers:

Finance cannot audit revenue accurately
Customers lose trust due to lack of visibility
Support load increases due to unclear usage

A credit system needs backend accounting logic, observability, auditability, and customer-facing visibility to operate reliably at scale. And all of that depends on the enforcement layer underneath staying accurate under real production load.

Credits look simple until you're building infrastructure

Most teams only realize the scope of a credit system after they’ve started building it. The first version handles the happy path.

Production load surfaces the rest: idempotency across retries, burn priority across grants, expiration logic that satisfies finance, real-time enforcement, and a transparency layer for customers. Each is its own problem. Together, they turn into months of unplanned work.

Stigg is the usage runtime for AI products. It enforces entitlements, meters usage, and governs AI spend in the request path, not after the bill. Built as a first-class primitive:

Credit management with an append-only ledger, configurable burn order, and hard or soft depletion limits
Real-time metering through a BYOC Sidecar, with entitlement decisions served from local cache and uncached reads fetched from Stigg’s Edge API in roughly 100ms.
Entitlement enforcement across plans, add-ons, trials, and promotional allowances
Packaging and plan management through config, not code, so product teams can update plans and add-ons without engineering involvement
Pricing model changes (hybrid, usage-based, credits) supported in the same infrastructure, with engineering still owning complex transitions
Integrations across your revenue stack, from billing and CPQ to CRM and data warehouses, so pricing, packaging, and entitlements stay aligned across systems

If credit infrastructure is eating sprint capacity, the architecture is being asked to do things it was never designed for. Stigg's credits runtime handles the ledger, enforcement, and entitlements layer as production infrastructure, so your team ships product instead of billing logic.

FAQs

1. What are AI credits in LLM products?

AI credits are a runtime primitive that tracks and enforces consumption across AI features. In LLM and compute-heavy products, they act as a shared currency drawn down across models, workflows, or agent actions at different rates.

2. What is the difference between AI credits and usage-based pricing?

The main difference between AI credits and usage-based pricing is that credits use a shared currency across features, while usage-based pricing charges directly per unit consumed.

3. What happens when a customer runs out of AI credits?

When a customer runs out of AI credits, the system applies the configured depletion rule. Usage is either blocked immediately (hard limit) or allowed to continue into a negative balance (soft limit).

4. How does credit expiration work technically?

Credit expiration removes unused balances after a configured date. The system tracks expiration per grant and burns credits that expire the soonest first.

5. Should I build or buy a credit system?

Building a credit system is reasonable at first, and many early-stage AI products start there: a balance field, a deduction function, maybe a cron job for expiration. It works until you need to support burn priority across multiple grant types, real-time enforcement under load, finance-grade audit trails, and configurable depletion behavior per feature. Each of those is a project. Together, they're an ongoing maintenance burden that grows as your customer base grows.