AI Credit Infrastructure for LLM Products: From Design to Scale

AI credits in LLM products are a runtime infrastructure problem. Learn how to architect metering, enforcement, ledger design, and failure points at scale.

Sara Nelissen
  |  
May 20, 2026
  |  
9
min read
AI Credit Infrastructure for LLM Products: From Design to Scale

Many teams think credits are a data model problem. You add a balance field, write a deduction function, and ship it. Three months later, you're debugging double charges, reconciling a ledger that doesn't match finance's numbers, and trying to explain why a customer’s balance hit zero when it shouldn't have.

AI credits are a runtime infrastructure problem. The moment you introduce them, you need real-time deduction, shared pools across features, expiration logic that satisfies finance, and low-latency enforcement on every request.

This guide breaks down how to build AI credits for LLM and compute-heavy products, including the core infrastructure components, trade-offs, and failure points teams hit as they scale.

Why AI credits became the default pricing model for AI products

Pure pay-as-you-go worked when products had one AI feature. As products added more, pricing each feature separately became unmanageable for customers and engineering alike.

Credits solve that with a single abstraction. One currency, consumed across multiple features at different rates. Stability AI, for example, structures its API around prepaid credits consumed across image generation, video, and editing capabilities. One currency drawn down at different rates depending on the model and task, rather than a separate pricing structure per capability.

Credits simplify pricing on the surface. Underneath, they require a metering layer that can map one currency across multiple features, each with different costs, and apply those rules in real time.

The first decision: Which credit model to support

The first decision is choosing between subscription-based credits and prepaid credit packs. This choice determines how usage is billed, how entitlements are enforced, and how complex your system becomes.

Recurring credits (subscription-based)

Customers receive a credit allocation as part of their plan, typically monthly or annually. Unused credits roll over or expire based on configuration, and additional credits come through tier upgrades or top-ups.

This works best when customers are willing to commit upfront.

For example, Notion bundles AI features like Notion Agent into its Business Plan at a flat per-seat fee, then meters autonomous Custom Agent workloads separately at $10 per 1,000 Notion credits. This gives workspaces a predictable per-seat baseline for everyday AI, with separately metered credits for agent runs.

Prepaid credit packs

Customers deposit a fixed amount and consume credits until the balance runs out, then top up on demand. This works best when usage is unpredictable and customers need flexibility.

How to choose

The decision depends on how predictable usage is and how much commitment customers are willing to make upfront.

Subscription credits work when customers can forecast their usage and are willing to commit to a plan. The system is simpler because allocations are tied to billing cycles and reset on a schedule. This model suits:

  • Enterprise customers on annual contracts with defined usage tiers
  • Products where AI features are embedded in a workflow, not triggered ad hoc
  • Teams that need predictable revenue and want to avoid top-up friction

Prepaid credits work when usage is unpredictable or spiky, which is common in LLM products where a single job or inference run can consume significant credits in a short window. This model suits:

  • Developer tools where usage varies significantly between users
  • Products with infrequent but high-cost inference workloads
  • Early-stage products where usage patterns are still unknown

Prepaid adds engineering surface area that compounds quickly: balance state has to be managed per customer, top-up flows need to handle partial balances and expired grants, and depletion logic has to account for negative states.

Supporting both models before you understand your usage patterns means building and maintaining that complexity twice, across allocation, enforcement, and billing, without the data to justify it.

The core building blocks behind how AI credits work

A credit system is a set of stateful components that manage balance, usage, and enforcement at runtime. Each component plays a specific role in keeping metering, entitlements, and billing aligned.

Component What it represents Engineering considerations
Credit wallet State container for a customer’s balance Stores current balance, aggregates grants, and serves as the source of truth for runtime checks
Credit grant Ledger event that adds credits Includes metadata like expiry, remaining balance, and consumption priority
Feature consumption rate Mapping between actions and credit cost Must be defined centrally and applied consistently across services to avoid drift
Depletion behavior System response when balance reaches zero Enforced at runtime (hard vs soft limits) and must align with billing logic
Rollover policy Rules for unused credits across cycles Requires scheduled jobs or events to expire or carry forward balances

Together, these components turn a simple credit balance into a real-time system that tracks usage, updates state, and enforces limits on every request.

How credits are granted and tracked

Credits are allocated through multiple sources: recurring grants tied to subscription plans, promotional credits, add-ons, and one-time top-ups. Each source produces a grant, and each grant needs to carry more than a balance.

At minimum, every grant record needs:

  1. Effective date and expiration date: Determines when credits become available and when they expire. Without this, you cannot enforce time-bound promotions or monthly resets correctly.
  2. Cost basis: The price paid per credit at the time of the grant. Promotional credits carry a zero or discounted cost basis. Paid credits carry the purchase rate. Finance needs this to recognize revenue correctly at consumption time, not grant time.
  3. Category: Defines which features or workflows the credits can be applied to. Some grants are unrestricted. Others are scoped to specific capabilities, such as image generation or agent actions. Without categories, you cannot enforce feature-level credit rules at runtime.
  4. Burn priority: Controls which grant is consumed first when a customer has multiple active balances. Promotional credits should typically deplete before paid ones. Credits expiring sooner should be consumed before those with longer windows. These rules need to live in the data model and be enforced consistently, not handled in application code.

Each grant must be recorded as an immutable ledger event. This is what gives finance the audit trail to reconcile what was issued against what was consumed, and what gives your enforcement layer the data it needs to apply the right rules on every inference request.

For example, when Miro launched its AI-powered collaboration platform, it needed credits allocated based on plan and seat count, tracked in real time, and reset each billing cycle. Miro needed this without rebuilding its existing architecture from scratch. Using Stigg, the team configured credit allocation, runtime enforcement, and usage visibility in under 6 weeks.

Credit consumption: Where system complexity shows up

Consumption looks relatively easy. A feature is used, credits are deducted, and the balance updates. In practice, this is where most systems break.

Problem area What’s happening under the hood What the system must handle
Event delivery Usage pipelines are at-least-once with retries Idempotent deduction logic to prevent double-charging
Event identity Same event may be processed multiple times Unique event IDs and pipeline-level de-duplication
State consistency Deductions touch multiple systems (wallet, billing, analytics) Consistent updates across data stores
Enforcement timing Usage can outpace enforcement if delayed Real-time checks at the moment of consumption

Why this matters

If these systems fail:

  • Double deductions break trust
  • Missed deductions create revenue leakage
  • Delayed enforcement leads to uncontrolled usage and cost spikes

The problems in this table are connected. Under real load, multiple requests can hit the same credit pool simultaneously, each reading the same available balance before any debit is committed. Every request passes, the writes land afterward, and the balance drops below zero once the compute is already spent.

Idempotency, consistency, and enforcement timing are all expressions of the same concurrency problem. Treating them as separate edge cases usually leads to fragmented enforcement spread across services and middleware.

Credit lifecycle management: States, rules, and edge cases

Credit lifecycles introduce more states and transitions than most teams expect.

Grants can be recurring, promotional, or one-time. Credits have effective dates, meaning they can be issued before they are available to consume. Expiration logic removes balances after a configured date. Priority rules determine which grant is consumed first. Admins also need the ability to make manual adjustments with a full audit trail.

Expiration requires careful design. Credits that disappear without notice create support tickets. Credits that never expire reduce pricing leverage and impact margin.

For example, Canva uses time-bound credits for certain AI features and trials, encouraging users to engage quickly or upgrade before credits expire. Supporting this requires systems that track expiry at the grant level and surface visibility to users before credits are removed.

Priority rules interact directly with expiration logic. A credit expiring in 3 days must be consumed before a paid credit expiring in 30, even if the paid credit was issued first. This burn order must be recalculated dynamically on every deduction as balances and expiration windows change.

Real-time credit enforcement at runtime

Real-time enforcement turns credits from accounting entries into product controls. It runs on every request, before compute is consumed.

Scenario What happens System requirement Risk if missing
Balance available Usage event is processed and credits are deducted Real-time balance check + atomic deduction Inconsistent state across services
Balance depleted (hard limit) Request is blocked immediately Low-latency enforcement at request time Uncontrolled usage and cost leakage
Balance depleted (soft limit) Usage continues into negative balance Support for negative states + reconciliation with billing Revenue leakage or billing disputes
Overage enabled Usage continues and is billed separately Integration between enforcement and billing systems Misaligned billing and product behavior

Why request-level enforcement matters for AI agents

AI agents turn a single user action into multiple concurrent credit decisions, which means enforcement has to happen at the request level while execution is still in progress.

One agent action can fan out into sub-calls, tool invocations, and model requests that all draw from the same credit pool simultaneously. Unlike a single-request flow, there is no natural serialization point. Enforcement has to follow each step in the chain, with atomic writes and reads that reflect current credit state.

Entry-point checks miss this. Enforcement has to follow each step in the chain, with atomic writes and reads that reflect current credit state.

How Stigg handles it

Stigg is the runtime layer that ties credit balances directly to entitlement enforcement, resolving access decisions before compute is consumed.

  • Enforcement happens on each request
  • Depletion behavior is configurable per feature
  • Product behavior stays aligned with billing and contract terms

Multi-feature credit drawdown: Shared pool enforcement and rate mapping

This is where credit systems move beyond simple metering. Multiple features drawing from the same pool require real-time cost mapping and consistent balance updates across services.

Component What it does Engineering requirement
Shared wallet Single balance used across features Atomic updates to prevent race conditions
Feature rate mapping Defines cost per action (e.g., API call vs image) Centralized, version-controlled configuration
Consumption engine Applies rates to usage events Low-latency evaluation at request time
Historical accuracy Preserves past pricing logic Versioned rate mappings tied to usage events

In practice, most AI products follow this pattern. A single credit balance is consumed across multiple features such as text generation, image creation, and search, each with different costs.

The challenge is keeping this consistent. The system has to apply the correct rate for each feature at runtime while maintaining a single shared balance across all services.

If rates change, historical usage must still reflect the rate at the time of consumption, not the current one. This requires versioned configurations and immutable usage records.

Credit-based revenue recognition: Aligning usage, pricing, and finance

Paid credits sit on the balance sheet as a liability and convert to revenue as they're consumed. The core challenge is timing and traceability. When credits roll over, revenue cannot be recognized at grant time. It must be recognized when credits are consumed.

Concept What it means System requirement
Cost basis Price per credit at purchase Stored per grant and linked to usage
Revenue timing Recognized on consumption Usage events tied to financial records
Grant tracking Multiple credit sources (paid, promo) Separate accounting per grant type
Auditability Trace every credit to revenue Immutable ledger with full history

For example:

  • $10,000 purchase → 8,500 credits
  • Effective rate → $0.85 per credit

Each usage event must reference the original grant and cost basis, so finance can recognize revenue correctly.

Edge cases add complexity:

  • Rollovers → delay revenue recognition
  • Promotional credits → zero or discounted cost basis
  • Refunds → require ledger adjustments

What looks like a data modeling decision, such as how grants and usage events are linked, becomes a financial requirement for audit and compliance.

Credit system UX and observability: Making usage visible and auditable

A credit system that works but lacks visibility will still fail. Customers and internal teams need real-time insight into balances, usage, and changes.

Customer-facing requirements:

  • Live balance visibility across all active credits
  • Usage history broken down by feature and time
  • Proactive alerts for low balance or upcoming expiration
  • Self-serve top-ups without support involvement

Admin and internal tooling

Capability Why it’s important
Audit log (who, what, when) Required for finance reconciliation and support
Grant and adjustment tracking Ensures traceability of all balance changes
Feature rate configuration Controls cost per feature centrally
Manual overrides Supports promotions, corrections, and edge cases

Without these layers:

  • Finance cannot audit revenue accurately
  • Customers lose trust due to lack of visibility
  • Support load increases due to unclear usage

A credit system needs backend accounting logic, observability, auditability, and customer-facing visibility to operate reliably at scale. And all of that depends on the enforcement layer underneath staying accurate under real production load.

Credits look simple until you're building infrastructure

Most teams only realize the scope of a credit system after they’ve started building it. The first version handles the happy path.

Production load surfaces the rest: idempotency across retries, burn priority across grants, expiration logic that satisfies finance, real-time enforcement, and a transparency layer for customers. Each is its own problem. Together, they turn into months of unplanned work.

Stigg is the usage runtime for AI products. It enforces entitlements, meters usage, and governs AI spend in the request path, not after the bill. Built as a first-class primitive:

  • Credit management with an append-only ledger, configurable burn order, and hard or soft depletion limits
  • Real-time metering through a BYOC Sidecar, with entitlement decisions served from local cache and uncached reads fetched from Stigg’s Edge API in roughly 100ms.
  • Entitlement enforcement across plans, add-ons, trials, and promotional allowances
  • Packaging and plan management through config, not code, so product teams can update plans and add-ons without engineering involvement
  • Pricing model changes (hybrid, usage-based, credits) supported in the same infrastructure, with engineering still owning complex transitions
  • Integrations across your revenue stack, from billing and CPQ to CRM and data warehouses, so pricing, packaging, and entitlements stay aligned across systems

If credit infrastructure is eating sprint capacity, the architecture is being asked to do things it was never designed for. Stigg's credits runtime handles the ledger, enforcement, and entitlements layer as production infrastructure, so your team ships product instead of billing logic.

FAQs

1. What are AI credits in LLM products?

AI credits are a runtime primitive that tracks and enforces consumption across AI features. In LLM and compute-heavy products, they act as a shared currency drawn down across models, workflows, or agent actions at different rates.

2. What is the difference between AI credits and usage-based pricing?

The main difference between AI credits and usage-based pricing is that credits use a shared currency across features, while usage-based pricing charges directly per unit consumed.

3. What happens when a customer runs out of AI credits?

When a customer runs out of AI credits, the system applies the configured depletion rule. Usage is either blocked immediately (hard limit) or allowed to continue into a negative balance (soft limit).

4. How does credit expiration work technically?

Credit expiration removes unused balances after a configured date. The system tracks expiration per grant and burns credits that expire the soonest first.

5. Should I build or buy a credit system?

Building a credit system is reasonable at first, and many early-stage AI products start there: a balance field, a deduction function, maybe a cron job for expiration. It works until you need to support burn priority across multiple grant types, real-time enforcement under load, finance-grade audit trails, and configurable depletion behavior per feature. Each of those is a project. Together, they're an ongoing maintenance burden that grows as your customer base grows.