Many teams think credits are a data model problem. You add a balance field, write a deduction function, and ship it. Three months later, you're debugging double charges, reconciling a ledger that doesn't match finance's numbers, and trying to explain why a customer’s balance hit zero when it shouldn't have.
AI credits are a runtime infrastructure problem. The moment you introduce them, you need real-time deduction, shared pools across features, expiration logic that satisfies finance, and low-latency enforcement on every request.
This guide breaks down how to build AI credits for LLM and compute-heavy products, including the core infrastructure components, trade-offs, and failure points teams hit as they scale.
Why AI credits became the default pricing model for AI products
Pure pay-as-you-go worked when products had one AI feature. As products added more, pricing each feature separately became unmanageable for customers and engineering alike.
Credits solve that with a single abstraction. One currency, consumed across multiple features at different rates. Stability AI, for example, structures its API around prepaid credits consumed across image generation, video, and editing capabilities. One currency drawn down at different rates depending on the model and task, rather than a separate pricing structure per capability.
Credits simplify pricing on the surface. Underneath, they require a metering layer that can map one currency across multiple features, each with different costs, and apply those rules in real time.
The first decision: Which credit model to support
The first decision is choosing between subscription-based credits and prepaid credit packs. This choice determines how usage is billed, how entitlements are enforced, and how complex your system becomes.
Recurring credits (subscription-based)
Customers receive a credit allocation as part of their plan, typically monthly or annually. Unused credits roll over or expire based on configuration, and additional credits come through tier upgrades or top-ups.
This works best when customers are willing to commit upfront.
For example, Notion bundles AI features like Notion Agent into its Business Plan at a flat per-seat fee, then meters autonomous Custom Agent workloads separately at $10 per 1,000 Notion credits. This gives workspaces a predictable per-seat baseline for everyday AI, with separately metered credits for agent runs.
Prepaid credit packs
Customers deposit a fixed amount and consume credits until the balance runs out, then top up on demand. This works best when usage is unpredictable and customers need flexibility.
How to choose
The decision depends on how predictable usage is and how much commitment customers are willing to make upfront.
Subscription credits work when customers can forecast their usage and are willing to commit to a plan. The system is simpler because allocations are tied to billing cycles and reset on a schedule. This model suits:
- Enterprise customers on annual contracts with defined usage tiers
- Products where AI features are embedded in a workflow, not triggered ad hoc
- Teams that need predictable revenue and want to avoid top-up friction
Prepaid credits work when usage is unpredictable or spiky, which is common in LLM products where a single job or inference run can consume significant credits in a short window. This model suits:
- Developer tools where usage varies significantly between users
- Products with infrequent but high-cost inference workloads
- Early-stage products where usage patterns are still unknown
Prepaid adds engineering surface area that compounds quickly: balance state has to be managed per customer, top-up flows need to handle partial balances and expired grants, and depletion logic has to account for negative states.
Supporting both models before you understand your usage patterns means building and maintaining that complexity twice, across allocation, enforcement, and billing, without the data to justify it.
The core building blocks behind how AI credits work
A credit system is a set of stateful components that manage balance, usage, and enforcement at runtime. Each component plays a specific role in keeping metering, entitlements, and billing aligned.
Together, these components turn a simple credit balance into a real-time system that tracks usage, updates state, and enforces limits on every request.
How credits are granted and tracked
Credits are allocated through multiple sources: recurring grants tied to subscription plans, promotional credits, add-ons, and one-time top-ups. Each source produces a grant, and each grant needs to carry more than a balance.
At minimum, every grant record needs:
- Effective date and expiration date: Determines when credits become available and when they expire. Without this, you cannot enforce time-bound promotions or monthly resets correctly.
- Cost basis: The price paid per credit at the time of the grant. Promotional credits carry a zero or discounted cost basis. Paid credits carry the purchase rate. Finance needs this to recognize revenue correctly at consumption time, not grant time.
- Category: Defines which features or workflows the credits can be applied to. Some grants are unrestricted. Others are scoped to specific capabilities, such as image generation or agent actions. Without categories, you cannot enforce feature-level credit rules at runtime.
- Burn priority: Controls which grant is consumed first when a customer has multiple active balances. Promotional credits should typically deplete before paid ones. Credits expiring sooner should be consumed before those with longer windows. These rules need to live in the data model and be enforced consistently, not handled in application code.
Each grant must be recorded as an immutable ledger event. This is what gives finance the audit trail to reconcile what was issued against what was consumed, and what gives your enforcement layer the data it needs to apply the right rules on every inference request.
For example, when Miro launched its AI-powered collaboration platform, it needed credits allocated based on plan and seat count, tracked in real time, and reset each billing cycle. Miro needed this without rebuilding its existing architecture from scratch. Using Stigg, the team configured credit allocation, runtime enforcement, and usage visibility in under 6 weeks.
Credit consumption: Where system complexity shows up
Consumption looks relatively easy. A feature is used, credits are deducted, and the balance updates. In practice, this is where most systems break.
Why this matters
If these systems fail:
- Double deductions break trust
- Missed deductions create revenue leakage
- Delayed enforcement leads to uncontrolled usage and cost spikes
The problems in this table are connected. Under real load, multiple requests can hit the same credit pool simultaneously, each reading the same available balance before any debit is committed. Every request passes, the writes land afterward, and the balance drops below zero once the compute is already spent.
Idempotency, consistency, and enforcement timing are all expressions of the same concurrency problem. Treating them as separate edge cases usually leads to fragmented enforcement spread across services and middleware.
Credit lifecycle management: States, rules, and edge cases
Credit lifecycles introduce more states and transitions than most teams expect.
Grants can be recurring, promotional, or one-time. Credits have effective dates, meaning they can be issued before they are available to consume. Expiration logic removes balances after a configured date. Priority rules determine which grant is consumed first. Admins also need the ability to make manual adjustments with a full audit trail.
Expiration requires careful design. Credits that disappear without notice create support tickets. Credits that never expire reduce pricing leverage and impact margin.
For example, Canva uses time-bound credits for certain AI features and trials, encouraging users to engage quickly or upgrade before credits expire. Supporting this requires systems that track expiry at the grant level and surface visibility to users before credits are removed.
Priority rules interact directly with expiration logic. A credit expiring in 3 days must be consumed before a paid credit expiring in 30, even if the paid credit was issued first. This burn order must be recalculated dynamically on every deduction as balances and expiration windows change.
Real-time credit enforcement at runtime
Real-time enforcement turns credits from accounting entries into product controls. It runs on every request, before compute is consumed.
Why request-level enforcement matters for AI agents
AI agents turn a single user action into multiple concurrent credit decisions, which means enforcement has to happen at the request level while execution is still in progress.
One agent action can fan out into sub-calls, tool invocations, and model requests that all draw from the same credit pool simultaneously. Unlike a single-request flow, there is no natural serialization point. Enforcement has to follow each step in the chain, with atomic writes and reads that reflect current credit state.
Entry-point checks miss this. Enforcement has to follow each step in the chain, with atomic writes and reads that reflect current credit state.
How Stigg handles it
Stigg is the runtime layer that ties credit balances directly to entitlement enforcement, resolving access decisions before compute is consumed.
- Enforcement happens on each request
- Depletion behavior is configurable per feature
- Product behavior stays aligned with billing and contract terms
Multi-feature credit drawdown: Shared pool enforcement and rate mapping
This is where credit systems move beyond simple metering. Multiple features drawing from the same pool require real-time cost mapping and consistent balance updates across services.
In practice, most AI products follow this pattern. A single credit balance is consumed across multiple features such as text generation, image creation, and search, each with different costs.
The challenge is keeping this consistent. The system has to apply the correct rate for each feature at runtime while maintaining a single shared balance across all services.
If rates change, historical usage must still reflect the rate at the time of consumption, not the current one. This requires versioned configurations and immutable usage records.
Credit-based revenue recognition: Aligning usage, pricing, and finance
Paid credits sit on the balance sheet as a liability and convert to revenue as they're consumed. The core challenge is timing and traceability. When credits roll over, revenue cannot be recognized at grant time. It must be recognized when credits are consumed.
For example:
- $10,000 purchase → 8,500 credits
- Effective rate → $0.85 per credit
Each usage event must reference the original grant and cost basis, so finance can recognize revenue correctly.
Edge cases add complexity:
- Rollovers → delay revenue recognition
- Promotional credits → zero or discounted cost basis
- Refunds → require ledger adjustments
What looks like a data modeling decision, such as how grants and usage events are linked, becomes a financial requirement for audit and compliance.
Credit system UX and observability: Making usage visible and auditable
A credit system that works but lacks visibility will still fail. Customers and internal teams need real-time insight into balances, usage, and changes.
Customer-facing requirements:
- Live balance visibility across all active credits
- Usage history broken down by feature and time
- Proactive alerts for low balance or upcoming expiration
- Self-serve top-ups without support involvement
Admin and internal tooling
Without these layers:
- Finance cannot audit revenue accurately
- Customers lose trust due to lack of visibility
- Support load increases due to unclear usage
A credit system needs backend accounting logic, observability, auditability, and customer-facing visibility to operate reliably at scale. And all of that depends on the enforcement layer underneath staying accurate under real production load.
Credits look simple until you're building infrastructure
Most teams only realize the scope of a credit system after they’ve started building it. The first version handles the happy path.
Production load surfaces the rest: idempotency across retries, burn priority across grants, expiration logic that satisfies finance, real-time enforcement, and a transparency layer for customers. Each is its own problem. Together, they turn into months of unplanned work.
Stigg is the usage runtime for AI products. It enforces entitlements, meters usage, and governs AI spend in the request path, not after the bill. Built as a first-class primitive:
- Credit management with an append-only ledger, configurable burn order, and hard or soft depletion limits
- Real-time metering through a BYOC Sidecar, with entitlement decisions served from local cache and uncached reads fetched from Stigg’s Edge API in roughly 100ms.
- Entitlement enforcement across plans, add-ons, trials, and promotional allowances
- Packaging and plan management through config, not code, so product teams can update plans and add-ons without engineering involvement
- Pricing model changes (hybrid, usage-based, credits) supported in the same infrastructure, with engineering still owning complex transitions
- Integrations across your revenue stack, from billing and CPQ to CRM and data warehouses, so pricing, packaging, and entitlements stay aligned across systems
If credit infrastructure is eating sprint capacity, the architecture is being asked to do things it was never designed for. Stigg's credits runtime handles the ledger, enforcement, and entitlements layer as production infrastructure, so your team ships product instead of billing logic.
FAQs
1. What are AI credits in LLM products?
AI credits are a runtime primitive that tracks and enforces consumption across AI features. In LLM and compute-heavy products, they act as a shared currency drawn down across models, workflows, or agent actions at different rates.
2. What is the difference between AI credits and usage-based pricing?
The main difference between AI credits and usage-based pricing is that credits use a shared currency across features, while usage-based pricing charges directly per unit consumed.
3. What happens when a customer runs out of AI credits?
When a customer runs out of AI credits, the system applies the configured depletion rule. Usage is either blocked immediately (hard limit) or allowed to continue into a negative balance (soft limit).
4. How does credit expiration work technically?
Credit expiration removes unused balances after a configured date. The system tracks expiration per grant and burns credits that expire the soonest first.
5. Should I build or buy a credit system?
Building a credit system is reasonable at first, and many early-stage AI products start there: a balance field, a deduction function, maybe a cron job for expiration. It works until you need to support burn priority across multiple grant types, real-time enforcement under load, finance-grade audit trails, and configurable depletion behavior per feature. Each of those is a project. Together, they're an ongoing maintenance burden that grows as your customer base grows.

%20(1).png)
%20(1).png)
%20(1).png)
%20(1).png)