You integrated with Stripe, but are you confident that it works?

Implementing Stripe is not enough. We've learned this the hard way, when we found internal server error responses from Stripe after deploying bad code. So, we've built end-to-end testing to never fear revenue loss again.

You integrated with Stripe, but are you confident that it works?

This post's purpose is to share our journey of integrating with Stripe from a testing perspective - how our testing methodologies evolved over time, and what were the motivations and considerations along the way for implementation of end-to-end testing (a.k.a E2E) with a real Stripe account (in a test mode).

The ultimate goal of those testing methods is to prevent revenue loss by making sure that Stripe integration works as expected at all times - for example, preventing bad code from being deployed to production.

Background

We have developed our Stripe integration from the very early stages of our product. Back then the integration was a bit more from a working POC. The communication with Stripe was synchronous (both incoming and outgoing). As we wanted to develop our product at a fast pace, we invested balanced effort writing tests for coverage of the integration. Those were service-tests with one-time pre-generated mock data.

Back then it was good enough for us as all integration flows were covered. Since then, our architecture evolved over time, and one of the big changes was refactoring into asynchronous communication with Stripe (read more here about this change). Such changes exposed us to scenarios that could not be covered by our existing service-tests. We had an incident where a bad code was deployed to the Lambda that is invoked via Stripe webhook and responsible to acknowledge the message and persist it into SQS queue for further processing, causing internal server error (500) responses to Stripe. This led to a situation where changes from Stripe were not reflected in our system.

Thanks to our monitoring system we were notified and were able to rollback Lambda to the previous stable version and perform manual actions to replay failed events from Stripe, so no customers were affected. During our incident post-mortem, we discussed the consequences of the incident - a potential revenue loss for our customers, since their end-customers couldn’t complete new purchases via Stripe checkout. We all agreed that we should invest additional effort to prevent similar cases happening in the future. We unanimously decided to enrich our end-to-end tests (a.k.a E2E)  with additional test-case that perform a real Stripe checkout as that is a crucial flow for us.

Design E2E tests

We debated between API testing vs. UI testing. While the former is more stable, the latter covers exactly the same flow as customers would do. We decided to combine both of them by doing UI testing for the critical user-flow, and API testing for additional verifications (e.g. resources created successfully).

Before diving into implementation, we listed the requirements from the new E2E test. Those requirements' purpose is to let us trust the E2E test and have minimal on-going maintenance effort.

The requirements are:

  • It should include UI testing that includes both Stigg and Stripe UI.
  • It should be stable - avoid flakiness and false-positives test failures.
  • It should be possible to run multiple tests concurrently.
  • It should finish in under 3 minutes.
  • It should be able to easily debug failures.

In the next few sections we will discuss considerations that helped us shape our E2E test framework that contains 3 tools:

  • RunScope - test orchestration and API verifications.
  • Ghost Inspector - UI testing.
  • Demo app - an internal application that simulates an integration with Stigg.

What scenario should be tested?

In order for the test to have short execution time, and avoid test flakiness, we decided to cover a single UI scenario and the rest of the verifications would be done via API calls. The covered scenario is for an end-customer wanting to upgrade from a free to paid plan, forwarded into Stripe Checkout UI, filling the checkout form with credit-card, then redirected back to the application and verifying that plan was upgraded to the paid plan.

Although we wanted to test a production ready stripe account, we settled on using a test-mode Stripe account in order to use test cards in order to avoid use of a real credit card to make payments. We justify this decision by the assumption that Stripe is well tested by them, so test mode works similar to production mode.

Using Ghost Inspector to define our UI testing was very easy to do. A very nice feature they have is a video recording of all the actions and verifications happened during the UI testing. Watch the below video to see the test’s UI actions:

Rest of the verifications are done via API calls to Stripe - making sure that the customer and the subscription are created correctly in Stripe.

Shared Stripe account vs. account per test execution?

At the beginning we intended to create a dedicated Stripe account for each test execution in order to have full separation between test execution and quick cleanup. However, we faced technical issue with it, as creating Stripe accounts via API is possible only for Stripe Connect which required adding our production Stripe Connect account secret key to the testing framework which is something that is not a good practice:

  • Security hole - having access to the production Stripe Connect secret keys can lead to access to all the connected accounts.
  • Possibility for intervention from tests in our production Stripe Connect account, which opens a room for not stable tests.

Due to the above concerns, we settled on having a single Stripe account (which was created manually) in a test-mode, which all the Stigg environments will be connected to. Since our product architecture knows how to handle such cases where multiple Stigg environments connect to the same Stripe account, it didn’t require additional development effort to support this 💪.

When should E2E tests run?

Since we designed our E2E test in such a way that it finishes quickly and can run in parallel, we were free to decide when we want it to run without any constraints. We decided that we want to have 2 automatic triggers to the test:

  • Upon every new deployment - this makes sure that new deployed code passes tests.
  • Scheduled every 6 hours - this makes sure that other factors beside code works well (e.g. connectivity issues)

Quick debug of failed tests

In order to be able to quickly debug fail test, we follow few rules of thumb:

1. Environment of failed tests are not deleted at the end of the test execution, and are kept another 1 week until being removed.

2. Upon test failure a notification is sent to a dedicated slack channel with all the relevant information that can assist for debug:

  • General information about the test.
  • Mention the developer that had the last deployed commit
  • Links to all all the tools / environments for quick access
  • Screenshot of the last failed UI action.

Once we receive a notification of a failed test, we look at it with highest priority. If it has a mention of the last deployer, then that developer investigates the failure, and in case it was a scheduled test, then the on-call developers investigate the failure.

Tests cleanup

It’s easy to forget to implement test cleanup as it’s not in the main test flow. Tests can run for a few months without cleanup, and everything will work well, until things start to break due to leftovers from older test executions.

Since we used a shared Stripe account, we put extra attention on this area to avoid such cases. At the end of every successful test execution, we delete the customer and the subscription that were created in Stripe, together with our own Stigg environment that were created for the test execution.

For failed tests, resources are not immediately deleted, since we would like to be able to connect to debug failures. Those resources are deleted after 1 week, which is more than enough time to investigate test failures.

Conclusions

Implementing Stripe integration is not enough for the long run. Every time new code is deployed, critical flows may be affected. Implementing E2E tests covering the Stripe integration was crucial for us to deploy new code without fearing to interrupt critical flows. On top of that, it empowered our DevOps team to do infrastructure changes independently, since they can simply run the E2E test and monitor that it passed.