Automated testing and design systems

  • Updated

Testing is often an under-appreciated discipline in the world of product development. It's easy to overlook as it can be manifested in many ways and doesn’t produce any glamorous output like a design or coded layout. It might rest with a developer,  test engineers, or manual QAs to validate what’s been built. How we validate can be quite different from simple checks that, given certain inputs, the correct output happens (and the inverse with incorrect input) or a full suite of test criteria.

There’s often an inherent disconnect; if a feature has a Business Analyst or similar contributor, there may be criteria drawn up before there’s a design to guide through the product development process. Still, most often, testing is done too close to the end of the process. A better place would be teams working with Test Driven Development or doing a lot of pairing as that moves some of that effort either to the beginning of the process (writing tests first) or working collaboratively, gives more confidence in the code (so often less/no need for PRs if you work that way). How and what we test, when and by whom can vary massively, and yet it stems from the definition of the feature at the beginning; what are we making, for who, why, and to do what? Digging into a feature, we might have states of components: what happens when a user encounters an error, how we show loading to a user, each state, and how we get there might need testing. 

In a world of design systems, what kinds of automated testing are most appropriate?


Types of testing

There are a number of ways that we can test our code and output to ensure what we’re building does what it’s designed to. Let’s go through the main categories of testing and how they might relate to the product you’re building, specifically, we’ll focus on the UI. Depending on what value you place on each depends on where you might focus your efforts, as there are many schools of thought!

Unit tests

The goal of unit testing is testing given an input we receive an expected output; this focuses on very small units of our codebase. Its concerns are individual functions or components. Let’s take the example of a button; It’s a fundamental part of most digital products, and you probably have a swath of variations in your design system.




A unit tests would focus on fundamentals:

  • Does the button render?
  • Given label text, does it display?
  • When clicking the button, does it call some action?

Unit tests are usually the more common types of tests because they are smaller, isolated, and so easier to think about and write.  Like a button in your design system, when done right, unit tests can be around for a long time. Even visual changes won't impact them, since they are generally testing how something should work, rather than look.

Integration tests

Where unit tests focus on individual parts of a system, the aim of integration tests is to ensure the different components work together.

Let’s take a look at our UI again, instead of a single button in isolation, let’s look at something that might be composed of multiple components. At the ‘molecule’ or ‘organism’ level if you’re thinking from an atomic design perspective. We’ll look at a search component.




An integration test would focus on how the input and button components work together:

  • If the input is empty, and the user clicks the “Search” button, what happens?
  • When search results are loading, does the spinner show?
  • If there are no results from the search, does the dropdown show the correct label?

Integration tests are focused on the interplay between the smaller components we’ve already tested with unit tests. We’re now testing higher-level interactions the user performs and how that impacts the UI. These are also very common tests that are a little more fragile than unit tests, but they offer a lot of benefits, ensuring the building blocks of your UI are correct.

Snapshot tests

In effect, these create a snapshot of the generated UI code before and after changes. The goal of this test is to catch unintended changes in code. The integration tests we’ve written are very explicit and could be written before creating the component for a test-driven development style or afterward. Snapshot testing is different; the snapshot is taken when the component already exists.

If we look at our search input from the previous example, let’s say we’re building a UI on the web. This test could be used to take a snapshot of the HTML output of the component. A few months later, we decided to update the input component to be more accessible and include a label rather than just a placeholder. When we run the test, it will snapshot the HTML which was generated and compare it to the last time. If these are different, it’s then up to the developer to decide if the change is intentional or if something has gone awry.

Your mileage might vary with this method. It’s best suited to stable products and used in situations where you’re trying to keep things the same and avoid regressions. It adds an overhead of frequently reviewing snapshots.

Visual regression

The previous testing categories, unit tests, integration tests, and snapshot tests are designed to run quickly. When thinking about a web UI to achieve speed, they don’t render in a full browser. Everything is done in a simulated way. Visual regression tests are a bit different. VRT relies on an actual browser to take actual snapshots of the current browser view and compare it with previous snapshots to figure out what’s changed and by how much. This could be performed on individual components or complete pages; it’s up to you.




The benefit of this type of testing is that it can catch changes that snapshot testing doesn’t. Where a snapshot test captures the HTML output, a VRT is a better representation of what a user sees. If a color changes, the border thickness or a component grows, or we change up the alignment in CSS, a visual regression test will let us know. We can also use them to reduce the disparity between different browsers on the web or mobile operating systems.

Behavioural or End to End tests

Using a headless web browser, E2E tests attempt to replicate user behavior through different scenarios to validate that a user journey would achieve an expected result. This is where everything comes together. 

Looking at our product, we could cover user journeys like this:

  • When a user signs up and goes through the onboarding process, they are able to complete the process and verify their account
  • When a user creates a styleguide for the first time, they see a sample styleguide, clicking on the main CTA takes them to the design uploads page

This type of test is the most powerful because it can cover entire user journeys as a user would see it and can be run using different browsers or operating systems. The downside is the cost. They are the most time-consuming to write as they touch every part of the product you're building. E2E tests take a long time to run because they have to spin up a web browser (maybe more than one) or simulated devices and run through scenarios as a user would. For complex product tests like this, once in place, can save a lot of time versus manually going through each flow in your product to check for regressions. For less established products, it might be overkill in the early stages to focus on this category and attempt to cover the entire product; instead, it’s often better to pick the core scenarios you care about and test those first.

You may well have some or all of these categories of tests providing coverage at the same time but checking different aspects of the site or app. It’s important to find out which tests work best for your product and your team. This changes over time, in the early stages of a product, you may find that snapshot tests don’t offer much value, but the behavioral and visual regression suites do as they’re closer to what the user experiences.


What and how to test

Every feature will have one or more desired outcomes, which we’d call ‘the happy path.’ They’re pretty easy to validate - given the right input, do we get the right output? What often gets overlooked is all the ways that things could go wrong. Collating a wide range of scenarios gives you a suite of tests that can be run to ensure the feature works and fails in the right ways when it should. You might have some criteria like:

This component accepts three props/arguments and should render with the right content but:

  • What if fewer arguments are passed to the component?
  • What if the args that are passed in are of the wrong type (expected a number and got a string, for example)?
  • What if nothing is passed to the component?
  • What if too much or not enough information is in any arguments passed to the component?

These scenarios aren’t purely in the realm of development. As a designer, how many of these might be considered? How might a component fail gracefully? What if a really long string of text is passed to it? Here, there’s scope to bring design and development closer by using the same test data.

What does automated testing bring to design systems?

One of the biggest things that a design system can bring to the consumers of the system is confidence

As a designer or developer, you want to know that you can rely on the components or properties that you use to solve your problem. This is where automated testing comes into play. Knowing that the components work and render as intended means that you can focus on pulling the right components in to solve your problem. And, when things break, you can be sure that it's not the fault of the individual components themselves.

Finally, testing is a great starting point for aligning the team's expectations. Thinking of test scenarios together (e.g., a 3 Amigos style session) can help with clarity on so-called happy and sad paths. This is also a good opportunity to see how you can use the same test data across design and development—a topic we'll dive into in more detail in another article!


Written collaboratively with colleagues Seth Corker and Chee Diep

Was this article helpful?