August 15, 2022
2 min

Card Sorting vs Tree Testing: what's the best?

A great information architecture (IA) is essential for a great user experience (UX). And testing your website or app’s information architecture is necessary to get it right.

Card sorting and tree testing are the very best UX research methods for exactly this. But the big question is always: which one should you use, and when? Very possibly you need both. Let’s find out with this quick summary.

What is card sorting and tree testing? 🧐

Card sorting is used to test the information architecture of a website or app. Participants group individual labels (cards) into different categories according to  criteria that makes best sense to them. Each label represents an item that needs to be categorized. The results provide deep insights to guide decisions needed to create an intuitive navigation, comprehensive labeling and content that is organized in a user-friendly way.

Tree testing is also used to test the information architecture of a website or app. When using tree testing participants are presented with a site structure and a set of tasks they need to complete. The goal for participants is to find their way through the site and complete their task. The test shows whether the structure of your website corresponds to what users expect and how easily (or not) they can navigate and complete their tasks.

What are the differences? 🂱 👉🌴

Card sorting is a UX research method which helps to gather insights about your content categorization. It focuses on creating an information architecture that responds intuitively to the users’ expectations. Things like which items go best together, the best options for labeling, what categories users expect to find on each menu.

Doing a simple card sort can give you all those pieces of information and so much more. You start understanding your user’s thoughts and expectations. Gathering enough insights and information to enable you to develop several information architecture options.

Tree testing is a UX research method that is almost a card sort in reverse. Tree testing is used to evaluate an information architecture structure and simply allows you to see what works and what doesn’t. 

Using tree testing will provide insights around whether your information architecture is intuitive to navigate, the labels easy to follow and ultimately if your items are categorized in a place that makes sense. Conversely it will also show where your users get lost and how.

What method should you use? 🤷

You’ve got this far and fine-tuning your information architecture should be a priority. An intuitive IA is an integral component of a user-friendly product. Creating a product that is usable and an experience users will come back for.

If you are still wondering which method you should use - tree testing or card sorting. The answer is pretty simple - use both.

Just like many great things, these methods work best together. They complement each other, allowing you to get much deeper insights and a rounded view of how your IA performs and where to make improvements than when used separately. We cover more reasons why card sorting loves tree testing in our article which dives deeper into why to use both.

Ok, I'm using both, but which comes first? 🐓🥚

Wanting full, rounded insights into your information architecture is great. And we know that tree testing and card sorting work well together. But is there an order you should do the testing in? It really depends on the particular context of your research - what you’re trying to achieve and your situation. 

Tree testing is a great tool to use when you have a product that is already up and running. By running a tree test first you can quickly establish where there may be issues, or snags. Places where users get caught and need help. From there you can try and solve potential issues by moving on to a card sort. 

Card sorting is a super useful method that can be instigated at any stage of the design process, from planning to development and beyond.  As long as there is an IA structure that can be tested again. Testing against an already existing website navigation can be informative. Or testing a reorganization of items (new or existing) can ensure the organization can align with what users expect.

However, when you decide to implement both of the methods in your research, where possible, tree testing should come before card sorting. If you want a little more on the issue have a read of our article here.

Check out our OptimalSort and Treejack tools - we can help you with your research and the best way forward. Wherever you might be in the process.

Share this article
Author
Optimal
Workshop

Related articles

View all blog articles
Learn more
1 min read

"Could I A/B test two content structures with tree testing?!"

"Dear Optimal Worshop
I have two huge content structures I would like to A/B test. Do you think Treejack would be appropriate?"
— Mike

Hi Mike (and excellent question)!

Firstly, yes, Treejack is great for testing more than one content structure. It’s easy to run two separate Treejack studies — even more than two. It’ll help you decide which structure you and your team should run with, and it won’t take you long to set them up.

When you’re creating the two tree tests with your two different content structures, include the same tasks in both tests. Using the same tasks will give an accurate measure of which structure performs best. I’ve done it before and I found that the visual presentation of the results — especially the detailed path analysis pietrees — made it really easy to compare Test A with Test B.

Plus (and this is a big plus), if you need to convince stakeholders or teammates of which structure is the most effective, you can’t go past quantitative data, especially when its presented clearly — it’s hard to argue with hard evidence!

Here’s two example of the kinds of results visualizations you could compare in your A/B test: the pietree, which shows correct and incorrect paths, and where people ended up:

treejack pietree

And the overall Task result, which breaks down success and directness scores, and has plenty of information worth comparing between two tests:

treejack task result

Keep in mind that running an A/B tree test will affect how you recruit participants — it may not be the best idea to have the same participants complete both tests in one go. But it’s an easy fix — you could either recruit two different groups from the same demographic, or test one group and have a gap (of at least a day) between the two tests.

I’ve one more quick question: why are your two content structures ‘huge’?

I understand that sometimes these things are unavoidable — you potentially work for a government organization, or a university, and you have to include all of the things. But if not, and if you haven’t already, you could run an open card sort to come up with another structure to test (think of it as an A/B/C test!), and to confirm that the categories you’re proposing work for people.

You could even run a closed card sort to establish which content is more important to people than others (your categories could go from ‘Very important’ to ‘Unimportant’, or ‘Use everyday’ to ‘Never use’, for example). You might be able to make your content structure a bit smaller, and still keep its usefulness. Just a thought... and of course, you could try to get this information from your analytics (if available) but just be cautious of this because of course analytics can only tell you what people did and not what they wanted to do.

All the best Mike!

Learn more
1 min read

First Click Testing Data: Correct First Click Lead to 3X Higher Task Success

In 2009, Bob Bailey and Cari Wolfson published published findings that changed how we approach first click testing and usability testing. They analyzed 12 scenario-based user tests and found that if someone gets their first click right, they're about twice as likely to complete their task successfully. This finding was so compelling that we built First Click Testing (formerly Chalkmark) specifically to help teams test this.  But we'd never actually validated their research using our own data, until now.

Turns out, we're sitting on one of the world's largest databases of tree testing results. So we analyzed millions of task responses to see if the "first click predicts success" hypothesis holds up.

It does. Convincingly.

Users who get their first click correct are nearly three times more likely to complete their task successfully (70% vs 24% success rate).

Here's how we validated the original study, what our data shows, and why first clicks matter more than you might think.

Original first click testing study: 87% task success rate

Bob and Cari analyzed data from twelve usability studies on websites and products with varying amounts and types of content, a range of subject matter complexity, and distinct user interfaces. They found that people were about twice as likely to complete a task successfully if they got their first click right, than if they got it wrong:

If the first click was correct, the chances of getting the entire scenario correct was 87% if the first click was incorrect, the chances of eventually getting the scenario correct was only 46%.

Our Tree Testing data: First clicks predict 70% task success rate

We analyzed millions of tree testing responses in our database. We've found that people who get the first click correct are almost three times as likely to complete a task successfully:

If the first click was correct, the chances of getting the entire scenario correct was 70% if the first click was incorrect, the chances of eventually getting the scenario correct was 24%

To give you another perspective on the same data, here's the inverse:

If the first click was correct, the chances of getting the entire scenario incorrect was 30% if the first click was incorrect, the chances of getting the whole scenario incorrect was 76%

How Tree Testing measures first click success and task completion

Bob and Cari proved the usefulness of the methodology by linking two key metrics in scenario-based usability studies: first clicks and task success. First Click Testing doesn't measure task success — it's up to the researcher to determine as they're setting up the study what constitutes 'success', and then to interpret the results accordingly. Tree Testing (formerly Treejack) does measure task success — and first clicks.

In a tree test, participants are asked to complete a task by clicking though a text-only version of a website hierarchy, and then clicking 'I'd find it here' when they've chosen an answer. Each task in a tree test has a pre-determined correct answer — as was the case in Bob and Cari's usability studies — and every click is recorded, so we can see participant paths in detail.

Thus, every single time a person completes an individual tree testing task, we record both their first click and whether they are successful or not. When we came to test the 'correct first click leads to task success' hypothesis, we could therefore mine data from millions of task.

To illustrate this, have a look at the results for one task. The overall Task result, you see a score for success and directness, and a breakdown of whether each Success, Fail, or Skip was direct (they went straight to an answer), or indirect (they went back up the tree before they selected an answer):

Tree testing task results showing success and directness scores

In the pie tree for the same task, you can look in more detail at how many people went the wrong way from a label (each label representing one page of your website):

Pie tree visualization showing first click paths in tree testing

In the First Click tab, you get a percentage breakdown of which label people clicked first to complete the task:

First click data breakdown by label in tree testing

And in the Paths tab, you can view individual participant paths in detail (including first clicks), and can filter the table by direct and indirect success, fails, and skips (this table is only displaying direct success and direct fail paths):

Participant path analysis showing direct success and fail rates

How to run first click tests: Best practices for usability testing

First click analysis is one of the most predictive metrics in usability testing. Whether you're testing wireframes, landing pages, or information architecture, measuring first click success gives you early insight into whether your design will work.

This analysis reinforces something we already knew: first clicks matterIt is worth your time to get that first impression right. You have plenty of options for measuring the link between first clicks and task success in your scenario-based usability tests. From simply noting where your participants go during observations, to gathering quantitative first click data via online tools, you'll win either way. And if you want quantitative first click data, Optimal has you covered. First Click Testing works for wireframes and landing pages, while Tree Testing validates your information architecture.

To finish, here are a few invaluable insights from other researchers on getting the most from first click testing:

About this study

This analysis was conducted in 2015 using millions of task responses from Optimal’s First Click and Tree Testing tools. While the dataset predates recent UI trends, the underlying behavioral principle, that a correct first click strongly predicts task success, remains consistent with modern usability research.

Learn more
1 min read

The powerful analysis features in our card sorting tool

You’ve just finished running your card sort. The study has closed and the data is waiting to be analyzed. It’s time to take a look at the analysis side of card sorting, specifically in our tool OptimalSort. Let’s get started.

A note on analysis 📌

When it comes to analysis, there are essentially two types. There’s exploratory analysis (when you look through data to get impressions, pull out useful ideas and be creative) and statistical analysis (which really just comes down to the numbers). These two types of analysis also go by qualitative and quantitative, respectively.

You’re able to get fantastic insights from both forms.

“Remember that you are the one who is doing the thinking, not the technique… you are the one who puts it all together into a great solution. Follow your instincts, take some risks, and try new approaches.” Donna Spencer, Maadmob.

Getting started with analysis 🏁

Whenever you wrap up a study using our card sorting tool, you’ll want to kick off your analysis by heading to the Results Overview section. It’s here that you’ll be able to see how many people actually took part in the study, the average time taken and general statistics about the study itself.

This is useful data to include in presentations to interested stakeholders, just to give them a more holistic view of your research.

Digging into your participant data ⛏

With the Results Overview section out of the way, you can make your way over to the Participants Table. This is where you can find information about the individual people who took part in your card sort. You can also start to filter your data here.

Here are just a few of the different actions that you can take:

  • Review your participants, and include or exclude certain individuals based on their card sorts. This is a useful tool if you want to use your data in different ways.
  • Segment and reload your results. This function can allow you to view data from individuals or groups of your choosing.
  • Add additional card sorts. If you also decided to run manual (in-person) card sorts using printed cards, you can add this data here.

Analysing open and hybrid card sort data 🕵️♂

The Categories tab is the best place to go for open and hybrid card sort results. Take some time to scan the categories people came up with and you’ll be able to quickly build up a good understanding of their ‘mental models’, or how they perceived the theme of your cards.

Consider how different the categories might look for cards containing food items, for example. Some participants might create categories reflecting supermarket aisles, while others might create categories reflecting food groups.

A good place to get started here is by refining your data. Standardize any categories that have similar labels (whether that’s wording, spelling or capitalizations etc). Hybrid card sorts have some set categories, and these will already be standardized.

Note: Before you start throwing categories with similar labels together, take a closer look to see if people had the same conceptual approach. Here’s an example from our card sorting 101 guide:

Of the 15 groups with the word ‘Animal’ in the label, 13 had a similar set of cards, but two participants had labeled their categories slightly differently (Animals and Environment’ and ‘Animals and Nature’) and had thus included extra cards the others didn’t have (‘Glaciers melting faster than previously thought’, for example).

Reviewing the Similarity Matrix 🤔

One really useful tool for understanding how your participants think is the Similarity Matrix. This view shows you the percentage of people who grouped 2 cards together.

The most closely related pairings are clustered along the right edge. Higher agreement between participants on which cards go together equates to darker and larger clusters.

There are a few different ways to use the insights from the Similarity Matrix:

  • Put together a draft website structure based on the clusters you see on the right.
  • Identify which card pairings are most common (and as a result should probably go together on your website).
  • Identify which card pairings are least common so you don’t need to waste time considering how they might work on your website.

Spotting popular card groupings 🔍

Dendrograms are a tool to enable you to spot popular groups of cards, as well to get a general feel of how similar or different your participants’ card sorts were to each other.

There are two dendrograms to explore:

  • More than 30 card sort participants: The Actual Agreement Method (AAM) dendrogram gives you the data straight: “X% of participants agree with this exact grouping”.
  • Fewer than 30 card sort participants: The Best Merge Method (BMM) tells you “X% of participants agree with parts of this grouping”, and so enables you to extract as much as you can from the data.

Looking for alternative approaches 👀

The Participant-Centric Analysis (PCA) view can be useful when you have a lot of results. It’s quite simple. Basically, it aims to find the most popular grouping strategy, and then find two more popular alternatives among participants who agreed with the first strategy.

This approach is called Participant-Centric Analysis because every response (from every participant) is treated as a potential solution, and then ranked for similarity with other responses. What this is telling you is that if you see a card sort with a 11/43 agreement score, this means 10 other participants sorted their cards into groups similar to these ones. 

Taking the next step: Run a card sort and try analysis for yourself 🃏

Now that we’ve taken a bit of a deep dive into the analysis side of card sorting in OptimalSort, it’s time to take the tool for a spin and start generating your own data.

Getting started is easy. If you haven’t already, simply sign up for a free account (you don’t need a credit card) and start a card sort. You can also practice by creating a card sort and sending it out to your coworkers, friends or family. Once you start to see results trickling in, you can start to make sense of the data.

For more information, check out the card sorting 101 guide that we’ve put together, or our introduction to card sorting on the Optimal Workshop Blog.

Happy testing! 

Seeing is believing

Explore our tools and see how Optimal makes gathering insights simple, powerful, and impactful.