December 4, 2018

How to interpret your card sort results Part 1: open and hybrid card sorts

Header graphic for the article 'How to interpret your card sort results Part 1: open...'

Cards have been created, sorted and sorted again. The participants are all finished and you’re left with a big pile of awesome data that will help you improve the user experience of your information architecture. Now what?Whether you’ve run an open, hybrid or closed card sort online using an information architecture tool or you’ve run an in person (moderated) card sort, it can be a bit daunting trying to figure out where to start the card sort analysis process.

About this guide

This two-part guide will help you on your way! For Part 1, we’re going to look at how to interpret and analyze the results from open and hybrid card sorts.

  • In open card sorts, participants sort cards into categories that make sense to them and they give each category a name of their own making.
  • In hybrid card sorts, some of the categories have already been defined for participants to sort the cards into but they also have the ability to create their own.

Open and hybrid card sorts are great for generating ideas for category names and labels and understanding not only how your users expect your content to be grouped but also what they expect those groups to be called.In both parts of this series, I’m going to be talking a lot about interpreting your results using Optimal Workshop’s online card sorting tool, OptimalSort, but most of what I’m going to share is also applicable if you’re analyzing your data using a spreadsheet or using another tool.

Understanding the two types of analysis: exploratory and statistical

Similar to qualitative and quantitative methods, exploratory and statistical analysis in card sorting are two complementary approaches that work together to provide a detailed picture of your results.

  • Exploratory analysis is intuitive and creative. It’s all about going through the data and shaking it to see what ideas, patterns and insights fall out. This approach works best when you don’t have the numbers (smaller sample sizes) and when you need to dig into the details and understand the ‘why’ behind the statistics.

  • Statistical analysis is all about the numbers. Hard data that tells you exactly how many people expected X to be grouped with Y and more and is very useful when you’re dealing with large sample sizes and when identifying similarities and differences across different groups of people.

Depending on your objectives - whether you are starting from scratch or redesigning an existing IA - you’ll generally need to use some combination of both of these approaches when analyzing card sort results. Learn more about exploratory and statistical analysis in Donna Spencer’s book.

Start with the big picture

When analyzing card sort results, start by taking an overall look at the results as a whole. Quickly cast your eye over each individual card sort and just take it all in. Look for common patterns in how the cards have been sorted and the category names given by participants. Does anything jump out as surprising? Are there similarities or differences between participant sorts? If you’re redesigning an existing IA, how do your results compare to the current state?If you ran your card sort using OptimalSort, your first port of call will be the Overview and Participants Table presented in the results section of the tool.If you ran a moderated card sort using OptimalSort’s printed cards, now is a good time to double check you got them all. And if you didn’t know about this handy feature of OptimalSort, it’s something to keep in mind for next time!The Participants Table shows a breakdown of your card sorting data by individual participant. Start by reviewing each individual card sort one by one by clicking on the arrow in the far left column next to the Participants numbers.

A screenshot of the individual participant card sort results pop-up in OptimalSort.
Viewing individual participant card sorts in detail.

From here you can easily flick back and forth between participants without needing to close that modal window. Don’t spend too much time on this — you’re just trying to get a general impression of what happened.Keep an eye out for any card sorts that you might like to exclude from the results. For example participants who have lumped everything into one group and haven’t actually sorted the cards. Don’t worry - excluding or including participants isn’t permanent and can be toggled on or off at anytime.If you have a good number of responses, then the Participant Centric Analysis (PCA) tab (below) can be a good place to head next. It’s great for doing a quick comparison of the different high-level approaches participants took when grouping the cards.The PCA tab provides the most insight when you have lots of results data (30+ completed card sorts) and at least one of the suggested IAs has a high level of agreement among your participants (50% or more agree with at least one IA).

A screenshot of the Participant Centric Analysis (PCA) tab in OptimalSort, showing an example study.
Participant Centric Analysis (PCA) tab for an open or hybrid card sort in OptimalSort.

The PCA tab compares data from individual participants and surfaces the top three ways the cards were sorted. It also gives you some suggestions based on participant responses around what these categories could be called but try not to get too bogged down in those - you’re still just trying to gain an overall feel for the results at this stage.Now is also a good time to take a super quick peek at the Categories tab as it will also help you spot patterns and identify data that you’d like to dive deeper into a bit later on!Another really useful visualization tool offered by OptimalSort that will help you build that early, high-level picture of your results is the Similarity Matrix. This diagram helps you spot data clusters, or groups of cards that have been more frequently paired together by your participants, by surfacing them along the edge and shading them in dark blue. It also shows the proportion of times specific card pairings occurred during your study and displays the exact number on hover (below).

A screenshot of the Similarity Matrix tab in OptimalSort, with the results from an example study displaying.
OptimalSort’s Similarity Matrix showing that ‘Flat sandals’ and ‘Court shoes’ were paired by 91% of participants (31 times) in this example study.

In the above screenshot example we can see three very clear clusters along the edge: ‘Ankle Boots’ to ‘Slippers’ is one cluster, ‘Socks’ to ‘Stockings & Hold Ups’ is the next and then we have ‘Scarves’ to ‘Sunglasses’. These clusters make it easy to spot the that cards that participants felt belonged together and also provides hard data around how many times that happened.Next up are the dendrograms. Dendrograms are also great for gaining an overall sense of how similar (or different) your participants’ card sorts were to each other. Found under the Dendrogram tab in the results section of the tool, the two dendrograms are generated by different algorithms and which one you use depends largely on how many participants you have.

If your study resulted in 30 or more completed card sorts, use the Actual Agreement Method (AAM) dendrogram and if your study had fewer than 30 completed card sorts, use the Best Merge Method (BMM) dendrogram.The AAM dendrogram (see below) shows only factual relationships between the cards and displays scores that precisely tell you that ‘X% of participants in this study agree with this exact grouping’.In the below example, the study shown had 34 completed card sorts and the AAM dendrogram shows that 77% of participants agreed that the cards highlighted in green belong together and a suggested name for that group is ‘Bling’. The tooltip surfaces one of the possible category names for this group and as demonstrated here it isn’t always the best or ‘recommended’ one. Take it with a grain of salt and be sure to thoroughly check the rest of your results before committing!

A screenshot of the Actual Agreement Method (AAM) dendrogram in OptimalSort.
AAM Dendrogram in OptimalSort.

The BMM dendrogram (see below) is different to the AAM because it shows the percentage of participants that agree with parts of the grouping - it squeezes the data from smaller sample sizes and makes assumptions about larger clusters based on patterns in relationships between individual pairs.The AAM works best with larger sample sizes because it has more data to work with and doesn’t make assumptions while the BMM is more forgiving and seeks to fill in the gaps.The below screenshot was taken from an example study that had 7 completed card sorts and its BMM dendrogram shows that 50% of participants agreed that the cards highlighted in green down the left hand side belong to ‘Accessories, Bottoms, Tops’.

A screenshot of the Best Merge Method (BMM) dendrogram in OptimalSort.
BMM Dendrogram in OptimalSort.

Drill down and cross-reference

Once you’ve gained a high level impression of the results, it’s time to dig deeper and unearth some solid insights that you can share with your stakeholders and back up your design decisions.Explore your open and hybrid card sort data in more detail by taking a closer look at the Categories tab. Open up each category and cross-reference to see if people were thinking along the same lines.Multiple participants may have created the same category label, but what lies beneath could be a very different story. It’s important to be thorough here because the next step is to start standardizing or chunking individual participant categories together to help you make sense of your results.In open and hybrid sorts, participants will be able to label their categories themselves. This means that you may identify a few categories with very similar labels or perhaps spelling errors or different formats. You can standardize your categories by merging similar categories together to turn them into one.OptimalSort makes this really easy to do - you pretty much just tick the boxes alongside each category name and then hit the ‘Standardize’ button up the top (see below). Don’t worry if you make a mistake or want to include or exclude groupings; you can unstandardize any of your categories anytime.

A screenshot of the categories tab in OptimalSort, showing how categorization works.
Standardizing categories in OptimalSort.

Once you’ve standardized a few categories, you’ll notice that the Agreement number may change. It tells you how many participants agreed with that grouping. An agreement number of 1.0 is equal to 100% meaning everyone agrees with everything in your newly standardized category while 0.6 means that 60% of your participants agree.Another number to watch for here is the number of participants who sorted a particular card into a category which will appear in the frequency column in dark blue in the right-hand column of the middle section of the below image.

A screenshot of the categories tab after the creation of two groupings.
Categories table after groupings called ‘Accessories’ and ‘Bags’ have been standardized.

A screenshot of the Categories tab showing some of the groupings under 'Accessories'.
A closer look at the standardized category for ‘Accessories’.

From the above screenshot we can see that in this study, 18 of the 26 participant categories selected agree that ‘Cat Eye Sunglasses’ belongs under ‘Accessories’.Once you’ve standardized a few more categories you can head over to the Standardization Grid tab to review your data in more detail. In the below image we can see that 18 participants in this study felt that ‘Backpacks’ belong in a category named ‘Bags’ while 5 grouped them under ‘Accessories’. Probably safe to say the backpacks should join the other bags in this case.

A screenshot of the Standardization grid tab in OptimalSort.
Standardization Grid in OptimalSort.

So that’s a quick overview of how to interpret the results from your open or hybrid card sorts.Here's a link to Part 2 of this series where we talk about interpreting results from closed card sorts as well as next steps for applying these juicy insights to your IA design process.

Further reading

Share this article
Author
Optimal
Workshop

Related articles

View all blog articles
Learn more
1 min read

Online card sorting: The comprehensive guide

When it comes to designing and testing in the world of information architecture, it’s hard to beat card sorting. As a usability testing method, card sorting is easy to set up, simple to recruit for and can supply you with a range of useful insights. But there’s a long-standing debate in the world of card sorting, and that’s whether it’s better to run card sorts in person (moderated) or remotely over the internet (unmoderated).

This article should give you some insight into the world of online card sorting. We've included an analysis of the benefits (and the downsides) as well as why people use this approach. Let's take a look!

How an online card sort works

Running a card sort remotely has quickly become a popular option just because of how time-intensive in-person card sorting is. Instead of needing to bring your participants in for dedicated card sorting sessions, you can simply set up your card sort using an online tool (like our very own OptimalSort) and then wait for the results to roll in.

So what’s involved in a typical online card sort? At a very high level, here’s what’s required. We’re going to assume you’re already set up with an online card sorting tool at this point.

  1. Define the cards: Depending on what you’re testing, add the items (cards) to your study. If you were testing the navigation menu of a hotel website, your cards might be things like “Home”, “Book a room”, “Our facilities” and “Contact us”.
  2. Work out whether to run a closed or open sort: Determine whether you’ll set the groups for participants to sort cards into (closed) or leave it up to them (open). You may also opt for a mix, where you create some categories but leave the option open for participants to create their own.
  3. Recruit your participants: Whether using a participant recruitment service or by recruiting through your own channels, send out invites to your online card sort.
  4. Wait for the data: Once you’ve sent out your invites, all that’s left to do is wait for the data to come in and then analyze the results.

That’s online card sorting in a nutshell – not entirely different from running a card sort in person. If you’re interested in learning about how to interpret your card sorting results, we’ve put together this article on open and hybrid card sorts and this one on closed card sorts.

Why is online card sorting so popular?

Online card sorting has a few distinct advantages over in-person card sorting that help to make it a popular option among information architects and user researchers. There are downsides too (as there are with any remote usability testing option), but we’ll get to those in a moment.

Where remote (unmoderated) card sorting excels:

  • Time savings: Online card sorting is essentially ‘set and forget’, meaning you can set up the study, send out invites to your participants and then sit back and wait for the results to come in. In-person card sorting requires you to moderate each session and collate the data at the end.
  • Easier for participants: It’s not often that researchers are on the other side of the table, but it’s important to consider the participant’s viewpoint. It’s much easier for someone to spend 15 minutes completing your online card sort in their own time instead of trekking across town to your office for an exercise that could take well over an hour.
  • Cheaper: In a similar vein, online card sorting is much cheaper than in-person testing. While it’s true that you may still need to recruit participants, you won’t need to reimburse people for travel expenses.
  • Analytics: Last but certainly not least, online card sorting tools (like OptimalSort) can take much of the analytical burden off you by transforming your data into actionable insights. Other tools will differ, but OptimalSort can generate a similarity matrix, dendrograms and a participant-centric analysis using your study data.

Where in-person (moderated) card sorting excels:

  • Qualitative insights: For all intents and purposes, online card sorting is the most effective way to run a card sort. It’s cheaper, faster and easier for you. But, there’s one area where in-person card sorting excels, and that’s qualitative feedback. When you’re sitting directly across the table from your participant you’re far more likely to learn about the why as well as the what. You can ask participants directly why they grouped certain cards together.

Online card sorting: Participant numbers

So that’s online card sorting in a nutshell, as well as some of the reasons why you should actually use this method. But what about participant numbers? Well, there’s no one right answer, but the general rule is that you need more people than you’d typically bring in for a usability test.

This all comes down to the fact that card sorting is what’s known as a generative method, whereas usability testing is an evaluation method. Here’s a little breakdown of what we mean by these terms:

Generative method: There’s no design, and you need to get a sense of how people think about the problem you’re trying to solve. For example, how people would arrange the items that need to go into your website’s navigation. As Nielsen Norman Group explains: “There is great variability in different people's mental models and in the vocabulary they use to describe the same concepts. We must collect data from a fair number of users before we can achieve a stable picture of the users' preferred structure and determine how to accommodate differences among users”.

Evaluation method: There’s already a design, and you basically need to work out whether it’s a good fit for your users. Any major problems are likely to crop up even after testing 5 or so users. For example, you have a wireframe of your website and need to identify any major usability issues.

Basically, because you’ll typically be using card sorting to generate a new design or structure from nothing, you need to sample a larger number of people. If you were testing an existing website structure, you could get by with a smaller group.

Where to from here?

Following on from our discussion of generative versus evaluation methods, you’ve really got a choice of 2 paths from here if you’re in the midst of a project. For those developing new structures, the best course of action is likely to be a card sort. However, if you’ve got an existing structure that you need to test in order to usability problems and possible areas of improvement, you’re likely best to run a tree test. We’ve got some useful information on getting started with a tree test right here on the blog.

Header graphic for the article 'How to Spot and Destroy Evil Attractors in Your Tree...'
Learn more
1 min read

How to Spot and Destroy Evil Attractors in Your Tree (Part 1)

Usability guru Jared Spool has written extensively about the 'scent of information'. This term describes how users are always 'on the hunt' through a site, click by click, to find the content they’re looking for. Tree testing helps you deliver a strong scent by improving organisation (how you group your headings and subheadings) and labelling (what you call each of them).

Anyone who’s seen a spy film knows there are always false scents and red herrings to lead the hero astray. And anyone who’s run a few tree tests has probably seen the same thing — headings and labels that lure participants to the wrong answer. We call these 'evil attractors'.In Part 1 of this article, we’ll look at what evil attractors are, how to spot them at the answer end of your tree, and how to fix them. In Part 2, we’ll look at how to spot them in the higher levels of your tree.

The false scent — what it looks like in practice

One of my favourite examples of an evil attractor comes from a tree test we ran for consumer.org.nz, a New Zealand consumer-review website (similar to Consumer Reports in the USA). Their site listed a wide range of consumer products in a tree several levels deep, and they wanted to try out a few ideas to make things easier to find as the site grew bigger.We ran the tests and got some useful answers, but we also noticed there was one particular subheading (Home > Appliances > Personal) that got clicks from participants looking for very different things — mobile phones, vacuum cleaners, home-theatre systems, and so on:

pic1

The website intended the Personal appliance category to be for products like electric shavers and curling irons. But apparently, Personal meant many things to our participants: they also went there for 'personal' items like mobile phones and cordless drills that actually lived somewhere else.This is the false scent — the heading that attracts clicks when it shouldn’t, leading participants astray. Hence this definition: an evil attractor is a heading that draws unwanted traffic across several unrelated tasks.

Evil attractors lead your users astray

Attracting clicks isn’t a bad thing in itself. After all, that’s what a good heading does — it attracts clicks for the content it contains (and discourages clicks for everything else). Evil attractors, on the other hand, attract clicks for things they shouldn’t. These attractors lure users down the wrong path, and when users find themselves in the wrong place they'll either back up and try elsewhere (if they’re patient) or give up (if they’re not). Because these attractor topics are magnets for the user’s attention, they make it less likely that your user will get to the place you intended. The other evil part of these attractors is the way they hide in the shadows. Most of the time, they don’t get the lion’s share of traffic for a given task. Instead, they’ll poach 5–10% of the responses, luring away a fraction of users who might otherwise have found the right answer.

Find evil attractors easily in your data

The easiest attractors to spot are those at the answer end of your tree (where participants ended up for each task). If we can look across tasks for similar wrong answers, then we can see which of these might be evil attractors.In your Treejack results, the Destinations tab lets you do just that. Here’s more of the consumer.org.nz example:

Pic2

Normally, when you look at this view, you’re looking down a column for big hits and misses for a specific task. To look for evil attractors, however, you’re looking for patterns across rows. In other words, you’re looking horizontally, not vertically. If we do that here, we immediately notice the row for Personal (highlighted yellow). See all those hits along the row? Those hits indicate an attractor — steady traffic across many tasks that seem to have little in common. But remember, traffic alone is not enough. We’re looking for unwanted traffic across unrelated tasks. Do we see that here? Well, it looks like the tasks (about cameras, drills, laptops, vacuums, and so on) are not that closely related. We wouldn’t expect users to go to the same topic for each of these. And the answer they chose, Personal, certainly doesn’t seem to be the destination we intended. While we could rationalise why they chose this answer, it is definitely unwanted from an IA perspective. So yes, in this case, we seem to have caught an evil attractor red-handed. Here’s a heading that’s getting steady traffic where it shouldn’t.

Evil attractors are usually the result of ambiguity

It’s usually quite simple to figure out why an item in your tree is an evil attractor. In almost all cases, it’s because the item is vague or ambiguous — a word or phrase that could mean different things to different people. Look at our example above. In the context of a consumer-review site, Personal is too general to be a good heading. It could mean products you wear, or carry, or use in the bathroom, or a number of things. So, when those participants come along clutching a task, and they see Personal, a few of them think 'That looks like it might be what I’m looking for', and they go that way.Individually, those choices may be defensible, but as an information architect, are you really going to group mobile phones with vacuum cleaners? The 'personal' link between them is tenuous at best.

Destroy evil attractors by being specific

Just as it’s easy to see why most attractors attract, it’s usually easy to fix them. Evil attractors trade in vagueness and ambiguity, so the obvious remedy is to make those headings more concrete and specific. In the consumer-site example, we looked at the actual content under the Personal heading. It turned out to be items like shavers, curling irons, and hair dryers. A quick discussion yielded Personal care as a promising replacement — one that should deter people looking for mobile phones and jewellery and the like.In the second round of tree testing, among the other changes we made to the tree, we replaced Personal with Personal Care. A few days later, the results confirmed our thinking. Our former evil attractor was no longer luring participants away from the correct answers:

Pic3

Testing once is good, testing twice is magic

This brings up a final point about tree testing (and about any kind of user testing, really): you need to iterate your testing —  once is not enough.The first round of testing shows you where your tree is doing well (yay!) and where it needs more work so you can make some thoughtful revisions. Be careful though. Even if the problems you found seem to have obvious solutions, you still need to make sure your revisions actually work for users, and don’t cause further problems. The good news is, it’s dead easy to run a second test, because it’s just a small revision of the first. You already have the tasks and all the other bits worked out, so it’s just a matter of making a copy in Treejack, pasting in your revised tree, and hooking up the correct answers. In an hour or two, you’re ready to pilot it again (to err is human, remember) and send it off to a fresh batch of participants.

Two possible outcomes await.

  • Your fixes are spot-on, the participants find the correct answers more frequently and easily, and your overall score climbs. You could have skipped this second test, but confirming that your changes worked is both good practice and a good feeling. It’s also something concrete to show your boss.
  • Some of your fixes didn’t work, or (given the tangled nature of IA work) they worked for the problems you saw in Round 1, but now they’ve caused more problems of their own. Bad news, for sure. But better that you uncover them now in the design phase (when it takes a few days to revise and re-test) instead of further down the track when the IA has been signed off and changes become painful.

Stay tuned for more on evil attractors

In Part 1, we’ve covered what evil attractors are and how to spot them at the answer end of your tree: that is, evil attractors that participants chose as their destination when performing tasks. Hopefully, a future version of Treejack will be able to highlight these attractors to make your analysis that much easier.

In Part 2, we’ll look at how to spot evil attractors in the intermediate levels of your tree, where they lure participants into a section of the site that you didn’t intend. These are harder to spot, but we’ll see if we can ferret them out.Let us know if you've caught any evil attractors red-handed in your projects.

Header graphic for the article '"Could I A/B test two content structures with tree testing?!"'
Learn more
1 min read

"Could I A/B test two content structures with tree testing?!"

"Dear Optimal Worshop
I have two huge content structures I would like to A/B test. Do you think Treejack would be appropriate?"
— Mike

Hi Mike (and excellent question)!

Firstly, yes, Treejack is great for testing more than one content structure. It’s easy to run two separate Treejack studies — even more than two. It’ll help you decide which structure you and your team should run with, and it won’t take you long to set them up.

When you’re creating the two tree tests with your two different content structures, include the same tasks in both tests. Using the same tasks will give an accurate measure of which structure performs best. I’ve done it before and I found that the visual presentation of the results — especially the detailed path analysis pietrees — made it really easy to compare Test A with Test B.

Plus (and this is a big plus), if you need to convince stakeholders or teammates of which structure is the most effective, you can’t go past quantitative data, especially when its presented clearly — it’s hard to argue with hard evidence!

Here’s two example of the kinds of results visualizations you could compare in your A/B test: the pietree, which shows correct and incorrect paths, and where people ended up:

treejack pietree

And the overall Task result, which breaks down success and directness scores, and has plenty of information worth comparing between two tests:

treejack task result

Keep in mind that running an A/B tree test will affect how you recruit participants — it may not be the best idea to have the same participants complete both tests in one go. But it’s an easy fix — you could either recruit two different groups from the same demographic, or test one group and have a gap (of at least a day) between the two tests.

I’ve one more quick question: why are your two content structures ‘huge’?

I understand that sometimes these things are unavoidable — you potentially work for a government organization, or a university, and you have to include all of the things. But if not, and if you haven’t already, you could run an open card sort to come up with another structure to test (think of it as an A/B/C test!), and to confirm that the categories you’re proposing work for people.

You could even run a closed card sort to establish which content is more important to people than others (your categories could go from ‘Very important’ to ‘Unimportant’, or ‘Use everyday’ to ‘Never use’, for example). You might be able to make your content structure a bit smaller, and still keep its usefulness. Just a thought... and of course, you could try to get this information from your analytics (if available) but just be cautious of this because of course analytics can only tell you what people did and not what they wanted to do.

All the best Mike!

Seeing is believing

Explore our tools and see how Optimal makes gathering insights simple, powerful, and impactful.