There’s no doubt usability is a key element of all great user experiences, how do we apply and test usability principles for a website? This article looks at usability principles in web design, how to test it, practical tips for success and a look at our remote testing tool, Treejack.
A definition of usability for websites 🧐📖
Web usability is defined as the extent to which a website can be used to achieve a specific task or goal by a user. It refers to the quality of the user experience and can be broken down into five key usability principles:
Ease of use: How easy is the website to use? How easily are users able to complete their goals and tasks? How much effort is required from the user?
Learnability: How easily are users able to complete their goals and tasks the first time they use the website?
Efficiency: How quickly can users perform tasks while using your website?
User satisfaction: How satisfied are users with the experience the website provides? Is the experience a pleasant one?
Impact of errors: Are users making errors when using the website and if so, how serious are the consequences of those errors? Is the design forgiving enough make it easy for errors to be corrected?
Why is web usability important? 👀
Aside from the obvious desire to improve the experience for the people who use our websites, web usability is crucial to your website’s survival. If your website is difficult to use, people will simply go somewhere else. In the cases where users do not have the option to go somewhere else, for example government services, poor web usability can lead to serious issues. How do we know if our website is well-designed? We test it with users.
Testing usability: What are the common methods? 🖊️📖✏️📚
There are many ways to evaluate web usability and here are the common methods:
Moderated usability testing: Moderated usability testing refers to testing that is conducted in-person with a participant. You might do this in a specialised usability testing lab or perhaps in the user’s contextual environment such as their home or place of business. This method allows you to test just about anything from a low fidelity paper prototype all the way up to an interactive high fidelity prototype that closely resembles the end product.
Moderated remote usability testing: Moderated remote usability testing is very similar to the previous method but with one key difference- the facilitator and the participant/s are not in the same location. The session is still a moderated two-way conversation just over skype or via a webinar platform instead of in person. This method is particularly useful if you are short on time or unable to travel to where your users are located, e.g. overseas.
Unmoderated remote usability testing: As the name suggests, unmoderated remote usability testing is conducted without a facilitator present. This is usually done online and provides the flexibility for your participants to complete the activity at a time that suits them. There are several remote testing tools available ( including our suite of tools ) and once a study is launched these tools take care of themselves collating the results for you and surfacing key findings using powerful visual aids.
Guerilla testing: Guerilla testing is a powerful, quick and low cost way of obtaining user feedback on the usability of your website. Usually conducted in public spaces with large amounts of foot traffic, guerilla testing gets its name from its ‘in the wild’ nature. It is a scaled back usability testing method that usually only involves a few minutes for each test but allows you to reach large amounts of people and has very few costs associated with it.
Heuristic evaluation: A heuristic evaluation is conducted by usability experts to assess a website against recognized usability standards and rules of thumb (heuristics). This method evaluates usability without involving the user and works best when done in conjunction with other usability testing methods eg Moderated usability testing to ensure the voice of the user is heard during the design process.
Tree testing: Also known as a reverse card sort, tree testing is used to evaluate the findability of information on a website. This method allows you to work backwards through your information architecture and test that thinking against real world scenarios with users.
First click testing: Research has found that 87% of users who start out on the right path from the very first click will be able to successfully complete their task while less than half ( 46%) who start down the wrong path will succeed. First click testing is used to evaluate how well a website is supporting users and also provides insights into design elements that are being noticed and those that are being ignored.
Hallway testing: Hallway testing is a usability testing method used to gain insights from anyone nearby who is unfamiliar with your project. These might be your friends, family or the people who work in another department down the hall from you. Similar to guerilla testing but less ‘wild’. This method works best at picking up issues early in the design process before moving on to testing a more refined product with your intended audience.
Online usability testing tool: Tree testing 🌲🌳🌿
Tree testing is a remote usability testing tool that uses tree testing to help you discover exactly where your users are getting lost in the structure of your website. Treejack uses a simplified text-based version of your website structure removing distractions such as navigation and visual design allowing you to test the design from its most basic level.
Like any other tree test, it uses task based scenarios and includes the opportunity to ask participants pre and post study questions that can be used to gain further insights. Tree testing is a useful tool for testing those five key usability principles mentioned earlier with powerful inbuilt features that do most of the heavy lifting for you. Tree testing records and presents the following for each task:
complete details of the pathways followed by each participant
the time taken to complete each task
first click data
the directness of each result
visibility on when and where participants skipped a task
Participant paths data in our tree testing tool 🛣️
The level of detail recorded on the pathways followed by your participants makes it easy for you to determine the ease of use, learnability, efficiency and impact of errors of your website. The time taken to complete each task and the directness of each result also provide insights in relation to those four principles and user satisfaction can be measured through the results to your pre and post survey questions.
The first click data brings in the added benefits of first click testing and knowing when and where your participants gave up and moved on can help you identify any issues.Another thing tree testing does well is the way it brings all data for each task together into one comprehensive overview that tells you everything you need to know at a glance. Tree testing's task overview- all the key information in one placeIn addition to this, tree testing also generates comprehensive pathway maps called pietrees.
Each junction in the pathway is a piechart showing a statistical breakdown of participant activity at that point in the site structure including details about: how many were on the right track, how many were following the incorrect path and how many turned around and went back. These beautiful diagrams tell the story of your usability testing and are useful for communicating the results to your stakeholders.
Test early and often: Usability testing isn’t something that only happens at the end of the project. Start your testing as soon as possible and iterate your design based on findings. There are so many different ways to test an idea with users and you have the flexibility to scale it back to suit your needs.
Try testing with paper prototypes: Just like there are many usability testing methods, there are also several ways to present your designs to your participant during testing. Fully functioning high fidelity prototypes are amazing but they’re not always feasible (especially if you followed the previous tip of test early and often). Paper prototypes work well for usability testing because your participant can draw on them and their own ideas- they’re also more likely to feel comfortable providing feedback on work that is less resolved! You could also use paper prototypes to form the basis for collaborative design sessions with your users by showing them your idea and asking them to redesign or design the next page/screen.
Run a benchmarking round of testing: Test the current state of the design to understand how your users feel about it. This is especially useful if you are planning to redesign an existing product or service and will save you time in the problem identification stages.
Bring stakeholders and clients into the testing process: Hearing how a product or service is performing direct from a user can be quite a powerful experience for a stakeholder or client. If you are running your usability testing in a lab with an observation room, invite them to attend as observers and also include them in your post session debriefs. They’ll gain feedback straight from the source and you’ll gain an extra pair of eyes and ears in the observation room. If you’re not using a lab or doing a different type of testing, try to find ways to include them as observers in some way. Also, don’t forget to remind them that as observers they will need to stay silent for the entire session beyond introducing themselves so as not to influence the participant - unless you’ve allocated time for questions.
Make the most of available resources: Given all the usability testing options out there, there’s really no excuse for not testing a design with users. Whether it’s time, money, human resources or all of the above making it difficult for you, there’s always something you can do. Think creatively about ways to engage users in the process and consider combining elements of different methods or scaling down to something like hallway testing or guerilla testing. It is far better to have a less than perfect testing method than to not test at all.
Never analyse your findings alone: Always analyse your usability testing results as a team or with at least one other person. Making sense of the results can be quite a big task and it is easy to miss or forget key insights. Bring the team together and affinity diagram your observations and notes after each usability testing session to ensure everything is captured. You could also use Reframer to record your observations live during each session because it does most of the analysis work for you by surfacing common themes and patterns as they emerge. Your whole team can use it too saving you time.
Engage your stakeholders by presenting your findings in creative ways: No one reads thirty page reports anymore. Help your stakeholders and clients feel engaged and included in the process by delivering the usability testing results in an easily digestible format that has a lasting impact. You might create an A4 size one page summary, or maybe an A0 size wall poster to tell everyone in the office the story of your usability testing or you could create a short video with snippets taken from your usability testing sessions (with participant permission of course) to communicate your findings. Remember you’re also providing an experience for your clients and stakeholders so make sure your results are as usable as what you just tested.
I have two huge content structures I would like to A/B test. Do you think Treejack would be appropriate?"
— Mike
Hi Mike (and excellent question)!
Firstly, yes, Treejack is great for testing more than one content structure. It’s easy to run two separate Treejack studies — even more than two. It’ll help you decide which structure you and your team should run with, and it won’t take you long to set them up.
When you’re creating the two tree tests with your two different content structures, include the same tasks in both tests. Using the same tasks will give an accurate measure of which structure performs best. I’ve done it before and I found that the visual presentation of the results — especially the detailed path analysis pietrees — made it really easy to compare Test A with Test B.
Plus (and this is a big plus), if you need to convince stakeholders or teammates of which structure is the most effective, you can’t go past quantitative data, especially when its presented clearly — it’s hard to argue with hard evidence!
Here’s two example of the kinds of results visualizations you could compare in your A/B test: the pietree, which shows correct and incorrect paths, and where people ended up:
And the overall Task result, which breaks down success and directness scores, and has plenty of information worth comparing between two tests:
Keep in mind that running an A/B tree test will affect how you recruit participants — it may not be the best idea to have the same participants complete both tests in one go. But it’s an easy fix — you could either recruit two different groups from the same demographic, or test one group and have a gap (of at least a day) between the two tests.
I’ve one more quick question: why are your two content structures ‘huge’?
I understand that sometimes these things are unavoidable — you potentially work for a government organization, or a university, and you have to include all of the things. But if not, and if you haven’t already, you could run an open card sort to come up with another structure to test (think of it as an A/B/C test!), and to confirm that the categories you’re proposing work for people.
You could even run a closed card sort to establish which content is more important to people than others (your categories could go from ‘Very important’ to ‘Unimportant’, or ‘Use everyday’ to ‘Never use’, for example). You might be able to make your content structure a bit smaller, and still keep its usefulness. Just a thought... and of course, you could try to get this information from your analytics (if available) but just be cautious of this because of course analytics can only tell you what people did and not what they wanted to do.
Usability guru Jared Spool has written extensively about the 'scent of information'. This term describes how users are always 'on the hunt' through a site, click by click, to find the content they’re looking for. Tree testing helps you deliver a strong scent by improving organisation (how you group your headings and subheadings) and labelling (what you call each of them).
Anyone who’s seen a spy film knows there are always false scents and red herrings to lead the hero astray. And anyone who’s run a few tree tests has probably seen the same thing — headings and labels that lure participants to the wrong answer. We call these 'evil attractors'.In Part 1 of this article, we’ll look at what evil attractors are, how to spot them at the answer end of your tree, and how to fix them. In Part 2, we’ll look at how to spot them in the higher levels of your tree.
The false scent — what it looks like in practice
One of my favourite examples of an evil attractor comes from a tree test we ran for consumer.org.nz, a New Zealand consumer-review website (similar to Consumer Reports in the USA). Their site listed a wide range of consumer products in a tree several levels deep, and they wanted to try out a few ideas to make things easier to find as the site grew bigger.We ran the tests and got some useful answers, but we also noticed there was one particular subheading (Home > Appliances > Personal) that got clicks from participants looking for very different things — mobile phones, vacuum cleaners, home-theatre systems, and so on:
The website intended the Personal appliance category to be for products like electric shavers and curling irons. But apparently, Personal meant many things to our participants: they also went there for 'personal' items like mobile phones and cordless drills that actually lived somewhere else.This is the false scent — the heading that attracts clicks when it shouldn’t, leading participants astray. Hence this definition: an evil attractor is a heading that draws unwanted traffic across several unrelated tasks.
Evil attractors lead your users astray
Attracting clicks isn’t a bad thing in itself. After all, that’s what a good heading does — it attracts clicks for the content it contains (and discourages clicks for everything else). Evil attractors, on the other hand, attract clicks for things they shouldn’t. These attractors lure users down the wrong path, and when users find themselves in the wrong place they'll either back up and try elsewhere (if they’re patient) or give up (if they’re not). Because these attractor topics are magnets for the user’s attention, they make it less likely that your user will get to the place you intended. The other evil part of these attractors is the way they hide in the shadows. Most of the time, they don’t get the lion’s share of traffic for a given task. Instead, they’ll poach 5–10% of the responses, luring away a fraction of users who might otherwise have found the right answer.
Find evil attractors easily in your data
The easiest attractors to spot are those at the answer end of your tree (where participants ended up for each task). If we can look across tasks for similar wrong answers, then we can see which of these might be evil attractors.In your Treejack results, the Destinations tab lets you do just that. Here’s more of the consumer.org.nz example:
Normally, when you look at this view, you’re looking down a column for big hits and misses for a specific task. To look for evil attractors, however, you’re looking for patterns across rows. In other words, you’re looking horizontally, not vertically. If we do that here, we immediately notice the row for Personal (highlighted yellow). See all those hits along the row? Those hits indicate an attractor — steady traffic across many tasks that seem to have little in common. But remember, traffic alone is not enough. We’re looking for unwanted traffic across unrelated tasks. Do we see that here? Well, it looks like the tasks (about cameras, drills, laptops, vacuums, and so on) are not that closely related. We wouldn’t expect users to go to the same topic for each of these. And the answer they chose, Personal, certainly doesn’t seem to be the destination we intended. While we could rationalise why they chose this answer, it is definitely unwanted from an IA perspective. So yes, in this case, we seem to have caught an evil attractor red-handed. Here’s a heading that’s getting steady traffic where it shouldn’t.
Evil attractors are usually the result of ambiguity
It’s usually quite simple to figure out why an item in your tree is an evil attractor. In almost all cases, it’s because the item is vague or ambiguous — a word or phrase that could mean different things to different people. Look at our example above. In the context of a consumer-review site, Personal is too general to be a good heading. It could mean products you wear, or carry, or use in the bathroom, or a number of things. So, when those participants come along clutching a task, and they see Personal, a few of them think 'That looks like it might be what I’m looking for', and they go that way.Individually, those choices may be defensible, but as an information architect, are you really going to group mobile phones with vacuum cleaners? The 'personal' link between them is tenuous at best.
Destroy evil attractors by being specific
Just as it’s easy to see why most attractors attract, it’s usually easy to fix them. Evil attractors trade in vagueness and ambiguity, so the obvious remedy is to make those headings more concrete and specific. In the consumer-site example, we looked at the actual content under the Personal heading. It turned out to be items like shavers, curling irons, and hair dryers. A quick discussion yielded Personal care as a promising replacement — one that should deter people looking for mobile phones and jewellery and the like.In the second round of tree testing, among the other changes we made to the tree, we replaced Personal with Personal Care. A few days later, the results confirmed our thinking. Our former evil attractor was no longer luring participants away from the correct answers:
Testing once is good, testing twice is magic
This brings up a final point about tree testing (and about any kind of user testing, really): you need to iterate your testing — once is not enough.The first round of testing shows you where your tree is doing well (yay!) and where it needs more work so you can make some thoughtful revisions. Be careful though. Even if the problems you found seem to have obvious solutions, you still need to make sure your revisions actually work for users, and don’t cause further problems. The good news is, it’s dead easy to run a second test, because it’s just a small revision of the first. You already have the tasks and all the other bits worked out, so it’s just a matter of making a copy in Treejack, pasting in your revised tree, and hooking up the correct answers. In an hour or two, you’re ready to pilot it again (to err is human, remember) and send it off to a fresh batch of participants.
Two possible outcomes await.
Your fixes are spot-on, the participants find the correct answers more frequently and easily, and your overall score climbs. You could have skipped this second test, but confirming that your changes worked is both good practice and a good feeling. It’s also something concrete to show your boss.
Some of your fixes didn’t work, or (given the tangled nature of IA work) they worked for the problems you saw in Round 1, but now they’ve caused more problems of their own. Bad news, for sure. But better that you uncover them now in the design phase (when it takes a few days to revise and re-test) instead of further down the track when the IA has been signed off and changes become painful.
Stay tuned for more on evil attractors
In Part 1, we’ve covered what evil attractors are and how to spot them at the answer end of your tree: that is, evil attractors that participants chose as their destination when performing tasks. Hopefully, a future version of Treejack will be able to highlight these attractors to make your analysis that much easier.
In Part 2, we’ll look at how to spot evil attractors in the intermediate levels of your tree, where they lure participants into a section of the site that you didn’t intend. These are harder to spot, but we’ll see if we can ferret them out.Let us know if you've caught any evil attractors red-handed in your projects.
For those new to the qualitative research space, there’s one question that’s usually pretty tough to figure out, and that’s the question of how many participants to include in a study. Regardless of whether it’s research as part of the discovery phase for a new product, or perhaps an in-depth canvas of the users of an existing service, researchers can often find it difficult to agree on the numbers. So is there an easy answer? Let’s find out.
Here, we’ll look into the right number of participants for qualitative research studies. If you want to know about participants for quantitative research, read Nielsen Norman Group’s article.
Getting the numbers right
So you need to run a series of user interviews or usability tests and aren’t sure exactly how many people you should reach out to. It can be a tricky situation – especially for those without much experience. Do you test a small selection of 1 or 2 people to make the recruitment process easier? Or, do you go big and test with a series of 10 people over the course of a month? The answer lies somewhere in between.
It’s often a good idea (for qualitative research methods like interviews and usability tests) to start with 5 participants and then scale up by a further 5 based on how complicated the subject matter is. You may also find it helpful to add additional participants if you’re new to user research or you’re working in a new area.
What you’re actually looking for here is what’s known as saturation.
Understanding saturation
Whether it’s qualitative research as part of a master’s thesis or as research for a new online dating app, saturation is the best metric you can use to identify when you’ve hit the right number of participants.
In a nutshell, saturation is when you’ve reached the point where adding further participants doesn’t give you any further insights. It’s true that you may still pick up on the occasional interesting detail, but all of your big revelations and learnings have come and gone. A good measure is to sit down after each session with a participant and analyze the number of new insights you’ve noted down.
Interestingly, in a paper titled How Many Interviews Are Enough?, authors Greg Guest, Arwen Bunce and Laura Johnson noted that saturation usually occurs with around 12 participants in homogeneous groups (meaning people in the same role at an organization, for example). However, carrying out ethnographic research on a larger domain with a diverse set of participants will almost certainly require a larger sample.
Ensuring you’ve hit the right number of participants
How do you know when you’ve reached saturation point? You have to keep conducting interviews or usability tests until you’re no longer uncovering new insights or concepts.
While this may seem to run counter to the idea of just gathering as much data from as many people as possible, there’s a strong case for focusing on a smaller group of participants. In The logic of small samples in interview-based, authors Mira Crouch and Heather McKenzie note that using fewer than 20 participants during a qualitative research study will result in better data. Why? With a smaller group, it’s easier for you (the researcher) to build strong close relationships with your participants, which in turn leads to more natural conversations and better data.
There's also a school of thought that you should interview 5 or so people per persona. For example, if you're working in a company that has well-defined personas, you might want to use those as a basis for your study, and then you would interview 5 people based on each persona. This maybe worth considering or particularly important when you have a product that has very distinct user groups (e.g. students and staff, teachers and parents etc).
How your domain affects sample size
The scope of the topic you’re researching will change the amount of information you’ll need to gather before you’ve hit the saturation point. Your topic is also commonly referred to as the domain.
If you’re working in quite a confined domain, for example, a single screen of a mobile app or a very specific scenario, you’ll likely find interviews with 5 participants to be perfectly fine. Moving into more complicated domains, like the entire checkout process for an online shopping app, will push up your sample size.
As Mitchel Seaman notes: “Exploring a big issue like young peoples’ opinions about healthcare coverage, a broad emotional issue like postmarital sexuality, or a poorly-understood domain for your team like mobile device use in another country can drastically increase the number of interviews you’ll want to conduct.”
In-person or remote
Does the location of your participants change the number you need for qualitative user research? Well, not really – but there are other factors to consider.
Budget: If you choose to conduct remote interviews/usability tests, you’ll likely find you’ve got lower costs as you won’t need to travel to your participants or have them travel to you. This also affects…
Participant access: Remote qualitative research can be a lifesaver when it comes to participant access. No longer are you confined to the people you have physical access to, instead you can reach out to anyone you’d like.
Quality: On the other hand, remote research does have its downsides. For one, you’ll likely find you’re not able to build the same kinds of relationships over the internet or phone as those in person, which in turn means you never quite get the same level of insights.
Is there value in outsourcing recruitment?
Recruitment is understandably an intensive logistical exercise with many moving parts. If you’ve ever had to recruit people for a study before, you’ll understand the need for long lead times (to ensure you have enough participants for the project) and the countless long email chains as you discuss suitable times.
Outsourcing your participant recruitment is just one way to lighten the logistical load during your research. Instead of having to go out and look for participants, you have them essentially delivered to you in the right number and with the right attributes.
We’ve got one such service at Optimal, which means it’s the perfect accompaniment if you’re also using our platform of UX tools. Read more about that here.
Wrap-up
So that’s really most of what there is to know about participant recruitment in a qualitative research context. As we said at the start, while it can appear quite tricky to figure out exactly how many people you need to recruit, it’s actually not all that difficult in reality.
Overall, the number of participants you need for your qualitative research can depend on your project among other factors. It’s important to keep saturation in mind, as well as the locale of participants. You also need to get the most you can out of what’s available to you. Remember: Some research is better than none!