Your guide to creating and running effective tree tests
Tree testing tells you how easily people can find information on your website, and exactly where people get lost. Your website visitors rely on your information architecture — how you label and organize your content — to get things done. Tree testing can answer questions like:
Tree testing has two main elements: your tree, and your tasks.
Your tree is a text-only version of your website structure (similar to a sitemap). You ask participants to complete tasks by clicking through your tree and nominating the information they think is correct. The task results will tell you:
Let’s say you work for a government tax website, and you want to know if self-employed people can find easily find information relevant to them.
Your tree might look like this:
Which in Treejack will look like this:
You might write a task like this:
You have an idea for a new business, and you want to find out what paperwork you have to complete for tax purposes.
Participants will be presented with the task, and see the top level of the tree:
They will click through the tree until they think they’ve completed the task:
And the data for that task will tell you things like this:
Tree testing is useful whenever you want to find out if the labels and structure of your information on your website, intranet, or product is easy to understand. You can get valuable insights at all stages in the design process, whether you’re starting from scratch or making a few tweaks to a website you already have in place.
You can test large website structures (with 10+ levels and 1000s of labels, for example). You can test small structures (with 3 levels and 22 labels, for example). You can test any size in between. You can set up one or two large studies, or you can run multiple smaller studies at the same time. You can set 1 task, or up to 10. You can recruit fewer people, or more. The potential for tree testing to give you useful insights is quite endless.
The clearer you are about your reasons and goals for tree testing, the more relevant and effective your results will be.
Get specific about the information architecture you want tree testing to help with. Do you want to test and improve a whole website, or just parts of it? Do you have just one structure to test to start with, or four variations to compare? Do you want to test your whole website in one go, or create a separate test for each part? Are you working on a complete overhaul, or just tweaking the information in one category? Be clear and specific about what tree testing will help you to improve.
At some point, your customers will get to benefit from all the work you’ve put into tree testing, so it’s vital to establish who these intended users are, before you run your tree tests. You’ll also find it easier to recruit suitable participants if you have specifics in mind before you start.
You might have access to a collection of customer personas, or data on the exact demographics of the people who visit your website or use your product. You might have a vague idea of who you’re designing for, and a few questions as well. Whatever you can find out, the better.
As well as knowing your audience, knowing why you’re improving the website or product will help you to focus your tree testing. You’ll get answers to your ‘Why’ from things like analytics, repeated support queries, phone calls to customer support, customer studies, user interviews, stakeholder feedback, client specifications, and so on. Think of the ‘why’ as your evidence — your justification for why tree testing is necessary.
Think about when tree testing would be the most useful for you as part of the wider design project you’re working on — a useful approach whether you’re working on something small (like your own personal website) or something big (like a government website). Getting your timing right — running tree tests at critical stages of your project — will give you a whole series of results visualizations and benchmarked data to compare, use to inform design decisions, and present as progress and return-on-investment reports to stakeholders.
If you’re improving a website that already exists, running a tree test to start with will give you insights into what works and what needs work from the start. Furthermore, starting with a tree test will also give you a clear, quantifiable benchmark for you to improve on in your iterations. If you test your first tree with 8 tasks, and then test your revised tree with the same 8 tasks, you’ll be able to pinpoint exactly how your changes have improved the findability of your information.
If you are creating a new design, run an open card sort to generate ideas for categorising your content and labelling your categories. Once you’ve created a draft IA based on the results of the card sort and your site requirements, run a tree test to see how it performs based on common user tasks.
If you have more than one potential design, test them all with tree testing instead of putting all your effort into trying to perfect just one. Then you’ll have data showing you which trees perform best, and the labels and groupings that trip people up.
Your tree is a text-only version of your website hierarchy (similar to a sitemap). Your category labels (first level, second level, and so on) are known as ‘parent nodes’, and your information labels are known as ‘child nodes’. If you don’t already have access to a sitemap, you can build the tree from the labels and structure of your website. Either way, here are four top tips for getting your tree right.
Creating your tree in spreadsheet and then importing it to Treejack is a simple way to build the tree you want to test. You may already have access to a spreadsheet sitemap if you’re improving an existing website. Having your tree on a spreadsheet will make testing different versions of the same tree quick and simple.
You can also build your tree easily in Treejack itself. You might find this useful if you’re creating a brand new structure to test, or making small changes to a tree you’ve already built. You can download a CSV file of the tree you build to have a spreadsheet copy as well. To retest the same tree, you can simply replicate the first test and make your changes in within Treejack before relaunching.
A tree test will give you data on specific section of your website. If you want to test your whole website, your first label might say ‘Home’, and the next level labels will represent the links on your homepage:
And if you want to test one particular section of your website instead of everything, your first label will act as your ‘homepage’, and your next level labels will represent the main links on that page: (in this example, we only wanted to test the Support Centre’):
Tree testing assesses how easily people can find information on your tree, so each task you ask participants to complete will need a correct answer. This answer will be a ‘child node’, and will represent the information you want your website visitors to be able to find easily.
Dave O’Brien points out that ‘any implicit in-page content should be turned into explicit topics in the tree, so that participants can ‘see’ and select those topics.’ For example, you might have a long-scrolling page labelled ‘About our company’ that has three types of information on it: ‘Our services, ‘Our clients’, and ‘Our location’:
If someone visited your actual website to find out where your office is, for example, they’d only need to click ‘About our company’ and scroll the page to see the information they wanted.
To test the same task in Treejack, in contrast, in your tree you would have ‘About our company’ as a ‘parent’ node, and the three types of information as ‘children’. You would then select ‘Our Location’ as the correct destination. So this part of your tree would look like this:
Your participants would see this on the tree like this:
And they would be able to select ‘Our location’ to complete the task successfully:
You can have the best of both worlds because Treejack data shows you the path participants took. You’ll still be able to assess the effectiveness of ‘About our company' as a parent node, even when you make the content of the page your child nodes.
You want to test the ‘findability’ of your information, so exclude labels from your tree that participants might select as an easy way out of finding the actual answer. Labels like ‘Contact us’, ‘Help’, and ‘Search’ are useful options for people when they use your actual website, but if people select these in your test you won’t be able to infer anything from the data.
Getting the labelling and hierarchy of your content right will reduce the need for people to contact you or use the search function, so that’s something to keep in mind as well.
Getting the tasks right is vital for gathering useful data. You want the tasks to reflect how your users might naturally approach your website, and you also want to make sure you don’t give the answer away by using the same language that’s in your tree. Your tasks will be linked to your testing objectives.
Take the evidence you gathered when setting your objectives, and write tasks that enable you to improve each one.
For example, you might know that your contact centre has been inundated with calls asking where to find ‘Form X’. If you therefore have a task that says ‘You need to complete X application, and you want to know if you can do this online’, the data you gather will tell you:
Tree testing helps you discover if people can find information in your website structure, so every task needs at least one correct destination. If you’re tree testing a large website, particularly one that’s been updated haphazardly over the years, you’ll probably find that information is repeated on different pages, and so you’ll need to select more than one correct destination. Go through the tree thoroughly to make sure you get them all.
You can’t select category labels (or ‘parent nodes’) as correct destinations because although we do want to know if our category labels make sense, we ultimately want to know if people can navigate the labels to find the right information. This is explained in Step Two as well.
You want your tasks to mimic the thought a person might have when they visit your website. So write in a natural, plain English style, and introduce a hypothetical scenario for people to bring to mind.
So instead of writing ‘Click where you think you’d find our office’ you would write ‘You’ve booked a meeting with one of our staff, and you want to find out where to go.’ Keeping the tone conversational will make the task less ‘Click the thing’ and more ‘Find the right thing’.
People will take whatever language clues they can get from your task to make it easier for them. So if you have the phrase ‘application form’ in your task and on your tree, you risk ‘giving away’ the answer. There’s two problems with this.
People who select the correct destination might just be matching the phrases rather than actually deciding the information is correct. So if this task scores highly, you can’t infer a lot from the data.
On top of this, it’s very difficult to know exactly what people have in their minds when they arrive on our website — and almost impossible for our labels to mimic the language each of our users have when they visit our websites. Here’s an example of what I mean.
Let’s say you have a link on your website labelled ‘Application form for X’. And three people see that link and think ‘Yes, that’s exactly what I need!’. Those three people could have had the following three questions or tasks in their minds:
Each task will give you data on a different part of your tree, so match the number of tasks to the parts of your tree you want to test. And as each task is scored individually, you can have a few as one task if you have one specific part of the tree you want to test.
We recommend a maximum of 10 tasks per tree test for two reasons: More tasks might mean fewer participants complete the entire test. And you run the risk of people becoming too familiar with your tree, which would bias the results for your later tasks.
If you do set more than 8 or so tasks, we recommend selecting the option to randomize the order the tree is presented to people. Then for each task, people will see the labels arranged in a different order.
If you want to gather more data from a test on the same tree, you could set up a separate test and either recruit different participants, or ask the same people after some time has passed.
The quality of your participants, and therefore the quality of your data, is an important thing to consider when you start recruiting. You want people who are as close to the right demographics as possible, and who are keen and willing to take the activity seriously. Here are three top tips.
You can recruit participants in a bunch of different ways, and how you do so will depend on a few different factors.
You can recruit participants in a bunch of different ways, and how you do so will depend on a few different factors. If you have access to a pool of participants (like employees if you’re working on an internal product, or your customer mailing list) then sending them an email invitation, along with an incentive or chance to win a prize, can be a useful way to get responses. Similarly, you could invite people via your social media channels or add banners to your website.
Keep in mind that if people don’t receive an incentive or are not obligated to participate, you’ll need to invite a whole lot more people than your minimum required.
You can also make use of high-quality recruitment panels, which can be effective if you want fast, pain-free options with minimal effort. You can recruit participants from quite specific demographics, and be confident that the participants will take your study seriously (they are getting paid, after all).
After you’ve launched your tree test, you’ll be given the option to recruit participants via our integrated panel from within your Optimal Workshop account. You’ll then enter in your required demographics and be presented with a quote based on the types of participants and the complexity of your study. After hitting Go, you can sit back and watch the results come in while you get on with other work.
Tree testing is a new concept and activity for many people, which makes the clarity of your instructions particularly important. Treejack comes with default instructions that we’ve found are easy for people to understand, and you can tailor them to suit your own participants or the particular features of your tree test. You can do the same thing with the text you write in your invitation email, your tweet, and your facebook post.
Managing participant expectations will reduce abandonment rates because people will know what they’re in for before accepting. If you say an activity will take 5 minutes at the most, people will take that quite literally. Offering an incentive is a good way to let your participants know how much you value their contribution, as well as ensuring they are committed to complete the whole activity.
We recommend getting at least 30 participants to complete your tree tests, and ideally around 50, so that you can see trends clearly and account for variations and outliers. In Treejack, the more participants you get, the more confident you can be in the accuracy of your quantitative data.
However, you can still get plenty of useful insights with fewer participants, so even if you only have a limited number or time, or if you’re on the free plan with a maximum of 10 participants, running a tree test is definitely worthwhile.
Tree testing data is presented in a series of tables and visualizations in Treejack, and is downloadable in either a raw or customized spreadsheet format. The results are designed to give you quick, actionable insights and enable you to delve deeper into the data. Something for everyone.
Tasks are scored individually, so after reviewing the overall results and assessing the validity of your participants, you’ll find it effective to analyze tasks one by one.
In the following sections, we refer to two examples: results from a high-scoring task, and results from a low-scoring task. The high-scoring task is from a study on Manchester United’s official website, which you can read about over here. The low-scoring task is from a study on the USA’s official tax website, which you can read about over here.
The results Overview tells you all the need-to-know information about your entire tree test. It updates in real time so you and anyone you’ve shared your results with can see progress in terms of participants, scores, and time taken.
The Success score is the average percentage of participants who got to the correct destination across all tasks. The Directness score is the average percentage of participants who selected a destination without backtracking. We’ll go into more detail about these scores in the Task results below.
The Participants table displays useful information about every participant who started your tree test, and can be used to clean and filter your tree test data.
Before delving into your results, take some time to review your participants to make sure they are all worth including. You can sort the table according to Time Taken, and exclude participants that you think finished the tree test too quickly, or took too long. Participants who fail to complete all tasks are automatically excluded, so you might decide to include them if you consider their completed tasks to be useful. You have control over who to include and exclude.
You can also use this table to select and reload results based on answers to questionnaires, or based on identifiers you’ve requested. You can do this any time during your analysis.
Analyzing tasks one by one will help you to see clear links across the data visualizations, and the overall Task result is where you’ll start.
It gives you a success and directness score for each task, the average time taken to complete the task by participants, and a breakdown of whether each success, fail, or task skip was direct or indirect. Each task also receives an overall score out of 10 (with an 8 or above score considered well-performing).
The Success score tells you what percentage of participants went to the correct destination on the tree, regardless of how direct they were. The Directness score tells you the percentage of people who went straight to a destination without backtracking, regardless of whether they were correct or incorrect.
We can see that 94% of people selected a correct destination, and that 74% of people went directly there without having to go back through the tree. Great news!
While the success rate is high, it would be worth looking more closely at why 20% of people had to click back through the tree before they selected a correct destination, and in particular, what made 5 people nominate the wrong destination as correct.
The large green circles and lines tell us instantly that most people went directly to the right destinations. In this example, we can see three green lines that lead to three yellow circles — this is because this task had three correct destinations on the tree. We could look at the tiny yellow circles at the end of the branching gray lines, and analyze why people might have incorrectly nominated these answers.
Also, look closely at the label represented by the the small, totally-red circle branching out from the left of the Home circle (in this case, the label says ‘Official Membership’). The red represents an incorrect path, and the lack of blue shows that people kept going further before going back up the tree. Why might people have thought they were on the right track?
We can see quickly that only 15% of participants selected the correct answer, which is made up of 7 people (Indirect Success) who had to go back up the tree before they found the answer, and only 1 person who went directly to it. And the Directness score tells us that only 36% of participants went directly to answer without clicking back through the tree. On top of this, people took an average of 23.3 seconds to finish the task, which is a long time in the world of the web.
We need to understand why 44 people (85%) selected an incorrect destination, and so our focus needs to be on the paths they took (their first clicks, when they went back up the tree, and so on) and their destinations (the answers they incorrectly nominated as correct.) We can find out more about these things on the Pietree, the Paths tab, and the First Click tab.
So where did most people think they’d find the answer? On the pietree, the two largest circles representing second-level labels tell us that most people went to either ‘Credits and Deductions’ and ‘Help and Resources’. You can ask yourself here: Why did people expect to find out about tax benefits under these labels?
This pietree tells us instantly that people went everywhere to find the correct answer, and that very few succeeded. See those branches going out in every direction? That represents people clicking all over the tree. And see that very skinny green line? That represents the number of people who went directly to the right answer. And see the very small yellow circle the green line leads to? That represents the number of people who selected the correct answer.
You have lots of potential starting points for improving this IA, so keep it simple by focusing on the big numbers. The large red circle in the middle shows the percentage of people who went down the wrong path from the Home page.
This indicates that our second-level labels caused a lot of confusion for people trying to complete this task. If we look back our tree (and the options that caused so much confusion for people), we can see the exact labels that need work:
The Paths table shows us the exact path each participant took through the tree before selecting a destination. You can filter the table by indirect and direct success, indirect and direct failure, and indirect and direct skips, depending on the scores you consider are worth exploring in depth.
Even though this is high-scoring task, it’s worth exploring indirect paths and direct failures in more depth. In this example, which is filtered, we can see that although Participant 14 got to the correct destination, they moved around the tree a lot before they got there. And when we look at the paths of the five people who went directly to an incorrect destination, we can ask ourselves which label made me them think they were on the right track.
When we filter the Paths to only show the Direct Failure results, we can look closely at the paths to see which labels led people in the wrong direction.
And when we filter by Indirect Success, we can look in more detail at the labels that people clicked and then moved back from. In these situations, people have seen the options and decided they’d gone the wrong way. You can ask yourself why people may have gone back from these labels.
First clicks matter. Two pieces of research (including one using Treejack data from millions of participants) have illustrated that when participants get their first click right in task-based usability tests, they have at least double the chance of success, and sometimes even triple. This table gives you the first-click data straight.
Here we have a very clear winner for the first place people clicked: 92% clicked ‘Tickets and Prices’. And as this was the correct path, we can be confident that this label leads people in the right direction.
People were asked to find information about special tax credits for military employees — this table tells us that 44% of them expected to find that information by clicking on the label ‘Credits & Deductions’, 13% thought ‘Help & Resources’ would take them there, and 12% thought ‘For tax pros’ was the right way to go. This is very useful data when deciding on the most effective label for that part of your website.
We hope you’ve found this Tree Testing 101 informative and inspiring. Here’s a few things you could do next to explore this technique in more depth.