December 5, 2022

Live training: How to benchmark an existing site structure using Treejack

If you missed our live training, don’t worry, we’ve got you covered! In this session, our product experts Katie and Aidan discuss why, how and when to benchmark an existing structure using Treejack.

They also talk through some benchmarking use cases, demo how to compare tasks between different studies, and which results are most helpful.

Share this article
Author
Sarah
Flutey

Related articles

View all blog articles
Learn more
1 min read

Behind the scenes of UX work on Trade Me's CRM system

We love getting stuck into scary, hairy problems to make things better here at Trade Me. One challenge for us in particular is how best to navigate customer reaction to any change we make to the site, the app, the terms and conditions, and so on. Our customers are passionate both about the service we provide — an online auction and marketplace — and its place in their lives, and are rightly forthcoming when they're displeased or frustrated. We therefore rely on our Customer Service (CS) team to give customers a voice, and to respond with patience and skill to customer problems ranging from incorrectly listed items to reports of abusive behavior.

The CS team uses a Customer Relationship Management (CRM) system, Trade Me Admin, to monitor support requests and manage customer accounts. As the spectrum of Trade Me's services and the complexity of the public website have grown rapidly, the CRM system has, to be blunt, been updated in ways which have not always been the prettiest. Links for new tools and reports have simply been added to existing pages, and old tools for services we no longer operate have not always been removed. Thus, our latest focus has been to improve the user experience of the CRM system for our CS team.

And though on the surface it looks like we're working on a product with only 90 internal users, our changes will have flow on effects to tens of thousands of our members at any given time (from a total number of around 3.6 million members).

The challenges of designing customer service systems

We face unique challenges designing customer service systems. Robert Schumacher from GfK summarizes these problems well. I’ve paraphrased him here and added an issue of my own:

1. Customer service centres are high volume environments — Our CS team has thousands of customer interactions every day, and and each team member travels similar paths in the CRM system.

2. Wrong turns are amplified — With so many similar interactions, a system change that adds a minute more to processing customer queries could slow down the whole team and result in delays for customers.

3. Two people relying on the same system — When the CS team takes a phone call from a customer, the CRM system is serving both people: the CS person who is interacting with it, and the caller who directs the interaction. Trouble is, the caller can't see the paths the system is forcing the CS person to take. For example, in a previous job a client’s CS team would always ask callers two or three extra security questions — not to confirm identites, but to cover up the delay between answering the call and the right page loading in the system.

4. Desktop clutter — As a result of the plethora of tools and reports and systems, the desktop of the average CS team member is crowded with open windows and tabs. They have to remember where things are and also how to interact with the different tools and reports, all of which may have been created independently (ie. work differently). This presents quite the cognitive load.

5. CS team members are expert users — They use the system every day, and will all have their own techniques for interacting with it quickly and accurately. They've also probably come up with their own solutions to system problems, which they might be very comfortable with. As Schumacher says, 'A critical mistake is to discount the expert and design for the novice. In contact centers, novices become experts very quickly.'

6. Co-design is risky — Co-design workshops, where the users become the designers,  are all the rage, and are usually pretty effective at getting great ideas quickly into systems. But expert users almost always end up regurgitating the system they're familiar with, as they've been trained by repeated use of systems to think in fixed ways.

7. Training is expensive — Complex systems require more training so if your call centre has high churn (ours doesn’t – most staff stick around for years) then you’ll be spending a lot of money. …and the one I’ve added:

8. Powerful does not mean easy to learn — The ‘it must be easy to use and intuitive’ design rationale is often the cause of badly designed CRM systems. Designers mistakenly design something simple when they should be designing something powerful. Powerful is complicated, dense, and often less easy to learn, but once mastered lets staff really motor.

Our project focus

Our improvement of Trade Me Admin is focused on fixing the shattered IA and restructuring the key pages to make them perform even better, bringing them into a new code framework. We're not redesigning the reports, tools, code or even the interaction for most of the reports, as this will be many years of effort. Watching our own staff use Trade Me Admin is like watching someone juggling six or seven things.

The system requires them to visit multiple pages, hold multiple facts in their head, pattern and problem-match across those pages, and follow their professional intuition to get to the heart of a problem. Where the system works well is on some key, densely detailed hub pages. Where it works badly, staff have to navigate click farms with arbitrary link names, have to type across the URL to get to hidden reports, and generally expend more effort on finding the answer than on comprehending the answer.

Groundwork

The first thing that we did was to sit with CS and watch them work and get to know the common actions they perform. The random nature of the IA and the plethora of dead links and superseded reports became apparent. We surveyed teams, providing them with screen printouts and three highlighter pens to colour things as green (use heaps), orange (use sometimes) and red (never use). From this, we were able to immediately remove a lot of noise from the new IA. We also saw that specific teams used certain links but that everyone used a core set. Initially focussing on the core set, we set about understanding the tasks under those links.

The complexity of the job soon became apparent – with a complex system like Trade Me Admin, it is possible to do the same thing in many different ways. Most CRM systems are complex and detailed enough for there to be more than one way to achieve the same end and often, it’s not possible to get a definitive answer, only possible to ‘build a picture’. There’s no one-to-one mapping of task to link. Links were also often arbitrarily named: ‘SQL Lookup’ being an example. The highly-trained user base are dependent on muscle memory in finding these links. This meant that when asked something like: “What and where is the policing enquiry function?”, many couldn’t tell us what or where it was, but when they needed the report it contained they found it straight away.

Sort of difficult

Therefore, it came as little surprise that staff found the subsequent card sort task quite hard. We renamed the links to better describe their associated actions, and of course, they weren't in the same location as in Trade Me Admin. So instead of taking the predicted 20 minutes, the sort was taking upwards of 40 minutes. Not great when staff are supposed to be answering customer enquiries!

We noticed some strong trends in the results, with links clustering around some of the key pages and tasks (like 'member', 'listing', 'review member financials', and so on). The results also confirmed something that we had observed — that there is a strong split between two types of information: emails/tickets/notes and member info/listing info/reports.

We built and tested two IAs

pietree results tree testing

After card sorting, we created two new IAs, and then customized one of the IAs for each of the three CS teams, giving us IAs to test. Each team was then asked to complete two tree tests, with 50% doing one first and 50% doing the other first. At first glance, the results of the tree test were okay — around 61% — but 'Could try harder'. We saw very little overall difference between the success of the two structures, but definitely some differences in task success. And we also came across an interesting quirk in the results.

Closer analysis of the pie charts with an expert in Trade Me Admin showed that some ‘wrong’ answers would give part of the picture required. In some cases so much so that I reclassified answers as ‘correct’ as they were more right than wrong. Typically, in a real world situation, staff might check several reports in order to build a picture. This ambiguous nature is hard to replicate in a tree test which wants definitive yes or no answers. Keeping the tasks both simple to follow and comprehensive proved harder than we expected.

For example, we set a task that asked participants to investigate whether two customers had been bidding on each other's auctions. When we looked at the pietree (see screenshot below), we noticed some participants had clicked on 'Search Members', thinking they needed to locate the customer accounts, when the task had presumed that the customers had already been found. This is a useful insight into writing more comprehensive tasks that we can take with us into our next tests.  

What’s clear from analysis is that although it’s possible to provide definitive answers for a typical site’s IAs, for a CRM like Trade Me Admin this is a lot harder. Devising and testing the structure of a CRM has proved a challenge for our highly trained audience, who are used to the current system and naturally find it difficult to see and do things differently. Once we had reclassified some of the answers as ‘correct’ one of the two trees was a clear winner — it had gone from 61% to 69%. The other tree had only improved slightly, from 61% to 63%.

There were still elements with it that were performing sub-optimally in our winning structure, though. Generally, the problems were to do with labelling, where, in some cases, we had attempted to disambiguate those ‘SQL lookup’-type labels but in the process, confused the team. We were left with the dilemma of whether to go with the new labels and make the system initially harder to use for staff but easier to learn for new staff, or stick with the old labels, which are harder to learn. My view is that any new system is going to see an initial performance dip, so we might as well change the labels now and make it better.

The importance of carefully structuring questions in a tree test has been highlighted, particularly in light of the ‘start anywhere/go anywhere’ nature of a CRM. The diffuse but powerful nature of a CRM means that careful consideration of tree test answer options needs to be made, in order to decide ‘how close to 100% correct answer’ you want to get.

Development work has begun so watch this space

It's great to see that our research is influencing the next stage of the CRM system, and we're looking forward to seeing it go live. Of course, our work isn't over— and nor would we want it to be! Alongside the redevelopment of the IA, I've been redesigning the key pages from Trade Me Admin, and continuing to conduct user research, including first click testing using Chalkmark.

This project has been governed by a steadily developing set of design principles, focused on complex CRM systems and the specific needs of their audience. Two of these principles are to reduce navigation and to design for experts, not novices, which means creating dense, detailed pages. It's intense, complex, and rewarding design work, and we'll be exploring this exciting space in more depth in upcoming posts.

Learn more
1 min read

How to Spot and Destroy Evil Attractors in Your Tree (Part 1)

Usability guru Jared Spool has written extensively about the 'scent of information'. This term describes how users are always 'on the hunt' through a site, click by click, to find the content they’re looking for. Tree testing helps you deliver a strong scent by improving organisation (how you group your headings and subheadings) and labelling (what you call each of them).

Anyone who’s seen a spy film knows there are always false scents and red herrings to lead the hero astray. And anyone who’s run a few tree tests has probably seen the same thing — headings and labels that lure participants to the wrong answer. We call these 'evil attractors'.In Part 1 of this article, we’ll look at what evil attractors are, how to spot them at the answer end of your tree, and how to fix them. In Part 2, we’ll look at how to spot them in the higher levels of your tree.

The false scent — what it looks like in practice

One of my favourite examples of an evil attractor comes from a tree test we ran for consumer.org.nz, a New Zealand consumer-review website (similar to Consumer Reports in the USA). Their site listed a wide range of consumer products in a tree several levels deep, and they wanted to try out a few ideas to make things easier to find as the site grew bigger.We ran the tests and got some useful answers, but we also noticed there was one particular subheading (Home > Appliances > Personal) that got clicks from participants looking for very different things — mobile phones, vacuum cleaners, home-theatre systems, and so on:

pic1

The website intended the Personal appliance category to be for products like electric shavers and curling irons. But apparently, Personal meant many things to our participants: they also went there for 'personal' items like mobile phones and cordless drills that actually lived somewhere else.This is the false scent — the heading that attracts clicks when it shouldn’t, leading participants astray. Hence this definition: an evil attractor is a heading that draws unwanted traffic across several unrelated tasks.

Evil attractors lead your users astray

Attracting clicks isn’t a bad thing in itself. After all, that’s what a good heading does — it attracts clicks for the content it contains (and discourages clicks for everything else). Evil attractors, on the other hand, attract clicks for things they shouldn’t. These attractors lure users down the wrong path, and when users find themselves in the wrong place they'll either back up and try elsewhere (if they’re patient) or give up (if they’re not). Because these attractor topics are magnets for the user’s attention, they make it less likely that your user will get to the place you intended. The other evil part of these attractors is the way they hide in the shadows. Most of the time, they don’t get the lion’s share of traffic for a given task. Instead, they’ll poach 5–10% of the responses, luring away a fraction of users who might otherwise have found the right answer.

Find evil attractors easily in your data

The easiest attractors to spot are those at the answer end of your tree (where participants ended up for each task). If we can look across tasks for similar wrong answers, then we can see which of these might be evil attractors.In your Treejack results, the Destinations tab lets you do just that. Here’s more of the consumer.org.nz example:

Pic2

Normally, when you look at this view, you’re looking down a column for big hits and misses for a specific task. To look for evil attractors, however, you’re looking for patterns across rows. In other words, you’re looking horizontally, not vertically. If we do that here, we immediately notice the row for Personal (highlighted yellow). See all those hits along the row? Those hits indicate an attractor — steady traffic across many tasks that seem to have little in common. But remember, traffic alone is not enough. We’re looking for unwanted traffic across unrelated tasks. Do we see that here? Well, it looks like the tasks (about cameras, drills, laptops, vacuums, and so on) are not that closely related. We wouldn’t expect users to go to the same topic for each of these. And the answer they chose, Personal, certainly doesn’t seem to be the destination we intended. While we could rationalise why they chose this answer, it is definitely unwanted from an IA perspective. So yes, in this case, we seem to have caught an evil attractor red-handed. Here’s a heading that’s getting steady traffic where it shouldn’t.

Evil attractors are usually the result of ambiguity

It’s usually quite simple to figure out why an item in your tree is an evil attractor. In almost all cases, it’s because the item is vague or ambiguous — a word or phrase that could mean different things to different people. Look at our example above. In the context of a consumer-review site, Personal is too general to be a good heading. It could mean products you wear, or carry, or use in the bathroom, or a number of things. So, when those participants come along clutching a task, and they see Personal, a few of them think 'That looks like it might be what I’m looking for', and they go that way.Individually, those choices may be defensible, but as an information architect, are you really going to group mobile phones with vacuum cleaners? The 'personal' link between them is tenuous at best.

Destroy evil attractors by being specific

Just as it’s easy to see why most attractors attract, it’s usually easy to fix them. Evil attractors trade in vagueness and ambiguity, so the obvious remedy is to make those headings more concrete and specific. In the consumer-site example, we looked at the actual content under the Personal heading. It turned out to be items like shavers, curling irons, and hair dryers. A quick discussion yielded Personal care as a promising replacement — one that should deter people looking for mobile phones and jewellery and the like.In the second round of tree testing, among the other changes we made to the tree, we replaced Personal with Personal Care. A few days later, the results confirmed our thinking. Our former evil attractor was no longer luring participants away from the correct answers:

Pic3

Testing once is good, testing twice is magic

This brings up a final point about tree testing (and about any kind of user testing, really): you need to iterate your testing —  once is not enough.The first round of testing shows you where your tree is doing well (yay!) and where it needs more work so you can make some thoughtful revisions. Be careful though. Even if the problems you found seem to have obvious solutions, you still need to make sure your revisions actually work for users, and don’t cause further problems. The good news is, it’s dead easy to run a second test, because it’s just a small revision of the first. You already have the tasks and all the other bits worked out, so it’s just a matter of making a copy in Treejack, pasting in your revised tree, and hooking up the correct answers. In an hour or two, you’re ready to pilot it again (to err is human, remember) and send it off to a fresh batch of participants.

Two possible outcomes await.

  • Your fixes are spot-on, the participants find the correct answers more frequently and easily, and your overall score climbs. You could have skipped this second test, but confirming that your changes worked is both good practice and a good feeling. It’s also something concrete to show your boss.
  • Some of your fixes didn’t work, or (given the tangled nature of IA work) they worked for the problems you saw in Round 1, but now they’ve caused more problems of their own. Bad news, for sure. But better that you uncover them now in the design phase (when it takes a few days to revise and re-test) instead of further down the track when the IA has been signed off and changes become painful.

Stay tuned for more on evil attractors

In Part 1, we’ve covered what evil attractors are and how to spot them at the answer end of your tree: that is, evil attractors that participants chose as their destination when performing tasks. Hopefully, a future version of Treejack will be able to highlight these attractors to make your analysis that much easier.

In Part 2, we’ll look at how to spot evil attractors in the intermediate levels of your tree, where they lure participants into a section of the site that you didn’t intend. These are harder to spot, but we’ll see if we can ferret them out.Let us know if you've caught any evil attractors red-handed in your projects.

Learn more
1 min read

Web usability guide

There’s no doubt usability is a key element of all great user experiences, how do we apply and test usability principles for a website? This article looks at usability principles in web design, how to test it, practical tips for success and a look at our remote testing tool, Treejack.

A definition of usability for websites 🧐📖

Web usability is defined as the extent to which a website can be used to achieve a specific task or goal by a user. It refers to the quality of the user experience and can be broken down into five key usability principles:

  • Ease of use: How easy is the website to use? How easily are users able to complete their goals and tasks? How much effort is required from the user?
  • Learnability: How easily are users able to complete their goals and tasks the first time they use the website?
  • Efficiency: How quickly can users perform tasks while using your website?
  • User satisfaction: How satisfied are users with the experience the website provides? Is the experience a pleasant one?
  • Impact of errors: Are users making errors when using the website and if so, how serious are the consequences of those errors? Is the design forgiving enough make it easy for errors to be corrected?

Why is web usability important? 👀

Aside from the obvious desire to improve the experience for the people who use our websites, web usability is crucial to your website’s survival. If your website is difficult to use, people will simply go somewhere else. In the cases where users do not have the option to go somewhere else, for example government services, poor web usability can lead to serious issues. How do we know if our website is well-designed? We test it with users.

Testing usability: What are the common methods? 🖊️📖✏️📚

There are many ways to evaluate web usability and here are the common methods:

  • Moderated usability testing: Moderated usability testing refers to testing that is conducted in-person with a participant. You might do this in a specialised usability testing lab or perhaps in the user’s contextual environment such as their home or place of business. This method allows you to test just about anything from a low fidelity paper prototype all the way up to an interactive high fidelity prototype that closely resembles the end product.
  • Moderated remote usability testing: Moderated remote usability testing is very similar to the previous method but with one key difference- the facilitator and the participant/s are not in the same location. The session is still a moderated two-way conversation just over skype or via a webinar platform instead of in person. This method is particularly useful if you are short on time or unable to travel to where your users are located, e.g. overseas.
  • Unmoderated remote usability testing: As the name suggests, unmoderated remote usability testing is conducted without a facilitator present. This is usually done online and provides the flexibility for your participants to complete the activity at a time that suits them. There are several remote testing tools available ( including our suite of tools ) and once a study is launched these tools take care of themselves collating the results for you and surfacing key findings using powerful visual aids.
  • Guerilla testing: Guerilla testing is a powerful, quick and low cost way of obtaining user feedback on the usability of your website. Usually conducted in public spaces with large amounts of foot traffic, guerilla testing gets its name from its ‘in the wild’ nature. It is a scaled back usability testing method that usually only involves a few minutes for each test but allows you to reach large amounts of people and has very few costs associated with it.
  • Heuristic evaluation: A heuristic evaluation is conducted by usability experts to assess a website against recognized usability standards and rules of thumb (heuristics). This method evaluates usability without involving the user and works best when done in conjunction with other usability testing methods eg Moderated usability testing to ensure the voice of the user is heard during the design process.
  • Tree testing: Also known as a reverse card sort, tree testing is used to evaluate the findability of information on a website. This method allows you to work backwards through your information architecture and test that thinking against real world scenarios with users.
  • First click testing: Research has found that 87% of users who start out on the right path from the very first click will be able to successfully complete their task while less than half ( 46%) who start down the wrong path will succeed. First click testing is used to evaluate how well a website is supporting users and also provides insights into design elements that are being noticed and those that are being ignored.
  • Hallway testing: Hallway testing is a usability testing method used to gain insights from anyone nearby who is unfamiliar with your project. These might be your friends, family or the people who work in another department down the hall from you. Similar to guerilla testing but less ‘wild’. This method works best at picking up issues early in the design process before moving on to testing a more refined product with your intended audience.

Online usability testing tool: Tree testing 🌲🌳🌿

Tree testing is a remote usability testing tool that uses tree testing to help you discover exactly where your users are getting lost in the structure of your website. Treejack uses a simplified text-based version of your website structure removing distractions such as navigation and visual design allowing you to test the design from its most basic level.

Like any other tree test, it uses task based scenarios and includes the opportunity to ask participants pre and post study questions that can be used to gain further insights. Tree testing is a useful tool for testing those five key usability principles mentioned earlier with powerful inbuilt features that do most of the heavy lifting for you. Tree testing records and presents the following for each task:

  • complete details of the pathways followed by each participant
  • the time taken to complete each task
  • first click data
  • the directness of each result
  • visibility on when and where participants skipped a task

Participant paths data in our tree testing tool 🛣️

The level of detail recorded on the pathways followed by your participants makes it easy for you to determine the ease of use, learnability, efficiency and impact of errors of your website. The time taken to complete each task and the directness of each result also provide insights in relation to those four principles and user satisfaction can be measured through the results to your pre and post survey questions.

The first click data brings in the added benefits of first click testing and knowing when and where your participants gave up and moved on can help you identify any issues.Another thing tree testing does well is the way it brings all data for each task together into one comprehensive overview that tells you everything you need to know at a glance. Tree testing's task overview- all the key information in one placeIn addition to this, tree testing also generates comprehensive pathway maps called pietrees.

Each junction in the pathway is a piechart showing a statistical breakdown of participant activity at that point in the site structure including details about: how many were on the right track, how many were following the incorrect path and how many turned around and went back. These beautiful diagrams tell the story of your usability testing and are useful for communicating the results to your stakeholders.

Usability testing tips 🪄

Here are seven practical usability testing tips to get you started:

  • Test early and often: Usability testing isn’t something that only happens at the end of the project. Start your testing as soon as possible and iterate your design based on findings. There are so many different ways to test an idea with users and you have the flexibility to scale it back to suit your needs.
  • Try testing with paper prototypes: Just like there are many usability testing methods, there are also several ways to present your designs to your participant during testing. Fully functioning high fidelity prototypes are amazing but they’re not always feasible (especially if you followed the previous tip of test early and often). Paper prototypes work well for usability testing because your participant can draw on them and their own ideas- they’re also more likely to feel comfortable providing feedback on work that is less resolved! You could also use paper prototypes to form the basis for collaborative design sessions with your users by showing them your idea and asking them to redesign or design the next page/screen.
  • Run a benchmarking round of testing: Test the current state of the design to understand how your users feel about it. This is especially useful if you are planning to redesign an existing product or service and will save you time in the problem identification stages.
  • Bring stakeholders and clients into the testing process: Hearing how a product or service is performing direct from a user can be quite a powerful experience for a stakeholder or client. If you are running your usability testing in a lab with an observation room, invite them to attend as observers and also include them in your post session debriefs. They’ll gain feedback straight from the source and you’ll gain an extra pair of eyes and ears in the observation room. If you’re not using a lab or doing a different type of testing, try to find ways to include them as observers in some way. Also, don’t forget to remind them that as observers they will need to stay silent for the entire session beyond introducing themselves so as not to influence the participant - unless you’ve allocated time for questions.
  • Make the most of available resources: Given all the usability testing options out there, there’s really no excuse for not testing a design with users. Whether it’s time, money, human resources or all of the above making it difficult for you, there’s always something you can do. Think creatively about ways to engage users in the process and consider combining elements of different methods or scaling down to something like hallway testing or guerilla testing. It is far better to have a less than perfect testing method than to not test at all.
  • Never analyse your findings alone: Always analyse your usability testing results as a team or with at least one other person. Making sense of the results can be quite a big task and it is easy to miss or forget key insights. Bring the team together and affinity diagram your observations and notes after each usability testing session to ensure everything is captured. You could also use Reframer to record your observations live during each session because it does most of the analysis work for you by surfacing common themes and patterns as they emerge. Your whole team can use it too saving you time.
  • Engage your stakeholders by presenting your findings in creative ways: No one reads thirty page reports anymore. Help your stakeholders and clients feel engaged and included in the process by delivering the usability testing results in an easily digestible format that has a lasting impact. You might create an A4 size one page summary, or maybe an A0 size wall poster to tell everyone in the office the story of your usability testing or you could create a short video with snippets taken from your usability testing sessions (with participant permission of course) to communicate your findings. Remember you’re also providing an experience for your clients and stakeholders so make sure your results are as usable as what you just tested.

Related reading 🎧💌📖

Seeing is believing

Explore our tools and see how Optimal makes gathering insights simple, powerful, and impactful.