UX stats for the faint-hearted: An interview with Jeff Sauro

Audio (mp3: 10.8 MB, 22:31)

Published: 1 August 2012

Gerry Gaffney:

This is Gerry Gaffney with the User Experience podcast.

My guest today has worked for Oracle, PeopleSoft, Intuit and General Electric and consults for Fortune 1000 companies such as Paypal, Walmart and Order Desk. He has a Masters in learning design and technology from Stanford University with a concentration in statistical concepts.

He’s published several peer-reviewed papers and has presented in a wide variety of forums. His most recent book is Quantifying the User Experience; Practical Statistics for User Research, which he co-authored with Jim Lewis. He specialises in making statistical concepts understandable and actionable. Jeff Sauro, welcome to the User Experience podcast.

Jeff Sauro:

Thanks, Gerry, it’s great to be here.

Gerry:

Listeners may remember that we interviewed Jim Lewis on a previous episode about voice user interfaces, speech user interfaces rather. How did you hook up with Jim?

Jeff:

You know, there was not many folks about 10 years ago doing quantification, statistics in usability and especially now it’s become user experience. Jim has a long publication record and I sort of tracked him down, and I had some questions because there were so many open questions that I had in how you go about quantifying this sort of nebulous area that we call user experience.

And so we met up together, started working through a lot of these problems and then almost 10 years later it’s culminated in this book.

Gerry:

One thing I really enjoyed about Jim’s work is that he really combines the academic with the practitioner stuff very, very effectively and I think you guys have managed to do it very well in this new book as well.

Jeff:

Great, yeah, he’s got a good methodological and detailed focus that makes writing these sorts of books better.

Gerry:

I’m sure many listeners would feel that one of the great advantages of working with UX is they’ve managed to avoid the scary field of statistics, but apparently you’re telling us that that’s not so.

Jeff:

Yeah, you know, believe me, I get this sort of thing every week when somebody says statistics, user experience? One of the very reasons why I’m in the field of user experience is because I don’t have to deal with the sort of the cold hard impersonal numbers that maybe they were able to avoid in grad school or undergrad, and so, yeah, pretty much every week I get that and you know fortunately I’m able to sort of convince people that it doesn’t take the place of that important design work and that intuition or other areas that I think that they find beneficial, it’s just providing evidence that their designs are better.

Gerry:

But isn’t it abnormal to like statistics?

Jeff:

[Laughs.] In many respects it is. My wife will often look over at me wondering what the hell I’m reading in bed and sometimes I think a lot of people, I get that sense… Statistics I like because it sort of where math meets the real world.

It’s not sort of just an abstract game. It’s dealing with the sort of fuzziness that you have; this could happen, this might not happen. And especially applied statistics where you deal with the sort of messy reality of well you don’t always get all the measurements you want, you’ve got a small sample size, your data isn’t always nice and normally distributed and symmetric. And statistics provides a number on “maybe”, and it’s that “maybe” that allows you to help understand the risk and helps understand maybe you’re wrong, and what that “maybe” is.

Gerry:

User experience people often conduct activities with very small numbers of participants and typically we’ve said to clients: Oh you know, it’s not a statistical activity or statistical relevance is not actually particularly important. And we often act as if there’s no place and maybe no need for statistical analysis when we’re talking about five to ten users. Is that the case?

Jeff:

That’s certainly I think the prevailing wisdom and I’m hoping to sort of reduce that, but there’s a lot of fields where the costs of getting participants or subjects are high and you have to deal with small sample sizes.

In the area of fMRI, functional magnetic resonance imaging, you’ve got very expensive equipment and you often are dealing with five to ten people that you can get into these expensive machines and then sort of sit there while they do things. Often the military in those applications they’ll be exploding rockets for example and they can’t do so many of these things, or launches, and they want to find out.

So when you’ve got high costs you can’t just have the luxury of saying I’m going to run a thousand people through an fMRI. And we’ve got the same situation in usability but that in no way precludes us using statistics.

And I like to look at the analogy, it’s like astronomy. It would be nice if we always had access to the really powerful telescopes like the Keck telescope in Hawaii. But often we only have access to binoculars or a naked eye. And what we’re limited to with our binoculars or naked eye is just seeing very big things in the sky. But Galileo made some of the most important discoveries about our solar system, like the moons of Jupiter, by having telescopes no more powerful than many of the binoculars that we have today.

So when we’re dealing with small sample sizes we can use statistics but we’re limited to seeing those big differences between designs or more obvious usability issues. But the good news is that’s usually why we’re getting paid to do what we do is first find and fix the problems that are most noticeable to your users.

There’s usually enough of those that if you can work on those your products are going to be easier to use. And of course there’s cases where you’ve got to find much smaller differences where that Keck telescope is important. So often like large sample A/B tests online. So there’s places where there’s millions of people a day that will come to a website like Paypal. Small differences make a big difference in terms of the bottom line and of course in those places when you want to detect those small differences statistics of course are both helpful and also applicable.

Gerry:

I think one of the things in the book that was a bit of an eye opener for me was, I can’t remember the exact example, but you chose I think the results of a usability study and you said, based on the findings and doing the number crunching the probability of people having trouble with a particular aspect of the UI was somewhere between 25% and 90% and you think wow that’s a huge variation there. But then, as you point out, the 25% was unacceptable in this particular case so you knew you needed to fix it even though you had that huge range.

But that range is probably the sort of thing that would make practitioners feel uncomfortable.

Jeff:

It is and sometimes just focusing on the precision of the measures is what gets you off into the deep end and you get a little concerned about that.

And it’s stepping back and saying what am I trying to find out? You focus on what’s my hypothesis and do I have data to determine that? And one other example that Jim likes to tell and we talk about as well is back when he was at IBM there was a problem with a printer that they were shipping out. The instruction manual was all printed and the printer was ready to go and as sort of a last ditch effort they thought, well if we put, on top of the instructions right when you sort of open the box up, we’re going to put one of those sheets that says, “Do this first.” So maybe we’ll avoid the problem that we couldn’t correct. And so he set up in a lab just eight people going through and finding out what would they do. Well, six out of eight people would open up the box, take the sheet that said, “Do this first” and throw it over their shoulder and get off to installing the printer.

So with six out of eight people, the 95% confidence intervals, so we are using statistics with a small sample size, the 95% confidence intervals is between 40% and 94% so it’s massively wide; 54% wide. Could you imagine a presidential poll with a margin of error that wide around it? But just with six out of eight people we’re 95% confident at least 40% of people are going to throw this over their shoulder and not even read it.

So even though it’s focusing on sort of the lower/upper boundaries, it’s just too high, so you can make some conclusions from that right away.

Gerry:

That’s a great example I think of how we can apply numbers to these statistical analyses to these small sample sizes.

Now I must admit I’m one of those that consider themselves to be statistically inept but I thoroughly enjoyed reading the book, which took me by surprise. You included an appendix with a crash course in fundamental statistical concepts that I found useful. Do you think that an adequate and practical proficiency in stats is within the grasp of your typical busy UX practitioner?

Jeff:

You know, I think it is. I think where I’m hoping to hook people and where I got hooked is if you start with a very specific research question or business question, whether it be well how do I know if my new design is better than my old design? Or what sample size do I need? Or how do I know I can use a statistical test? Or which statistical test do I need? Sort of starting with a problem and then having the book which provides the solutions for those problems like other practitioners have had I think will certainly help. I think most people too got exposed to statistics and for many of them unfortunately left a bad taste in their mouth but they were exposed to statistics as here’s a series of solutions that someday you might have a problem and this will help.

Well now people are faced with now I have a problem, I want to know what are the chances of this happening? Or what test do I use to determine if there’s a difference?

Now go back and sort of dig into whether it be a chapter or a part, maybe in that crash course, saying here’s enough for you to grasp the concept and then you can dig in the formulas if you want to.

Gerry:

One thing I really enjoyed about the book is that particular focus on what to do. You say if you’re doing this type of test, if you’ve got this number of participants and if these conditions prevail, then use this method.

And while you don’t preclude other methods, you help people I guess to focus down on that particular one. And there are also the worked examples at the end of each chapter which I confess I didn’t go through [laughter] but I can imagine one day actually going through it and raising my skills in that fashion.

Jeff:

Yeah and that’s the sort of thing where it’s; OK I get it, let’s try it on my own, on the one hand it was to provide that. But 2; hey it never hurts if you have exercises as examples to try and sell that to some more academic institutions.

Gerry:

I guess that’ll make the book very attractive to tertiary education organisations as well, universities and the like where they are teaching user-centred design.

Jeff:

Right, exactly right because in those cases, and this will often happen, where they have a textbook but then they’ll get in the textbook and it’ll say okay, here’s what you’re going to do but don’t use this statistical procedure, don’t generate a confidence interval using this formula unless your sample size is above 30 or something, or unless you have a large sample size. And then you see a sort of a footnote and it says what you do if you have a small sample size is beyond the scope of this book. And then you’re sort of stuck with no solution.

And so with our book instead of saying it’s beyond the scope of the book it’s saying well you’re probably going to run into this problem. You’re probably going to have this issue. Here’s what works the best and here’s what we found and here’s the literature that supports that and then oh just so you get it, just like in the same kind of didactic fashion at university, here are those exercises to go through.

Gerry:

Okay, so let’s test Jeff Sauro’s teaching capabilities here. Can you explain to us in audio what a normal distribution is?

Jeff:

[Laughs.] Yeah, definitely I have a lot of good visualisations for that so this is a good challenge… What researchers have noticed and scientists have noticed going back a couple hundred years now is if you plot the heights, the weights, the IQ scores, for example, of people on a graph as each one of those representing sort of a dot. Now what you’ll see is that dot over time forms a bell shape and that bell shape basically is conveying that the bulk of heights, the bulk of weights, the bulk of IQ scores tend to cluster around the average. So in the United States the average height for men is about five foot ten inches. I know you guys are metric system but five foot ten inches I think is about 177 centimetres. The bulk of people are going to be within a couple of inches of that and then as you get taller and taller and taller you sort of spread out. This sort of bell-shaped distribution has come to be known as the normal distribution.

It’s sort of a misnomer because it’s called normal but it doesn’t necessarily mean things are normal about what you’re measuring. It’s just maybe more of the standard or a typical distribution.

Gerry:

And then presumably it’s fair to say that most of the statistical analyses are based around that normal distribution.

Jeff:

Yeah, that’s right. It’s sort of at the heart of a lot of what’s called parametric statistics. In many cases we assume that there is this underlying normal distribution in the data. And you know this often presents a problem for practitioners because they’ll say alright Jeff well I graphed my questionnaire data. I graphed my time data; I graphed my conversion rate, completion rates. It looks nothing like that bell you just described. It’s all sort of funky shaped, it’s skewed up. So this normal distribution, this bell shape is all well and good but my data looks nothing like that so what do you do?

Gerry:

That was my next question. [Laughter.]

Jeff:

Oh good, I’m glad I anticipated. Well so the good news is that in most cases what we’re most interested in is actually the shape of the graph, the dots as they’re displayed on the graph, when we graph the average.

So if we graph the average score, the average completion rate and then take another sample and graph that one and take another sample and graph it; over time it’s the distribution of these averages which actually form a very nice bell shape, even if the distribution from which you’re graphing it from is not normal.

So if you have a completion rate or you have questionnaire data that’s all skewed, if you take a series of samples and just graph the average, it’s that average which follows a normal distribution. And this is the most fundamental concept of statistics. It’s called the central limit theorem. And for most usability data, which we talk about in our book, even for relatively small sample sizes it follows this normal distribution. And where it doesn’t we have some sort of here’s what you’re going to do in this particular case.

So I think that’s the first deceiving thing, we go on about the normal distribution but when you graph it it’s not normal but the distribution of the sample means are what become normal and we show that they’re normal even with small sample sizes.

Gerry:

Well done. I think it’s a little bit better in the book but that was pretty good.

Jeff:

[Laughs.] Yeah, yeah I know. There you go.

Gerry:

And I do recommend to people that they read the book if they’re at all interested in this or if they’re scared that clients are going to ask them about this because you do cover it very, very well and admittedly you have to concentrate a bit more than you do reading your average Twitter stream but it’s worth the effort.

One of the things though that occurred to me Jeff is that if we’re to embrace the use of statistics, we also need to embrace uncertainty and risk. Is that fair to say?

Jeff:

Absolutely, absolutely and as I alluded to earlier, statistics puts a number on “maybe” and it’s reminding you that you could be wrong about what you’re doing. You could be making the incorrect decision or being not as precise as you think you are about what you’re trying to estimate about what users are doing, what their attitudes are, how many are going to purchase, how many are going to convert to, say, sign up to a newsletter.

And the important thing is that you’re never 100% certain in statistics. There’s no such thing as 100% certainty and what’s interesting is that people I think unfortunately think if I don’t quantify what I’m doing, if I don’t use numbers, if I don’t use statistics that somehow I’ve avoided this problem of uncertainty in my decisions.

But the truth of the matter is, you have it, you just are dealing with an uncertainty that you’re even uncertain of.

Gerry:

So tell me, what’s the bare minimum that a UX practitioner can get away with? Or perhaps we should all just befriend someone who’s good with numbers.

Jeff:

That certainly doesn’t hurt to make friends with this person that’s quantitatively inclined. In the classes we give and for folks who don’t have a strong background, we tell them in the first couple of chapters of the book, the first three chapters, they can get introduced to the concept of confidence intervals. It tells you how precise you can be about the estimate that you make; about whether the attitudes people have, the time it’s taking them, the questionnaires, the completion. If you can kind of get your idea around if you could compute an average or if you’re computing the percentage of people that are completing a task and if you’re comfortable at least with the concept of confidence intervals as a measure of how precise your estimate is, that by itself will accomplish many statistical feats for you with sort of the smallest amount of computation and mental pain.

Gerry:

For UX practitioners who are maybe reading your book or getting interested in getting involved in this field, besides reading the book what else should they be doing?

Jeff:

There are some additional resources off my website measuringusability.com. We’ve taken a popular tutorial on practical statistics. It follows the book and it’s available for download off the website. So that’s one thing that I know a lot of people like to take those courses in a more teaching fashion than reading from a book. You sort of need both. You need having the book as reference. We also have a companion book that goes with it, that follows our Excel calculator or statistical package called R so it allows you to sort of, here’s your problem, get introduced to the concept and try and solve the problem with some similar examples, the worked examples, follow the tutorial.

And I’m at many conferences. We have a conference this fall in Denver, Colorado in the US. It’s called Lean UX Denver. So we’ll be there giving the tutorial there as well.

Gerry:

Okay so there are plenty of resources and things that people can do. Can I change the topic here a little bit and ask you what aspects of UX we can quantify or deal with statistically?

Jeff:

It’s pretty much at any point along the way there are the traditional things you would think about, like I mentioned earlier, are people completing or people not completing or people converting or not converting and I think that people think of that as oh that’s sort of a quantitative question. That’s great but what people don’t realise too is that you can actually quantify qualitative data and qualitative insights.

So for example there was a recent client that I had that they were interested in finding out why aren’t more people using our mobile credit card website? And we all assumed oh it’s just because they’re finding it difficult to use and so we brought them into the lab and we sort of had this open ended conversation with them about their use and their experience.

And then what we found is that just after 16 people, in many sentences people were sort of saying; well you know I’m just afraid that someone’s going to steal my data over the internet or the cellular network or that somebody else said; I’m just afraid my information’s going to get stolen. Or somebody else said; you know, I just don’t trust that my data’s secure.

Now we didn’t have a question in particular about are you trusting that your data’s not going to get stolen. But in many ways they articulated these comments and we found that 5 out of 16 people said something in a way that they were concerned their data was going to get stolen.

And so we extrapolated that out to the client and said, well look we’re 95% confident between 14% and 56% of your participants probably see this as a primary impediment to using it. So in addition to of course improving the usability, probably a good bang for your buck is a marketing message that lets them know that the cellular network that they’re using is actually more secure than their home-based LAN.

Gerry:

Tthat’s very interesting because I guess you’re applying quantitative methods to something that, the sort of data that typically we’d consider to be qualitative.

Jeff:

That’s right and I think it’s important for people to understand that I’m in no way asking people to supplant the sort of skills and the methods that they use and that qualitative insight about what the problems are and the interface.

It’s not qualitative insights or statistics. It’s qualitative insights and statistics.

Gerry:

Now how should we present our statistical analyses to product teams and managers because presumably they’re going to be equally likely to be inept?

Jeff:

Yeah, that’s a good question. I think that’s sort of where your intuition and your instincts and the politics of the folks that you’re dealing with are probably as important or more important than the numbers. I don’t always lead off with like p values and confidence intervals in my presentations in the same way that I don’t lead off with a long list of usability problems, depending on the team I’m working with or the audience.

So in many cases what I’ll say is I’ll convey it either visually through error bars or through an asterisk that says there’s a statistical difference or, no, the differences are not statistically meaningful. But it doesn’t mean hit people over the head with it.

Also what I’ll find is someone in your audience or somebody wants to know at some point can we trust this guy’s decisions are not just his gut instinct? And sometimes it comes with just one question whether it be an appendix or you let them know what their p value is or that you ran a statistical test. They now think that maybe this is a little bit more than the black box methods, that maybe there’s a little more credibility to it.

So just because you don’t present them, that maybe you make the decision not to present any statistics, you just present the conclusion, knowing that you’ve done that or at least you yourself knowing that you’ve done it and having that data available provides that sort of quantifiable audit trail that I think elevates the practice for all of us.

Gerry:

Do you think that quantifying UX is something that is more relevant to large organisations?

Jeff:

I think certainly it starts there. I’ll often have a lot of clients who are that way, they’re like; look our managers are managing us with numbers. If they can’t measure, they can’t manage it. So they’ll find a way or they’ll manage us out. And so certainly it starts that way and that’s because of the sort of inherent constraints of a large organisation. You’ve got distributed teams and so they’re looking at the output of what you’re doing and they want to do know; alright we put so many dollars into your group, into improving this, where’s the evidence that we did?

And so you know sometimes it’s the bottom line, sometimes it’s the revenue, sometimes it’s the profits but those are lagging indicators. You can’t do anything about last quarter. They want to know do we have an indication over the next six months, six years things are going to be improved.

And so it certainly starts that way but for the smaller organisations it’s not immaterial, it’s not irrelevant and I think you can certainly get a jump on that.

Gerry:

The book is called Quantifying the User Experience; Practical Statistics for User Research and authors are my guest Jeff Sauro and Jim Lewis.

Jeff Sauro thanks so much for joining me today on the User Experience podcast.

Jeff:

Thank you Gerry. It was great being here.

Published: July 2012