Published: 5 August 2017
Does metadata matter – and why? Shari Thurow talks about making your content usable and accessible for both humans and machines and the importance of information architecture.
Gerry Gaffney:
This is Gerry Gaffney with the User Experience podcast. My guest today is director of
She has a specific interest in search engines and several years ago wrote Search Engine Visibility as well as co-authoring When Search Meets Web Usability. More recently she's been presenting and talking about making web content usable to both humans and AIs, or artificial intelligences.
Shari Thurow, welcome to the User Experience podcast.
Shari Thurow
Thank you. Nice to be here.
Gerry
Okay, let me get a nasty question out of the way first; I think it's fair to say a lot of people would see SEO as being, you know, occupying the same or a similar territory to spam. Can you tell us a little bit about that and perhaps defend SEO against that allegation?
Shari
Oh sure. I sometimes feel that SEOs are the used car salesman of the web and there are good used car salesmen and salespeople and there are, you know, people who are skunks. The basic way to tell the good SEOs from the "bad" quote/unquote SEOs is a person who does search engine optimisation for people who use search engines with the emphasis on the "people" is a person who's probably going to do a good job on your website or your mobile site or any kind of web document. Whereas a person who is only interested in optimising for search engines only is a person who's probably going to get your site into trouble or might be pushing the limits. So, think of it as optimising for people who use search engines versus people who optimise for search engines only. You want the first group, not the latter group.
Gerry
Okay, and when you say things that might get you into trouble; for listeners, some of whom might not be familiar with search engine optimisation, what sort of activities get people into trouble?
Shari
There are things called "keyword stuffing" where you put too many words in a document and one way to tell if you're putting too many keywords, let's say you're selling cars and you put "car, car, car, car, auto, auto, auto, auto, autos, auto this, auto that." If it sounds unnatural, it probably is unnatural and that would be keyword stuffing.
The other type of spam where you're hiding something; you're displaying something to human users that you're not displaying to search engines and vice versa, you're displaying things to search engines that you're not displaying to human users, with very few exceptions, and also taking advantage of link schemes where you need a lot of links for your websites or you're buying 1,000 of them or 10,000 of them. And these are the types of things that can get your site in trouble with web search engines.
Gerry
Now I remember when we were at UXPA and working on updating the Journal of Usability Studies when we migrated it, you provided a lot of really solid advice on metadata. What is metadata and what's good metadata?
Shari
Well, metadata is basically information about other data and there's three different types. There is descriptive, which is the one people are most familiar with, that would be the title, the abstract or the description, the authors and any key words. So let's say somebody wants to find a PDF Journal of Usability Studies for example on eye-tracking. You would put in the pdf under a field name key words under eye tracking. You'd spell it the hyphenated way; you'd spell it also as one word and also as two words. Structural metadata has to do with containers. So think about chapters in a book, episodes in a podcast, things like that.
So, structural metadata is important because it can also help you find things more accurately, like if you know that there is important information on episode blah blah blah of this podcast it could make you find the information more quickly.
And then the last type is administrative metadata. Now a lot of people aren't too interested in this. The one thing about administrative metadata that people should know is the file type because the file type can tell you a lot about a document. So if we know that the file extension of a document is jpeg or .jpg, it's probably a photo. If it's .pdf of course it's a PDF. And there are times when you cannot, a search engine cannot ascertain what a document's about and so they rely on the metadata to tell them what the document's about and also to reinforce what the search engine has already learnt.
Gerry
So the metadata, you mentioned PDFs, I guess a lot of people would possibly think of metadata as being more associated with HTML, but presumably pretty much any type of content can have or should have associated metadata?
Shari
Absolutely. In fact one of the, if you ever want to see metadata that would blow your mind, look at the metadata of movies. There's a tremendous amount of metadata in motion pictures.
Gerry
Yeah, I was wondering how people put together those amazing visualisations showing how many times a particular actor or a piece of music or a scene appeared across an entire genre.
Shari
Hooray for librarians is all I can say for that! [Laughter.]
Metadata has always been important and a lot of search engine optimisers don't think it's as important as it used to be and I disagree with that because I've seen metadata that's essential and without it you couldn't find the document. And I've also seen metadata that should support what artificial intelligence, human beings and search engines themselves have already discovered. So reinforcing information that's already there is a very important goal of metadata optimisation.
Gerry
I guess if you go back a number of years you had people complying with something that was called the Dublin Core; I don't know if that's still in existence or if it's still authoritative?
Shari
It is, and Dublin Core, if I were to defer to a type of metadata to stick with right now I'd still stick with that one. That seems to cover the vast majority of digital documents.
Gerry
That must be 20 years old.
Shari
If it's good it's good. I think that also because of the internet and the web that people think that if something were published, you know, two days ago it's outdated or if something were published five days ago or five years ago, it's outdated. That's not necessarily true. There are a lot of principles and that's what I try to focus on is principles; what are the principles that UX is all about? What are the principles of SEO? So gestalt principles, how old are they? Yet we, usability, still follow those principles.
Gerry
I guess pretty much any HTML document then should have metadata. From my point of view, I often think HTML document, it's in plain text and, you know robots are going to search it and they're going to figure out what's in there so why do I really need the metadata?
Shari
Because the search engine's going to find the content wherever it is in the document. It's not necessarily going to be at the top of the document, it could be in the middle of the document, it could be at the end of the document and Google might jumble it up so that the search listing doesn't really make sense to a human user. If you put an accurate metatag description in your HTML file and it is accurate, it neatly summarises in at least 155 characters, at least that's current count, Google is more likely to accept it, same with Bing, same with any major web search engine. So, and I think that's a good exercise for people.
In fact this has been a good exercise for people. I've been doing this for 22 years and 22 years later it's still a good exercise. It helps you say what you think is most important about the document and if you can't say it in a 155 characters, maybe you ought to re-think how you've written the document.
Gerry
So that's a maximum of 155 characters, is it?
Shari
That's all Google will display in search listings but you can go longer. It used to be 200 characters and that's usually where I cap it out is around 200 characters and that includes punctuation and spaces.
Gerry
So this implies that there's human intervention required for pretty much every piece of content that people would be putting online?
Shari
I would hope so.
Gerry
I'm sure there are… I know there are a lot of people who don't do that or who do it in a mechanistic fashion at best.
Shari
Well there was a way to do it and it's still somewhat true, is if you just take the first two sentences after your heading content, your primary heading and copy and paste it in there and you can also program it to take the first 255 characters and the first paragraph of the content. If you've written a really good introduction or a really good summary then you don't have to write something different.
Gerry
So that's a good discipline.
Shari
It is and it's always been a good discipline. Now we know this isn't always true of certain types of documents, like lyrics, poetry, you know, Shakespeare is not going to suddenly come alive and you know do a summary of each and everyone one of his sonnets. So that's where a metadata, a metatag description would be very, very helpful.
Gerry
There was talk several years ago that producing metadata had in fact become not quite counterproductive but a complete waste of time because in fact Google at the time, clearly the dominant search engine, still is now but there was nothing around at the time, that they had stopped looking at metadata.
Shari
Well they had looked at metadata for display purposes, display and search listings but it also depends on the type of document that you have. Say your digital document is a video, Google absolutely looks at the video's metadata and uses it for ranking purposes; image metadata is also used for ranking purposes. So it depends on the type of document.
The other thing is the metatag key words, that's the one people are always saying that search engines ignore but they're not, they're only talking about web search engines. They're not necessarily talking about another type of search engine that's extremely important and that's a site search engine, and I've worked on a number of site search engines that use the content and the meta keywords tag in order to determine rankings for a site search engine.
So I don't dismiss those and so if I know a site's eventually going to have a site search engine, I'm just going to get in the habit of doing unique meta tag keywords and, again, it's just getting into the habit of selecting seven to ten, seven to twelve, depending on the document of your most important keyword phrases, commonest spellings, eye tracking was a good example of it, sometimes it's spelt as one word, sometimes two words, sometimes it's hyphenated and put it in the meta key words tag. It would help web search engines as well as site search engines determine if they've got the right document.
Gerry
And I guess this has going to relate to the people having some sort of controlled vocabulary as well potentially?
Shari
Exactly. So, it really depends and I'd rather take a broad approach rather than the narrower approach of "I only care about being number one in Google all the time or as much as humanly possible." I would rather look at what are the principles that search engines follow that are likely not going to change and I stick to those building blocks.
Yes, they change a little bit, sometimes they change a lot, but for the most part a lot of the things I did 22 years ago I still do now.
Gerry
Now, Shari you talk about making web content useful for both humans and AIs. How does an AI read web data and why should we care?
Shari
Well, there's two things; there is artificial intelligence and machine learning and they are very similar, in fact machine learning is part of artificial intelligence but they're a little bit different. So artificial intelligence is, it's a branch of computer science, and you want the computer or other device to be capable of intelligent behaviour. So it would be similar to a computer being able to carry out a task that we would consider smart; so a computer reminding you that you have a doctor's appointment. You would love that if a computer could automatically detect that so that would be a type of artificial intelligence.
Machine learning is an application of artificial intelligence. It's a science of getting computers to act without being explicitly programmed and RankBrain and RankNet are machine learning and so here is an example; giving computers access to data and letting the computer figure it out itself.
Gerry
So you mentioned RankBrain and RankNet; can you tell us about them?
Shari
RankBrain is actually the third biggest ranking factor on Google right now, of course Google's claims that it is. And Google claims that no-one can optimise or do search engine optimisation for RankBrain because Google doesn't have any scores, and I put "scores" in quotation marks for it. I don't agree with that. Because just because Google doesn't have a score for their machine learning program doesn't mean that we as human beings can't do anything to support it. The main thing we can do to support artificial intelligence is have a clear and consistent labelling system and reinforce it and support it and maintain it on a regular basis; that way both humans as well as machines, and that includes Google and Bing, will understand what a document's about deliver more accurate results.
Gerry
But it's an interesting, um, I guess the conflict there, you know on the one hand we've got Google and Bing saying well we're going to ignore your efforts to improve your ranking and then, you know, you and people working in SEO are saying well there are things that you can do which will improve your ranking.
Shari
That's just the PR department talking. We learn to, at times when to ignore the public relations department and there are times not to ignore the public relations department. For those who are new to search engine optimisation they might not understand when something comes from the public relations department that should be valid. People like me, who have been in this industry for a long time and understand the fundamentals of search, sometimes just, you know, pshaw, just blow them off, it's just PR. And so, but a clear and consistent labelling system, what's wrong with that?
Gerry
So are you suggesting that Google is being disingenuous in some of its statements about how pages and content and documents are ranked?
Shari
I don't think they're being disingenuous, I think they have what's called a cat-and-mouse game, and this has been in existence long before Google came around is that as soon as you tell anybody that something is a ranking factor, the SEO, a lot of people in the SEO industry will go nuts and try to change every document because Google released this statement and, as you said before, there are people in the search engine optimisation industry that are the bad used car salesmen and they'll try to program that and try to do something to exploit the search engines and deliver less accurate results. So it's hard for Google, Bing, any search, web search engine, to release anything like that because there is always the chance it's going to be very terribly abused and it's still going on. So I don't think they're disingenuous, I just think they have to be very careful in how they word things.
Gerry
So there's an arms race going on.
Shari
Yes, absolutely.
Gerry
The ability to rank and to show results is enormously powerful. It gives Google, in particular, an incredible amount of power, doesn't it?
Shari
It does but, again, you also have to know when to listen to representatives at Google and when not to. And I tend, I don't care what any Google rep says, if it is something that conflicts with what usability professionals and user experience professionals have been stating for years, I'm not going to do it no matter how much Google says to do it. For example, I believe that Google will encourage people to submit something like an XML site map which is essentially a list of URLs and my attitude is that's giving Google free data. If Google wants my data, Google can pay me for my data but so many people are convinced that if they don't submit an XML site map, then their site's not going to rank or their document's not going to rank when in reality it's never been a requirement for ranking to do an XML document, it's just a way for Google to get free data. So, like I said, that would be an example of, okay, Google's saying this blah blah blah and then I move on and then I create a clear and consistent labelling system because that's what users want.
Gerry
Can you imagine a circumstance, I remember years ago when Google first appeared and sort of wiped out everyone essentially overnight, everyone else in the search engine space, can you envisage Google going the same way?
Shari
Oh absolutely.
Gerry
Are there any contenders out there at the moment or any technologies that you see threatening their hegemony?
Shari
DuckDuckGo, I know it's a strange sounding search engine but that's been gaining. I'm trying to keep an eye on different universities.
People who work at Google are human beings; human beings are far from perfect. There is plenty of room for another search engine. I think it's sorely needed in fact. I think Google having the majority is problematic because I see people designing for Google and writing for Google and not necessarily designing and writing for humans which is why I have a lot of faith in the UX industry, because people in the UX industry deal with people who have different mental models than they do all the time and I love that honesty. It's one of the reasons I went into UX.
Gerry
I guess when we think about metadata, documents themselves have got visible metadata in their information hierarchy.
Shari
There's nothing wrong with that. It helps make content easier to find, easier to use, easier to discover, locate, easier, makes the document more useful. There's nothing wrong with putting it in there and to penalise a site or to deem a site less valuable just because that metadata is in there is not something that I would endorse.
Gerry
Is that happening? Are sites being penalised for in-content, in-page metadata?
Shari
I have seen, there's a contradiction going on about certain metadata, not so much in text-based documents but image documents. Image document automatically, especially if it comes from a camera, has metadata already embedded in it. If you want to get rid of that metadata then by all means, and it's not helping the document being found, then by all means get rid of it. But using, quoting usability experts saying that a page has to load within two seconds or less and call that download time, that's ridiculous. That's response time. There are certain documents that by their very nature are going to be, that are going to download longer than two seconds and people are perfectly fine with that as long as the system responds quickly and tells people that this is going to happen. So if people want a high resolution picture of something, which you would expect on many e-commerce sites, you're going to get a longer download time but you could always tell people, you know, if they're on the e-commerce page or looking at a low-res, click to view the high resolution picture, then people are going to expect that that's going to take longer to download. So for Google to say that on their page speed tester that this is too slow, I ignore that. I usually say if you can get 80% or above on Google's or Bing's page speed tester you're good, don't worry about it, don't waste your money trying to improve your score.
Gerry
Presumably metadata would not account for any significant percentage of the actual file size of an image though.
Shari
You'd be surprised. It depends on the type of picture it is, you know, obviously if it's got millions of colours, you know every bit of metadata you can eliminate would be important. A lot of times with photos, you know, it automatically generates a name that doesn't make sense to anybody so that's where going through and re-naming your photos especially if you want them to be found online would be a wise idea and this includes not just on websites but if you're going to do any kind of social media sharing for certain types of images then I'd pay attention to it.
Gerry
Should we worry about metadata for things like Tweets?
Shari
I don't think so.
Gerry
What about images embedded in Tweets or Instagram content?
Shari
I would spend a little time. I think if you have the label and the context accurate, you're good. But if the label and the context are not communicated then you might want to pay a little more attention to metadata. Right now I haven't seen Twitter use image metadata but I will say that whenever I uploaded an image for any social media I do have it labelled very, very well and I try to make the context as clear as possible.
Gerry
I guess related to that, accessibility and metadata presumably would go hand in glove together?
Shari
Absolutely. Accessibility is one of my passions. I think people overlook accessibility when they shouldn't and they would be singing a whole different song and dancing a whole different dance if that person were in somebody who had accessibility issues shoes. So, and you know this, when you and I met, I had just had my knee replaced so I was just learning to walk again and I was very fortunate in that I can walk again. Well, there are people who have vision issues, hearing issues, speech issues that they're not going to change and imagine if you were in that person's shoes, and I think people overlook that group of people far too much and not just in the US but everywhere and so that's why I've decided to make it my passion.
Gerry
Do you see the two as being almost exactly the same task?
Shari
I do, because basically what people want is they want clarity, they want to know, in fact I have this phrase, I call it KICK instead of KISS because people, I think that whole concept of KISS – "Keep it simple, stupid" is misinterpreted. There are things that just can't be simple. Go ahead, make differential equations simple, making organic chemistry simple. I'm waiting for the day they make organic chemistry simple. It's not. But if it's clear, even though it's complex, that's great.
So I believe in 'KICK' – "Keep it clear, kids," instead of KISS. Clarity is more important than simplicity, sometimes they can be both but not always.Gerry
It's really interesting to observe people interacting by voice with the likes of Google Now or Alexa. I mean, they structure their queries in a completely different way to if they were typing; have you had a look at any of that stuff?
Shari
I have and it's really funny. For me, one of the reasons I love UX and usability, and I'm sure this is the same with you, is you realise how funny people are. And if you want to observe funny things in usability and UX, observe people using voice. I hear more than anything else shouted into, for example, mobile phone is "No! No! No! No!" … and the curse words of course. These devices are mostly programmed for a male voice, not a female voice and it is also programmed for a younger voice as opposed to an older voice.
So a lot of people that have a higher pitched voice than whatever a standard male voice is, and a lot of people who are older than twenty are going to have a difficult time with voice. So, I think that's good if you're a parent and you have children and you don't want them ordering things that they're not supposed to; that's fine. But for the most part, voice has ignored, voice search has not addressed accessibility issues that I really think they should.
The other thing is that what people say and what people type or text are two different things. So the overlap isn't there that you might see on a mobile phone and a desktop and a mobile phone and the Alexa Home or the Google Home. It's really funny to watch though because so many misinterpretations..
Gerry
Indeed. Now one thing that's been in the back of my mind for a while, and I think it's crystallised a little bit while you've been talking today is that it seems to me that IA, information architecture, has been somewhat demoted, in people's minds anyway, as UX has risen.
Shari
That's true, and in my mind it is more important than ever because with responsive design and adaptive design and the decreasing size of screens, information architecture has gone from, migrated from broad and shallow to narrow and deep and we both know that narrow and deep content is more difficult to locate and more difficult to discover. Therefore having a really good information architect is incredibly important for mobile and it's almost always overlooked and people think that the information architecture tests they can do are A/B tests and I sit there with my jaw, you know I have to pick up my jaw from the floor and say that's the wrong test.
There's two different types of card sort tests. There are also things called tree tests and this is something you do on an ongoing basis and, you know, at least 15 people per user group, over time that can get smaller but not in the beginning. And people completely skip this, they just want the mobile site and they want it to look like this when they're overlooking the most important thing of all is that skeleton and I use my knee as an analogy, you know, compare a website to a human body. What happens when you knock out the knee, which is part of the skeleton? You fall down. Well, it's the same with the website, you're knocking out the knee, you're knocking out the knee and you're knocking out the spine, you're knocking out the support system which is the information architecture. And part of the problem is that people don't go "Wow! The information architecture of this website is really cool." Information architecture is best when people are doing things and they're not saying "Wow! This is really cool," so this whole coolness factor is part of the thing that people are looking for when in fact if they're noticing something about your information architecture it's usually because there's something wrong. It's a shame that people are overlooking at it when they shouldn't.
Gerry
And it's a shame too because it's so cheap to test and so easy, you know, there's lots of online tools, Chalkmark and whatever the other ones, I mean there's just a stack of tools you can use for current sorting and tree testing.
Shari
Oh yeah, and in fact mine is down under by you, it's, I love Optimal Sort, I've used it for years and their online tutorials are fantastic and their data that they offer for people who may not be professional information architects that you can present to the higher ups that have to make those executive decisions is fantastic too. So I love that. But it is a shame that it's overlooked, but if you think about it there really isn't a formal definition of User Experience, and the UXPA's definition, I've used that for years. But I've also used Peter Morville's user experience honeycomb for many years and one of his UX facets is findability, and people can't use what they can't find and on mobile people use navigation more than they do on desktop. So not overlooking information architecture from mobile is insane and those navigation labels better be spot on because you have so much less screen real estate to deal with. So yeah it's a shame but it is not something that I overlook. It's something that I emphasise and I continue to write about and speak about because it's critical.
Gerry
Now for people who are listeners who are already maybe doing UX but they're interested in getting into this whole area of metadata, search engine optimisation, understanding how AIs work with their data, understanding a little bit about machine learning perhaps… That's rather a big brief, where should they start?
Shari
There's a good source, and I apologise, most of my list is US-based and English language-based but I will send you a list of resources that are from other countries. All you have to do is Google RankBrain, that's one word, FAQs and the term "search engine land." It's called FAQ all about Google RankBrain algorithm. It's really well written and it explains artificial intelligence, machine and learning and what machine learning algorithms at Google take from your documents. I would recommend looking at UXPA's body of knowledge, Usability BoK. I would also recommend looking at the user experience honeycomb and make sure you understand all the facets of user experience. Findability is not the only thing people overlook. They also overlook usability. So you were talking about card sort tests, well every user experience person should know when and how to do a card sort test or when or how to use an eye tracking test or a performance test.
There's a time to use those tests and there are times when you use different methodologies such as surveys, web analytics data and other things. You need both the qualitative and the quantitative data. I recommend going to conferences. The search conferences I like are Search Marketing Expo and if you're a total, total techy geek type of person there's TREC which is fantastic. It's been going on for many years. There's the Information Architecture Resource library and, to be honest, I have taken courses, even me, I will go and take courses over again to make sure I understand the basics and I have actually gotten training in Human Factors International; it's humanfactors.com and it's in all different countries and I've taken many courses from Dr Jakob Nielsen.
So I highly encourage people to start there. I also encourage people to get a mentor if there is an area of UX you don't understand and want to learn more about; you know I tell people, join UXPA, join Information Architecture Institute, join SEMPO, there's different organisations and you can find a mentor. In fact my mentors, most of them are not from the US and I have been able to help them as much as they have been able to help me.
Gerry
Shari Thurow, thanks so much for joining me today on the User Experience podcast.
Shari
Thank you and I hope you all have a good day.
[You can view an extensive list of resources on Shari's website.]