Search Analytics for Your Site. Louis Rosenfeld
Чтение книги онлайн.
Читать онлайн книгу Search Analytics for Your Site - Louis Rosenfeld страница 8
George Kingsley Zipf, Harvard Linguist and Hockey Star
Of course, we’ve just been looking at a tiny slice of a search log. And as interesting as it is, the true power of SSA comes from collectively analyzing the thousands or millions of such interactions that take place on your site during a given period of time. That’s when the patterns emerge, when trends take shape, and when there’s enough activity to merit measuring—and drawing interesting conclusions.
Nowhere is the value of statistical analysis more apparent than when viewing the Zipf Distribution, named for Harvard linguist George Kingsley Zipf, who, as you’d expect from a linguist, liked to count words.[4] He found that a few terms were used quite often, while many were hardly used at all. We find the same thing when tallying up queries from most to least frequent, as in Figure 2-4.
The Zipf distribution—which emerges when tallying just about any site’s search data—shows that the few most common queries account for a surprisingly large portion of all search activity during any given period. (Remember in Chapter 1, how John Ferrara focused exclusively on those common queries.) You can see how tall and narrow what we’ll call the “short head” is, and how quickly it drops down to the “long tail” of esoteric queries (technically, described as “twosies” and “onesies”). In fact, we’re only showing the first 500 or so queries here; in reality, this site’s long tail would extend into the tens of thousands, many meters to the right of where you sit.
http://www.flickr.com/photos/rosenfeldmedia/5690405271/
Figure 2-4. The hockey-stick-shaped Zipf Distribution shows that a few queries are very popular, while most are not. This example is from Michigan State University, but this distribution is true of just about every Web site and intranet.
It’s equally enlightening to examine the same phenomenon when presented textually, as shown in Table 2-1
The most common query, campus map, accounts for 1.4% of all the search activity during this time period. That number, 1.4%, doesn’t sound like much, but those top queries add up very quickly—the top 14 most common queries account for 10% of all search activity. (Note to MSU.edu webmaster: better make sure that relevant results come up when users search campus map!)
Table 2-1.
http://www.flickr.com/photos/rosenfeldmedia/5825543717/The ZIPF Distribution Shown Textually | |||
---|---|---|---|
Rank | Cumulative % | Count | Query Terms |
1 | 1.40% | 7,218 | campus map |
14 | 10.53% | 2,464 | housing |
42 | 20.18% | 1,35I | webenroll |
98 | 30,01% | 650 | computer center |
221 | 40.05% | 295 | msu union |
500 | 50.02% | 124 | hotels |
7,877 | 80.00% | 7 | department of surgery |
Note how few queries are required to account for 10% of all search activity. (This data is also from Michigan State University.) |
That’s incredible—it means that if you invested the small amount of effort needed to ensure that the top 14 queries performed well, you’d improve the search experience for 10% of all users. And if, say, half of your site’s users were search dominant,[5] then you’ve just improved the overall user experience by 5% (10% × 50%). Numbers like this can and should be challenged, and 5% may not sound like much. But 5% here, 3% there... these quickly add up.
It bears noting that we just started with a simple report—presented both visually and as a table—and quickly drew some useful conclusions based on the data presented. That there, folks, is analysis. And that’s why reports are only means, not goals.
And equally important, this analysis scales beautifully. Have the time and resources to go beyond the top 14 queries? No problem—tuning the top 42 queries will get you to the 20% mark. About a 100 gets you to 30%, and so on.
[4] You may not have heard of Zipf, but you’ve probably heard of the 80/20 Rule, the Pareto Principle, or Power Laws. All relate to the hockey-stick curve’s dramatic dropoff from “short head” to long tail.
[5] Usability expert Jakob Nielsen suggests that this is the case; see www.useit.com/alertbox/9707b.html
Ways to Use SSA (and This Book)
So what’s the message here? That SSA is an incredibly important tool for helping you understand what users want from your site. And once you have a sense of what they want, you can evaluate and improve all sorts of things that are there to help users get what they want. For instance, you can improve your site as follows:
Search system: SSA will help you understand how people entered searches, where they were when they entered them, and how they interpreted the search results. (We cover this in Chapter 8.)
Navigation and metadata: Do certain pages generate a lot more