Search Analytics for Your Site. Louis Rosenfeld
Чтение книги онлайн.
Читать онлайн книгу Search Analytics for Your Site - Louis Rosenfeld страница 10
![Search Analytics for Your Site - Louis Rosenfeld Search Analytics for Your Site - Louis Rosenfeld](/cover_pre639941.jpg)
Finally, if you’re one of those wearers of many hats, don’t fret: as mentioned earlier, SSA scales wonderfully. Even if you spend 15 minutes per month looking over the simplest reports—the most frequent queries list and the null results query list—you’ll get something useful out of your analysis. This month’s 15 minutes of tuning can gently grow to 30 minutes next month, and so on. The work is the same—it will fill whatever time you can make or justify for it.
[6] http://tech.groups.yahoo.com/group/webanalytics/
Your Secret Weapon
Thank your lucky stars: SSA remains safely under the radar. No one owns it, and the people in most organizations who are closest to it—the IT folks who manage the search engine—aren’t likely to worry much about things like user intent. So if you can crack open the data, you (and your organization) will own the keys to a very powerful secret weapon. Read ahead.
Anatomy of a Search Log Entry
Avi Rappoport, Search Tools Consulting— http://searchtools.com/
Though most of us are now using analytics applications that provide some SSA reporting functionality, you may be in a situation where you’ll have to create your own reports—either because the analytics application doesn’t support your specific needs—or because you don’t have access to an analytics application. In both cases, you’ll need to process the data yourself.
Working with search engine transaction logs, you’ll find the search query, any search parameters (such as language or date), and the number of matches retrieved by the search engine. Most also contain the date and time, and some kind of searcher identifier. Understanding the format makes it easier to understand search analytics reports, recognize what they can and can’t tell you, and perform special processing for unusual questions.
Many search engines conform to the NCSA extended Web server log format,[7] so that’s what we’ll cover here. These text files have a standard field order, with spaces between them. To indicate a field with internal spaces, it needs double quotes or square brackets at the start and end.
However, there’s no place in the NCSA extended format for the hit count (the number of items matched in the search), so search engines tend to slide it in the middle or hang it off the end. If your search log format is not documented, you may need to do some sleuthing: you can figure this out by entering several unique searches that you know will generate no matches, and then look in the search log for those terms.
BASIC FIELDS
A simple query entry in this log format looks like this:
XX.XX.XX.14 - - [10/Jul/2010:10:24:13 -0800] "GET /search?q=noise HTTP/1.1" 200 9429 111
We can break that down into fields for better analysis, as shown in Table 2-2.
Table 2-2.
Fields By Position | ||||||||
---|---|---|---|---|---|---|---|---|
#1 | #2 | #3 | #4 | #5 | #6 | #7 | #8 | |
meaning | ip | - | - | date/timestamp | search request | response code | bytes | hits |
example | xx.xx.xx.14 | - | - | [10/Jul/2010:10:24:13-0800] | “GET/search?q= noise HTTP/I.I” | 200 | 9429 | III |
Table 2-3 provides even more detail on each field.
Table 2-3.
Details About Fields | |||
---|---|---|---|
Position | Field | Example | Meaning |
#1 | IP or host name | XX.XX.XX.14 | ID of the computer sending the search. |
#2 | auth. user | - | usually empty, RFC931 authentication |
#3 | user name | - | usually empty |
#4a | date | [10/Jul/2010 | date of the query in standard form |
#4b | time | :10:24:i3 | time of the query in standard form |
#4C | offset | -0800] | offset time from GMT[a] |
#5a | request | “GET | HTTP results (form action) |
#5b | URL | /search.html | search results page URL |
#5c | parameters | ?query=noise | search terms and other options |
#5d | version | HTTP/1.1” | version (always the same) |
#6 | response code | 200 | server response code (if it’s not 200, you are in trouble) |
#7 | bytes | 9249 |
bytes returned (the size
|