I'd like to generate a report of N top search queries from my apache weblogs.
Log entry for a search looks like:
123.456.789.000 - - [22/Sep/2010:13:58:18 -0700] "GET /search?SearchableText=Gateway HTTP/1.1" 200 5857 "http://www.example.com/" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:126.96.36.199) Gecko/20100722 Firefox/3.6.8"
How would I go about doing this? I mean, I can do something like:
host="www" file="search" SearchableText="*" which returns the search terms in date/time order. But it would be nice to show them in frequency, etc, and return N number (where I can set N to 100, 1000, 10000, etc).
Frequency would probably also have to account for case. So, probably lc all the results, then tally them up.
I believe I got it. I'm not sure why it wasn't rendering, but I managed to get it working...
host="www" file="search" SearchableText="*" | top limit=100 SearchableText
So yeah, this works.
A report of top hosts for an error log might be:
index=stuff sourcetype=error_log | top host
If you manually manipulate stats:
index=stuff sourcetype=errors | stats count by host | sort -count | head 10
For your case, you need to extract a field called SearchableText. Once you extract it (via rex or interactive field extraction) you can report by it:
host=www source=<your log file> | rex "SearchableText=(?<SearchableText>.*[^ ]) HTTP" | top SearchableText
Make sure you limit your time range to test this out.
So, I've just tried:
host="www" file="search" SearchableText="*" | top searchabletext
but it never seems to render the results (the results section just says "Waiting for Search Preview Results").