All perform some form of Web Filtering / Blocking, which I'm now being asked to produce a report on, Top 50 blocked categories.
The SPL looks something like
Index IN (Palo, Barra, MCWG) vendor_action="Blocked-URL" earliest=-8d@d latest=-1d@d | top limit=50 category | stats count by category.
The problem is - I need to filter out links to a site (for instance type Betfred into google and I get two blocks although the human never actually went to Betfred. I've also got the dilemma of multiple images being called from a web page each being blocked.
So - how do you interpret weblogs to only be unique calls by a human being to a website, rather than google lookups or multiple returns whilst visiting another site.
I've tried using dedup against user and URL, but that removes repeat attempts throughout the week along with all the image download requests, it's not very accurate or scientific.
There has to be a way to work out that the web request is a link click or a URL entry rather than a page lookup, but I'm at a loss.