Solved: Top search results from Drupal

staze · ‎02-06-2012

Okay, I've done this once in Plone, but we've moved to Drupal, and things don't look the same.

Basically, I want to grab the top search terms from a given timeframe. Drupal search urls look like:

http://site.example.com/search/site/ where is something like "splunk" or "foobar" or, whatever.

A log entry looks something like (in the case I searched for "splunk". Server is apache):

111.222.333.444 - - [06/Feb/2012:14:38:07 -0800] "GET /search/site/splunk HTTP/1.1" 200 9289 "http://site.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7"

Previously, in plone, I was using something like:

host="hostname" file="search" SearchableText="*" | eval SearchableText=lower(SearchableText) | top limit=10 SearchableText

But there's no query variable being set like that.

Thoughts? Help?

araitz · ‎02-06-2012

What is the sourcetype for your Drupal data? It looks like a standard access log. What if you run the following search?

 host="hostname" file="search" | kv access-extractions | eval SearchableText=lower(uri) | top limit=10 SearchableText

UPDATE: the final answer from comments below:

The best thing to do would be to make the 'rex' field extraction a permanent one using props.conf (http://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf):

[source::.../access_log*]
EXTRACT-access = "/(?<last_part>[^/]+)$" in uri_path

Then you can do:

source="/var/log/apache2/access_log" uri_path="/search/site/*" NOT last_part=*comment*  NOT last_part="favicon.ico" | top limit=10 last_part

View solution in original post

araitz · ‎02-06-2012

Take a look at the Web Intelligence app, these use cases and a lot more are built in, and the app is free and supported: http://splunk-base.splunk.com/apps/28994/splunk-app-for-web-intelligence

araitz · ‎02-06-2012

What is the sourcetype for your Drupal data? It looks like a standard access log. What if you run the following search?

 host="hostname" file="search" | kv access-extractions | eval SearchableText=lower(uri) | top limit=10 SearchableText

UPDATE: the final answer from comments below:

The best thing to do would be to make the 'rex' field extraction a permanent one using props.conf (http://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf):

[source::.../access_log*]
EXTRACT-access = "/(?<last_part>[^/]+)$" in uri_path

Then you can do:

source="/var/log/apache2/access_log" uri_path="/search/site/*" NOT last_part=*comment*  NOT last_part="favicon.ico" | top limit=10 last_part

staze · ‎02-08-2012

Cool, that worked! Thanks! I've added the stuff to props.conf, but I have to wait for the webintelligence backfill to finish before restarting splunk.

Thanks again!

araitz · ‎02-08-2012

Otherwise, in-line, it will be far less efficient. As a rule of thumb, as much filtering as possible should be done to the left of the first pipe:

source="/var/log/apache2/access_log" uri_path="/search/site/*" | rex field=uri_path "/(?<last_part>[^/]+)$" | eval last_part=lower(last_part) | search NOT last_part=*comment* | eval last_part = mvfilter(last_part != "favicon.ico" ) | top limit=10 last_part

araitz · ‎02-08-2012

The best thing to do would be to make the 'rex' field extraction a permanent one using props.conf (http://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf):

[source::.../access_log*]
EXTRACT-access = "/(?<last_part>[^/]+)$" in uri_path

Then you can do:

source="/var/log/apache2/access_log" uri_path="/search/site/*" NOT last_part=*comment*  NOT last_part="favicon.ico" | top limit=10 last_part

staze · ‎02-08-2012

Okay, last one seems to work (with the rex field). I'm very close, the only issue is, I want to ignore any results that contain the word "comment".

Here's what I have:
source="/var/log/apache2/access_log" uri_path="/search/site/*" | rex field=uri_path "/(?[^/]+)$" | eval last_part=lower(last_part) | eval last_part = mvfilter(last_part != "favicon.ico" ) | top limit=10 last_part

The mvfilter is obviously removing "favicon" from the results. And I needed to run the results through "lower" to remove the case duplicates.

Almost....There....

araitz · ‎02-07-2012

So if uri_path is already an extracted field, you don't need the '| kv access-extractions'. You can try this to get query strings:

source="/var/log/apache2/access_log" uri_path="/search/site/*" uri_query=* | top limit=10 uri_query

To get the last part before the query string:

sourcetype="access_combined_wcookie" | rex field=uri_path "\/(?[^\/]+)$" | top limit=10 last_part

staze · ‎02-07-2012

I think I'm close. The above didn't work quite right, but this seems close...

source="/var/log/apache2/access_log" uri_path="/search/site/*" | kv access-extractions | eval SearchableText=lower(uri) | top limit=10 SearchableText

Problem is, I'm getting results like:

"/search/site/scholarship". Is there a way to just remove the "/search/site/" part of that result, so I just get the actual search term?

Also, how does one remove certain results? Like, getting a favicon.ico in the results because it happens to get loaded from a location with "/search/site" in the url for some reason...

Thoughts?

And thanks. I've got backfilling going with the webintelligence app... will have to see how that works.

Top search results from Drupal

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Build the Future of Agentic AI: Join the Splunk Agentic Ops Hackathon

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Splunk Community Badges!

Join the Conversation

Top search results from Drupal

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Build the Future of Agentic AI: Join the Splunk Agentic Ops Hackathon

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Splunk Community Badges!