<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Top search results from Drupal in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Top-search-results-from-Drupal/m-p/66292#M16457</link>
    <description>&lt;P&gt;Take a look at the Web Intelligence app, these use cases and a lot more are built in, and the app is free and supported: &lt;A href="http://splunk-base.splunk.com/apps/28994/splunk-app-for-web-intelligence"&gt;http://splunk-base.splunk.com/apps/28994/splunk-app-for-web-intelligence&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 07 Feb 2012 04:32:36 GMT</pubDate>
    <dc:creator>araitz</dc:creator>
    <dc:date>2012-02-07T04:32:36Z</dc:date>
    <item>
      <title>Top search results from Drupal</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Top-search-results-from-Drupal/m-p/66290#M16455</link>
      <description>&lt;P&gt;Okay, I've done this once in Plone, but we've moved to Drupal, and things don't look the same. &lt;/P&gt;

&lt;P&gt;Basically, I want to grab the top search terms from a given timeframe. Drupal search urls look like: &lt;/P&gt;

&lt;P&gt;&lt;A href="http://site.example.com/search/site/"&gt;http://site.example.com/search/site/&lt;/A&gt;&lt;SEARCHTERM&gt; where &lt;SEARCHTERM&gt; is something like "splunk" or "foobar" or, whatever. &lt;/SEARCHTERM&gt;&lt;/SEARCHTERM&gt;&lt;/P&gt;

&lt;P&gt;A log entry looks something like (in the case I searched for "splunk". Server is apache): &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;111.222.333.444 - - [06/Feb/2012:14:38:07 -0800] "GET /search/site/splunk HTTP/1.1" 200 9289 "http://site.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Previously, in plone, I was using something like: &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;host="hostname" file="search" SearchableText="*" | eval SearchableText=lower(SearchableText) | top limit=10 SearchableText
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;But there's no query variable being set like that. &lt;/P&gt;

&lt;P&gt;Thoughts? Help? &lt;/P&gt;</description>
      <pubDate>Mon, 06 Feb 2012 22:48:38 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Top-search-results-from-Drupal/m-p/66290#M16455</guid>
      <dc:creator>staze</dc:creator>
      <dc:date>2012-02-06T22:48:38Z</dc:date>
    </item>
    <item>
      <title>Re: Top search results from Drupal</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Top-search-results-from-Drupal/m-p/66291#M16456</link>
      <description>&lt;P&gt;What is the sourcetype for your Drupal data?  It looks like a standard access log.  What if you run the following search?&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; host="hostname" file="search" | kv access-extractions | eval SearchableText=lower(uri) | top limit=10 SearchableText
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;&lt;STRONG&gt;UPDATE:&lt;/STRONG&gt; the final answer from comments below:&lt;/P&gt;

&lt;P&gt;The best thing to do would be to make the 'rex' field extraction a permanent one using props.conf (&lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf):"&gt;http://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf):&lt;/A&gt;&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[source::.../access_log*]
EXTRACT-access = "/(?&amp;lt;last_part&amp;gt;[^/]+)$" in uri_path
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Then you can do:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;source="/var/log/apache2/access_log" uri_path="/search/site/*" NOT last_part=*comment*  NOT last_part="favicon.ico" | top limit=10 last_part
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 07 Feb 2012 02:30:05 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Top-search-results-from-Drupal/m-p/66291#M16456</guid>
      <dc:creator>araitz</dc:creator>
      <dc:date>2012-02-07T02:30:05Z</dc:date>
    </item>
    <item>
      <title>Re: Top search results from Drupal</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Top-search-results-from-Drupal/m-p/66292#M16457</link>
      <description>&lt;P&gt;Take a look at the Web Intelligence app, these use cases and a lot more are built in, and the app is free and supported: &lt;A href="http://splunk-base.splunk.com/apps/28994/splunk-app-for-web-intelligence"&gt;http://splunk-base.splunk.com/apps/28994/splunk-app-for-web-intelligence&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 07 Feb 2012 04:32:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Top-search-results-from-Drupal/m-p/66292#M16457</guid>
      <dc:creator>araitz</dc:creator>
      <dc:date>2012-02-07T04:32:36Z</dc:date>
    </item>
    <item>
      <title>Re: Top search results from Drupal</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Top-search-results-from-Drupal/m-p/66293#M16458</link>
      <description>&lt;P&gt;I think I'm close. The above didn't work quite right, but this seems close... &lt;/P&gt;

&lt;P&gt;source="/var/log/apache2/access_log" uri_path="/search/site/*" | kv access-extractions | eval SearchableText=lower(uri) | top limit=10 SearchableText&lt;/P&gt;

&lt;P&gt;Problem is, I'm getting results like:&lt;/P&gt;

&lt;P&gt;"/search/site/scholarship". Is there a way to just remove the "/search/site/" part of that result, so I just get the actual search term? &lt;/P&gt;

&lt;P&gt;Also, how does one remove certain results? Like, getting a favicon.ico in the results because it happens to get loaded from a location with "/search/site" in the url for some reason... &lt;/P&gt;

&lt;P&gt;Thoughts? &lt;/P&gt;

&lt;P&gt;And thanks. I've got backfilling going with the webintelligence app... will have to see how that works. &lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 10:23:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Top-search-results-from-Drupal/m-p/66293#M16458</guid>
      <dc:creator>staze</dc:creator>
      <dc:date>2020-09-28T10:23:53Z</dc:date>
    </item>
    <item>
      <title>Re: Top search results from Drupal</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Top-search-results-from-Drupal/m-p/66294#M16459</link>
      <description>&lt;P&gt;So if uri_path is already an extracted field, you don't need the '| kv access-extractions'.  You can try this to get query strings:&lt;/P&gt;

&lt;P&gt;source="/var/log/apache2/access_log" uri_path="/search/site/*" uri_query=* | top limit=10 uri_query&lt;/P&gt;

&lt;P&gt;To get the last part before the query string:&lt;/P&gt;

&lt;P&gt;sourcetype="access_combined_wcookie" | rex field=uri_path "\/(?&lt;LAST_PART&gt;[^\/]+)$" | top limit=10 last_part&lt;/LAST_PART&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 10:23:56 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Top-search-results-from-Drupal/m-p/66294#M16459</guid>
      <dc:creator>araitz</dc:creator>
      <dc:date>2020-09-28T10:23:56Z</dc:date>
    </item>
    <item>
      <title>Re: Top search results from Drupal</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Top-search-results-from-Drupal/m-p/66295#M16460</link>
      <description>&lt;P&gt;Okay, last one seems to work (with the rex field). I'm very close, the only issue is, I want to ignore any results that contain the word "comment". &lt;/P&gt;

&lt;P&gt;Here's what I have: &lt;BR /&gt;
source="/var/log/apache2/access_log" uri_path="/search/site/*" | rex field=uri_path "/(?&lt;LAST_PART&gt;[^/]+)$" | eval last_part=lower(last_part) | eval last_part = mvfilter(last_part != "favicon.ico" ) | top limit=10 last_part&lt;/LAST_PART&gt;&lt;/P&gt;

&lt;P&gt;The mvfilter is obviously removing "favicon" from the results. And I needed to run the results through "lower" to remove the case duplicates. &lt;/P&gt;

&lt;P&gt;Almost....There....&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 11:21:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Top-search-results-from-Drupal/m-p/66295#M16460</guid>
      <dc:creator>staze</dc:creator>
      <dc:date>2020-09-28T11:21:53Z</dc:date>
    </item>
    <item>
      <title>Re: Top search results from Drupal</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Top-search-results-from-Drupal/m-p/66296#M16461</link>
      <description>&lt;P&gt;The best thing to do would be to make the 'rex' field extraction a permanent one using props.conf (&lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf):"&gt;http://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf):&lt;/A&gt;&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[source::.../access_log*]
EXTRACT-access = "/(?&amp;lt;last_part&amp;gt;[^/]+)$" in uri_path
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Then you can do:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;source="/var/log/apache2/access_log" uri_path="/search/site/*" NOT last_part=*comment*  NOT last_part="favicon.ico" | top limit=10 last_part
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 08 Feb 2012 19:56:52 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Top-search-results-from-Drupal/m-p/66296#M16461</guid>
      <dc:creator>araitz</dc:creator>
      <dc:date>2012-02-08T19:56:52Z</dc:date>
    </item>
    <item>
      <title>Re: Top search results from Drupal</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Top-search-results-from-Drupal/m-p/66297#M16462</link>
      <description>&lt;P&gt;Otherwise, in-line, it will be far less efficient.  As a rule of thumb, as much filtering as possible should be done to the left of the first pipe:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;source="/var/log/apache2/access_log" uri_path="/search/site/*" | rex field=uri_path "/(?&amp;lt;last_part&amp;gt;[^/]+)$" | eval last_part=lower(last_part) | search NOT last_part=*comment* | eval last_part = mvfilter(last_part != "favicon.ico" ) | top limit=10 last_part
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 08 Feb 2012 19:57:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Top-search-results-from-Drupal/m-p/66297#M16462</guid>
      <dc:creator>araitz</dc:creator>
      <dc:date>2012-02-08T19:57:57Z</dc:date>
    </item>
    <item>
      <title>Re: Top search results from Drupal</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Top-search-results-from-Drupal/m-p/66298#M16463</link>
      <description>&lt;P&gt;Cool, that worked! Thanks! I've added the stuff to props.conf, but I have to wait for the webintelligence backfill to finish before restarting splunk. &lt;/P&gt;

&lt;P&gt;Thanks again! &lt;/P&gt;</description>
      <pubDate>Wed, 08 Feb 2012 20:56:33 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Top-search-results-from-Drupal/m-p/66298#M16463</guid>
      <dc:creator>staze</dc:creator>
      <dc:date>2012-02-08T20:56:33Z</dc:date>
    </item>
  </channel>
</rss>

