<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: categorization based on frequent text in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/categorization-based-on-frequent-text/m-p/75088#M181292</link>
    <description>&lt;P&gt;I was referring to the index file that would get generated when I run Splunk on the file containing the incident description. &lt;/P&gt;

&lt;P&gt;Based on the example provided, I am assuming the index file would have the following content :&lt;BR /&gt;
5   aaaa&lt;BR /&gt;
2   backup&lt;BR /&gt;
4   bbbbb&lt;BR /&gt;
3   channel&lt;BR /&gt;
4   cluster&lt;BR /&gt;
2   disk&lt;BR /&gt;
3   fibre&lt;BR /&gt;
3   luns&lt;BR /&gt;
3   multipath&lt;BR /&gt;
where the numbers specify the number of times the string appears in the content.&lt;/P&gt;

&lt;P&gt;Was wondering if I could read this index file to obtain the strings and count, provided my assumption about the index file contents are correct.&lt;/P&gt;

&lt;P&gt;Thanks !&lt;/P&gt;</description>
    <pubDate>Wed, 20 Jun 2012 04:46:36 GMT</pubDate>
    <dc:creator>subinj</dc:creator>
    <dc:date>2012-06-20T04:46:36Z</dc:date>
    <item>
      <title>categorization based on frequent text</title>
      <link>https://community.splunk.com/t5/Splunk-Search/categorization-based-on-frequent-text/m-p/75080#M181284</link>
      <description>&lt;P&gt;Hi. I have an excel dump of incident tickets generated from the ticketing tool. &lt;BR /&gt;
Sample incidents' description from the report:&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;"Target: CI-xxxx Stateless event
alarm Event details:  HA recovered
from a total cluster failure in
cluster" &lt;/LI&gt;
&lt;LI&gt;"Server - CI-aaaa generates
Multipath Issue Fibre Channel
information: Multipathing ERROR, not
all luns have 4 paths"&lt;/LI&gt;
&lt;LI&gt;"Servers generate CI-aaaa &amp;amp; CI-bbbbb  - Multipath issue Fibre Channel information: Multipathing ERROR, not all luns have 4 paths"&lt;/LI&gt;
&lt;LI&gt;"Servers generate CI-aaaa &amp;amp; CI-bbbbb  - Multipath issue Fibre Channel information: Multipathing ERROR, not all luns have 4 paths"&lt;/LI&gt;
&lt;LI&gt;"[VMware vCenter - Alarm Cluster high availability error] Insufficient resources to satisfy HA failover level on cluster"&lt;/LI&gt;
&lt;LI&gt;"F drive is having less disk space nagios-ebs: CI-xxxx "&lt;/LI&gt;
&lt;LI&gt;"Low disk space alert on CI-yyyyy"&lt;/LI&gt;
&lt;LI&gt;"Failed backup report for 2nd April 2012 : CI-xxxx , CI-aaaa , CI-bbbbb"&lt;/LI&gt;
&lt;LI&gt;"Failed backup report for 3rd April 2012 : CI-xxxx , CI-aaaa , CI-bbbbb"&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;There is no exclusive "category" field. My end objective is to perform a Trend Analysis to identify top recurring issues.&lt;BR /&gt;
I could perform a grouping by going through the description fields one by one and identifying the incident type.&lt;/P&gt;

&lt;P&gt;Desired output would be :&lt;/P&gt;

&lt;P&gt;category ---- count of occurrence&lt;/P&gt;

&lt;P&gt;HA ---- 2&lt;/P&gt;

&lt;P&gt;Multipath ---- 3&lt;/P&gt;

&lt;P&gt;disk space ---- 2&lt;/P&gt;

&lt;P&gt;failed backup ---- 2&lt;/P&gt;

&lt;P&gt;The manual grouping would not be feasible though for a list of 300+ incidents. &lt;/P&gt;

&lt;P&gt;I was wondering if Splunk could identify the common significant text from the description fields and return a similar grouping, without the need to key in search strings ?&lt;/P&gt;</description>
      <pubDate>Thu, 14 Jun 2012 11:16:04 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/categorization-based-on-frequent-text/m-p/75080#M181284</guid>
      <dc:creator>subinj</dc:creator>
      <dc:date>2012-06-14T11:16:04Z</dc:date>
    </item>
    <item>
      <title>Re: categorization based on frequent text</title>
      <link>https://community.splunk.com/t5/Splunk-Search/categorization-based-on-frequent-text/m-p/75081#M181285</link>
      <description>&lt;P&gt;Can you provide the data.  It's still, to me, a little unclear what you're trying to accomplish.&lt;/P&gt;</description>
      <pubDate>Thu, 14 Jun 2012 12:40:07 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/categorization-based-on-frequent-text/m-p/75081#M181285</guid>
      <dc:creator>Lamar</dc:creator>
      <dc:date>2012-06-14T12:40:07Z</dc:date>
    </item>
    <item>
      <title>Re: categorization based on frequent text</title>
      <link>https://community.splunk.com/t5/Splunk-Search/categorization-based-on-frequent-text/m-p/75082#M181286</link>
      <description>&lt;P&gt;Thanks for your time Lamar !&lt;BR /&gt;
I have edited my original post to include samples of my requirement. Trust this brings in more clarity.&lt;/P&gt;</description>
      <pubDate>Fri, 15 Jun 2012 06:17:02 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/categorization-based-on-frequent-text/m-p/75082#M181286</guid>
      <dc:creator>subinj</dc:creator>
      <dc:date>2012-06-15T06:17:02Z</dc:date>
    </item>
    <item>
      <title>Re: categorization based on frequent text</title>
      <link>https://community.splunk.com/t5/Splunk-Search/categorization-based-on-frequent-text/m-p/75083#M181287</link>
      <description>&lt;P&gt;You mean if Splunk can somehow automatically identify a category for each of these messages and return it? In that case the answer is no. Splunk doesn't know anything about what these logs actually mean, it just indexes it just like any other data. Any other intelligence will have to be provided by you (or if someone else already provided the intelligence through an app or similar).&lt;/P&gt;

&lt;P&gt;If you mean that Splunk could match on individual strings in each message and create fields from that, certainly. You could match on the string "disk space" and put that into a field, same goes for any other string you're interested in.&lt;/P&gt;</description>
      <pubDate>Fri, 15 Jun 2012 06:39:29 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/categorization-based-on-frequent-text/m-p/75083#M181287</guid>
      <dc:creator>Ayn</dc:creator>
      <dc:date>2012-06-15T06:39:29Z</dc:date>
    </item>
    <item>
      <title>Re: categorization based on frequent text</title>
      <link>https://community.splunk.com/t5/Splunk-Search/categorization-based-on-frequent-text/m-p/75084#M181288</link>
      <description>&lt;P&gt;The problem that you'll have with this data is the fact that it isn't relatively common in format.&lt;/P&gt;

&lt;P&gt;You have some events that have their description after a ":" and then some descriptions actually start at the beginning of the event/line.&lt;/P&gt;

&lt;P&gt;You could create a hash of your event and key off that with a lookup or something similar to that.&lt;/P&gt;</description>
      <pubDate>Fri, 15 Jun 2012 13:17:07 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/categorization-based-on-frequent-text/m-p/75084#M181288</guid>
      <dc:creator>Lamar</dc:creator>
      <dc:date>2012-06-15T13:17:07Z</dc:date>
    </item>
    <item>
      <title>Re: categorization based on frequent text</title>
      <link>https://community.splunk.com/t5/Splunk-Search/categorization-based-on-frequent-text/m-p/75085#M181289</link>
      <description>&lt;P&gt;Right about the format - it doesn't have a common template. Thanks Lamar !&lt;/P&gt;</description>
      <pubDate>Tue, 19 Jun 2012 06:40:15 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/categorization-based-on-frequent-text/m-p/75085#M181289</guid>
      <dc:creator>subinj</dc:creator>
      <dc:date>2012-06-19T06:40:15Z</dc:date>
    </item>
    <item>
      <title>Re: categorization based on frequent text</title>
      <link>https://community.splunk.com/t5/Splunk-Search/categorization-based-on-frequent-text/m-p/75086#M181290</link>
      <description>&lt;P&gt;Thanks Ayn! &lt;/P&gt;

&lt;P&gt;Yes, the first part is what i am looking for, as currently I do not know what are the possible incident categories and associated strings I should be searching for.&lt;/P&gt;

&lt;P&gt;Would it be feasible to read the index file from wherein I could identify the various strings and associated number of occurrences?&lt;/P&gt;</description>
      <pubDate>Tue, 19 Jun 2012 07:04:37 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/categorization-based-on-frequent-text/m-p/75086#M181290</guid>
      <dc:creator>subinj</dc:creator>
      <dc:date>2012-06-19T07:04:37Z</dc:date>
    </item>
    <item>
      <title>Re: categorization based on frequent text</title>
      <link>https://community.splunk.com/t5/Splunk-Search/categorization-based-on-frequent-text/m-p/75087#M181291</link>
      <description>&lt;P&gt;Please clarify what you mean - what index file are you referring to, and which various strings?&lt;/P&gt;</description>
      <pubDate>Tue, 19 Jun 2012 07:46:34 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/categorization-based-on-frequent-text/m-p/75087#M181291</guid>
      <dc:creator>Ayn</dc:creator>
      <dc:date>2012-06-19T07:46:34Z</dc:date>
    </item>
    <item>
      <title>Re: categorization based on frequent text</title>
      <link>https://community.splunk.com/t5/Splunk-Search/categorization-based-on-frequent-text/m-p/75088#M181292</link>
      <description>&lt;P&gt;I was referring to the index file that would get generated when I run Splunk on the file containing the incident description. &lt;/P&gt;

&lt;P&gt;Based on the example provided, I am assuming the index file would have the following content :&lt;BR /&gt;
5   aaaa&lt;BR /&gt;
2   backup&lt;BR /&gt;
4   bbbbb&lt;BR /&gt;
3   channel&lt;BR /&gt;
4   cluster&lt;BR /&gt;
2   disk&lt;BR /&gt;
3   fibre&lt;BR /&gt;
3   luns&lt;BR /&gt;
3   multipath&lt;BR /&gt;
where the numbers specify the number of times the string appears in the content.&lt;/P&gt;

&lt;P&gt;Was wondering if I could read this index file to obtain the strings and count, provided my assumption about the index file contents are correct.&lt;/P&gt;

&lt;P&gt;Thanks !&lt;/P&gt;</description>
      <pubDate>Wed, 20 Jun 2012 04:46:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/categorization-based-on-frequent-text/m-p/75088#M181292</guid>
      <dc:creator>subinj</dc:creator>
      <dc:date>2012-06-20T04:46:36Z</dc:date>
    </item>
    <item>
      <title>Re: categorization based on frequent text</title>
      <link>https://community.splunk.com/t5/Splunk-Search/categorization-based-on-frequent-text/m-p/75089#M181293</link>
      <description>&lt;P&gt;The index is in a proprietary binary format that can't be read in any way like that, so no, the assumption is false.&lt;/P&gt;</description>
      <pubDate>Wed, 20 Jun 2012 07:41:46 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/categorization-based-on-frequent-text/m-p/75089#M181293</guid>
      <dc:creator>Ayn</dc:creator>
      <dc:date>2012-06-20T07:41:46Z</dc:date>
    </item>
    <item>
      <title>Re: categorization based on frequent text</title>
      <link>https://community.splunk.com/t5/Splunk-Search/categorization-based-on-frequent-text/m-p/75090#M181294</link>
      <description>&lt;P&gt;I know this question was asked quite a while ago, but in case anyone stumbles across this in a search I thought I'd mention that Prelert Anomaly Detective for Splunk (&lt;A href="http://splunk-base.splunk.com/apps/68765/prelert-anomaly-detective"&gt;http://splunk-base.splunk.com/apps/68765/prelert-anomaly-detective&lt;/A&gt;) can categorize events based on looking for common words in the raw text.&lt;/P&gt;</description>
      <pubDate>Fri, 12 Apr 2013 12:23:35 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/categorization-based-on-frequent-text/m-p/75090#M181294</guid>
      <dc:creator>dmr195</dc:creator>
      <dc:date>2013-04-12T12:23:35Z</dc:date>
    </item>
  </channel>
</rss>

