<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Text Clustering in Splunk in Splunk ITSI</title>
    <link>https://community.splunk.com/t5/Splunk-ITSI/Text-Clustering-in-Splunk/m-p/494091#M2382</link>
    <description>&lt;P&gt;You can use the &lt;CODE&gt;kmeans&lt;/CODE&gt; command for this:&lt;BR /&gt;
&lt;A href="https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Kmeans"&gt;https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Kmeans&lt;/A&gt;&lt;BR /&gt;
Or you can have even more control in the &lt;CODE&gt;Machine Learning ToolKit&lt;/CODE&gt; (MLTK) to build a model.  Once that is done, you can inspect the KMeans model you built with fit using the summary command:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| summary &amp;lt;your_model_name&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;When assigning new points to the appropriate cluster, you can simply apply your model like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;&amp;lt;new_points&amp;gt; | apply &amp;lt;your_model_name&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Fri, 29 Nov 2019 19:43:39 GMT</pubDate>
    <dc:creator>woodcock</dc:creator>
    <dc:date>2019-11-29T19:43:39Z</dc:date>
    <item>
      <title>Text Clustering in Splunk</title>
      <link>https://community.splunk.com/t5/Splunk-ITSI/Text-Clustering-in-Splunk/m-p/494086#M2377</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;Here is my requirement&lt;BR /&gt;
I have file with column 'Description'. I need to get the most common pattern of the words.Example &lt;/P&gt;

&lt;P&gt;Repetitive Pattern  Count   Percentage  Examples&lt;BR /&gt;
Job                                 80         15%                  Job Related with Ticket number&lt;BR /&gt;
Access                         130        20%                  Access issues&lt;/P&gt;

&lt;P&gt;Any Job or Jobs should categorize as Job. &lt;BR /&gt;
I have installed Machine Learning Tool Kit and tried to apply TFIDF and Kmeans. I am unable to proceed as i am new to splunk.&lt;BR /&gt;
Can any one help me how to do clustering using Kmeans with data as mentioned above and get required output.&lt;/P&gt;

&lt;P&gt;Please help.&lt;/P&gt;</description>
      <pubDate>Wed, 27 Nov 2019 11:48:14 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-ITSI/Text-Clustering-in-Splunk/m-p/494086#M2377</guid>
      <dc:creator>lavanya_gurrapu</dc:creator>
      <dc:date>2019-11-27T11:48:14Z</dc:date>
    </item>
    <item>
      <title>Re: Text Clustering in Splunk</title>
      <link>https://community.splunk.com/t5/Splunk-ITSI/Text-Clustering-in-Splunk/m-p/494087#M2378</link>
      <description>&lt;P&gt;For &lt;STRONG&gt;string matching&lt;/STRONG&gt; you could check this post:&lt;BR /&gt;
&lt;A href="https://www.splunk.com/en_us/blog/tips-and-tricks/you-can-t-hyde-from-dr-levenshtein-when-you-use-url-toolbox.html"&gt;https://www.splunk.com/en_us/blog/tips-and-tricks/you-can-t-hyde-from-dr-levenshtein-when-you-use-url-toolbox.html&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Intelligent text pattern matching might be a little hard to implement. I will investigate further.&lt;/P&gt;

&lt;P&gt;Hope it helps!!!&lt;/P&gt;</description>
      <pubDate>Wed, 27 Nov 2019 16:20:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-ITSI/Text-Clustering-in-Splunk/m-p/494087#M2378</guid>
      <dc:creator>jaime_ramirez</dc:creator>
      <dc:date>2019-11-27T16:20:49Z</dc:date>
    </item>
    <item>
      <title>Re: Text Clustering in Splunk</title>
      <link>https://community.splunk.com/t5/Splunk-ITSI/Text-Clustering-in-Splunk/m-p/494088#M2379</link>
      <description>&lt;P&gt;If you don't have a list of keywords, you can try the &lt;CODE&gt;cluster&lt;/CODE&gt; command:&lt;BR /&gt;
&lt;A href="https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Cluster"&gt;https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Cluster&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;But it sounds like you have a limited set so you can do something like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;Your Search Here
| eval cluster_keyword = case(
   match(_raw, "(?i)job"), "job",
   match(_raw, "(?i)access"), "access",
   match(_raw, "(?i)ticket"), "ticket",
   true() "other")
| stats first(_raw) last(_raw) count BY cluster_keyword
| eventstats sum(count)AS total
| eval pct = 100 * count / total
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 27 Nov 2019 17:06:31 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-ITSI/Text-Clustering-in-Splunk/m-p/494088#M2379</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2019-11-27T17:06:31Z</dc:date>
    </item>
    <item>
      <title>Re: Text Clustering in Splunk</title>
      <link>https://community.splunk.com/t5/Splunk-ITSI/Text-Clustering-in-Splunk/m-p/494089#M2380</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;Thank you for the quick reply.&lt;/P&gt;

&lt;P&gt;Firstly, want to remove the stop words and categorize the similar words into one category. Next should be, most recurrent words should display with count.&lt;/P&gt;

&lt;P&gt;How can i implement this logic in Splunk. I need to use Kmeans algorithm&lt;/P&gt;</description>
      <pubDate>Thu, 28 Nov 2019 05:00:28 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-ITSI/Text-Clustering-in-Splunk/m-p/494089#M2380</guid>
      <dc:creator>lavanya_gurrapu</dc:creator>
      <dc:date>2019-11-28T05:00:28Z</dc:date>
    </item>
    <item>
      <title>Re: Text Clustering in Splunk</title>
      <link>https://community.splunk.com/t5/Splunk-ITSI/Text-Clustering-in-Splunk/m-p/494090#M2381</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I have tried below search command to exclude stop words&lt;BR /&gt;
index=sample| makemv Summary | mvexpand Summary|fields Summary| search Summary NOT [|inputlookup words.csv|rename word as summary1]|top summary1&lt;/P&gt;

&lt;P&gt;No results are fetched. Please help where i am doing mistake&lt;/P&gt;</description>
      <pubDate>Thu, 28 Nov 2019 08:42:21 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-ITSI/Text-Clustering-in-Splunk/m-p/494090#M2381</guid>
      <dc:creator>lavanya_gurrapu</dc:creator>
      <dc:date>2019-11-28T08:42:21Z</dc:date>
    </item>
    <item>
      <title>Re: Text Clustering in Splunk</title>
      <link>https://community.splunk.com/t5/Splunk-ITSI/Text-Clustering-in-Splunk/m-p/494091#M2382</link>
      <description>&lt;P&gt;You can use the &lt;CODE&gt;kmeans&lt;/CODE&gt; command for this:&lt;BR /&gt;
&lt;A href="https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Kmeans"&gt;https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Kmeans&lt;/A&gt;&lt;BR /&gt;
Or you can have even more control in the &lt;CODE&gt;Machine Learning ToolKit&lt;/CODE&gt; (MLTK) to build a model.  Once that is done, you can inspect the KMeans model you built with fit using the summary command:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| summary &amp;lt;your_model_name&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;When assigning new points to the appropriate cluster, you can simply apply your model like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;&amp;lt;new_points&amp;gt; | apply &amp;lt;your_model_name&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 29 Nov 2019 19:43:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-ITSI/Text-Clustering-in-Splunk/m-p/494091#M2382</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2019-11-29T19:43:39Z</dc:date>
    </item>
  </channel>
</rss>

