<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to extract Most popular words from the source data? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/How-to-extract-Most-popular-words-from-the-source-data/m-p/94280#M182419</link>
    <description>&lt;P&gt;I've a source file in which I need to find most popular English words (&lt;STRONG&gt;excluding prepositions and pronouns&lt;/STRONG&gt;) and display it.&lt;/P&gt;

&lt;P&gt;This is a sample of text I've : Yaaah...,Goal for Arsenal. City don't deal with the corner and Koscielny smashes home..&lt;/P&gt;

&lt;P&gt;Like this I've more than one file now I need to extract the most popular English words from the text as shown in my sample text.&lt;/P&gt;

&lt;P&gt;Thanks for helping&lt;/P&gt;</description>
    <pubDate>Thu, 18 Oct 2012 06:14:19 GMT</pubDate>
    <dc:creator>warhead</dc:creator>
    <dc:date>2012-10-18T06:14:19Z</dc:date>
    <item>
      <title>How to extract Most popular words from the source data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-extract-Most-popular-words-from-the-source-data/m-p/94280#M182419</link>
      <description>&lt;P&gt;I've a source file in which I need to find most popular English words (&lt;STRONG&gt;excluding prepositions and pronouns&lt;/STRONG&gt;) and display it.&lt;/P&gt;

&lt;P&gt;This is a sample of text I've : Yaaah...,Goal for Arsenal. City don't deal with the corner and Koscielny smashes home..&lt;/P&gt;

&lt;P&gt;Like this I've more than one file now I need to extract the most popular English words from the text as shown in my sample text.&lt;/P&gt;

&lt;P&gt;Thanks for helping&lt;/P&gt;</description>
      <pubDate>Thu, 18 Oct 2012 06:14:19 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-extract-Most-popular-words-from-the-source-data/m-p/94280#M182419</guid>
      <dc:creator>warhead</dc:creator>
      <dc:date>2012-10-18T06:14:19Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract Most popular words from the source data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-extract-Most-popular-words-from-the-source-data/m-p/94281#M182420</link>
      <description>&lt;P&gt;...yes? Where did you get with this so far? Is there a question you'd like to ask?&lt;/P&gt;</description>
      <pubDate>Thu, 18 Oct 2012 07:00:09 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-extract-Most-popular-words-from-the-source-data/m-p/94281#M182420</guid>
      <dc:creator>Ayn</dc:creator>
      <dc:date>2012-10-18T07:00:09Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract Most popular words from the source data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-extract-Most-popular-words-from-the-source-data/m-p/94282#M182421</link>
      <description>&lt;P&gt;Interesting... what does the file look like? &lt;BR /&gt;
Is there more than one language involved?&lt;BR /&gt;
All words on a separate line?&lt;/P&gt;

&lt;P&gt;/k&lt;/P&gt;</description>
      <pubDate>Thu, 18 Oct 2012 07:38:38 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-extract-Most-popular-words-from-the-source-data/m-p/94282#M182421</guid>
      <dc:creator>kristian_kolb</dc:creator>
      <dc:date>2012-10-18T07:38:38Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract Most popular words from the source data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-extract-Most-popular-words-from-the-source-data/m-p/94283#M182422</link>
      <description>&lt;P&gt;Yes , there're more than one language. It's collection of online tweeter data , now I need to separate the most popular English words(excluding propositions and pronouns)&lt;/P&gt;</description>
      <pubDate>Thu, 18 Oct 2012 07:59:51 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-extract-Most-popular-words-from-the-source-data/m-p/94283#M182422</guid>
      <dc:creator>warhead</dc:creator>
      <dc:date>2012-10-18T07:59:51Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract Most popular words from the source data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-extract-Most-popular-words-from-the-source-data/m-p/94284#M182423</link>
      <description>&lt;P&gt;Sorry, but I don't think that you'll be able to reliably filter out French/German/Spanish/etc etc automatically.&lt;/P&gt;

&lt;P&gt;I believe though that you could possibly break your text into separate events (one event per word), with use of &lt;/P&gt;

&lt;P&gt;in props.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[my_tweets]
SHOULD_LINEMERGE = false
LINE_BREAKER=(\s+)
EXTRACT-tweetword = ^(?&amp;lt;words&amp;gt;.*)$
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;And then search like;&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;sourcetype=my_tweets NOT the NOT a NOT an NOT for NOT by | top 1000 words
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;and then successively add to the "NOT someword" that turns up unwanted.&lt;/P&gt;

&lt;P&gt;Sadly, this is one of those times where there is probably a better tool than Splunk.&lt;/P&gt;

&lt;P&gt;/Kristian&lt;/P&gt;</description>
      <pubDate>Thu, 18 Oct 2012 08:40:50 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-extract-Most-popular-words-from-the-source-data/m-p/94284#M182423</guid>
      <dc:creator>kristian_kolb</dc:creator>
      <dc:date>2012-10-18T08:40:50Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract Most popular words from the source data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-extract-Most-popular-words-from-the-source-data/m-p/94285#M182424</link>
      <description>&lt;P&gt;For anyone else who might be doing this, I was able to achieve the desired result by using a combination of the rex command to extract individual words from the twitter post body and then piping it to a dynamic lookup table fed by a simple python script. &lt;/P&gt;

&lt;P&gt;The command to extract each word was:&lt;BR /&gt;
rex field=body "(?&lt;WORD&gt;[a-zA-Z]{2,}\s)"&lt;/WORD&gt;&lt;/P&gt;

&lt;P&gt;Jason&lt;/P&gt;</description>
      <pubDate>Sat, 20 Oct 2012 10:08:38 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-extract-Most-popular-words-from-the-source-data/m-p/94285#M182424</guid>
      <dc:creator>jcampos8782</dc:creator>
      <dc:date>2012-10-20T10:08:38Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract Most popular words from the source data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-extract-Most-popular-words-from-the-source-data/m-p/94286#M182425</link>
      <description>&lt;P&gt;Interesting use case.&lt;BR /&gt;
Here is a search time method to do it, ( to be tested on large set of events).&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;source=*mybook* &lt;BR /&gt;
| sort -_time&lt;BR /&gt;
| rex  mode=sed "s/(\.|,|;|=|\"|'|\(|\)|\[|\]| -|!|^-)/ /g"&lt;BR /&gt;
|  eval word=_raw &lt;BR /&gt;
| makemv delim=" " word &lt;BR /&gt;
 | mvexpand word&lt;BR /&gt;
| eval word=lower(word)&lt;BR /&gt;
| eval position=1 | streamstats sum(position) AS position &lt;BR /&gt;
| table position word&lt;BR /&gt;
| stats count min(position) max(position) by word&lt;BR /&gt;
&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;to describe the steps : we use a field named word, we replace all special characters by spaces, we generate multivalue field using space a separator, then we split each value into a new event, then convert to lowercase, we generate a counter for the position of the word in the text, and finally count the values, with the first and last occurrence of.each word.&lt;/P&gt;</description>
      <pubDate>Sun, 21 Oct 2012 05:35:03 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-extract-Most-popular-words-from-the-source-data/m-p/94286#M182425</guid>
      <dc:creator>yannK</dc:creator>
      <dc:date>2012-10-21T05:35:03Z</dc:date>
    </item>
  </channel>
</rss>

