<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Can Splunk find similar strings in a log? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Can-Splunk-find-similar-strings-in-a-log/m-p/449873#M127385</link>
    <description>&lt;P&gt;@samlinsongguo&lt;/P&gt;

&lt;P&gt;Splunk can do searches using wildcard. For e.g. below is my data inputs(events)&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;1,This string contain mystring
2,This string contain mystrings
3,This string contain my5tring
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Below search gives me all three rows&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index="test" sourcetype="strings"|search *my*tring*
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Below gives me only first 2 rows&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index="test" sourcetype="strings"|search *mystring*
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;And below only the first row&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index="test" sourcetype="strings"|search *mystring
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Hope it clarifies&lt;/P&gt;</description>
    <pubDate>Sat, 21 Jul 2018 15:53:15 GMT</pubDate>
    <dc:creator>renjith_nair</dc:creator>
    <dc:date>2018-07-21T15:53:15Z</dc:date>
    <item>
      <title>Can Splunk find similar strings in a log?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Can-Splunk-find-similar-strings-in-a-log/m-p/449871#M127383</link>
      <description>&lt;P&gt;Hi &lt;BR /&gt;
Does Splunk can do similar string search? &lt;BR /&gt;
For example the given string is mystring, and I want to return any log that contain string which looks similar as my given string such as my5tring or mystrings etc.&lt;BR /&gt;
Cheers&lt;/P&gt;</description>
      <pubDate>Sat, 21 Jul 2018 15:24:15 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Can-Splunk-find-similar-strings-in-a-log/m-p/449871#M127383</guid>
      <dc:creator>samlinsongguo</dc:creator>
      <dc:date>2018-07-21T15:24:15Z</dc:date>
    </item>
    <item>
      <title>Re: Can Splunk find similar strings in a log?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Can-Splunk-find-similar-strings-in-a-log/m-p/449872#M127384</link>
      <description>&lt;P&gt;Hi @samlinsongguo,&lt;/P&gt;

&lt;P&gt;Hope this helps you &lt;A href="https://docs.splunk.com/Documentation/Splunk/7.1.2/Search/UseCASEandTERMtomatchphrases"&gt;https://docs.splunk.com/Documentation/Splunk/7.1.2/Search/UseCASEandTERMtomatchphrases&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 21 Jul 2018 15:46:43 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Can-Splunk-find-similar-strings-in-a-log/m-p/449872#M127384</guid>
      <dc:creator>thambisetty</dc:creator>
      <dc:date>2018-07-21T15:46:43Z</dc:date>
    </item>
    <item>
      <title>Re: Can Splunk find similar strings in a log?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Can-Splunk-find-similar-strings-in-a-log/m-p/449873#M127385</link>
      <description>&lt;P&gt;@samlinsongguo&lt;/P&gt;

&lt;P&gt;Splunk can do searches using wildcard. For e.g. below is my data inputs(events)&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;1,This string contain mystring
2,This string contain mystrings
3,This string contain my5tring
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Below search gives me all three rows&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index="test" sourcetype="strings"|search *my*tring*
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Below gives me only first 2 rows&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index="test" sourcetype="strings"|search *mystring*
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;And below only the first row&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index="test" sourcetype="strings"|search *mystring
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Hope it clarifies&lt;/P&gt;</description>
      <pubDate>Sat, 21 Jul 2018 15:53:15 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Can-Splunk-find-similar-strings-in-a-log/m-p/449873#M127385</guid>
      <dc:creator>renjith_nair</dc:creator>
      <dc:date>2018-07-21T15:53:15Z</dc:date>
    </item>
    <item>
      <title>Re: Can Splunk find similar strings in a log?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Can-Splunk-find-similar-strings-in-a-log/m-p/449874#M127386</link>
      <description>&lt;P&gt;Cute joke in subject.&lt;/P&gt;</description>
      <pubDate>Sat, 21 Jul 2018 17:28:35 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Can-Splunk-find-similar-strings-in-a-log/m-p/449874#M127386</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2018-07-21T17:28:35Z</dc:date>
    </item>
    <item>
      <title>Re: Can Splunk find similar strings in a log?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Can-Splunk-find-similar-strings-in-a-log/m-p/449875#M127387</link>
      <description>&lt;P&gt;Hi - that depends on your criteria for similarity.  &lt;/P&gt;

&lt;P&gt;It seems like you are looking for something that will search for all terms within a certain Levenshtein distance. Here's something that will calculate that distance, given two words... &lt;A href="https://splunkbase.splunk.com/app/1898/"&gt;https://splunkbase.splunk.com/app/1898/&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;There is no native Splunk method of getting all such possible terms, and it would be a very expensive search.  However, we can string together that expensive search if you want to try.  &lt;/P&gt;

&lt;P&gt;in essence, to find all similar items to &lt;CODE&gt;"mystring"&lt;/CODE&gt;  you would need to search for &lt;CODE&gt;( "*ystring" OR "m*string" OR "my*tring" OR "mys*ring" OR "mystr*ing" OR "mystri*g" OR "mystri*")&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;Efficiency-wise, you would probably be best searching for... &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; index=foo ("*ystring" OR "m*g" OR "mystrin*")
| fields ... list the fields you care about ... (_raw and _time will survive this command anyway)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;and then limiting extracting the results by a regular expression that is more specific.  In this case, we've just translated the above search into a regex to pull it to a field called &lt;CODE&gt;myalmostmatch&lt;/CODE&gt;. &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; | rex max_match=0 "\b(?&amp;lt;myalmostmatch&amp;gt;\w*ystring\w*|m\w*g|mystrin\w*)\b"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;In the above expression &lt;CODE&gt;\w*&lt;/CODE&gt; will match any number of word characters (including zero of them). &lt;CODE&gt;\b&lt;/CODE&gt; matches a word break, and &lt;CODE&gt;|&lt;/CODE&gt; represents a logical &lt;CODE&gt;OR&lt;/CODE&gt; between the different things that might match.  Thus, this will match any single word that looks about like &lt;CODE&gt;mystring&lt;/CODE&gt;.  &lt;/P&gt;

&lt;P&gt;Now, that extraction has not specifically dealt with transpositions - &lt;CODE&gt;mytsring&lt;/CODE&gt; and so on... but as long as the &lt;CODE&gt;m&lt;/CODE&gt; and &lt;CODE&gt;g&lt;/CODE&gt; are there at the beginning and end, those words will be pulled out.&lt;/P&gt;

&lt;P&gt;Okay, we now have the result universe, but the middle term &lt;CODE&gt;"m*g"&lt;/CODE&gt; could include &lt;CODE&gt;"meeting"&lt;/CODE&gt; and &lt;CODE&gt;"mutating"&lt;/CODE&gt; .  We have to calculate the Levenshtein to each of the potential terms that we extracted.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| rename COMMENT as "give each potential record a unique number so we can put them back together later"
| rename COMMENT as "then split apart the records that got multiple hits. mvexpand would also kill any records that got no hits."
| streamstats count as recno
| mvexpand myalmostmatch

| rename COMMENT as "calculate the levenshtein distance and kill all records that require more than 3 changes to match"
| levenshtein distance "mystring" myalmostmatch
| where distance &amp;lt; 3   

| rename COMMENT as "collapse the myalmostmatch string and the distance field into a single field, then delete them so that we can rejoin the record"
| rename COMMENT as "(mvcombine only allows a single field to differ between two records or it won't combine them."
| eval mymatch="match=".myalmostmatch.";levenshtein=".distance
| fields - myalmostmatch distance
| mvcombine mymatch
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The above will provide the basis to get more or less what you are looking for.&lt;/P&gt;</description>
      <pubDate>Sun, 22 Jul 2018 03:07:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Can-Splunk-find-similar-strings-in-a-log/m-p/449875#M127387</guid>
      <dc:creator>DalJeanis</dc:creator>
      <dc:date>2018-07-22T03:07:53Z</dc:date>
    </item>
    <item>
      <title>Re: Can Splunk find similar strings in a log?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Can-Splunk-find-similar-strings-in-a-log/m-p/449876#M127388</link>
      <description>&lt;P&gt;Here's some run-anywhere code using the jellyfisher app &lt;A href="https://splunkbase.splunk.com/app/3626/#/details"&gt;https://splunkbase.splunk.com/app/3626/#/details&lt;/A&gt; to calculate the Levenshtein distance.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| makeresults | eval mydata="test one mystring!!!!test two m5strng and mystr1ng!!!!test three I'm mortgaging my kid's future!!!!test 5 making my day my5tring!!!!test 6 whatever!!!!test 7 there was no matching word to mystring in test 6"" | makemv delim="!!!!" mydata | mvexpand mydata | rename mydata as _raw
|streamstats count | eval _time = _time + count  | fields - count 
| rename COMMENT as "the above just generates test data"

| rex max_match=0 "\b(?&amp;lt;myalmostmatch&amp;gt;\w*ystring\w*|m\w*g|mystrin\w*)\b"
| rename COMMENT as "give each potential record a unique number so we can put them back together later"
 | rename COMMENT as "then split apart the records that got multiple hits. mvexpand would also kill any records that got no hits."
 | streamstats count as recno
 | rename _raw as Raw, _time as Time
 | mvexpand myalmostmatch
 | rename COMMENT as "calculate the levenshtein distance and kill all records that require more than 2 changes to match"
 | eval target="mystring"
 | jellyfisher levenshtein_distance(target,myalmostmatch)
 | rename levenshtein_distance as distance
 | where distance &amp;lt; 3   

 | rename COMMENT as "collapse the myalmostmatch string and the distance field into a single field, then delete them so that we can rejoin the record"
 | rename COMMENT as "(mvcombine only allows a single field to differ between two records or it won't combine them."
 | eval mymatch="match=".myalmostmatch.";levenshtein=".distance
 | fields - myalmostmatch distance
 | mvcombine mymatch
 | rename Raw as _raw, Time as _time
 | sort 0 _time 
 | table _time _raw recno mymatch
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Resulting in this output&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;_time                _raw                                                     recno  mymatch
2018-07-21 22:26:36  test one mystring                                          1    match=mystring;levenshtein=0
2018-07-21 22:26:37  test two m5strng and mystr1ng                              2    match=m5strng;levenshtein=2
                                                                                     match=mystr1ng;levenshtein=1
2018-07-21 22:26:39  test 5 making my day my5tring                              4    match=my5tring;levenshtein=1
2018-07-21 22:26:41  test 7 there was no matching word to mystring in test 6    5    match=mystring;levenshtein=0
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sun, 22 Jul 2018 03:30:07 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Can-Splunk-find-similar-strings-in-a-log/m-p/449876#M127388</guid>
      <dc:creator>DalJeanis</dc:creator>
      <dc:date>2018-07-22T03:30:07Z</dc:date>
    </item>
    <item>
      <title>Re: Can Splunk find similar strings in a log?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Can-Splunk-find-similar-strings-in-a-log/m-p/449877#M127389</link>
      <description>&lt;P&gt;Thank you for your suggestion but it is not exactly I am looking for. I want to search any string that similar to mystring, not just two string I given.&lt;/P&gt;</description>
      <pubDate>Mon, 23 Jul 2018 00:22:32 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Can-Splunk-find-similar-strings-in-a-log/m-p/449877#M127389</guid>
      <dc:creator>samlinsongguo</dc:creator>
      <dc:date>2018-07-23T00:22:32Z</dc:date>
    </item>
    <item>
      <title>Re: Can Splunk find similar strings in a log?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Can-Splunk-find-similar-strings-in-a-log/m-p/449878#M127390</link>
      <description>&lt;P&gt;thank you for your details explaination&lt;/P&gt;</description>
      <pubDate>Mon, 23 Jul 2018 02:56:42 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Can-Splunk-find-similar-strings-in-a-log/m-p/449878#M127390</guid>
      <dc:creator>samlinsongguo</dc:creator>
      <dc:date>2018-07-23T02:56:42Z</dc:date>
    </item>
  </channel>
</rss>

