Splunk Search

Why does search not find data when using wild card in middle of search term?

bowesmana
Champion

I have JSON data, which is indexed and can be searched. This is an example of the data

Product:    {   [-] 
         BottleSizeMls:  750mls 
         BottleSizeName:     Bottle 
         Id:     0  
         Notes:  null   
         Title:  MOSS WOOD Ribbon Vale Merlot, Margaret River 2013  
         Winery:     null   
    }   

I have 4 searches, the first three work and the last one does not.

Product.Title="MOSS WOOD Ribbon Vale Merlot, Margaret River*"
Product.Title="MOSS WOOD Ribbon Vale Merlot, Margaret River 2013"
Product.Title="MOSS WOOD Ribbon Vale Merlot, Margaret River*" Product.Title="*2013"
Product.Title="MOSS WOOD Ribbon Vale Merlot, Margaret River*2013"

I need to use the wildcard as the third party data is inconsistent and sometimes comes with extra words before the year rather than just the single space.

This ONLY happens for one or two different wines, and works in 99.9% of cases.

I have checked the original JSON and there is only a single space in the source data.

Any thoughts on how to diagnose?

0 Karma
1 Solution

jkat54
SplunkTrust
SplunkTrust

It has to do with the way Splunk stores the data in segments. By default, major segmenters include spaces.

A search like field=fu*ar would match events with fubar fuBar fubbbbbar, fu1234bar, etc. but does not match "fun at the bar". This is because the later has several major segmenters in it. To match "fun at the bar" with wild cards you'd need something like this

 field=fun* field=*at* field=*the* field=*bar

I seem to remember you can circumvent this with the CASE statement like this field=CASE("fu*bar") as well but then you also have to keep in mind that your search is case sensitive.

View solution in original post

jkat54
SplunkTrust
SplunkTrust

It has to do with the way Splunk stores the data in segments. By default, major segmenters include spaces.

A search like field=fu*ar would match events with fubar fuBar fubbbbbar, fu1234bar, etc. but does not match "fun at the bar". This is because the later has several major segmenters in it. To match "fun at the bar" with wild cards you'd need something like this

 field=fun* field=*at* field=*the* field=*bar

I seem to remember you can circumvent this with the CASE statement like this field=CASE("fu*bar") as well but then you also have to keep in mind that your search is case sensitive.

View solution in original post

bowesmana
Champion

Really interesting, that would explain it. What is really strange is that, this morning, statement # 4 now works. Does that mean that when the indexing first occurs and the data is in the hot bucket, it can be different to when it gets rolled to a different bucket?

I have seen this is a very few cases and now I think about it, it could be that the search has failed when I have JUST indexed the data. A day later, can the index/segmenters change in any way?

Believe me, I have been re-running search # 4 and it works every time... Every time I use Splunk the jigsaw gets bigger and another piece of the jigsaw needs to be fit 🙂

Both CASE() and TERM() work, but oddly enough, TERM with variant #2 above does not find it.

0 Karma

jkat54
SplunkTrust
SplunkTrust

Yes absolutely. The data in a hot bucket is in a different format. When it rolls to warm, all sorts of wizardry occurs 🙂

0 Karma

jkat54
SplunkTrust
SplunkTrust

Check this out for a more in depth exploration of the segmenters topic:

https://conf.splunk.com/files/2016/slides/fields-indexed-tokens-and-you.pdf

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Do a stats count by Product.Title and see if there are differences that you can see in those cases. If they are all the same, then you have something strange. If there are differences, then you should be able to discover the cause.

bowesmana
Champion

I downvoted this post because upvoted wrong post, sorry

0 Karma

jkat54
SplunkTrust
SplunkTrust

You can simply click on the up arrow to un-upvote. A downvote takes away karma and is generally a bad thing around here unless doing what the author suggests would harm someone's environment.

I upvoted to even out the karma here.

0 Karma

bowesmana
Champion

Sorry guys @cpetterborg, not really up on the voting business.

cpetterborg
SplunkTrust
SplunkTrust

No problem. We're all just here to help out. I'm glad you got an answer to your question.

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Thanks, @jkat54!

0 Karma

bowesmana
Champion

That's the point, there are no visible differences between any of the wines Given that option 3 works, I can't figure out why Splunk is not finding the result in case 4, as they are essentially the same

0 Karma

jkat54
SplunkTrust
SplunkTrust

cpetterborg was trying to explain how to "discover" the segmenters issue. Usually people find it when they do a stats count by fieldName and have x number less than when they look at the data alone; Yet they know fieldName is in 100% of the events.

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.