Splunk Search

Why does search not find data when using wild card in middle of search term?

bowesmana
SplunkTrust
SplunkTrust

I have JSON data, which is indexed and can be searched. This is an example of the data

Product:    {   [-] 
         BottleSizeMls:  750mls 
         BottleSizeName:     Bottle 
         Id:     0  
         Notes:  null   
         Title:  MOSS WOOD Ribbon Vale Merlot, Margaret River 2013  
         Winery:     null   
    }   

I have 4 searches, the first three work and the last one does not.

Product.Title="MOSS WOOD Ribbon Vale Merlot, Margaret River*"
Product.Title="MOSS WOOD Ribbon Vale Merlot, Margaret River 2013"
Product.Title="MOSS WOOD Ribbon Vale Merlot, Margaret River*" Product.Title="*2013"
Product.Title="MOSS WOOD Ribbon Vale Merlot, Margaret River*2013"

I need to use the wildcard as the third party data is inconsistent and sometimes comes with extra words before the year rather than just the single space.

This ONLY happens for one or two different wines, and works in 99.9% of cases.

I have checked the original JSON and there is only a single space in the source data.

Any thoughts on how to diagnose?

0 Karma
1 Solution

jkat54
SplunkTrust
SplunkTrust

It has to do with the way Splunk stores the data in segments. By default, major segmenters include spaces.

A search like field=fu*ar would match events with fubar fuBar fubbbbbar, fu1234bar, etc. but does not match "fun at the bar". This is because the later has several major segmenters in it. To match "fun at the bar" with wild cards you'd need something like this

 field=fun* field=*at* field=*the* field=*bar

I seem to remember you can circumvent this with the CASE statement like this field=CASE("fu*bar") as well but then you also have to keep in mind that your search is case sensitive.

View solution in original post

jkat54
SplunkTrust
SplunkTrust

It has to do with the way Splunk stores the data in segments. By default, major segmenters include spaces.

A search like field=fu*ar would match events with fubar fuBar fubbbbbar, fu1234bar, etc. but does not match "fun at the bar". This is because the later has several major segmenters in it. To match "fun at the bar" with wild cards you'd need something like this

 field=fun* field=*at* field=*the* field=*bar

I seem to remember you can circumvent this with the CASE statement like this field=CASE("fu*bar") as well but then you also have to keep in mind that your search is case sensitive.

bowesmana
SplunkTrust
SplunkTrust

Really interesting, that would explain it. What is really strange is that, this morning, statement # 4 now works. Does that mean that when the indexing first occurs and the data is in the hot bucket, it can be different to when it gets rolled to a different bucket?

I have seen this is a very few cases and now I think about it, it could be that the search has failed when I have JUST indexed the data. A day later, can the index/segmenters change in any way?

Believe me, I have been re-running search # 4 and it works every time... Every time I use Splunk the jigsaw gets bigger and another piece of the jigsaw needs to be fit 🙂

Both CASE() and TERM() work, but oddly enough, TERM with variant #2 above does not find it.

0 Karma

jkat54
SplunkTrust
SplunkTrust

Yes absolutely. The data in a hot bucket is in a different format. When it rolls to warm, all sorts of wizardry occurs 🙂

0 Karma

jkat54
SplunkTrust
SplunkTrust

Check this out for a more in depth exploration of the segmenters topic:

https://conf.splunk.com/files/2016/slides/fields-indexed-tokens-and-you.pdf

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Do a stats count by Product.Title and see if there are differences that you can see in those cases. If they are all the same, then you have something strange. If there are differences, then you should be able to discover the cause.

bowesmana
SplunkTrust
SplunkTrust

I downvoted this post because upvoted wrong post, sorry

0 Karma

jkat54
SplunkTrust
SplunkTrust

You can simply click on the up arrow to un-upvote. A downvote takes away karma and is generally a bad thing around here unless doing what the author suggests would harm someone's environment.

I upvoted to even out the karma here.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Sorry guys @cpetterborg, not really up on the voting business.

cpetterborg
SplunkTrust
SplunkTrust

No problem. We're all just here to help out. I'm glad you got an answer to your question.

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Thanks, @jkat54!

0 Karma

bowesmana
SplunkTrust
SplunkTrust

That's the point, there are no visible differences between any of the wines Given that option 3 works, I can't figure out why Splunk is not finding the result in case 4, as they are essentially the same

0 Karma

jkat54
SplunkTrust
SplunkTrust

cpetterborg was trying to explain how to "discover" the segmenters issue. Usually people find it when they do a stats count by fieldName and have x number less than when they look at the data alone; Yet they know fieldName is in 100% of the events.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...