Solved: Re: Why does search not find data when using wild ...

bowesmana · ‎09-22-2017

I have JSON data, which is indexed and can be searched. This is an example of the data

Product:    {   [-] 
         BottleSizeMls:  750mls 
         BottleSizeName:     Bottle 
         Id:     0  
         Notes:  null   
         Title:  MOSS WOOD Ribbon Vale Merlot, Margaret River 2013  
         Winery:     null   
    }

I have 4 searches, the first three work and the last one does not.

Product.Title="MOSS WOOD Ribbon Vale Merlot, Margaret River*"
Product.Title="MOSS WOOD Ribbon Vale Merlot, Margaret River 2013"
Product.Title="MOSS WOOD Ribbon Vale Merlot, Margaret River*" Product.Title="*2013"
Product.Title="MOSS WOOD Ribbon Vale Merlot, Margaret River*2013"

I need to use the wildcard as the third party data is inconsistent and sometimes comes with extra words before the year rather than just the single space.

This ONLY happens for one or two different wines, and works in 99.9% of cases.

I have checked the original JSON and there is only a single space in the source data.

Any thoughts on how to diagnose?

jkat54 · ‎09-23-2017

It has to do with the way Splunk stores the data in segments. By default, major segmenters include spaces.

A search like field=fu*ar would match events with fubar fuBar fubbbbbar, fu1234bar, etc. but does not match "fun at the bar". This is because the later has several major segmenters in it. To match "fun at the bar" with wild cards you'd need something like this

 field=fun* field=*at* field=*the* field=*bar

I seem to remember you can circumvent this with the CASE statement like this field=CASE("fu*bar") as well but then you also have to keep in mind that your search is case sensitive.

View solution in original post

jkat54 · ‎09-23-2017

It has to do with the way Splunk stores the data in segments. By default, major segmenters include spaces.

A search like field=fu*ar would match events with fubar fuBar fubbbbbar, fu1234bar, etc. but does not match "fun at the bar". This is because the later has several major segmenters in it. To match "fun at the bar" with wild cards you'd need something like this

 field=fun* field=*at* field=*the* field=*bar

I seem to remember you can circumvent this with the CASE statement like this field=CASE("fu*bar") as well but then you also have to keep in mind that your search is case sensitive.

bowesmana · ‎09-23-2017

Really interesting, that would explain it. What is really strange is that, this morning, statement # 4 now works. Does that mean that when the indexing first occurs and the data is in the hot bucket, it can be different to when it gets rolled to a different bucket?

I have seen this is a very few cases and now I think about it, it could be that the search has failed when I have JUST indexed the data. A day later, can the index/segmenters change in any way?

Believe me, I have been re-running search # 4 and it works every time... Every time I use Splunk the jigsaw gets bigger and another piece of the jigsaw needs to be fit 🙂

Both CASE() and TERM() work, but oddly enough, TERM with variant #2 above does not find it.

jkat54 · ‎09-23-2017

Yes absolutely. The data in a hot bucket is in a different format. When it rolls to warm, all sorts of wizardry occurs 🙂

jkat54 · ‎09-23-2017

Check this out for a more in depth exploration of the segmenters topic:

https://conf.splunk.com/files/2016/slides/fields-indexed-tokens-and-you.pdf

cpetterborg · ‎09-22-2017

Do a stats count by Product.Title and see if there are differences that you can see in those cases. If they are all the same, then you have something strange. If there are differences, then you should be able to discover the cause.

bowesmana · ‎09-23-2017

I downvoted this post because upvoted wrong post, sorry

jkat54 · ‎09-23-2017

You can simply click on the up arrow to un-upvote. A downvote takes away karma and is generally a bad thing around here unless doing what the author suggests would harm someone's environment.

I upvoted to even out the karma here.

bowesmana · ‎09-23-2017

Sorry guys @cpetterborg, not really up on the voting business.

cpetterborg · ‎09-24-2017

No problem. We're all just here to help out. I'm glad you got an answer to your question.

cpetterborg · ‎09-23-2017

Thanks, @jkat54!

bowesmana · ‎09-22-2017

That's the point, there are no visible differences between any of the wines Given that option 3 works, I can't figure out why Splunk is not finding the result in case 4, as they are essentially the same

jkat54 · ‎09-23-2017

cpetterborg was trying to explain how to "discover" the segmenters issue. Usually people find it when they do a stats count by fieldName and have x number less than when they look at the data alone; Yet they know fieldName is in 100% of the events.

Why does search not find data when using wild card in middle of search term?

Index This | When is October more than just the tenth month?

Observe and Secure All Apps with Splunk

What’s New & Next in Splunk SOAR

Are you a member of the Splunk Community?

Why does search not find data when using wild card in middle of search term?

Index This | When is October more than just the tenth month?

Observe and Secure All Apps with Splunk

What’s New & Next in Splunk SOAR