I got an odd behavior today in Splunk.
When I ran: index=A sourcetype=A m=4 OR m=404 OR m=1233 the search was running for 30 minutes (there are lots of events involved)
but if I omit "m=4" the search only takes 2 minutes to run.
I do not understand why this is happening. m is a numerical field and I was not expecting to be any difference between my two searches... with m=4 and without m=4
how do you explain this??
Heh. You have discovered the wonder of "bloom filters". What Splunk is doing under the covers -- and I am being REALLY loose in the way I describe it -- is checking every page of data that has any of the values that you've listed.
Splunk speeds up its searches by only checking the pages/events that definitely have the key words that you are looking for. That is really efficient when the values you are looking for are rare. However, when you add
m=4, the search engine is going to have to laboriously check any event that has the value
4 in any field. Those will be pretty common.
Performance is going to be highly data dependent. When actual performance fails to match theory, believe the actual performance. In this case, if you don't have the ability to index the field
m at index-time, then you are going to have to play around with different ways to get at it at search time
Here's the first couple of things I'd try.
If the field
m is not on all the records, then this could help...
index=A sourcetype=A m=* | fields ... list only the fields you want ... | search m=4 OR m=404 OR m=1233
If there is a unique key field (
mykey) for the records you want, this might help...
index=A sourcetype=A [ search index=A sourcetype=A m=* | fields m mykey | search m=4 OR m=404 OR m=1233 | table mykey]
What the above code does is, in the subsearch, search thru the index and sourcetype, returning only the key and m. Then it checks to make sure m is one of the values you want, and then returns ONLY the keyfield. When it hits the end bracket
] of the subsearch,
the selected results are implicitly returned as if the
format command had been used. It will return a string that looks like this:
( (mykey="firstvalue" ) OR ( mykey="second value" ) OR .... OR ( mykey="last value" ) )
Assuming the keys are unique, that might cut a bunch of time off.
There may be some other ways, but those two are the first shots that I'd take.
thanks for your answer @DalJeanis . I am still trying to understand your explanation.
You say this "when you add m=4, the search engine is going to have to laboriously check any event that has the value 4 in any field" but I am telling Splunk to use m as the field, righto?