I have some issues with the results from using | table *
I start with a simple data selection:
This gives me 108 events as results.
With two different sensortype's , namely: "sens1" and "sens-B".
Ofcourse this would give me the same result:
sourcetype=senssordata sensortype="sens1" OR sensortype="sens-B"
And it does, It gives the same 108 events as result.
So far, so good.
Now the strange issue appear.
sourcetype=senssordata sensortype="sens1" OR sensortype="sens-B" | table *
or (sourcetype=senssordata sensortype="sens1" OR sensortype="sens-B" | fieldsummary)
sourcetype=senssordata sensortype="sens" | table *
or (sourcetype=senssordata sensortype="sens" | fieldsummary)
These two queries does give a different output!!!
Both field summaries are not equal, and both table * outputs are not equal.
Even when both initial data selection has the same events.....
The outputs of the second query contains much more fields and those fields doesn't seem to exist.
This first query seems to output valid date. But the second should do exactly the same.
Can this be explained or is this a bug?
are you running
sourcetype=senssordata sensortype="sens1" OR sensortype="sens-B" | fieldsummary separately and comparing it to
sourcetype=senssordata sensortype="sens*" | fieldsummary ? I only ask because you shouldn't be able to have a
|table * or (sourcetype.... without the query erroring out. I'm wondering if either part of your query is missing or i'm misunderstanding something.
The answer is that, when you are doing
sensortype=sens*, the system is doing an expansion of all the fields from the other sensortypes before eliminating those sensortypes that don't match. This leaves a bunch of NULL fields.
table * is not best practices anyway -- much better to use only the fields that you need for any given query, and to put them in an explicit
fields command after the first pipe, to minimize the amount of extraction done by the system.
For an understanding of why this unexpected behavior is not a bug, you have to understand how searches and bloom filters actually work under the covers.
If you look at slide 22 of this .conf2017 presentation by MVP Martin Müller (@martin_mueller) at https://conf.splunk.com/files/2017/slides/fields-indexed-tokens-and-you.pdf
...then you will see this wording...
▶ Default assumption: Field values are whole indexed tokens
[ AND java lang NullPointerException ]
▶ Actual field extractions and post-filtering happens after loading raw events
So basically, for the event selection,
sensortype=sens* initially becomes
AND sens*, so the initial part of the search is going to find all events that have
sens* somewhere in them. That is going to literally be every record with a
sensortype= in its _raw, since
sens* will pick up the token
sensortype. It will also pick up any other fields that happen to have values starting with sens.
Since you are coding
| table * , the system cannot optimize to the fields you are asking for and EVERY field has to be expanded. Once that all gets expanded, the ones where
sensortype!=sens* get dropped, but the search still knows all the fields that were created/extracted for any of the events.