Splunk Search
Highlighted

Why does Splunk include null columns in data by default?

Communicator

This is something I've always wondered, and I can't understand the reasoning behind it or how to fix it.

This is my search:

index=_internal "alert_name" source="/opt/splunk/var/log/splunk/scheduler.log" 
| table *

The search yields 144 results. Of those 144 results, there are roughly 30 fields total.

So why, when I run |table * do I get 100+ fields showing up in my table?

The only reason I can think of that makes sense is that Splunk is automatically displaying every field that exists in that source/sourcetype, regardless of if it's being used in the current dataset. If that is the case, what is the reasoning behind that? It seems terribly inefficient and impractical to load a ton of empty fields.

I know there are tricks to get around this problem, such as:

|...base search...
| streamstats count as temp_count
| stats values(*) as * by temp_count
| fields - temp_count

But it seems ridiculous that removing null columns isn't how Splunk works with fields by default.
I feel like I have a fundamental misunderstanding of this, and would appreciate any guidance on not only why it happens, but what I can do only show non-null columns in my data by default in the future.

Below is a snippet of my dataset. As you can see, over half of the fields contain no data at all.
alt text

0 Karma
Highlighted

Re: Why does Splunk include null columns in data by default?

Communicator

If you table everything using a wildcard (not a good idea) you create a column for every field that exists in every event. If a particular event doesn't have that field, you won't get a value for that field (how could you?).

This is expected behaviour. It's not "inefficient and impractical to load a ton of empty fields" because Splunk isn't actually doing that. What you're seeing is schema defined at search time based on what the indexer(s) have returned to the search head.

View solution in original post

Highlighted

Re: Why does Splunk include null columns in data by default?

Esteemed Legend

Splunk infers a schema from all results returned from the base search that is everything before the first pipe. If you later throw some events away, the inferred schema is maintained, until you do a | stats, | chart or | timechart at which point, the schema is reset. That's just the way that it is. I worked on some SPL to automagically drop the all-null fields but was unable to get anything to work.

0 Karma