Getting Data In

CHECK_FOR_HEADER weirdness.

Splunker
Communicator

Hi all,

Running Splunk v4.2.3 and using a Universal Forwarder (4.2.3) to "monitor" a CSV file and forward back to a main indexer for processing.

Trying to make use of the CHECK_FOR_HEADER feature in props.conf (on the indexer) for a CSV file with a "," delimited column header string.

The column header in the raw log is sent out at midnight every night.

I setup a simple $SPLUNK_HOME/etc/system/local/props.conf with:


[my_sourcetype]
CHECK_FOR_HEADER = True

And on the indexers inputs.conf i explicitly set:


[monitor::C:\file.txt]
sourcetype=my_sourcetype

However i see no $SPLUNK_HOME/etc/apps/learned/local/transforms.conf but i do see some my_sourcetype-9, my_sourcetype-10 within Splunk so i think it is working in some capacity.

I've tried adding priority=101 beneath the CHECK_FOR_HEADER entry in props.conf but no change.

I can do manual fields with props.conf/transforms.conf but there are actually a number of CSV files and doing it manually is a heck of a lot of typing, i'd like to avoid if possible 🙂

Would appreciate any thoughts / comments.

Chris.

Tags (1)

sideview
SplunkTrust
SplunkTrust

After a lot of long roads dealing with this issue in customer installs, I've come to the following conclusions:

1) given CHECK_FOR_HEADER's behavior around trapping important stanzas in 'learned' and renaming sourcetypes from 'foo' to 'foo-N', CHECK_FOR_HEADER is indeed evil. However if you have a sourcetype where the header fields are not reliably the same from system to system, it can be considered a necessary evil.

2) It gets worse before it gets better. In addition to breaking when you're using any kind of Splunk forwarder, it also breaks in distributed search. In short the knowledge on the search-head is always used to run the various aspects of the search, and as long as those AutoHeader rules are not on the search-head, those searches wont run right.

3) Despite all this rampant evil, it's possible to get things back in working order by diligently copying the "learned" stanzas out of "/etc/apps/learned/props.conf" and "/etc/apps/learned/transforms.conf", and putting those along with all your base "CHECK_FOR_HEADER stanzas, onto all indexers and search heads. This will make everything work properly. The main drawback is that whenever your production system decides to output a new kind of header row for the sourcetype that the Splunk system has never seen, those new learned configs will also be trapped on the forwarders and you'll have to do it all again.

4) While you can also set up specific regexes, you'll hit limits there. Possibly depending on how you construct the field extraction, you'll have a 32-character limit on field names plus also a 100 field limit. (Both of these limits are too low for this to be feasible in the data I'm talking about)

Specifics:

Specifically, I think it's a necessary evil in Cisco CallManager data. From version to version Cisco changes the field list slightly. Since my app has to work with all possible versions from old builds like 4.X up through 8.X, and since it's not a feasible task for me to know ahead of time all the dozens if not hundreds of different headers that have been in existence for the past ten years, my hand is a little forced. Also several of the field names are longer than 32 characters and there's over a hundred of them per row.

Obviously in the long term I am eager for Splunk to come up with some better method of indexing CSV's - a way that isn't subject to the current problems, but still preserves the nice parts of the current automatic behavior..

gkanapathy
Splunk Employee
Splunk Employee

The simple answer is that CHECK_FOR_HEADER is essentially useless when you're using a forwarder (regardless of whether it's a Universal, Light, or Heavy Forwarder). The simplest solution, and the only one I use, is to set CHECK_FOR_HEADER = false (to prevent the renaming of the sourcetypes), and then manually configure the appropriate FIELDS and DELIMS settings in props.conf and transforms.conf on the search head.

ftk
Motivator

Which also explains why so many people are having problems with the IIS header extractions...

0 Karma

sideview
SplunkTrust
SplunkTrust

Agreed. I have run into this myself and I completely agree that the only way for CHECK_FOR_HEADER's weird system to work is when it's all happening on the same machine.

0 Karma
Get Updates on the Splunk Community!

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Survey for Splunk Admins and App Developers is open now! | Earn a $35 gift card!      Hello there,  Splunk ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...