Getting Data In

File with Header not getting indexed

Parameshwara
Path Finder
[test_header]
INDEXED_EXTRACTIONS = CSV
HEADER_FIELD_LINE_NUMBER = 1
KV_MODE = none
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false
pulldown_type = 1
TRANSFORMS-NoHeader = test_header

First file gets indexed accordingly with only the data captured and header ignored, but subsequent files are not indexed at all.

0 Karma

Parameshwara
Path Finder

At the moment I'm not using crcSalt setting, as mentioned I don't want any possibility of logs being re-indexed.

My working configuration...

PROPS.CONF:
[host::testcsvwithheader]
CHECK_METHOD = entire_md5
HEADER_FIELD_LINE_NUMBER = 1
INDEXED_EXTRACTIONS = CSV
KV_MODE = none
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false
pulldown_type = 1
REPORT-AutoHeader = skipheader

INPUTS.CONF
[monitor:///...]
disabled = false
followTail = 0
host = testcsvwithheader
index = test
sourcetype = testcsvwithheader
initCrcLength = 654
0 Karma

Parameshwara
Path Finder

I'll test out the suggested configuration.

I installed a new instance of Splunk 6.02 on my laptop, created a test app and using the same configurations tried pulling in data for indexing the same set of files. It WORKED! My header is 433 characters. I'm a bit stumped, but feel like this is a bug.

0 Karma

Parameshwara
Path Finder

[monitor:...]
disabled = false
followTail = 0
host = testheader
index = testheader
sourcetype = testheader

Above is my inputs.conf. I'll check out the "CHECK_METHOD = entire_md5" option, and thanks for pointing out the correct stanza it works with.

marcoscala
Builder

I had a similar problem due to the first 260 chars in the file being alway the same due to long headers.

I solved this in the inputs.conf like this:

[monitor:///........./appdir/SD*.ERR_*.Z]
disabled = false
followTail = 0
sourcetype = my_sourcetype
initCrcLength = 330
crcSalt = <SOURCE>

In my case, we had thousands of file being written in the same "appdir" and severa times the "ERR" files were skipped because of same headers.

Marco

Parameshwara
Path Finder

Read about crcSalt option and decided not to use that. Thanks.

0 Karma

miteshvohra
Contributor

Using "checkMethod" and "initCrcLength" is better than using "crcSalt". Be cautious about using attribute with rolling log files; it could lead to the log file being re-indexed after it has rolled over and in turn, consume your indexing license as well.

Parameshwara
Path Finder

I'll test out the suggested configuration.

I installed a new instance of Splunk 6.02 on my laptop, created a test app and using the same configurations tried pulling in data for indexing the same set of files. It WORKED! My header is 433 characters. I'm a bit stumped, but feel like this is a bug.

0 Karma

marcoscala
Builder

beware that this option is valid only for a stanza like [source::filename]

miteshvohra
Contributor

Add "CHECK_METHOD = entire_md5" to props.conf file and retry.

Splunk, by default, check the first and last 256 bytes of the file. When it's finds matches, Splunk lists the file as already indexed and indexes only new data, or ignores it if there is no new data.

http://docs.splunk.com/Documentation/Splunk/6.0.2/admin/Propsconf

kristian_kolb
Ultra Champion

what does your inputs.conf look like?

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...