Currently working for a quite complex Application, i am indexing many csv files contains within Zip files.
This data has the following tabular format:
And so on, up to 128 columns.
Everything was working perfectly, with a configuration as:
# your settings INDEXED_EXTRACTIONS=csv NO_BINARY_CHECK=1 SHOULD_LINEMERGE=false # set by detected source type KV_MODE=none pulldown_type=true # Time zone of HDS data is UTC/GMT TZ=UTC
In limits.conf, i had to set the kv limit to allow more than 50 columns to be indexed:
[kv] # when non-zero, the point at which kv should stop creating new columns maxcols = 512 # maximum number of keys auto kv can generate limit = 256 # truncate _raw to to this size and then do auto KV maxchars = 10240
BUT... i lately discovered that the manufactor extracting tool (this is big data coming from storage Array) split a csv file (mostly for some like devices) in 2 part within the same file.
In exactly line "1448" of every files concerned, a new header is written containing the rest of devices between 129 and 256 (256 is the max technical number of device per unit)
Splunk can't natively work with that, as mentioned in Docs:
Splunk Enterprise does not support
renaming of header fields mid-file
Some software, such as Internet
Information Server, supports the
renaming of header fields in the
middle of the file. Splunk does not
recognize changes such as this. If you
attempt to index a file which has
header fields renamed within the file,
Splunk does not index the renamed
Off course, i understand and the message is clear enough, but i keep hope that some advanced technique like redirecting some part of the file to null queue, and some other not, or some technique to simulate having 2 source type for the same file could be possible
Or perhaps some regex stuff, i don't know yet...
I anyone would have some idea on how this could be managed, i'm sure this would be an interesting case for others 🙂
Thanks in advance for any help and answer!
You can use a LINE_BREAKER to break the events, like this
Found this answer while looking for something else and I disagree that this can’t be handled by splunk. See my answer for more details.
Just note with large csv files you may also have to tweak limits.conf [kv] stanza values too get all the fields to display in search.
Just found this post:
It seems a line breaker could split my csv file as i have a new header like:
No. time Device1 Device2 ...
Trie adding this in data preview:
LINE_BREAKER = ([\r\n]+)"No."
No sucess yet...