Hi All,
So following this excellent blog post I thought I found a solution to ingesting a binary logfile with Splunk.
https://www.splunk.com/blog/2011/07/19/the-naughty-bits-how-to-splunk-binary-logfiles.html
Unfortunately nothing is making it into Splunk at all.
inputs.conf
[monitor://$SPLUNK_HOME/etc/apps/TA-myBinary/spool/*.evl]
disabled = 0
index = myBinary
sourcetype = myBinary:evl
followTail = 0
props.conf
[myBinary:evl]
NO_BINARY_CHECK = true
invalid_cause = archive
unarchive_cmd = /opt/splunk/etc/apps/TA-myBinary/bin/decode_evl.py
unarchive_sourcetype = myBinary:evl:unarchived
priority = 10
TIME_FORMAT = %Y-%m-%d %H:%M:%S
SHOULD_LINEMERGE = false
Some of those additional props entries (unarchive_sourcetype, priority) I discovered as attempts to resolve the issue from searching answers high and low.
Sadly nothing makes it into Splunk and there are no errors during processing.
The script runs just fine and will extract the data manually.
cat ../spool/20170529.evl | ./decode_evl.py > ../spool/20170529.evl.out
Initially I thought it might be my script not writing to stdout so I made sure it did that.
#!/usr/bin/python
import os, sys, json, time
from datetime import datetime
from HTMLParser import HTMLParser
import logging
import pprint
import binascii
..... decoding components......
while 1:
splunkEvent = readEvent()
if len(splunkEvent['splunkMessage']) == 0:
break
sys.stdout.write(splunkEvent['splunkMessage'] + '\n')
# Flush out any extra data
sys.stdout.flush()
sys.exit()
splunkd.log
06-15-2017 09:06:23.165 +1000 INFO ArchiveProcessor - Handling file=/opt/splunk/etc/apps/TA-myBinary/spool/20170529.evl
06-15-2017 09:06:23.165 +1000 INFO ArchiveProcessor - reading path=/opt/splunk/etc/apps/TA-myBinary/spool/20170529.evl (seek=0 len=59936626)
06-15-2017 09:06:23.324 +1000 INFO ArchiveProcessor - Finished processing file '/opt/splunk/etc/apps/TA-myBinary/spool/20170529.evl', removing from stats
Now that I look at the above timings. This particular file takes at least 1 minute to parse the binary file and the three log events above all happen within a second of each other.
Anyone with any thoughts on what could be wrong here?
According to the docs for props.conf:
unarchive_cmd = <string>
* Only called if invalid_cause is set to "archive".
* This field is only valid on [source::<source>] stanzas.
Perhaps you need to add the unarchive portion in a stanza like:
[source::$SPLUNK_HOME/etc/apps/TA-myBinary/spool/*.evl]
unarchive_cmd = /opt/splunk/etc/apps/TA-myBinary/bin/decode_evl.py
unarchive_sourcetype = myBinary:evl:unarchived
It does look tricky, as some of the configs need to be in source stanzas, and others in sourcetype stanzas.
Hi phoenixdigital,
looks like the option unarchive_cmd
in props.conf
is only valid on source
and NOT sourcetype
unarchive_cmd = <string>
* Only called if invalid_cause is set to "archive".
* This field is only valid on [source::<source>] stanzas.
Also this option must be applied at input time.
After reading the docs and the blog I now wonder which one is wrong ???
Hope this helps ...
cheers, MuS
Bingo! That was exactly the problem.
I was just coming back to correct my post and answer my own question but you all beat me to it.
I did read this a few days ago and was certain I read somewhere it was only valid on sourcetype and not source.
According to the docs for props.conf:
unarchive_cmd = <string>
* Only called if invalid_cause is set to "archive".
* This field is only valid on [source::<source>] stanzas.
Perhaps you need to add the unarchive portion in a stanza like:
[source::$SPLUNK_HOME/etc/apps/TA-myBinary/spool/*.evl]
unarchive_cmd = /opt/splunk/etc/apps/TA-myBinary/bin/decode_evl.py
unarchive_sourcetype = myBinary:evl:unarchived
It does look tricky, as some of the configs need to be in source stanzas, and others in sourcetype stanzas.
HeHe, again failed on slow typing 🙂 well done Sir!
Thanks again that was exactly the problem. I'm sure I read somewhere a few days ago that unarchive_cmd had to be applied to sourcetypes only but there it is in black an white.
It is working perfectly now.
Sorry MuS had to unaccept your answer and give it to micahkemp who beat you by 6 minutes.
no worries! that is only fair to accept the first correct answer 😉
One thing that has puzzled me about this "cheat" for ingesting binary data.
What happens if new data is written to the end of these files?
Does it reingest it all over again when new data is appended?
Do I need to keep track of where I left off in my script or will Splunk just send me the new section of data?
I'll find out in a few hours after some more testing no doubt.
My guess would be that Splunk will keep track of the position where it last was in the file like any other input monitor. Please provide some feedback on this topic.
cheers, MuS