Getting Data In

unarchive_cmd for decoding binary file with python script

phoenixdigital
Builder

Hi All,

So following this excellent blog post I thought I found a solution to ingesting a binary logfile with Splunk.
https://www.splunk.com/blog/2011/07/19/the-naughty-bits-how-to-splunk-binary-logfiles.html

Unfortunately nothing is making it into Splunk at all.
inputs.conf

[monitor://$SPLUNK_HOME/etc/apps/TA-myBinary/spool/*.evl]
disabled = 0
index = myBinary
sourcetype = myBinary:evl
followTail = 0

props.conf

[myBinary:evl]
NO_BINARY_CHECK = true
invalid_cause = archive
unarchive_cmd = /opt/splunk/etc/apps/TA-myBinary/bin/decode_evl.py
unarchive_sourcetype = myBinary:evl:unarchived
priority = 10

TIME_FORMAT = %Y-%m-%d %H:%M:%S
SHOULD_LINEMERGE = false

Some of those additional props entries (unarchive_sourcetype, priority) I discovered as attempts to resolve the issue from searching answers high and low.

Sadly nothing makes it into Splunk and there are no errors during processing.

The script runs just fine and will extract the data manually.

cat ../spool/20170529.evl | ./decode_evl.py > ../spool/20170529.evl.out

Initially I thought it might be my script not writing to stdout so I made sure it did that.

#!/usr/bin/python

import os, sys, json, time
from datetime import datetime
from HTMLParser import HTMLParser
import logging
import pprint
import binascii

..... decoding components......

while 1: 
    splunkEvent = readEvent()
    if len(splunkEvent['splunkMessage']) == 0:
        break

    sys.stdout.write(splunkEvent['splunkMessage'] + '\n')

# Flush out any extra data
sys.stdout.flush()

sys.exit()

splunkd.log

06-15-2017 09:06:23.165 +1000 INFO  ArchiveProcessor - Handling file=/opt/splunk/etc/apps/TA-myBinary/spool/20170529.evl
06-15-2017 09:06:23.165 +1000 INFO  ArchiveProcessor - reading path=/opt/splunk/etc/apps/TA-myBinary/spool/20170529.evl (seek=0 len=59936626)
06-15-2017 09:06:23.324 +1000 INFO  ArchiveProcessor - Finished processing file '/opt/splunk/etc/apps/TA-myBinary/spool/20170529.evl', removing from stats

Now that I look at the above timings. This particular file takes at least 1 minute to parse the binary file and the three log events above all happen within a second of each other.

Anyone with any thoughts on what could be wrong here?

0 Karma
1 Solution

micahkemp
Champion

According to the docs for props.conf:

unarchive_cmd = <string>
* Only called if invalid_cause is set to "archive".
* This field is only valid on [source::<source>] stanzas.

Perhaps you need to add the unarchive portion in a stanza like:

[source::$SPLUNK_HOME/etc/apps/TA-myBinary/spool/*.evl]
unarchive_cmd = /opt/splunk/etc/apps/TA-myBinary/bin/decode_evl.py
unarchive_sourcetype = myBinary:evl:unarchived

It does look tricky, as some of the configs need to be in source stanzas, and others in sourcetype stanzas.

View solution in original post

MuS
Legend

Hi phoenixdigital,

looks like the option unarchive_cmd in props.conf is only valid on source and NOT sourcetype

unarchive_cmd = <string>
* Only called if invalid_cause is set to "archive".
* This field is only valid on [source::<source>] stanzas.

Also this option must be applied at input time.

After reading the docs and the blog I now wonder which one is wrong ???

Hope this helps ...

cheers, MuS

phoenixdigital
Builder

Bingo! That was exactly the problem.

I was just coming back to correct my post and answer my own question but you all beat me to it.

I did read this a few days ago and was certain I read somewhere it was only valid on sourcetype and not source.

0 Karma

micahkemp
Champion

According to the docs for props.conf:

unarchive_cmd = <string>
* Only called if invalid_cause is set to "archive".
* This field is only valid on [source::<source>] stanzas.

Perhaps you need to add the unarchive portion in a stanza like:

[source::$SPLUNK_HOME/etc/apps/TA-myBinary/spool/*.evl]
unarchive_cmd = /opt/splunk/etc/apps/TA-myBinary/bin/decode_evl.py
unarchive_sourcetype = myBinary:evl:unarchived

It does look tricky, as some of the configs need to be in source stanzas, and others in sourcetype stanzas.

MuS
Legend

HeHe, again failed on slow typing 🙂 well done Sir!

0 Karma

phoenixdigital
Builder

Thanks again that was exactly the problem. I'm sure I read somewhere a few days ago that unarchive_cmd had to be applied to sourcetypes only but there it is in black an white.

It is working perfectly now.

Sorry MuS had to unaccept your answer and give it to micahkemp who beat you by 6 minutes.

0 Karma

MuS
Legend

no worries! that is only fair to accept the first correct answer 😉

0 Karma

phoenixdigital
Builder

One thing that has puzzled me about this "cheat" for ingesting binary data.

What happens if new data is written to the end of these files?

Does it reingest it all over again when new data is appended?

Do I need to keep track of where I left off in my script or will Splunk just send me the new section of data?

I'll find out in a few hours after some more testing no doubt.

0 Karma

MuS
Legend

My guess would be that Splunk will keep track of the position where it last was in the file like any other input monitor. Please provide some feedback on this topic.

cheers, MuS

0 Karma
Get Updates on the Splunk Community!

Index This | Divide 100 by half. What do you get?

November 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

❄️ Celebrate the season with our December lineup of Community Office Hours, Tech Talks, and Webinars! ...

Splunk and Fraud

Watch Now!Watch an insightful webinar where we delve into the innovative approaches to solving fraud using the ...