I've got data coming in that's a hex string (binary fields). They're not delimited, but they do follow a fixed format.
Offset 0 , 1 byte = Index
Offset 1, 1 byte = Data Type
Offset 2, 2 bytes = Sequence Number
Offset 4, 4 bytes = Interval
Offset 8, 4 bytes = Timestamp (seconds)
Offset 12, 4 bytes = Timestamp2 (nanoseconds)
Offset 16, 4 bytes = 32 bit counter #1
Offset 20, 4 bytes = 32 bit counter #2
...followed by 30 additional 4 byte counter fields.
From what I understand, I need to use SEDCMD to insert delimiters and then use DELIM to allow the fields to be extracted? Any help on the syntax would be greatly appreciated since my SED is about 20 years rusty.
Is that how you want the data to look in splunk? If not I highly recommend setting up an input like the blog post Ayn suggested. Parse the data with a python script and output it as with human readable timestamps, fields, etc. To be honest, I have no clue how that Splunk excerpt could possibly represent the raw data.
Yeah, I definitely think the least painful solution to this is to simplify your data before it makes its way into Splunk. It looks like Splunk is trying and failing to parse the data - for example, isn't there a huge chunk of data missing? I count 36 bytes (minus all of the /x's) in each event in your screenshot. There should be a LOT more, correct?
Here's what I'm seeing:
http://i236.photobucket.com/albums/ff31/spongerapple/splunk.jpg
Sorry.. that's how it shows up RAW in splunk when it comes in off the wire.
Maybe a better explanation of the data would help?
The data represents performance data (packet counts) from a network appliance. Every millisecond, we send a UDP packet to splunk that has the number of bytes observed in that time period. The beginning of the packet has some housekeeping info (Index, datatype, sequence #), 2 timestamps (seconds,and nanoseconds) and then the counter data from 32 "interfaces". The goal is to be able to report against each of the counters over time.
Couldn't you just do this with a rex extraction?
Something like:
rex field=_raw "(?
Thanks all..
Here's a sample record:
00010cc503e851a8c733248e0b380274d41000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
This is how it shows up in Splunk:
\xA4\xFC\xE8Q\xC34,
\xA4\xFD\xE8Q\xC34,
\xA4\xFE\xE8Q\xC34,
\xA4\xFF\xE8Q\xC34,
\xA5
\xA5\xE8Q\xC34,
\xA5\xE8Q\xC34,
\xA5\xE8Q\xC34,
\xA5\xE8Q\xC34,
This blog post might be of interest, even though it's dealing with raw binary data and not just a hex representation of it: http://blogs.splunk.com/2011/07/19/the-naughty-bits-how-to-splunk-binary-logfiles/
If it doesn't, post an example of one of the raw events and I can try to fix my regex.
I sure hope so... that looks vastly simpler than what I've been trying to do. I'll give that a shot.