All Apps and Add-ons

Extracting fields from undelimited binary data?

rapple1066
Explorer

I've got data coming in that's a hex string (binary fields). They're not delimited, but they do follow a fixed format.

Offset 0 , 1 byte = Index

Offset 1, 1 byte = Data Type

Offset 2, 2 bytes = Sequence Number

Offset 4, 4 bytes = Interval

Offset 8, 4 bytes = Timestamp (seconds)

Offset 12, 4 bytes = Timestamp2 (nanoseconds)

Offset 16, 4 bytes = 32 bit counter #1

Offset 20, 4 bytes = 32 bit counter #2

...followed by 30 additional 4 byte counter fields.

From what I understand, I need to use SEDCMD to insert delimiters and then use DELIM to allow the fields to be extracted? Any help on the syntax would be greatly appreciated since my SED is about 20 years rusty.

cschmidt0121
Path Finder

Is that how you want the data to look in splunk? If not I highly recommend setting up an input like the blog post Ayn suggested. Parse the data with a python script and output it as with human readable timestamps, fields, etc. To be honest, I have no clue how that Splunk excerpt could possibly represent the raw data.

0 Karma

cschmidt0121
Path Finder

Yeah, I definitely think the least painful solution to this is to simplify your data before it makes its way into Splunk. It looks like Splunk is trying and failing to parse the data - for example, isn't there a huge chunk of data missing? I count 36 bytes (minus all of the /x's) in each event in your screenshot. There should be a LOT more, correct?

0 Karma

rapple1066
Explorer
0 Karma

rapple1066
Explorer

Sorry.. that's how it shows up RAW in splunk when it comes in off the wire.

Maybe a better explanation of the data would help?

The data represents performance data (packet counts) from a network appliance. Every millisecond, we send a UDP packet to splunk that has the number of bytes observed in that time period. The beginning of the packet has some housekeeping info (Index, datatype, sequence #), 2 timestamps (seconds,and nanoseconds) and then the counter data from 32 "interfaces". The goal is to be able to report against each of the counters over time.

0 Karma

cschmidt0121
Path Finder

Couldn't you just do this with a rex extraction?

Something like:

rex field=_raw "(?.{2})(?.{2})(?.{4})(?.{8})(?.{8})(?.{8})(?.{8})(?.{8})... etc

rapple1066
Explorer

Thanks all..

Here's a sample record:

00010cc503e851a8c733248e0b380274d41000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

This is how it shows up in Splunk:

\xA4\xFC\xE8Q\xC34 ,
\xA4\xFD\xE8Q\xC34,
\xA4\xFE\xE8Q\xC34,
\xA4\xFF\xE8Q\xC34,
\xA5
\xA5\xE8Q\xC34,
\xA5\xE8Q\xC34,
\xA5\xE8Q\xC34,
\xA5\xE8Q\xC34,

0 Karma

Ayn
Legend

This blog post might be of interest, even though it's dealing with raw binary data and not just a hex representation of it: http://blogs.splunk.com/2011/07/19/the-naughty-bits-how-to-splunk-binary-logfiles/

cschmidt0121
Path Finder

If it doesn't, post an example of one of the raw events and I can try to fix my regex.

0 Karma

rapple1066
Explorer

I sure hope so... that looks vastly simpler than what I've been trying to do. I'll give that a shot.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...