All Apps and Add-ons

Extracting fields from undelimited binary data?

rapple1066
Explorer

I've got data coming in that's a hex string (binary fields). They're not delimited, but they do follow a fixed format.

Offset 0 , 1 byte = Index

Offset 1, 1 byte = Data Type

Offset 2, 2 bytes = Sequence Number

Offset 4, 4 bytes = Interval

Offset 8, 4 bytes = Timestamp (seconds)

Offset 12, 4 bytes = Timestamp2 (nanoseconds)

Offset 16, 4 bytes = 32 bit counter #1

Offset 20, 4 bytes = 32 bit counter #2

...followed by 30 additional 4 byte counter fields.

From what I understand, I need to use SEDCMD to insert delimiters and then use DELIM to allow the fields to be extracted? Any help on the syntax would be greatly appreciated since my SED is about 20 years rusty.

cschmidt0121
Path Finder

Is that how you want the data to look in splunk? If not I highly recommend setting up an input like the blog post Ayn suggested. Parse the data with a python script and output it as with human readable timestamps, fields, etc. To be honest, I have no clue how that Splunk excerpt could possibly represent the raw data.

0 Karma

cschmidt0121
Path Finder

Yeah, I definitely think the least painful solution to this is to simplify your data before it makes its way into Splunk. It looks like Splunk is trying and failing to parse the data - for example, isn't there a huge chunk of data missing? I count 36 bytes (minus all of the /x's) in each event in your screenshot. There should be a LOT more, correct?

0 Karma

rapple1066
Explorer
0 Karma

rapple1066
Explorer

Sorry.. that's how it shows up RAW in splunk when it comes in off the wire.

Maybe a better explanation of the data would help?

The data represents performance data (packet counts) from a network appliance. Every millisecond, we send a UDP packet to splunk that has the number of bytes observed in that time period. The beginning of the packet has some housekeeping info (Index, datatype, sequence #), 2 timestamps (seconds,and nanoseconds) and then the counter data from 32 "interfaces". The goal is to be able to report against each of the counters over time.

0 Karma

cschmidt0121
Path Finder

Couldn't you just do this with a rex extraction?

Something like:

rex field=_raw "(?.{2})(?.{2})(?.{4})(?.{8})(?.{8})(?.{8})(?.{8})(?.{8})... etc

rapple1066
Explorer

Thanks all..

Here's a sample record:

00010cc503e851a8c733248e0b380274d41000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

This is how it shows up in Splunk:

\xA4\xFC\xE8Q\xC34 ,
\xA4\xFD\xE8Q\xC34,
\xA4\xFE\xE8Q\xC34,
\xA4\xFF\xE8Q\xC34,
\xA5
\xA5\xE8Q\xC34,
\xA5\xE8Q\xC34,
\xA5\xE8Q\xC34,
\xA5\xE8Q\xC34,

0 Karma

Ayn
Legend

This blog post might be of interest, even though it's dealing with raw binary data and not just a hex representation of it: http://blogs.splunk.com/2011/07/19/the-naughty-bits-how-to-splunk-binary-logfiles/

cschmidt0121
Path Finder

If it doesn't, post an example of one of the raw events and I can try to fix my regex.

0 Karma

rapple1066
Explorer

I sure hope so... that looks vastly simpler than what I've been trying to do. I'll give that a shot.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...