Getting Data In

Can Splunk decode data at index time?

hulahoop
Splunk Employee
Splunk Employee

If I have a field value that is URL encoded then base-64 encoded, is it possible to have Splunk decode this field before indexing (maybe via a custom processor)? Has anyone done this before? Is it recommended? How difficult is it?

This is probably easily done with a custom search script at search time, but that is a less desirable approach as a user would need to have advanced understanding to run the search through this custom search command.

Here is a sample event with the body field encoded:

2010-02-26 03:19:29    : LOG: M=Ce3zW5GtsGE= A=anonymous S=48976970336315650 pt=100001 body=T%3d2010-02-26%2003%3a17%3a45%20PST%26L%3di%26M%3d%5bg2mfeedback%5d%26N%3d553%26X%3d%253cG2MFeedback%253e%2520FeedbackTracker%253a%253aupdate()%2520lastUpdateTime%25201267183021171%2520curTime%25201267183051205%2520timeSinceUpdate%252030034%2520currentAttentivenessState%25201%2520_currentSatisfactionState%25202%2520-%2520Tracker%2520025A6658%252c%2520Seconds%2520in%2520great%252039818%253b%2520fair%25200%253b%2520poor%25200%253b%2520attentive%252039818%253b%2520not%25200%0d%0aT%3d
1 Solution

jrodman
Splunk Employee
Splunk Employee

In ooooold days (1.0-ish), Splunk imagined the processors as a customer-available API, but there were a variety of problems. The binary interfaces were too brittle, and the API-challenges were not conducive to plugging in arbitrary code.

While it's technically still possible to plug in your own processor by wiring up the xml and building the code just so, it's not easy, and definitely not recommended.

The more loosely coupled approach of handling this in an input script is probably the way to go. You can be fancy and set up a scripted input, which will end up being responsible for checkpointing and file handling. My preference is to just have a script that preprocesses foo.log into foo.log.processed, or similar, and have Splunk watch the processed version. It's easy to write, easy to debug, and easy to configure Splunk to use.

View solution in original post

jrodman
Splunk Employee
Splunk Employee

On re-read, I don't see any performance concerns. You can achieve your field filtering transparently via a scripted lookup

http://docs.splunk.com/Documentation/Splunk/5.0/Knowledge/Addfieldsfromexternaldatasources#Set_up_a_...

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Alright, fine, you get the idea.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Hmm I meant to format that:

 FIELDALIAS-body = body AS body_encoded
 LOOKUP-urldecode = urldecode body_encoded OUTPUT body_decoded AS body
0 Karma

gkanapathy
Splunk Employee
Splunk Employee

You could also do:

FIELDALIAS-body = body AS body_encoded
LOOKUP-urldecode = urldecode body_encoded OUTPUT body_decoded AS body

This will work, as the order goes: EXTRACT, FIELDALIAS, LOOKUP. You could also just change your extraction to extract body as body_encoded, but that might be a pain if you're just using KV_MODE.

0 Karma

hulahoop
Splunk Employee
Splunk Employee

I implemented the external lookup. The encoding turns out to be a double URL encoding, not a URL encoding followed by a base 64 encoding as originally stated. The lookup works just okay--it presents a new field 'body_decoded' with decoded field value. However, since the decoding is done at search time, searching is awkward. You need to use 'body_decoded=coolstuff'. A keyword search does not work since the value of the 'body' field was not segmented at index time. We will have to pursue the alternative--process the log file before indexing. Wish this could be done in Splunk more easily.

0 Karma

hulahoop
Splunk Employee
Splunk Employee

Thank you, Josh! I think this is the most promising approach. I will give it a try and post results here.

0 Karma

jrodman
Splunk Employee
Splunk Employee

In ooooold days (1.0-ish), Splunk imagined the processors as a customer-available API, but there were a variety of problems. The binary interfaces were too brittle, and the API-challenges were not conducive to plugging in arbitrary code.

While it's technically still possible to plug in your own processor by wiring up the xml and building the code just so, it's not easy, and definitely not recommended.

The more loosely coupled approach of handling this in an input script is probably the way to go. You can be fancy and set up a scripted input, which will end up being responsible for checkpointing and file handling. My preference is to just have a script that preprocesses foo.log into foo.log.processed, or similar, and have Splunk watch the processed version. It's easy to write, easy to debug, and easy to configure Splunk to use.

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...