Getting Data In

Script to filter events at index time

ktn01
Path Finder

Is it possible to use a python script to perform transforms during event indexing?

My aim is to remove keys from json files to reduce volume. I'm thinking of using a python script that decodes the json, modifies the resulting dict and then encodes the result in a new json that will be indexed.

Labels (1)
0 Karma

jawahir007
Communicator

Yes, you can achieve this by using a Python script as a scripted input in Splunk. You can read the data using Python, perform the modifications as you described (decoding the JSON, updating the dictionary, and re-encoding it), and output the modified data.

Here's how it works:

  1. Create a Python Script:

    • Read the incoming data.
    • Apply the necessary transformations.
    • Print the modified JSON to standard output (stdout).
  2. Configure Scripted Input in Splunk:

    • Go to Settings > Data Inputs > Scripts.
    • Add a new scripted input and select your Python script.
    • Set a cron schedule for when the script should run.

The script will run at the configured intervals, fetch the data, apply your changes, and send the transformed data to Splunk for indexing.

Important Consideration:
The main limitation is that data ingestion will depend on the cron schedule of the scripted input, so real-time or very frequent data processing might not be achievable. Adjust the schedule as needed based on your data update frequency.

ktn01
Path Finder

Thank you for your reply.
I can't pre-process the events before ingestion in Splunk because they are directly sent by an appliance to a hec input.
Christian

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Is it possible to ask that sender reduce content of HEC event or is it used somewhere else also?
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @ktn01 ,

the only solution is apply INGEST_EVAL rules to your input, instead a python script.

Ciao.

Giuseppe

gcusello
SplunkTrust
SplunkTrust

Hi @ktn01 ,

yes, it's possible, but it isn't related to Splunk because it pre-processes data before ingestion: I did it for a customer.

Put attention to one issue: changing the format of your logs, you have to completely rebuild the parsing rules for your data because the standard parsing rules aren't still applicable to the new data format.

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

AppDynamics Summer Webinars

This summer, our mighty AppDynamics team is cooking up some delicious content on YouTube Live to satiate your ...

SOCin’ it to you at Splunk University

Splunk University is expanding its instructor-led learning portfolio with dedicated Security tracks at .conf25 ...

Credit Card Data Protection & PCI Compliance with Splunk Edge Processor

Organizations handling credit card transactions know that PCI DSS compliance is both critical and complex. The ...