Getting Data In

Script to filter events at index time

ktn01
Path Finder

Is it possible to use a python script to perform transforms during event indexing?

My aim is to remove keys from json files to reduce volume. I'm thinking of using a python script that decodes the json, modifies the resulting dict and then encodes the result in a new json that will be indexed.

Labels (1)
0 Karma

jawahir007
Communicator

Yes, you can achieve this by using a Python script as a scripted input in Splunk. You can read the data using Python, perform the modifications as you described (decoding the JSON, updating the dictionary, and re-encoding it), and output the modified data.

Here's how it works:

  1. Create a Python Script:

    • Read the incoming data.
    • Apply the necessary transformations.
    • Print the modified JSON to standard output (stdout).
  2. Configure Scripted Input in Splunk:

    • Go to Settings > Data Inputs > Scripts.
    • Add a new scripted input and select your Python script.
    • Set a cron schedule for when the script should run.

The script will run at the configured intervals, fetch the data, apply your changes, and send the transformed data to Splunk for indexing.

Important Consideration:
The main limitation is that data ingestion will depend on the cron schedule of the scripted input, so real-time or very frequent data processing might not be achievable. Adjust the schedule as needed based on your data update frequency.

ktn01
Path Finder

Thank you for your reply.
I can't pre-process the events before ingestion in Splunk because they are directly sent by an appliance to a hec input.
Christian

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Is it possible to ask that sender reduce content of HEC event or is it used somewhere else also?
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @ktn01 ,

the only solution is apply INGEST_EVAL rules to your input, instead a python script.

Ciao.

Giuseppe

gcusello
SplunkTrust
SplunkTrust

Hi @ktn01 ,

yes, it's possible, but it isn't related to Splunk because it pre-processes data before ingestion: I did it for a customer.

Put attention to one issue: changing the format of your logs, you have to completely rebuild the parsing rules for your data because the standard parsing rules aren't still applicable to the new data format.

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

Splunk App Dev Community Updates – What’s New and What’s Next

Welcome to your go-to roundup of everything happening in the Splunk App Dev Community! Whether you're building ...

The Latest Cisco Integrations With Splunk Platform!

Join us for an exciting tech talk where we’ll explore the latest integrations in Cisco + Splunk! We’ve ...

Enterprise Security Content Update (ESCU) | New Releases

In April, the Splunk Threat Research Team had 2 releases of new security content via the Enterprise Security ...