Good day Splunkers ,
We have a Data flow coming from the source A to Kakfa Topic. Splunk Connector on the kafka using HEC Token to forward data from the Kafka Topic to Splunk HF. Sourcetype if specified while configuring the HEC.
This source event has huge volume , and have many key-value pairs , To Manage the High ingestion Volume , I need to apply truncate feature on all these events at the heavy forwarder layer before it reaches indexing layer.
Is it possible to choose only selected fields from these events and have them indexed ?
is it possible to use script applied on the source type to format the data which is coming from HEC input at the HF level ?
@PickleRick Thanks for your response !! I'm eager to know any solution that we could within splunk feature (scripts included) !!
As I said - you can't spawn a script within a Splunk processing pipeline. So you're mostly limited to https://docs.splunk.com/Documentation/Splunk/9.0.4/Data/Anonymizedata
1. No. The TRUNCATE option just cuts the event at given point and doesn't care about the logical structure of the event. And since it's relatively early on in the event processing pipeline, all the data after the truncation point is irrevocably lost
2. No. Not while the events are being processed by Splunk's "internal" pipelines. If you want to manipulate the data prior to ingesting them you'd have to implement some form of a data-mangling proxy in front of your HEC input so that you'd first receive the event from your source, cut and splice it and then forward the resulting event to the HEC input.
Other option, if you know that your data will always be in a pretty strictly defined format, you could use regexes to "extract" only some parts of the event using SEDCMD but I suppose with this fancy stuff of yours 😉 (I've never worked with Kafka) you're getting some json or something like that.
You could try to use indexed fields to extract data using index-time extractions and then truncate events "manially" but that's generally not a very good idea. And, in index-time processing you only have regexes and INGEST_EVAL at your disposal so no fancy search-time parsed fields (which means that manipulating json structure is not easy/next to impossible.