Getting Data In

How to keep particular parts of a raw event using regex and drop the rest before indexing?

DanAlexander
Communicator

Hello network,

Hope this message finds you All well.

I have a challenge I would like to solve and I am sure with your help this can be done.

Problem: The following scenario represents our desire to be able to index the highlighted data/kv pairs only and dropping all of the rest from the event sample PSB (please let us know if you would need the original regex and the obfuscated event sample).

DanAlexander_1-1684760199142.png

The regex properly selects what I would like to keep before the data gets indexed on the idexers (I learnt this can be done on the indexers not just on the Heavy Forwarder/s, which we would like to avoid in our forwarding topology approach).

Unfortunately, we cannot do this via the Ingest Actions Ruleset UI on the Cluster Manager (it is designed to drop off the entire event matching a regex not particular parts of that event).

Question: Is there a way we can use props and transforms or any other mechanism to govern the dropping of the unwanted parts from the event above, during the parsing phase and before the data gets indexed?

Thank you!

 

Labels (4)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

Yes, indexers do process data before it gets indexed.  That's the primary function of an indexer.  A heavy forwarder is just an indexer that does not store data.  Index-time settings must be deployed on the first one (HF or indexer) that sees the data.  You're right to want to avoid an intermediate layer, IMO.

As for how to modify the data, I think SEDCMD is easiest to do.  Put this in the relevant props.conf file:

[mysourcetype]
SEDCMD-stripFields = s/\[action:"(?<Action>\w+)"|origin:"(?<Origin>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"|layer_name:"(?<Text>\w+)"|dst:"(?<dest>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"|src:"(?<Source>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"/\1,\2,\3,\4,\5/

It will keep only the 5 capture groups, separated by commas.

---
If this reply helps you, Karma would be appreciated.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

You may be able to use SEDCMD settings in props.conf to remove selected texts from events before indexing.  This must be done in heavy forwarders, if you have them.

The same thing may be possible using transforms and also must be done on HFs, if the data passes through them.

If we had actual text (instead of a screen shot), we might be able to test some possible solutions for you.

---
If this reply helps you, Karma would be appreciated.

DanAlexander
Communicator

Hi @richgalloway 

Thanks for the reply. 

Here is the working pair regex/log (works according to regex101):

\[action:"(?<Action>\w+)"|origin:"(?<Origin>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"|layer_name:"(?<Text>\w+)"|dst:"(?<dest>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"|src:"(?<Source>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"

xxxxxxxx - [action:"Accept"; xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx origin:"10.181.11.111"; xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx dst:"192.168.22.9"; xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx src:"10.181.111.111"]

I wanted to learn how to set up the configuration properly and then explore opportunities around where exactly this can be deployed either on a HF or on the Indexers. I heard that the Indexers can process the data before it gets indexed (I might be wrong just wanted to avoid an intermediate layer whenever possible), but first thing first.

Thank you for your feedback so far. Looking forward to receiving further help.

Regards,

Dan 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Yes, indexers do process data before it gets indexed.  That's the primary function of an indexer.  A heavy forwarder is just an indexer that does not store data.  Index-time settings must be deployed on the first one (HF or indexer) that sees the data.  You're right to want to avoid an intermediate layer, IMO.

As for how to modify the data, I think SEDCMD is easiest to do.  Put this in the relevant props.conf file:

[mysourcetype]
SEDCMD-stripFields = s/\[action:"(?<Action>\w+)"|origin:"(?<Origin>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"|layer_name:"(?<Text>\w+)"|dst:"(?<dest>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"|src:"(?<Source>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"/\1,\2,\3,\4,\5/

It will keep only the 5 capture groups, separated by commas.

---
If this reply helps you, Karma would be appreciated.

DanAlexander
Communicator

Hi @richgalloway 

Trust my message finds you well.

I have implemented it as advised, but so far I cannot see a change within the structure of the raw event. I did put the regex in the TA app for Windows props.conf pushed the new bundle across all indexers, but still does not work.

Shall I put it in the etc/system/local/props.conf on each indexer, as I think if I make a change in this directory on the cluster master, I would not be able to push the new bundle across all indexers (I might be wrong). 

Where am I going wrong?

Abs no idea...

0 Karma

richgalloway
SplunkTrust
SplunkTrust

If you change etc/system/local on each indexer then you will not be able to override that change via the CM.

Use btool on one of the indexers to verify your pushed change is in effect.

Remember that the change applies only to new data.

---
If this reply helps you, Karma would be appreciated.
0 Karma

DanAlexander
Communicator

Hi,

Thanks for your feedback @richgalloway Your help is much appreciated. No need to test to give kudos to people that cater for others.

One last question please. Where can I get a comprehensive list of SEDCMD options (like stripFields) some samples may be helpful too.

Kind regards,

Dan

0 Karma

richgalloway
SplunkTrust
SplunkTrust

One last question please. Where can I get a comprehensive list of SEDCMD options (like stripFields) some samples may be helpful too.

stripFields is not an option.  It's a name I made up.  It exists only to distinguish this SEDCMD from others.  See props.conf.spec or https://docs.splunk.com/Documentation/Splunk/9.0.4/Admin/Propsconf#:~:text=data.%0A*%20Default%3A%20...

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...

New! Splunk Observability Search Enhancements for Splunk APM Services/Traces and ...

Regardless of where you are in Splunk Observability, you can search for relevant APM targets including service ...

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...