Splunk Search

Why does my sed replace command replace too much?

DempseyWilliams
Explorer

I need some help figuring out why my sed replace command is replacing all of the text to the end of the event in Splunk rather than just the specific text I had it look for. As part of a GDPR-compliance project, I was tasked with anonymizing personal names that come through Splunk, which my solution does. But I'm finding that everything after the replaced text is being cut off as well.

In my props.conf file, I've added this section to do the replace.

[host::...*]
SEDCMD-GDPR-anonymize-firstname = s/\"FirstName\"[=:].*\".*?\"/"FirstName":"######"/g

These are JSON messages, so I have Splunk looking for the "FirstName":"Billy", and want it to replace whatever it finds between the double-quotes with the pound signs, which it does.

Here's a sample message that I want to anonymize:

"Beneficiary_LocalID":"TZ056500190","FirstName":"Billy","Location":"Tanzania"

Desired result:

"Beneficiary_LocalID":"TZ056500190","FirstName":"######","Location":"Tanzania"

Actual result:

"Beneficiary_LocalID":"TZ056500190","FirstName":"######"

Do I have something wrong in my regex statement that is causing the rest of the event to be included in the replacement? Any help would be greatly appreciated.

0 Karma
1 Solution

dshpritz
SplunkTrust
SplunkTrust

Your regex is a little too greedy. Try

"FirstName"[=:]"[^"]+"

This is using something called a "negated character class".

View solution in original post

ballen1
Explorer

Is this still a valid fix?  I've tried something very similar and it didn't work for me.  Please see below:

rex mode=sed "s/\"name":\s\"[^\"]+\"/"name":"###############"/g"

 

0 Karma

dshpritz
SplunkTrust
SplunkTrust

Your regex is a little too greedy. Try

"FirstName"[=:]"[^"]+"

This is using something called a "negated character class".

DempseyWilliams
Explorer

That appears to have fixed it. I'm still learning regex. Could you give a brief explanation as to what your version is doing compared to what I had?

0 Karma

dshpritz
SplunkTrust
SplunkTrust

My version is saying "anything that isn't a quote character, repeated one or more times". Once it hits that first quote, the match stops, and then we add another quote to match it. This is stricter than the other version, which would keep capturing until it hit the final quote. HTH!

DempseyWilliams
Explorer

Awesome! Thanks for the explanation!

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...