I need some help figuring out why my sed replace command is replacing all of the text to the end of the event in Splunk rather than just the specific text I had it look for. As part of a GDPR-compliance project, I was tasked with anonymizing personal names that come through Splunk, which my solution does. But I'm finding that everything after the replaced text is being cut off as well.
In my props.conf file, I've added this section to do the replace.
[host::...*]
SEDCMD-GDPR-anonymize-firstname = s/\"FirstName\"[=:].*\".*?\"/"FirstName":"######"/g
These are JSON messages, so I have Splunk looking for the "FirstName":"Billy", and want it to replace whatever it finds between the double-quotes with the pound signs, which it does.
Here's a sample message that I want to anonymize:
"Beneficiary_LocalID":"TZ056500190","FirstName":"Billy","Location":"Tanzania"
Desired result:
"Beneficiary_LocalID":"TZ056500190","FirstName":"######","Location":"Tanzania"
Actual result:
"Beneficiary_LocalID":"TZ056500190","FirstName":"######"
Do I have something wrong in my regex statement that is causing the rest of the event to be included in the replacement? Any help would be greatly appreciated.
Your regex is a little too greedy. Try
"FirstName"[=:]"[^"]+"
This is using something called a "negated character class".
Is this still a valid fix? I've tried something very similar and it didn't work for me. Please see below:
rex mode=sed "s/\"name":\s\"[^\"]+\"/"name":"###############"/g"
Your regex is a little too greedy. Try
"FirstName"[=:]"[^"]+"
This is using something called a "negated character class".
That appears to have fixed it. I'm still learning regex. Could you give a brief explanation as to what your version is doing compared to what I had?
My version is saying "anything that isn't a quote character, repeated one or more times". Once it hits that first quote, the match stops, and then we add another quote to match it. This is stricter than the other version, which would keep capturing until it hit the final quote. HTH!
Awesome! Thanks for the explanation!