We have two indexers, one version 8.1.5 (which will not be updated soon) and version 9.1.0.1
I see 9 has a nice feature "Ingest actions" which is exactly what I need to mask some incoming Personal Information (PI). It is coming in JSON and looks something like:
\"addressLine1\":\"1234 Main Street\",
I need to find some fields and remove the content. Yes I believe there are backslashes in there. I tested a regex on 9 and added to the transforms.conf and props.conf files on our 8.1.5 indexer but the rules didn't work.
In one of my tests the rule caused an entire log entry to change to "999999999", not quite what I was expecting but now we know Splunk was applying the rule.
This is one of my rules that had no affect:
[address_masking]
REGEX = (?<=\"addressLine1\":\")[^\"]*
FORMAT = \"addressLine1\":\"100 Unknown Rd.\"
DEST_KEY = _raw
Found docs, looking at them now: Configure advanced extractions with field transforms - Splunk Documentation
Can I get someone point out what is wrong with the above transform? Thanks!
You don't need ingest actions to mask your data.
You can either use SEDCMD functionality or properly crafted TRANSFORM.
There are two things wrong with your TRANSFORM.
1. Your regex does not match properly. Use https://regex101.com/ to test your regexes
2. The REGEX part of the TRANSFORM definition specifies a regex which must match for the event to be processed by the transform (and possibly captures parts of it) but the DEST_KEY and FORMAT define whole contents of the resulting field. So if you do DEST_KEY=_raw and match the event by REGEX, _whole event_ will be overwritten with what you specify as FORMAT, not just the matched part, if the REGEX matches a part of the event.
In other words if you did
REGEX=.
FORMAT=aaaaa
DEST_KEY=_raw
your transform would match every non-empty event since the REGEX matches any character but it would overwrite the whole event, not just one character, to the string "aaaaa".
according to regex101 my regex is correct, so the problem must be in the FORMAT
Well... if your data looks like that:
\"addressLine1\":\"1234 Main Street\",
And your regex looks like
(?<=\"addressLine1\":\")[^\"]*
It won't match.
Remember that in regex backslashes are used to escape things. If you need to match the literal string
\"
you need to escape the backslash to match it literally. Like this:
\\"
In your regex backslashes are silently ignored since there is nothing after them that requires escaping so the following characters is taken literally (as it would be without the backslash as well).
Also your negated character class
[^\"]
is probably not what you wanted it to be - the backslash in this case is not needed - there is nothing to escape about the quote mark.
So would this work?
[address_masking]
REGEX = (\\"addressLine1\\":\\")([^\\"]+)(\\")
FORMAT = $1(masked)$3
This would work but possibly not the way you meant.
I suppose you want this part
([^\\"]+)
To match everything up to (not including), the closing \"
It doesn't work that way. It will match any sequence of any characters which are not either a backslash or a quote. Which means if your string would contain some escaped character (like \'), your match would terminate there. And since you explicitly want the \" part immediately adter that , the whole regex won't match.
Oh, and since you're only matching the "static" parts of your events your match groups that you use for FORMAT will only contain those which is probably not what you want.
You could try to fiddle with negative lookaheads/lookbehinds like
(.*?\\"addressLine1\\":\\").*(?<!\\")(\\".*)
I don't want to learn regex, I want to replace personal information with fixed strings. Can someone at Splunk give me the correct expression to use? Since testing in other environments doesn't help, and Splunk needs a Restart just to try out a rule this is really painful.
Ok. To put it bluntly - if you don't want to put any effort into it and have it done, pay someone to it for you - Splunk Professional Services or you local friendly Splunk Partner.
I don't think you understand what this community is. It's not "someone at Splunl". This is a forum where voluteers, not affiliated with Splunk as a company, choose to give their time and effort to _help_ other people, no to do someone else]s job for free.
How does this look?
[address_masking]
REGEX = (\\\"addressLine1\\\":\\\")([^\\\"]+)(\\\")
FORMAT = $1(masked)$3
that didn't work either