We have requirement to mask data in index time. While below works to mask data in raw, it does not work for extracted field "User name". My SED is on universal forwarder (windows) and it works fine for raw data:
s/(GBW\d{8}\t)(\d{8}\s){0,1}(\w.*?)(\t)/\1\2(masked)\4/g
My props.conf:
[sourcetype]
SEDCMD-username=s/(GBW\d{8}\t)(\d{8}\s){0,1}(\w.*?)(\t)/\1\2(masked)\4/1
FIELD_DELIMITER=tab
HEADER_FIELD_DELIMITER=tab
HEADER_FIELD_LINE_NUMBER=1
MAX_TIMESTAMP_LOOKAHEAD=300
TIMESTAMP_FIELDS=Timestamp
TIME_FORMAT=%Y%m%dT%H%M%S.%3N+%z
TRANSFORMS-anonymize = username-anonymizer
However, Transforms does not work. Have tried by placing on Universal forwarder as well as Intermediate heavy forwarder. Have created based on response from Solved: How can I anonymize fields of data that has underg... - Splunk Community
transforms.conf:
[username-anonymizer]
REGEX = (?m)^(.*User name\:\:)(\d{8}\s){0,1}(\w.*?)$
FORMAT = $1(masked)
WRITE_META = false
SOURCE_KEY = _meta
DEST_KEY = _meta
Related info: We are expecting tab-delimited data. The field User name is in the middle and follows hostname and hence GBW is this example.
"User name" could be combination of id and name and we only want to mask name:
Value :
12345678 firstname lastname
12345678 firstname
firstname lastname
firstname
expected masked value
12345678 (masked)
12345678 (masked)
(masked)
(masked)
It could be blank as well.
Have you already tried below setup and checked if your regex is correct?
[username-anonymizer]
REGEX = (?m)^(.*User name\:\:)(\d{8}\s){0,1}(\w.*?)$
FORMAT = $1(masked)
WRITE_META = false
SOURCE_KEY = "field:User name"
DEST_KEY = "field:User name"
Feel free to provide some sample data. Thank you!
Thank you for your response @PaulPanther
I tried what you suggested but if did not work. extracted field still has unmasked data.
Below is the type of event we expect:
Product Assembly Name Product Version Class Name Timestamp Severity Hostname User name User ID WebEngine Request ID Connection ID Task ID Execution ID Report ID Request ID Transformation ID Message Exception Stacktrace
Qlik.NPrinting.Repo 00.00.0.0 Qlik.NPrinting.Repo.Service.AuthenticationService 20221118T163152.532+00:00 INFO GBW22223451 John Smith 0 0 0 0 0 0 0 0 Windows login successful. The user with id n45675643h456556l5c7bu5jw5esd4 has been correctly identified as a Windows domain user with sid S-1-4-10-123457890-1234543-13243554-344545
Qlik.NPrinting.Repo 21.14.5.0 Qlik.NPrinting.Repo.Service.AuthenticationService 20221118T163152.532+00:00 INFO GBW22223451 12345678 0 0 0 0 0 0 0 0 Windows login successful. The user with id n45675643h456556l5c7bu5jw5esd4 has been correctly identified as a Windows domain user with sid S-1-4-10-123457890-1234543-13243554-344545
Qlik.NPrinting.Repo 21.14.5.0 Qlik.NPrinting.Repo.Service.ImportExport.DataConnectionsMatchingService 20221118T163152.532+00:00 WARN GBW22223451 12345678 John Smith 0 0 0 0 0 0 0 0 Trying to import connection Horizon Scanning Rapid 2 MI Connection. Data connection NPrinting Rapid2 does not match↓Missing objects from template: O\fghjy, O\123457890-1234543-13243554-344545, O\123457890-1234543-13243554-344545, O\ttyte, O\fgfggf, O\erewff, O\sdfdf, O\dfdgfg, O\zfgfg, O\DfAd, O\dsfdfh, O\dfdfD, O\dfdZ↓
Qlik.NPrinting.WebEngine 21.14.5.0 Qlik.NPrinting.WebEngine.WebEngineWindowsService 20221118T163152.532+00:00 INFO GBW22223451 0 0 0 0 0 0 0 0 Windows authentication server listening on http://localhost:port/
Have you already tried the other suggested solution from @jeffland ?
[username-anonymizer]
REGEX = .+
FORMAT = "User name::masked"
WRITE_META = true
SOURCE_KEY = "field:User name"
DEST_KEY = "field:User name"
[accepted_keys]
is_valid="field:User name"
If not please try to get rid off the spaces first with
props.conf
[sourcetype]
SEDCMD-replacespace = s/ /_/g