I Need to exclude or discard specific field values which contains sensitive info from indexed events. Users should not see this data because this is password and needs to be masked or remove completely. But this password field will only come when there is field called "match_element":"ARGS:password" follows with password in field name called "match_value":"RG9jYXgtODc5MzIvKxs%253D" in this way.
Below is the raw event -
"matches":[{"match_element":"ARGS:password","match_value":"RG9jYXgtODc5NzIvKys%253D","is_internal":false}],
These are json values and given kv_mode=json in order to auto extract field values while indexing.
Here I need to mask or remove or override match values field values (RG9jYXgtODc5MzIvKxs%253D and soonnnn). Those are the passwords given by the user and very sensitive data which can be misued.
I am afraid that if I do anything wrong.. Json format will disturb which in return all logs will be disturbed. Can someone help me with the workaround of this?
I would like to say that its always better to fix the problem at the source and try to mask those details which ever is not needed. If you are not able to then you should have a regex in place and try to check that these raw events match those regex and then try to mask it.
One more way is to have a program running, where you can have the raw events having matching the regex and convert them into a different one this way it will definitely help you. I am sure by now you would have got the answers from the team of experts but however it all depends on your organizations and how well you can perform it.
Below is a link which will help you for the masking of events.
https://help.splunk.com/en/splunk-enterprise/get-started/get-data-in/9.4/configure-event-processing/...
and what if one event has multiple match element values ("ARGS:password") then it should mask all the corresponding passwords as well. Need to write logic in such a way.
This data is just key value pairs coming from source end. I have given kv_mode=Json which is converting them to Json readable format.
@livehybrid @PickleRick what will be the best way to do it? Remove or Mask? User is fine with both. but what will be the best approach? and please help me with final props and transforms? I can see two above and confused.
"Remove" what? Just the value? It's no different than masking it. Remove the whole event? It's the easiest way. This you actually can do relatively simply with regex-based matching.
About the performance remark earlier - well, parsing structured data and subsequently manipulating it are relatively heavy operations. So if you have just a few events every now and then, it should be no problem but if you're gonna have a lot of those, I'd still advise for an external tool.
remove the match value field (which contains passwrod values) completely not complete events -
This is raw data of it
"matches":[{"match_element":"ARGS:password","match_value":"SmFUYWlfZUJhZTc%253D","is_internal":false}]
So is it better to do this -
"matches":[{"match_element":"ARGS:password","match_value":"MASKED","is_internal":false}]
or this -
"matches":[{"match_element":"ARGS:password","is_internal":false}]
At the end this operation shouldn't impact JSON format.
OK. So your event as a whole is actually _not_ a well-formed json.
yes initially they are key value pairs and we are changing it in Splunk to JSON format. Please help me with the logic accordingly to mask password values.
If your events are always formed this way - the fields are in this order, there are no other fields squeezed in the middle, there is just one password per event and so on, you can use a simple SEDCMD to replace the data. Something like this:
SEDCMD-strip-pass-from-match = s/("match_element"\s*:\s*"ARGS:password"\s*,\s*"match_value":")[^"]+"/\1REDACTED"/
But be aware of all the caveats I mentioned before - as soon as your source - for example - swaps order of reported fields, which is perfectly fine from the json point of view, it will stop working.
Unfortunately, you're trying to fiddle with structured data which means that strictly text-based tools might work if the data will be formatted in a constant way, which it doesn't have to, but might fail if - for example - fields order varies.
The best solution here would be to use an external tool before ingesting the data into Splunk which can handle your jsons understanding their structure.
Hi @Karthikeya
You can use an ingest_eval to do this on the instance that parses the logs (e.g. HF or Indexer) using the following config:
# props.conf
[yourSourcetype]
# You could choose to remove or redact
#TRANSFORMS-removePasswordJson = removePasswordJson
TRANSFORMS-redactPasswordJson = redactPasswordJson
# transforms
[redactPasswordJson]
INGEST_EVAL = _raw=replace(_raw,"\"match_element\":\"ARGS:password\"[^\"]*\"match_value\":\"[^\"]*\"","\"match_element\":\"ARGS:password\",\"match_value\":\"REDACTED\"")
[removePasswordJson]
INGEST_EVAL = _raw=replace(_raw,"\"match_element\":\"ARGS:password\"[^}]*\"match_value\":\"[^\"]*\",?","")
This is the equiv when ran in the search to visualise the output you should get:
|windbag | head 1 | eval _raw="{\"someField\":\"someVal\",\"matches\":[{\"match_element\":\"ARGS:password\",\"match_value\":\"RG9jYXgtODc5NzIvKys%253D\",\"is_internal\":false}]}"
```| eval _raw=replace(_raw,"\"match_element\":\"ARGS:password\"[^\"]*\"match_value\":\"[^\"]*\",?","")```
| eval _raw=replace(_raw,"\"match_element\":\"ARGS:password\"[^\"]*\"match_value\":\"[^\"]*\"","\"match_element\":\"ARGS:password\",\"match_value\":\"REDACTED\"")
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
Already using one ingest eval to this sourcetype to route logs to specific indexes. Can we use one more ingest eval here? Will it override that or independent?
Yes @Karthikeya
As @PickleRick you can use multiple INGEST_EVAL.
As this is JSON ive written an alternative INGEST_EVAL for this which relies less on replacing parts of the raw json string and uses the JSON functions instead, let me know if this helps!
In the screenshot you can see the raw data added (bottom right) the props/transforms (top right) and output (left).
# props.conf
[yourSourceType]
TRANSFORMS-redactJSONPassword = redactJSONPassword
# transforms.conf
[redactJSONPassword]
INGEST_EVAL = _raw=json_set(_raw,"matches.{".mvfind(json_array_to_mv(json_extract(_raw, "matches")),"ARGS:password")."}.match_value","REDACTED")
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
1. It pains me when I think of performance. But that's not the most important issue.
2. Obviously, the original post contained only partial event. Are you sure you eval modifies _all_ occurrences? And only those? Not challenging your solution, just pointing out that structured data is difficult to handle.
What performance issues can I get? Please let me know
The logs which contain match element field value as "ARGS:password", then match value will be password everytime and that should be masked or removed. For rest match element field values, no need to mask anything.
You can have multiple transforms containing INGEST_EVAL for the same sourcetype.