We are currently having an issue where our masking transforms are not working due to the length of _raw being too large. If we set LOOKAHEAD to a higher value the masking works.
_raw has request.body at the end of the event.
Since request.body is the only relevant part of the event from a transform perspective, we tried to set as the SOURCE_KEY, but it doesn't seem to do anything and there's no logs from what we can see.
How do we use SOURCE_KEY to limit where the transforms regex is applying?
Yes. Edge processor seems to be the best shot here (anyway, manipulating structured data like json with regexes is risky).
Okay thanks for the feedback
Thanks for the reply.
1 Is there a way to check this assumption? "your json event is in escaped text mode in disc."
There are a couple of options to make this work outside of Splunk, but are not ideal.
2 Maybe is there some way to index the request.body or set it to a be readable as a SOURCE_KEY in a performant way? Maybe some logic in the forwarder?
SOURCE_KEY in case of index-time transforms requires indexed fields. You can't apply a transform to search-time extracted field because it doesn't exist in the indexing pipeline.
Okay, I think I should look up potentially making the request.body an indexed field.
Is this something that can be done in a performant way?
Don't do that. Indexed fields of high cardinality are not a good idea. Oh, and even if you wanted to modify an indexed field, it wouldn't change the raw data.
More info, our stanza in transforms.conf looks like
I think that acceptable_keys didn't work as your json event is in escaped text mode in disc.
If you want to use event like json you must use INGEST_EVAL and json-functions. But I expecting that in that case you hit again a same limit to read that event in, convert it from escaped text to json and save again back to stream.
The best option is do this masking before ingestion with some other tool than Splunk's props and transforms.
Is it possible that you ask that source already mask it or can you use e.g. Ingest Action or Edge or Ingest Processor? Also one option is Cribl outside of Splunk world.
Yes. Edge processor seems to be the best shot here (anyway, manipulating structured data like json with regexes is risky).