I'm reading files with fixed width fields into splunk. For extraction and masking of dedicated fields I need to adresse sth like "character 56 to 78". This generally works fine with regex.
Unfortunately the file contains some special characters like german umlauts or null chars, which are escaped with \xDC or \x00. As the number of special chars varies from line to line this escape mechanism destroys the fixed width contraint.
How can I prevent the escaping? Alternatively, how can I make splunk to do an escaping that maintains the fixed width constraint (that is, replace one char with exactly one other).
It would be nice to avoid using a custom preprocessor.
thank you for your support.
I tried adjusting the charset and could fix the issue regarding german umlauts with that (CHARSET = latin-1).
Nevertheless the null-char is still escaped - and I guess NUL is invalid in any charset. As my log source contains a fixed number of these chars, I could make it work anyway.
I would still regard it a useful feature to disable the escaping completely and replace invalid chars with a simple dot or sth similar.
You could build an SEDCMD setting in props.conf to replace your NUL with a different char... not quite sure if that happens before or after escaping though, so you may need to look for both escaped and unescaped NULs and replace both.