Unicode punctuation characters U+2000 to U+206f seem to make Splunk want to put the requirement for Simplified Chinese fonts in exported PDFs, so I want to convert these characters to ASCII equivalents.
I can add the following to the search command
rex field=Course mode=sed "s/(‘|’)/'/g"
where the replacement chars above are U+2018 and U+2019 and they are replaced with 0x27, but I want to put something in props.conf to force it to happen always.
How would I do this?
Yes, but keep in mind this is an index time function, so it will change indexed data on the way in... permanently.
[yoursourcetype] sedcmd-course = s/(‘|’)/'/g You can read about it HERE
and I have excerpted below:
SEDCMD- =
* Only used at index time.
* Commonly used to anonymize incoming data at index time, such as credit card or social
security numbers. For more information, search the online documentation for "anonymize
data."
* Used to specify a sed script which Splunk applies to the _raw field.
* A sed script is a space-separated list of sed commands. Currently the following subset of
sed commands is supported:
* replace (s) and character substitution (y).
* Syntax:
* replace - s/regex/replacement/flags
* regex is a perl regular expression (optionally containing capturing groups).
* replacement is a string to replace the regex match. Use \n for backreferences,
where "n" is a single digit.
* flags can be either: g to replace all matches, or a number to replace a specified
match.
* substitute - y/string1/string2/
* substitutes the string1[i] with string2[i]