Hi! Hope all are fine, and thanks in advance for any help
I'm having problems ingesting Linux Audit Log. For some reason, a weird field delimiter is not being correctly interpreted by Splunk. I'm pasting the examples
How can I get rid of this and get fields "data" and "UID" correctly separated?
I would take a look at the CHARSET config in "props.conf" of where your input is located.
CHARSET = <string> * When set, Splunk software assumes the input from the given [<spec>] is in the specified encoding. * Can only be used as the basis of [<sourcetype>] or [source::<spec>], not [host::<spec>]. * A list of valid encodings can be retrieved using the command "iconv -l" on most *nix systems. * If an invalid encoding is specified, a warning is logged during initial configuration and further input from that [<spec>] is discarded. * If the source encoding is valid, but some characters from the [<spec>] are not valid in the specified encoding, then the characters are escaped as hex (for example, "\xF3"). * When set to "AUTO", Splunk software attempts to automatically determine the character encoding and convert text from that encoding to UTF-8. * For a complete list of the character sets Splunk software automatically detects, see the online documentation. * This setting applies at input time, when data is first read by Splunk software, such as on a forwarder that has configured inputs acquiring the data. * Default (on Windows machines): AUTO * Default (otherwise): UTF-8
https://docs.splunk.com/Documentation/Splunk/latest/Admin/propsconf
https://docs.splunk.com/Documentation/Splunk/latest/Data/Configurecharactersetencoding
If you have access to the raw log, I guess you can try to paste it into regex101 and create your own regex to replace the character with an empty space. REDACT any sensitive data before you paste it into regex101.
Example props.conf on wherever is parsing the data:
[my_sourcetype]
SEDCMD-removeWeirdCharacter = s/<square_character_here>/ /
I would take a look at the CHARSET config in "props.conf" of where your input is located.
CHARSET = <string> * When set, Splunk software assumes the input from the given [<spec>] is in the specified encoding. * Can only be used as the basis of [<sourcetype>] or [source::<spec>], not [host::<spec>]. * A list of valid encodings can be retrieved using the command "iconv -l" on most *nix systems. * If an invalid encoding is specified, a warning is logged during initial configuration and further input from that [<spec>] is discarded. * If the source encoding is valid, but some characters from the [<spec>] are not valid in the specified encoding, then the characters are escaped as hex (for example, "\xF3"). * When set to "AUTO", Splunk software attempts to automatically determine the character encoding and convert text from that encoding to UTF-8. * For a complete list of the character sets Splunk software automatically detects, see the online documentation. * This setting applies at input time, when data is first read by Splunk software, such as on a forwarder that has configured inputs acquiring the data. * Default (on Windows machines): AUTO * Default (otherwise): UTF-8
https://docs.splunk.com/Documentation/Splunk/latest/Admin/propsconf
https://docs.splunk.com/Documentation/Splunk/latest/Data/Configurecharactersetencoding
If you have access to the raw log, I guess you can try to paste it into regex101 and create your own regex to replace the character with an empty space. REDACT any sensitive data before you paste it into regex101.
Example props.conf on wherever is parsing the data:
[my_sourcetype]
SEDCMD-removeWeirdCharacter = s/<square_character_here>/ /
Excellent advice! Works!
What is the "weird" character? What settings do you have already configured?
Thanks for answering! I've attached an image of how I'm seeing the character. Did you see it?