Solved: Re: Escaping characters in an event

jwestberg · ‎06-02-2010

I have a dataset that is going into Splunk where an event is a timestamp followed by a list of key value pairs where the value is set in quotes, like so:

2010-01-01 00:00 key="value" key2="value2" key3="value3"

Some of the values however, may contain the "-character. Is there any way for me to escape these to ensure the entire field value is extracted by Splunk, and make sure Splunk only finds one field - text - in the following input, and not two fields - text and status:

2010-01-01 00:00 text="This text contains status="200" and it confuses Splunk"

Lowell · ‎06-02-2010

Is this log format that you control? In other words, are you asking about best practices for writing out log messages in a format that splunk will handle natively, or is this just an example of what you have to deal with that somebody else is writing out?

I'm not sure you can escape the quote, but I know that sometimes splunk handles this better:

2010-01-01 00:00 key="value", key2="value2", key3="value3"

If you have a comma between your events like this, then you may be able to use splunk's delimited field extractions. (I'm borrowing this from Splunk's built-in stash sourcetype which is used for summary indexing events which are automatically formatted to look like the key/value message shown above.) The key to this approach is the DELMIS = ",", "=" entry.

Sample props.conf:

[my_source_type]
KV_MODE = none
REPORT-my_fields = kv_comma_sep

Sample transforms.conf:

[kv_comma_sep]
DELIMS       = ",", "="
CAN_OPTIMIZE = false

You may also find Splunk's Common Information Model wiki page helpful.
To handle these at search time, you will probably need to use a custom field extraction. See Overview of search-time field extraction in the docs.

View solution in original post

gkanapathy · ‎06-11-2010

Okay, if you have control over the output format, and you have relatively arbitrary field values (e.g., they might actually contain things like name=word in the middle of a field value), I would go to a multi-line input format, and set up a unique delimiter between events, e.g., your script would output:

2010-06-10 12:34:56.789
field1=value value value name=something and stuff
fieldnameX=blah asdfasdf something else something something "this" name="this"
fieldthree=5
----%%%----
2010-06-10 12:34:56.890
myfield=value
another=ggggg
----%%%----

etc. And your props for that would be:

SHOULD_LINEMERGE = false
# that's right, *false*
LINE_BREAKER = ([\r\n]*----%%%---[\r\n]*)(?=\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3})
REPORT-x = y
KV_MODE = none
TIME_FORMAT = %Y-%m-%d %H:%M:%S.%3N
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 25

transforms:

[y]
REGEX = (\w+)=([^\r\n]*)
FORMAT = $1::$2
MV_ADD = true

Of course, this only works if your values don't contain newlines or CR. And in general, this is just a version of choosing a delimiter character that doesn't occur in the data, in this case a newline. If you have to, you can use a character sequence between fields provided it doesn't occur in the string, and modify the field extraction REGEX to something like (?s)(\w+)=([\S\s]+)(?!\n+++(?:\n|$)), if you have to divide fields using +++ on a line by itself. That means you'll need a delimiter sequence between events, and a different one between KV pairs.

View solution in original post

ravinderbisht · ‎10-28-2013

[] and "" are screwing the things...

ravinderbisht · ‎10-28-2013

11.111.11.11 - - [26/Oct/2013:17:04:56 -0700] "POST /abc/abcd/xx HTTP/1.1" 200 885

How can we transform above line ...

gkanapathy · ‎06-11-2010

Okay, if you have control over the output format, and you have relatively arbitrary field values (e.g., they might actually contain things like name=word in the middle of a field value), I would go to a multi-line input format, and set up a unique delimiter between events, e.g., your script would output:

2010-06-10 12:34:56.789
field1=value value value name=something and stuff
fieldnameX=blah asdfasdf something else something something "this" name="this"
fieldthree=5
----%%%----
2010-06-10 12:34:56.890
myfield=value
another=ggggg
----%%%----

etc. And your props for that would be:

SHOULD_LINEMERGE = false
# that's right, *false*
LINE_BREAKER = ([\r\n]*----%%%---[\r\n]*)(?=\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3})
REPORT-x = y
KV_MODE = none
TIME_FORMAT = %Y-%m-%d %H:%M:%S.%3N
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 25

transforms:

[y]
REGEX = (\w+)=([^\r\n]*)
FORMAT = $1::$2
MV_ADD = true

Of course, this only works if your values don't contain newlines or CR. And in general, this is just a version of choosing a delimiter character that doesn't occur in the data, in this case a newline. If you have to, you can use a character sequence between fields provided it doesn't occur in the string, and modify the field extraction REGEX to something like (?s)(\w+)=([\S\s]+)(?!\n+++(?:\n|$)), if you have to divide fields using +++ on a line by itself. That means you'll need a delimiter sequence between events, and a different one between KV pairs.

Lowell · ‎06-02-2010

Is this log format that you control? In other words, are you asking about best practices for writing out log messages in a format that splunk will handle natively, or is this just an example of what you have to deal with that somebody else is writing out?

I'm not sure you can escape the quote, but I know that sometimes splunk handles this better:

2010-01-01 00:00 key="value", key2="value2", key3="value3"

If you have a comma between your events like this, then you may be able to use splunk's delimited field extractions. (I'm borrowing this from Splunk's built-in stash sourcetype which is used for summary indexing events which are automatically formatted to look like the key/value message shown above.) The key to this approach is the DELMIS = ",", "=" entry.

Sample props.conf:

[my_source_type]
KV_MODE = none
REPORT-my_fields = kv_comma_sep

Sample transforms.conf:

[kv_comma_sep]
DELIMS       = ",", "="
CAN_OPTIMIZE = false

You may also find Splunk's Common Information Model wiki page helpful.
To handle these at search time, you will probably need to use a custom field extraction. See Overview of search-time field extraction in the docs.

Lowell · ‎06-11-2010

Hmm. I've updated my answer and added some sample config entries. I think this will work better for you. Basically we are disabling splunk default key=value expansion and forcing it to use a delimiter-based extraction pattern which takes commas into consideration. I think this will work for you.

jwestberg · ‎06-09-2010

Yes, I do have control over the log format, in that it is a scripted input. Sadly, adding a comma in between fields as per your suggestion did not alleviate the problem.

While I in theory could replace all "-characters in the dataset with “ or similar, that could lead to other problems down the line with copy/pasting search results.

Escaping characters in an event

Introducing the Splunk Community Dashboard Challenge!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...