Getting Data In

Exactly what characters are allowed in field values?

New Member

We have in-house web apps which log stuff, and are considering moving to Splunk for analysis. This would entail adopting a new log format, which is easy - we can write it out however Splunk wants. We understand this is the canonical format...

 timestamp key1=value1 key2="value two" key3=value3

Problem is, sometimes we need to log a LOT of stuff in the 'value' part. One example is an exception, and would want to store a fairly large Python traceback (newlines and all). Yet, we still want the value to be findable/searchable/readable in reports. Another situation is when we want to log POST params in a web form; the values might be multiline text, unicode characters or whatever.

Does Splunk support a standard system for quoting or encoding multiline text and 'problem' characters in the "value" part of the format? I was expecting to find some well documented system like base64 or URL-encoding supported, but have been unable to find any docs on this.

Tags (2)
0 Karma

Splunk Employee
Splunk Employee

If you are going to be logging potentially multiline values, then I would suggest that you use a different format for those events. You will have to define some kind of marker string, both to divide events from each other as well as to divide values from events. For example, I would define a type as follows:

LINE_BREAKER = ([\r\n]+-+-+==breaker==+---[\r\n]+)
EXTRACT-longkv = (?ms)\v+--kvbegin--:(?<_KEY_1>\w+)\v+(?<_VAL_1>[\V\v]+?(?=\v+---kvend---(?:\v|$)))
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
KV_MODE = auto

Then log entries should be output as:

2010-12-17T12:34:56.789 abc=123 xyz=blah
long field value with other stuff and btw, splunk will handle UTF-8 just fine by default
though you might want to set 
the CHARSET property for a source or
well, here's another value
2010-12-17T12:22:33.444 fieldname1=somenewvaluesagain
2010-12-17T13:12:11.000 something

The LINE_BREAKER should be output between each event, and the kvend and kvbegin will delimit long KV pairs. Short ones will still be autoextracted. Note that the breakers between events will be removed by Splunk, but the ones between KV pairs will not (and need to be left in). The marker strings can of course be changed to anything you like or can stand.

0 Karma