I have JSON data prefixed by syslog that I would like to index using INDEXED_EXTRACTIONS=json
. Here's an example of the data:
May 13 10:26:42 ip-10-11-12-13 myapp-17: {"headers":{"Accept":"*\/*","Accept-Language":"en-gb,en;q=0.5","User-Agent":"Mozilla\/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko\/20100101 Firefox\/29.0"},"date":1399976802,"node":"ip-10-11-12-13","source":"myapp-17","client_ip":"17.18.19.20"}
I need to strip off the stuff at the beginning of the, which was added by syslog, so everything before the first "{" char, then process the event as JSON:
{
"client_ip": "17.18.19.20",
"date": 1399976802,
"headers": {
"Accept": "*/*",
"Accept-Language": "en-gb,en;q=0.5",
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0",
},
"node": "ip-10-11-12-13",
"source": "myapp-17",
}
I have tried the following methods:
LINE_BREAKER=((:?^|\n).+?){
SHOULD_LINEMERGE=false
SEDCMD-StripHeader=s/^[^{]+//
;transforms.conf
[StripSyslog]
REGEX = ^[^{]+(.*)$
FORMAT = $1
DEST_KEY = _raw
;props.conf
TRANSFORMS-StripSyslog = StripSyslog
All of these methods work with KV_MODE=json
, but none of them work with INDEXED_EXTRACTIONS=json
.
What I don't like about KV_MODE=json
is that my events lose their hierarchical nature, so the keys in the headers.* collection are mixed in with the other keys. For example, with INDEXED_EXTRACTIONS=json
I can do "headers.User-Agent"="Mozilla/*"
. More importantly, I can group these headers.* keys to determine their relative frequency, which is not possible with KV_MODE=json
since the keys are flattened.
In the splunkd.log file I see this error:
07-15-2014 12:33:16.384 -0400 ERROR JsonLineBreaker - JSON StreamID: 0 having confkey=source::/tmp/myfile.gz|host::17-18-19-20|JsonSyslog|3 had parsing error: Unexpected character while looking for value: 'M'
This tells me that the JsonLineBreaker
is probably trying to parse the line before applying any of the aforementioned transformations (the "M" is from "May 13 10:26:42...").
Is there any way to apply a transformation before the JsonLineBreaker
kicks in, or perhaps to extend that class in order to strip the leader out?
I am looking for a definitive answer here as the obvious workarounds (scripted input, change my data format, "sed -i" the file before input) are not great long-term.
This is probably relevant to these other questions as well:
Unfortunately, there is no solution at Splunk for your case.
INDEXED_EXTRACTIOIN happens at reading file and parsing event time before transforms.conf is applied.
Unfortunately, there is no solution at Splunk for your case.
INDEXED_EXTRACTIOIN happens at reading file and parsing event time before transforms.conf is applied.
Hi kamermans - Did you have any luck with this? I am having a similar issue.