Hello!
I have some json data being generated by a client-side tool:
{
"name": "open_sockets",
"hostIdentifier": "ip-172-30-1-242.ec2.internal",
"calendarTime": "Tue May 24 10:37:31 2016 UTC",
"unixTime": "1464086251",
"columns": {
"family": "2",
"fd": "6",
"local_address": "172.30.1.242",
"local_port": "32886",
"path": "",
"pid": "547",
"protocol": "17",
"remote_address": "4.53.160.75",
"remote_port": "123",
"socket": "52263"
},
"action": "added"
}
When this data is dropped into a flat file on the client then picked up by the Splunk Universal Forwarder, the field extractions using the _json sourcetype work perfectly. I've since reconfigured the tool to push the data into Amazon S3 via Firehose, and the field extractions are no longer work using the _json sourcetype.
The data is unchanged. I've examined the raw logs in the S3 management console and they are the same structure as the previously indexed flat file with no additional data or formatting as far as I can tell.
I've tried a variety of regex in the BREAK_ONLY_BEFORE, BREAK_ONLY_BEFORE_DATE, MUST_BREAK_AFTER, no effect.
I currently have two near identical clients forwarding this information: one using the Splunk UF and one using AWS Firehose, both with the _json sourcetype, the first works fine, the second does not!
I am editing sourcetypes using the GUI; we are imminently moving to Splunk Cloud, and I am training myself to cope with no shell access!
Thanks
Solved it, with a little help from Splunk PS:
[osq]
LINE_BREAKER=(){\"name
And that works.
() Is a capture group which consumes nothing (otherwise Splunk will remove the "name" string)
Apparently I ran into an issue specifically as my Prod Splunk infra is running on 6.4.0 and Lower environment on 6.5.
6.5 had only this much and it worked perfectly:
[mySourcetype]
INDEXED_EXTRACTIONS = json
KV_MODE = none
For 6.4 I had to follow what Gary has recommended. Many thanks to him for sharing his experience.
Here is my props. Mind you, if you are a beginner, you would love to know that Indexer is where you want to update this props as event breaking is a parsing step.
[mySourcetype]
INDEXED_EXTRACTIONS = json
KV_MODE = none
LINE_BREAKER = (){\"searchString
SHOULD_LINEMERGE = false
NO_BINARY_CHECK = true
Solved it, with a little help from Splunk PS:
[osq]
LINE_BREAKER=(){\"name
And that works.
() Is a capture group which consumes nothing (otherwise Splunk will remove the "name" string)
JSON should linemerge, and I know you said you tried the _json sourcetype, but this is a copy of it i'd like you to try instead:
[osq2]
CHARSET=AUTO
INDEXED_EXTRACTIONS=json
KV_MODE=none
SHOULD_LINEMERGE=true
category=Structured
disabled=false
pulldown_type=true
Didn't work sadly...
I have discovered a difference between the two sources:
The flat file on disk, each json object begins on it's own line.
The AWS S3 source, all json events occur on the same line.
So, need a way to break events from a single line, where each json object begins with {"name":
I thought my regex would have done this, but clearly not.
Thanks.
Have you tried this regex instead?
'{"name"'
surrounded by single quotes...?
or even '\{"name"'
again surrounded by single quotes but escaping the {
Thanks, tried both but still not breaking.
Am I right in thinking the SHOULD_LINEMERGE directive could be causing Splunk to assume that the entire block of data is a single event? In that case, shouldn't a matching regex in BREAK_ONLY_BEFORE override that and define the individual events?
Oh sorry you just hit the nail on the head
Your thinking is correct but let's try removing break only before, setting should line merge equal to false and use our regex as LINE_BREAKER instead.
Should linemerge = true and the break only here and there's are for tcp/udp inputs mainly. Line breakers are for when you don't have the standard carriage returns / line feeds. Now we might still have issues with indexed extractions and may need to use kv mode instead... Let's see. Sorry for the bad syntax I'm replying from a phone.
Thanks, attempting to force SHOULD_LINE_MERGE=false via the GUI keeps defaulting to "true" and adding a BREAK_ONLY_BEFORE directive, which is annoying... have no console access at present to edit the props.conf, will do this tomorrow back in the office and let you know.
So, currently running with the below in system/local/props.conf
[osq]
NO_BINARY_CHECK = true
disabled = false
KV_MODE = none
SHOULD_LINEMERGE = false
LINE_BREAKER = {\"name/g
And still no breaking... Regex validated using http://www.regextester.com/:
Another update...
If I copy the data from event (which contains multiple json objects on one line) into a flat file local to my laptop, then try to upload that file manually into Splunk using the _json sourcetype... event breaking works!
As an update, I have created another sourcetype with the below in the Splunk_TA_aws app:
[osq2]
DATETIME_CONFIG =
INDEXED_EXTRACTIONS = json
NO_BINARY_CHECK = true
TRUNCATE = 0
category = Structured
pulldown_type = 1
BREAK_ONLY_BEFORE = (\{\"name\")/g
disabled = false
Still not getting event breaking. Suggestions welcomed!