Solved: Loading syslog-prefixed JSON

kamermans · ‎06-19-2014

I've got a data source being produced by rsyslog which is in this format:

Jun 19 10:28:25 hostname appname: {"date":12345678,"foo":"bar"}

There is always one event per line. I would like to parse this as JSON, discarding the stuff that syslog added to the beginning of the line (in this case "Jun 19 10:28:25 hostname appname:").

I have tried using LINE_BREAKER to consume and discard this line prefix like this:

LINE_BREAKER=}(\n[^{]+?)

and I've tried using sed:

SEDCMD-stripjsonheader = s/^[^{]+?//g

Neither of which have worked. In the log file it seems that the JsonLineBreaker is not using the LINE_BREAKER data, and SED is happening too late:

06-19-2014 14:59:19.736 -0400 ERROR JsonLineBreaker - JSON StreamID: 0 having confkey=source::/file|host::app|SyslogJson|2 had parsing error: Unexpected character while looking for value: 'J'

Is there any way for me to remove this line prefix before parsing?

Thanks!

s2_splunk · ‎06-19-2014

Try this:

SEDCMD-StripHeader = s/^.*(\{.*$)/\1/

This should remove your prefix with anything after (and including) the opening '{' up until the end of line.
Note the use of a capture group for the stuff you want to keep.

View solution in original post

s2_splunk · ‎06-19-2014

I have just tested it succesfully with this props.conf entry:

[answers_json]
SEDCMD-StripHeader = s/^.*(\{.*$)/\1/
KV_MODE=json
pulldown_type=1

jbrodsky_splunk · ‎03-18-2015

Thank you Stefan and Kamermans - I had a customer running into this same issue today and this Answers post allowed me to avoid a ton of testing.

kamermans · ‎07-15-2014

Btw, your regex still fails in my environment due to the greedy "." issue - I have nested records, and without ".?" it starts the record at the last "{" instead of the first one. For the sake of posterity, here is the most efficient regex I've found for this problem "s/^[^{]+//". Note that I was able to get things working, but it seems INDEXED_EXTRACTIONS=json will not work with a custom SEDCMD or LINE_BREAKER. I will post this as another question.

s2_splunk · ‎06-19-2014

You are on the right track. There are multiple ways of getting data in and the UI is OK if you want to index files from the same server. If the data comes from a different box, you'll want to use a universal forwarder to watch a file/directory and forward to your indexer. Depends on your architecture.
I'd recommend watching this for starters: http://www.splunk.com/view/education-videos/SP-CAAAGB6
and reading through this http://docs.splunk.com/Documentation/Splunk/latest/Data/WhatSplunkcanmonitor for more details on the various options.

kamermans · ‎06-19-2014

Wow, thanks for verifying! I must be doing something noob-ish as I am just getting started with Splunk. From the GUI I'm going to "Add Data" -> "From files and dirs...", then I choose a file and Preview the data, which brings up a dialog to specify the format, where I choose JSON and drop in the edits to props.conf. How should I be loading the data into Splunk (using the file monitor method)?

I'm using Splunk Enterprise 6.1 on my own server(s).

s2_splunk · ‎06-19-2014

And it works just as well with your SEDCMD/RegEx

s2_splunk · ‎06-19-2014

Try this:

SEDCMD-StripHeader = s/^.*(\{.*$)/\1/

This should remove your prefix with anything after (and including) the opening '{' up until the end of line.
Note the use of a capture group for the stuff you want to keep.

suarezry · ‎01-17-2017

Just fyi, like kamermans, I found this match to be greedy, it prefer to go to the last '{'. I have used his suggested "SEDCMD-stripjsonheader = s/^[^{]+//" and it worked better for me.

reswob4 · ‎02-05-2018

Thanks for this discussion. Helped big time solve my problem with the same issue.

kamermans · ‎07-15-2014

Please note that the greedy-ness did need to be removed by using ".*?". Also, this technique works for KV_MODE=json but not INDEXED_EXTRACTION=json, which I need to use. I've opened a new question for that: http://answers.splunk.com/answers/145388/indexed_extractionsjson-with-transform

s2_splunk · ‎06-19-2014

See my answer below. What version of Splunk are you on?

kamermans · ‎06-19-2014

Ah, very good point on the greedy thing, I do indeed need it to be greedy - thanks 🙂

Unfortunately, I'm getting the same error "had parsing error: Unexpected character while looking for value: 'J'" using your suggestion verbatim, which is strange because the 'J' should be gone in any case. I restarted splunkd after the change and attempted a new file load with that props.conf. Notice that although the data preview failed, I still tried to continue just in case the preview is different than the actual parsing.

s2_splunk · ‎06-19-2014

I beg to respectfully differ. 🙂
I believe your RegEx only matches 'J', because you used a lazy match (?).
So, if you like your RegEx better, try it without the '?':

SEDCMD-stripjsonheader = s/^[^{]+//

Note that I also don't think you need the global flag, so I removed it.

kamermans · ‎06-19-2014

Thanks for the quick response! Unfortunately my regex was correct - it replaces the prefix with an empty string, which is faster than capturing the relevant part and replacing the entire string with that capture group. In addition, your first .* is "greedy", so it will prefer to go to the last "{" instead of the first one. Anyway, I have tested my regex using the "unstructured" data type and I can see that it trims properly, it's just that the JSON line parser seems to be parsing the data before the SED replacement occurs or something. Maybe there is a TRANSFORM required or something?

Loading syslog-prefixed JSON

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

From Data to Insight: Announcing the Winners of the Splunk Dashboard Contest

Splunk Developers: Construct Your Future at the .conf26 Builder Bar

Quick connection discovery mode for forwarders

Join the Conversation