Getting Data In

Loading syslog-prefixed JSON

kamermans
Path Finder

I've got a data source being produced by rsyslog which is in this format:

Jun 19 10:28:25 hostname appname: {"date":12345678,"foo":"bar"}

There is always one event per line. I would like to parse this as JSON, discarding the stuff that syslog added to the beginning of the line (in this case "Jun 19 10:28:25 hostname appname:").

I have tried using LINE_BREAKER to consume and discard this line prefix like this:

LINE_BREAKER=}(\n[^{]+?)

and I've tried using sed:

SEDCMD-stripjsonheader = s/^[^{]+?//g

Neither of which have worked. In the log file it seems that the JsonLineBreaker is not using the LINE_BREAKER data, and SED is happening too late:

06-19-2014 14:59:19.736 -0400 ERROR JsonLineBreaker - JSON StreamID: 0 having confkey=source::/file|host::app|SyslogJson|2 had parsing error: Unexpected character while looking for value: 'J'

Is there any way for me to remove this line prefix before parsing?

Thanks!

1 Solution

s2_splunk
Splunk Employee
Splunk Employee

Try this:

SEDCMD-StripHeader = s/^.*(\{.*$)/\1/

This should remove your prefix with anything after (and including) the opening '{' up until the end of line.
Note the use of a capture group for the stuff you want to keep.

View solution in original post

s2_splunk
Splunk Employee
Splunk Employee

I have just tested it succesfully with this props.conf entry:

[answers_json]
SEDCMD-StripHeader = s/^.*(\{.*$)/\1/
KV_MODE=json
pulldown_type=1

alt text

jbrodsky_splunk
Splunk Employee
Splunk Employee

Thank you Stefan and Kamermans - I had a customer running into this same issue today and this Answers post allowed me to avoid a ton of testing.

0 Karma

kamermans
Path Finder

Btw, your regex still fails in my environment due to the greedy "." issue - I have nested records, and without ".?" it starts the record at the last "{" instead of the first one. For the sake of posterity, here is the most efficient regex I've found for this problem "s/^[^{]+//". Note that I was able to get things working, but it seems INDEXED_EXTRACTIONS=json will not work with a custom SEDCMD or LINE_BREAKER. I will post this as another question.

s2_splunk
Splunk Employee
Splunk Employee

You are on the right track. There are multiple ways of getting data in and the UI is OK if you want to index files from the same server. If the data comes from a different box, you'll want to use a universal forwarder to watch a file/directory and forward to your indexer. Depends on your architecture.
I'd recommend watching this for starters: http://www.splunk.com/view/education-videos/SP-CAAAGB6
and reading through this http://docs.splunk.com/Documentation/Splunk/latest/Data/WhatSplunkcanmonitor for more details on the various options.

0 Karma

kamermans
Path Finder

Wow, thanks for verifying! I must be doing something noob-ish as I am just getting started with Splunk. From the GUI I'm going to "Add Data" -> "From files and dirs...", then I choose a file and Preview the data, which brings up a dialog to specify the format, where I choose JSON and drop in the edits to props.conf. How should I be loading the data into Splunk (using the file monitor method)?

I'm using Splunk Enterprise 6.1 on my own server(s).

s2_splunk
Splunk Employee
Splunk Employee

And it works just as well with your SEDCMD/RegEx

0 Karma

s2_splunk
Splunk Employee
Splunk Employee

Try this:

SEDCMD-StripHeader = s/^.*(\{.*$)/\1/

This should remove your prefix with anything after (and including) the opening '{' up until the end of line.
Note the use of a capture group for the stuff you want to keep.

suarezry
Builder

Just fyi, like kamermans, I found this match to be greedy, it prefer to go to the last '{'. I have used his suggested "SEDCMD-stripjsonheader = s/^[^{]+//" and it worked better for me.

0 Karma

reswob4
Builder

Thanks for this discussion. Helped big time solve my problem with the same issue.

0 Karma

kamermans
Path Finder

Please note that the greedy-ness did need to be removed by using ".*?". Also, this technique works for KV_MODE=json but not INDEXED_EXTRACTION=json, which I need to use. I've opened a new question for that: http://answers.splunk.com/answers/145388/indexed_extractionsjson-with-transform

0 Karma

s2_splunk
Splunk Employee
Splunk Employee

See my answer below. What version of Splunk are you on?

0 Karma

kamermans
Path Finder

Ah, very good point on the greedy thing, I do indeed need it to be greedy - thanks 🙂

Unfortunately, I'm getting the same error "had parsing error: Unexpected character while looking for value: 'J'" using your suggestion verbatim, which is strange because the 'J' should be gone in any case. I restarted splunkd after the change and attempted a new file load with that props.conf. Notice that although the data preview failed, I still tried to continue just in case the preview is different than the actual parsing.

0 Karma

s2_splunk
Splunk Employee
Splunk Employee

I beg to respectfully differ. 🙂
I believe your RegEx only matches 'J', because you used a lazy match (?).
So, if you like your RegEx better, try it without the '?':

SEDCMD-stripjsonheader = s/^[^{]+//

Note that I also don't think you need the global flag, so I removed it.

0 Karma

kamermans
Path Finder

Thanks for the quick response! Unfortunately my regex was correct - it replaces the prefix with an empty string, which is faster than capturing the relevant part and replacing the entire string with that capture group. In addition, your first .* is "greedy", so it will prefer to go to the last "{" instead of the first one. Anyway, I have tested my regex using the "unstructured" data type and I can see that it trims properly, it's just that the JSON line parser seems to be parsing the data before the SED replacement occurs or something. Maybe there is a TRANSFORM required or something?

Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...