Getting Data In

How to configure line breaking for my sample JSON event?

manderson7
Contributor

I've been through this thread: https://answers.splunk.com/answers/295142/line-breaker-in-single-line-printed-json-doc.html
without any success.

I have JSON data coming in as 1 event, and I need it broken up into the individual events seen on the page.

{"id":588699,"name":null,"status":{"id":2963,"name":"Handled"},"priority":{"id":2873,"name":"Urgent"},"queue":{"id":2144,"name":"Default"},"description":null,"assigned_to":{"id":4120,"username":"user4@company.com"},"asset_list":{"id":4777,"name":"Info Security Threat_Splunk"},"advisory":{"id":199003,"advisory_identifier":"SA74447","title":"Blue Coat Security Analytics Multiple Vulnerabilities","released":"2016-12-21T15:24:53Z","modified_date":"2016-12-21T15:24:53Z","criticality":2,"criticality_description":"Highly critical","solution_status":4,"solution_status_description":"Partial Fix","where":1,"where_description":"From remote","cvss_score":10.0,"cvss_vector":"(AV:N/AC:L/Au:N/C:C/I:C/A:C/E:U/RL:TF/RC:C)","type":0,"is_zero_day":false},"created":"2016-12-21T15:33:09Z","pretty_id":79,"custom_score":null,"last_updated":"2016-12-21T15:40:28Z"},{"id":584252,"name":null,"status":{"id":2963,"name":"Handled"},"priority":{"id":2873,"name":"Urgent"},"queue":{"id":2144,"name":"Default"},"description":null,"assigned_to":{"id":4118,"username":"user3@company.com"},"asset_list":{"id":4657,"name":"PSS Middleware Environment"},"advisory":{"id":195840,"advisory_identifier":"SA73221","title":"Oracle Solaris Multiple Third Party Components Multiple Vulnerabilities","released":"2016-10-19T14:20:02Z","modified_date":"2016-12-19T14:42:30Z","criticality":2,"criticality_description":"Highly critical","solution_status":2,"solution_status_description":"Vendor Patched","where":1,"where_description":"From remote","cvss_score":10.0,"cvss_vector":"(AV:N/AC:L/Au:N/C:C/I:C/A:C/E:U/RL:OF/RC:C)","type":0,"is_zero_day":false},"created":"2016-12-20T13:43:24Z","pretty_id":76,"custom_score":null,"last_updated":"2017-01-11T19:47:09Z"},"

So Secunia is dropping a comma in between each event, and I've read that makes Splunk not read the data as _json. Is that accurate? I've tried regex to use the close bracket, comma and open bracket, without luck.
I also need help in getting the timestamp to map to the created timestamp in the event.
Any help you can provide will be useful. Thank you.

0 Karma
1 Solution

woodcock
Esteemed Legend

If the problem is an extra comma between the events, then you just need to adjust your LINE_BREAKER for this sourcetype in props.conf:

[YourSourcetypeHere]
LINE_BREAKER = }(,){"id":

View solution in original post

mattymo
Splunk Employee
Splunk Employee

The best bet for this topic is to either

1) use the REST API modular input to call the endpoint and create an event handler to parse this data so that Splunk has a better time ingesting or

2) preparse with something like jq to split out the one big json blob into smaller pieces so you get the event breaking you want but maintain the json structure - throw ur entire blob in here https://jqplay.org and see if you can break it out the way you want.

Can you share an entire event on pastebin or Slack? Then we can play with the jq syntax and get it rocking

- MattyMo

anilsharmahk
New Member

No need to use any additional rest api /python .
Just props.conf
specially Line_breaking and SED commands

i had complex jason with multiple events key value pairs. All sorted in splunk props.conf

0 Karma

manderson7
Contributor

I've tried the REST API modular input, and I wasn't able to make it split up the events on each page, nor could I bring in the events from all pages (ungood at python), so I made a bash script instead. That's showing problems in this regard, as I can't pull out the comma's. If it's not one thing it's 5. Here's the pastebin dump, cleaned up and our company data removed: http://pastebin.com/EJpmwz77

0 Karma

mattymo
Splunk Employee
Splunk Employee

you just want each of these as events, right?

    {
      "id": 592594,
      "name": null,
      "status": {
        "id": 2963,
        "name": "Handled"
      },
      "priority": {
        "id": 2864,
        "name": "Low"
      },
      "queue": {
        "id": 2144,
        "name": "Default"
      },
      "description": null,
      "assigned_to": {
        "id": 2426,
        "username": "user1@company.com"
      },
      "asset_list": null,
      "advisory": {
        "id": 199009,
        "advisory_identifier": "SA74347",
        "title": "VMware ESXi Script Insertion Vulnerability",
        "released": "2016-12-21T22:28:44Z",
        "modified_date": "2016-12-21T22:28:44Z",
        "criticality": 4,
        "criticality_description": "Less critical",
        "solution_status": 2,
        "solution_status_description": "Vendor Patched",
        "where": 1,
        "where_description": "From remote",
        "cvss_score": 4.3,
        "cvss_vector": "(AV:N/AC:M/Au:N/C:N/I:P/A:N/E:U/RL:OF/RC:C)",
        "type": 0,
        "is_zero_day": false
      }

do you care about this piece?

{
"count": 66,
"next": "https://api.app.secunia.com/api/tickets/?page=2",
"previous": null,
"results": [

- MattyMo
0 Karma

manderson7
Contributor

I don't need that in Splunk, no. The rest is what I need.

0 Karma

mattymo
Splunk Employee
Splunk Employee

ok cool...so might need to catch you in Slack to give you the full tour (though I am a jq n00b too), but heres what I did.

I took your json and threw it in jqplay then used its magic syntax, which is like sed/awk for json, and i told it to "unwrap" the results array so that we can get something splunk can eat as single events...

.results[]

alt text

I know, i know..ugh another thing to learn...but this one is worth it, and like you, i no speaka good python, so this is easier for me than learning python to create a handler for the mod input...

Anwyays, I took the ouput, saved it to file then ran it thru add data wiz...looking good to me...so basically you would just run the api results thru jq as part of your script to ingest...

alt text

- MattyMo

woodcock
Esteemed Legend

@mmodestino; this would make a great blog or conference session. Is there a way that I can see you run through the whole process start to finish?

0 Karma

mattymo
Splunk Employee
Splunk Employee

I could gladly help with notes on the learning curve the from the perspective of someone who knows just enough to get into trouble 😉 I think I know a few splunkers that could so something like that justice...

- MattyMo
0 Karma

manderson7
Contributor

Yes, this. I'm not sure if I just add jq .results[] to my bash script or what exactly. Thanks for everyones help so far.

0 Karma

mattymo
Splunk Employee
Splunk Employee

yeah you need to install jq on the os. you are working on nix right?

Its worth it because its less a point solution and more an other spell in the voodoo book

- MattyMo
0 Karma

manderson7
Contributor

So that was easy. I added jq to /usr/sbin/, added | jq .results[] to my echo statement and ran the script. Data came in as normal json. How did you know to use .results[]?

0 Karma

mattymo
Splunk Employee
Splunk Employee

banged head table till it worked 😉

https://jqplay.org/ - cheat sheet at the bottom of the page
https://stedolan.github.io/jq/tutorial/ - couple simple tutorials on how to learn the syntax/capabilities.

Spent a night trying to wrastle the same thing for another data source. basically telling it to unwrap the array...the powers are amazing, you can literally format the json any which way you want.

- MattyMo
0 Karma

woodcock
Esteemed Legend

If the problem is an extra comma between the events, then you just need to adjust your LINE_BREAKER for this sourcetype in props.conf:

[YourSourcetypeHere]
LINE_BREAKER = }(,){"id":

manderson7
Contributor

I'm going through the Add Data wizard in Splunk, and I'm selecting the _json sourcetype, then adding LINE_BREAKER of what you suggested, but that doesn't break the events up. Do I need to add something additional? When I try and create a new sourcetype, without using json, I've tried LINE_BREAKER with that value, as well as BREAK_ONLY_BEFORE, which was added when I put that in the REGEX portion of Event Breaks.

0 Karma

woodcock
Esteemed Legend

You should need ONLY LINE_BREAKER and SHOULD_LINEMERGE=false.

0 Karma

manderson7
Contributor

So I created the sourcetype in the GUI w/ those settings, and after searching the data, it all goes into one event. I edit the props.conf and added SHOULD_LINEMERGE=false to the stanza, restarted splunk, and the events are still in one event. Editing the sourcetype in the GUI shows SHOULD_LINEMERGE = true, but in props it shows as false. What do?

0 Karma

woodcock
Esteemed Legend

Checklist:
1) Did you deploy this to all the Indexers (or to the forwarders if INDEXED_EXTRACTIONS)?
2) Did you restart splunk after deploying?
3) Are you looking at old events (they will stay broken forever; you need to look ONLY at events that were forwarded after steps 1-2)?
4) Are you sure that your timestamping is correct (i.e. repaired events may be in a different time window, even in the future, so that you are looking for them but they are somewhere else in the wrong place)?

0 Karma

manderson7
Contributor

I'm doing this on a test splunk install, so all on one box. Restarted Splunk. Looks like I needed to re-index the data, that worked. Unfortunately, it didn't prettify some of the events, specifically the events at the beginning of a page. They stay in the faux json, I suppose because the additional count and page lines.

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Will this work for your data (It works for the string you provided)?:

sed -e "s/},{/}+{/" -e "s/}[^}]*$/}/" z.log | tr "+" "\n"

If the rest of your data will work with this, then you should be fine. Without seeing more data it is hard to tell if it will work or not in a more general case. You may have to change the + character to something else if your data may contain a + in it. There are some other ways to do that as well, but this way is pretty clean.

cpetterborg
SplunkTrust
SplunkTrust

Splunk is pretty picky about the formatting of the JSON data. For example, if you have one character before or after the beginning and ending curly braces ('{}'), Splunk will not do the JSON formatted output when you search the data. If their JSON strings are not formatted just right, that could be the reason that you are not seeing the results that you would like to see. Give that what I'm seeing is a }," at the end of your JSON string, I would think that would be the case here. http://jsonlint.com shows an error that there are two JSON strings on the line, which doesn't bode well for Splunk to work with either.

Can you modify the JSON before you bring it into Splunk, or is this data just how it comes from the application and the file being indexed is immutable? Either way, you will probably have to set things up to modify the JSON data so that Splunk can parse it properly and index it as JSON data.

0 Karma
Get Updates on the Splunk Community!

Harnessing Splunk’s Federated Search for Amazon S3

Managing your data effectively often means balancing performance, costs, and compliance. Splunk’s Federated ...

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...