- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How to index full json data and automatically extract fields without using field extraction

Here's the format of the data i have been working on. i've tried using INDEXED_EXTRACTIONS=JSON in props but the event data is lesser than expected.
{
"d": {
"results": [{
"__metadata": {
"id": "http://sapuri('123456789')",
"uri": "sapuri('123456789')",
"type": "sapuri"
},
"DATETIME": "05/05/2016 18:34:40",
"System_ID": "DE1",
"Client_ID": "200",
"SO_Datetime": "05/05/2016 18:34:40",
"SO_Number": "123456789",
"SO_Item": "000010",
"SO_Type": "ANOR",
"PO_Num": "",
"Sales_Organization": "NP01",
"Distribution_Channel": "01",
"Division": "01",
"Sales_Office": "",
"Sales_Group": "",
"Delivery_Block": "",
"Requested_Delivery_Date": "05/05/2016",
"Order_Reason": "301",
"Header_Net_Value": " 100.00",
"Currency": "USD",
"Product_Number": "000000000000123456",
"Product_Description": "sample-product description",
"Order_Quantity": " 1.000",
"Sales_Unit": "DOS",
"Item_Net_Value": " 100.00",
"Cost_Value": " 0.00",
"Tax_Value": " 7.00",
"Rejection_Code": "",
"Billing_Block": "",
"Pricing_Procedure": "SAMPLE",
"PO_Type": "SAMP",
"Cust_Material": "",
"Item_Category": "SAMP",
"Delivery_Quantity": "0.000 ",
"Confirmed_Quantity": "1.000 ",
"Plant": "7001",
"Customer_Number": "2000010281",
"Address_Code": "0002429053",
"Customer_Name": "abcdefghijklmnop",
"House_Number": "",
"Street": "qrstuvwxyz",
"City": "MIAMI",
"Region": "FL",
"Country_Code": "US",
"Post_Code": "33586-2008",
"Status_Txt": "Billed",
"Status_ID": "4",
"DN_Number": "",
"DN_Item": "",
"DN_Date": "",
"DN_Item_Date": "",
"DN_Material_Num": "",
"DN_Quantity": "",
"DN_Werks": "",
"DN_Point": "",
"DN_Type": "",
"DN_Route": "",
"DN_Bill_Lading": "",
"DN_Shipping_Date": "",
"DN_Ext_Delivery_Num": "",
"DN_Route_Schedule": "",
"DN_Billing_Date": "",
"Bill_Doc": "8123456727",
"Bill_Item": "123410",
"Bill_Fiscal_Year": "0000",
"Bill_Company_Code": "2250",
"Bill_Sales_Org": "AB01",
"Bill_Dist_Channel": "01",
"Bill_Quantity": "1.000 ",
"Bill_Sales_Unit": "DOS",
"Bill_Material_Num": "00123456000102970",
"Bill_Type": "aNF1",
"Bill_Date": "12/05/2016",
"Bill_Createdate": "05/05/2016 18:38:56",
"Bill_Item_date": "05/05/2016 18:38:56",
"Bill_Net_Value": "300.00 ",
"Bill_Payer": "4000014278",
"Bill_Sold_To_Party": "2000010281",
"Bill_Cancelled": "",
"Bill_Ref_Doc": "123457178913",
"Bill_Sales_Doc": "11235678113",
"Bill_Plant": "7001",
"Bill_Item_Net_Value": "100.00 ",
"Accounting_Number": ""
}
]
}
}
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Please try below settings in props.conf:
[sourcetype]
BREAK_ONLY_BEFORE = ^{
DATETIME_CONFIG =
NO_BINARY_CHECK = true
TIME_PREFIX = "DATETIME": "
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi splunkt0n,
You can make the following changes in your props.conf:
[sourcetype]
INDEXED_EXTRACTIONS = NONE
KV_MODE = json
TRUNCATE = 0
MUST_BREAK_AFTER = ]
Let me know if this helps!!!
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thanks mate! but this doesn't work.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

What does you data look like once its been indexed? - Is it properly rendered as json in search?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi nickhillscpl,
no it wasn't rendered as json, but the fields were extracted properly and the number of events does not match.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

How many events are included in each json block?
Since you individual list of keys is quite large, if you list has more than a few items, its possible you are tripping the line breaker limit, which will render the json as a big block of unformatted text in search, and will not extract all items.
Try this search to confirm:
index=_internal LineBreakingProcessor Truncating
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for this, yep it looks like it exceeds the limit. how can I increase the limit of the line breaker?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

In your props.conf on the heavy forwarder/indexer add TRUNCATE = 0
which removes the limit.
Obviously, you should keep an eye on this, because massive numbers of events can impact performance, so ideally you would set the truncate value to something just above your maximum anticipated size.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thanks nick, i've added a TRUNCATE in the props.conf and the linebreaking warning is gone. but in the sourcetype preview all events are in the same row and i'm seeing just one row.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

You might need to configure a linebreaker regex if Splunk cant spot the different events.
You could try LINE_BREAKER = (\}\]\}\})
Which will look for the closing parentheses }]}}
and then create a new event.
NB if your json has spaces you may need to adjust the regex accordingly
http://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf#Line_breaking
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

thanks Nick, but this doesn't work either
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Can you try taking a complete event (ideally a selection of events) and run them through a JSon validator like https://jsonlint.com
