Getting Data In

How to index full json data and automatically extract fields without using field extraction

splunkt0n
New Member

Here's the format of the data i have been working on. i've tried using INDEXED_EXTRACTIONS=JSON in props but the event data is lesser than expected.

   {
            "d": {
                "results": [{
                        "__metadata": {
                            "id": "http://sapuri('123456789')",
                            "uri": "sapuri('123456789')",
                            "type": "sapuri"
                        },
                        "DATETIME": "05/05/2016 18:34:40",
                        "System_ID": "DE1",
                        "Client_ID": "200",
                        "SO_Datetime": "05/05/2016 18:34:40",
                        "SO_Number": "123456789",
                        "SO_Item": "000010",
                        "SO_Type": "ANOR",
                        "PO_Num": "",
                        "Sales_Organization": "NP01",
                        "Distribution_Channel": "01",
                        "Division": "01",
                        "Sales_Office": "",
                        "Sales_Group": "",
                        "Delivery_Block": "",
                        "Requested_Delivery_Date": "05/05/2016",
                        "Order_Reason": "301",
                        "Header_Net_Value": "        100.00",
                        "Currency": "USD",
                        "Product_Number": "000000000000123456",
                        "Product_Description": "sample-product description",
                        "Order_Quantity": "         1.000",
                        "Sales_Unit": "DOS",
                        "Item_Net_Value": "        100.00",
                        "Cost_Value": "        0.00",
                        "Tax_Value": "        7.00",
                        "Rejection_Code": "",
                        "Billing_Block": "",
                        "Pricing_Procedure": "SAMPLE",
                        "PO_Type": "SAMP",
                        "Cust_Material": "",
                        "Item_Category": "SAMP",
                        "Delivery_Quantity": "0.000 ",
                        "Confirmed_Quantity": "1.000 ",
                        "Plant": "7001",
                        "Customer_Number": "2000010281",
                        "Address_Code": "0002429053",
                        "Customer_Name": "abcdefghijklmnop",
                        "House_Number": "",
                        "Street": "qrstuvwxyz",
                        "City": "MIAMI",
                        "Region": "FL",
                        "Country_Code": "US",
                        "Post_Code": "33586-2008",
                        "Status_Txt": "Billed",
                        "Status_ID": "4",
                        "DN_Number": "",
                        "DN_Item": "",
                        "DN_Date": "",
                        "DN_Item_Date": "",
                        "DN_Material_Num": "",
                        "DN_Quantity": "",
                        "DN_Werks": "",
                        "DN_Point": "",
                        "DN_Type": "",
                        "DN_Route": "",
                        "DN_Bill_Lading": "",
                        "DN_Shipping_Date": "",
                        "DN_Ext_Delivery_Num": "",
                        "DN_Route_Schedule": "",
                        "DN_Billing_Date": "",
                        "Bill_Doc": "8123456727",
                        "Bill_Item": "123410",
                        "Bill_Fiscal_Year": "0000",
                        "Bill_Company_Code": "2250",
                        "Bill_Sales_Org": "AB01",
                        "Bill_Dist_Channel": "01",
                        "Bill_Quantity": "1.000 ",
                        "Bill_Sales_Unit": "DOS",
                        "Bill_Material_Num": "00123456000102970",
                        "Bill_Type": "aNF1",
                        "Bill_Date": "12/05/2016",
                        "Bill_Createdate": "05/05/2016 18:38:56",
                        "Bill_Item_date": "05/05/2016 18:38:56",
                        "Bill_Net_Value": "300.00 ",
                        "Bill_Payer": "4000014278",
                        "Bill_Sold_To_Party": "2000010281",
                        "Bill_Cancelled": "",
                        "Bill_Ref_Doc": "123457178913",
                        "Bill_Sales_Doc": "11235678113",
                        "Bill_Plant": "7001",
                        "Bill_Item_Net_Value": "100.00 ",
                        "Accounting_Number": ""
                    }
                ]
            }
        }
0 Karma

p_gurav
Champion

Hi,

Please try below settings in props.conf:

[sourcetype]
BREAK_ONLY_BEFORE = ^{
DATETIME_CONFIG =
NO_BINARY_CHECK = true
TIME_PREFIX = "DATETIME": "

0 Karma

deepashri_123
Motivator

Hi splunkt0n,

You can make the following changes in your props.conf:

[sourcetype]
INDEXED_EXTRACTIONS = NONE
KV_MODE = json
TRUNCATE = 0
MUST_BREAK_AFTER = ]

Let me know if this helps!!!

0 Karma

splunkt0n
New Member

Thanks mate! but this doesn't work.

0 Karma

nickhills
Ultra Champion

What does you data look like once its been indexed? - Is it properly rendered as json in search?

If my comment helps, please give it a thumbs up!
0 Karma

splunkt0n
New Member

Hi nickhillscpl,

no it wasn't rendered as json, but the fields were extracted properly and the number of events does not match.

0 Karma

nickhills
Ultra Champion

How many events are included in each json block?
Since you individual list of keys is quite large, if you list has more than a few items, its possible you are tripping the line breaker limit, which will render the json as a big block of unformatted text in search, and will not extract all items.

Try this search to confirm:
index=_internal LineBreakingProcessor Truncating

If my comment helps, please give it a thumbs up!
0 Karma

splunkt0n
New Member

Thanks for this, yep it looks like it exceeds the limit. how can I increase the limit of the line breaker?

0 Karma

nickhills
Ultra Champion

In your props.conf on the heavy forwarder/indexer add TRUNCATE = 0 which removes the limit.

Obviously, you should keep an eye on this, because massive numbers of events can impact performance, so ideally you would set the truncate value to something just above your maximum anticipated size.

If my comment helps, please give it a thumbs up!

splunkt0n
New Member

Thanks nick, i've added a TRUNCATE in the props.conf and the linebreaking warning is gone. but in the sourcetype preview all events are in the same row and i'm seeing just one row.

0 Karma

nickhills
Ultra Champion

You might need to configure a linebreaker regex if Splunk cant spot the different events.

You could try LINE_BREAKER = (\}\]\}\})

Which will look for the closing parentheses }]}} and then create a new event.
NB if your json has spaces you may need to adjust the regex accordingly

http://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf#Line_breaking

If my comment helps, please give it a thumbs up!
0 Karma

splunkt0n
New Member

thanks Nick, but this doesn't work either

0 Karma

nickhills
Ultra Champion

Can you try taking a complete event (ideally a selection of events) and run them through a JSon validator like https://jsonlint.com

If my comment helps, please give it a thumbs up!
0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...