Splunk Search

Splunk not able to extract fields properly

janesh222
Engager

Hi Splunk Experts, 

I have some data coming into splunk which has the following format: 

[{"columns":[{"text":"id","type":"string"},{"text":"event","type":"number"},{"text":"delays","type":"number"},{"text":"drops","type":"number"}],"rows":[["BM0077",35602782,3043.01,0],["BM1604",2920978,4959.1,2],["BM1612",2141607,5623.3,6],["BM2870",41825122,2545.34,7],["BM1834",74963092,2409.0,8],["BM0267",86497692,1804.55,44],["BM059",1630092,5684.5,0]],"type":"table"}]

 

I tried to extract each field so that each value  corresponds to id,event,delays and drops as a table using the below command. 

 

index=result  | rex field=_raw max_match=0 "\[\"(?<id>[^\"]+)\",\s*(?<event>\d+),\s*(?<delays>\d+\.\d+),\s*(?<drops>\d+)" | table id

 event delays drops 

 

I get the result in table format , however it spits out as one whole table and not individual entries and I cannot manipulate the result.  I have tried using mvexpand , however it can only do for one value, so have not been helpful as well. 

 

Does anyone know how we can properly get the table in splunk . 

Labels (3)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

This is a task for spath and JSON functions, not for rex.  Important: Do not treat structured data as strings.  But most importantly, if there is a worse data design in JSON than this, I haven't seen it before. (Trust me, I've seen many bad JSON designs.)  Your developers really go to lengths to show the world how lazy they can be.  After lengthy reverse engineering, I cannot decide whether this is the poorest transliteration of an existing data table into JSON, or the worst imagined construction of a data table in JSON.  Using two separate arrays to describe row and column saves a little bit of space, but unlike in a SQL database, this makes search less efficient. (Unless the application reads the whole thing and reconstruct data into an in-memory SQL database.  It also wastes text to describe data type in each column when you JSON format has enough datatypes to include strings and numbers.

Enough ranting.  If you have any influence over your developers, make them change the event format to something like this instead:

 

[{"id":"BM0077","event":35602782,"delays":3043.01,"drops":0},{"id":"BM0267","event":86497692,"delays":1804.55,"drops":44},{"id":"BM059","event":1630092,"delays":5684.5,"drops":0},{"id":"BM1604","event":2920978,"delays":4959.1,"drops":2},{"id":"BM1612","event":2141607,"delays":5623.3,"drops":6},{"id":"BM1834","event":74963092,"delays":2409,"drops":8},{"id":"BM2870","event":41825122,"delays":2545.34,"drops":7}]

 

With properly designed JSON data, extraction can be simple as 

 

``` extract from well-designed JSON ```
| spath path={}
| mvexpand {}
| spath input={}
| fields - _* {}

 

Output will your sample data (see emulation below) will be

delaysdropseventid
3043.01035602782BM0077
1804.554486497692BM0267
5684.501630092BM059
4959.122920978BM1604
5623.362141607BM1612
2409874963092BM1834
2545.34741825122BM2870

 This is how to reach good JSON structure from the sample's bad structure:

 

``` transform from bad JSON to well-designed JSON ```
| spath path={}
| mvexpand {}
| fromjson {}
| mvexpand rows
| eval idx = mvrange(0, mvcount(columns))
| eval data = json_object()
| foreach idx mode=multivalue
    [eval row = mvindex(json_array_to_mv(rows), <<ITEM>>), data = json_set(data, json_extract(mvindex(columns, <<ITEM>>), "text"), row)]
| stats values(data) as _raw
| eval _raw = mv_to_json_array(_raw, true())

 

But before your developer can change their mind, you can do one of the following.

1. String together the bad-JSON-to-good-JSON transformation and normal extraction

 

``` transform, then extract ```
``` transform from bad JSON to well-designed JSON ```
| spath path={}
| mvexpand {}
| fromjson {}
| mvexpand rows
| eval idx = mvrange(0, mvcount(columns))
| eval data = json_object()
| foreach idx mode=multivalue
    [eval row = mvindex(json_array_to_mv(rows), <<ITEM>>), data = json_set(data, json_extract(mvindex(columns, <<ITEM>>), "text"), row)]
| stats values(data) as _raw
| eval _raw = mv_to_json_array(_raw, true())
``` extract from well-designed JSON ```
| spath path={}
| mvexpand {}
| spath input={}
| fields - _* {}

 

2. Transform the poorly designed JSON into table format with a little help from kv aka extract, like this:

 

``` directly handle bad design ```
| spath path={}
| mvexpand {}
| fromjson {}
| fields - _* {} type
| mvexpand rows
| eval idx = mvrange(0, mvcount(columns))
``` the above is the same as transformation ```
| foreach idx mode=multivalue
    [eval _raw = mvappend(_raw, json_extract(mvindex(columns, <<ITEM>>), "text") . "=" . mvindex(json_array_to_mv(rows), <<ITEM>>))]
| fields - rows columns idx
| extract

 

This will give you the same output as above.

Here is a data emulation you can play with the above methods and compare with real data

 

| makeresults
| eval _raw = "[{\"columns\":[{\"text\":\"id\",\"type\":\"string\"},{\"text\":\"event\",\"type\":\"number\"},{\"text\":\"delays\",\"type\":\"number\"},{\"text\":\"drops\",\"type\":\"number\"}],\"rows\":[[\"BM0077\",35602782,3043.01,0],[\"BM1604\",2920978,4959.1,2],[\"BM1612\",2141607,5623.3,6],[\"BM2870\",41825122,2545.34,7],[\"BM1834\",74963092,2409.0,8],[\"BM0267\",86497692,1804.55,44],[\"BM059\",1630092,5684.5,0]],\"type\":\"table\"}]"
``` data emulation above ```

 

A final note: spath has some difficulty handling array of arrays, so I used fromjson (available since 9.0) in one filter.

View solution in original post

yuanliu
SplunkTrust
SplunkTrust

This is a task for spath and JSON functions, not for rex.  Important: Do not treat structured data as strings.  But most importantly, if there is a worse data design in JSON than this, I haven't seen it before. (Trust me, I've seen many bad JSON designs.)  Your developers really go to lengths to show the world how lazy they can be.  After lengthy reverse engineering, I cannot decide whether this is the poorest transliteration of an existing data table into JSON, or the worst imagined construction of a data table in JSON.  Using two separate arrays to describe row and column saves a little bit of space, but unlike in a SQL database, this makes search less efficient. (Unless the application reads the whole thing and reconstruct data into an in-memory SQL database.  It also wastes text to describe data type in each column when you JSON format has enough datatypes to include strings and numbers.

Enough ranting.  If you have any influence over your developers, make them change the event format to something like this instead:

 

[{"id":"BM0077","event":35602782,"delays":3043.01,"drops":0},{"id":"BM0267","event":86497692,"delays":1804.55,"drops":44},{"id":"BM059","event":1630092,"delays":5684.5,"drops":0},{"id":"BM1604","event":2920978,"delays":4959.1,"drops":2},{"id":"BM1612","event":2141607,"delays":5623.3,"drops":6},{"id":"BM1834","event":74963092,"delays":2409,"drops":8},{"id":"BM2870","event":41825122,"delays":2545.34,"drops":7}]

 

With properly designed JSON data, extraction can be simple as 

 

``` extract from well-designed JSON ```
| spath path={}
| mvexpand {}
| spath input={}
| fields - _* {}

 

Output will your sample data (see emulation below) will be

delaysdropseventid
3043.01035602782BM0077
1804.554486497692BM0267
5684.501630092BM059
4959.122920978BM1604
5623.362141607BM1612
2409874963092BM1834
2545.34741825122BM2870

 This is how to reach good JSON structure from the sample's bad structure:

 

``` transform from bad JSON to well-designed JSON ```
| spath path={}
| mvexpand {}
| fromjson {}
| mvexpand rows
| eval idx = mvrange(0, mvcount(columns))
| eval data = json_object()
| foreach idx mode=multivalue
    [eval row = mvindex(json_array_to_mv(rows), <<ITEM>>), data = json_set(data, json_extract(mvindex(columns, <<ITEM>>), "text"), row)]
| stats values(data) as _raw
| eval _raw = mv_to_json_array(_raw, true())

 

But before your developer can change their mind, you can do one of the following.

1. String together the bad-JSON-to-good-JSON transformation and normal extraction

 

``` transform, then extract ```
``` transform from bad JSON to well-designed JSON ```
| spath path={}
| mvexpand {}
| fromjson {}
| mvexpand rows
| eval idx = mvrange(0, mvcount(columns))
| eval data = json_object()
| foreach idx mode=multivalue
    [eval row = mvindex(json_array_to_mv(rows), <<ITEM>>), data = json_set(data, json_extract(mvindex(columns, <<ITEM>>), "text"), row)]
| stats values(data) as _raw
| eval _raw = mv_to_json_array(_raw, true())
``` extract from well-designed JSON ```
| spath path={}
| mvexpand {}
| spath input={}
| fields - _* {}

 

2. Transform the poorly designed JSON into table format with a little help from kv aka extract, like this:

 

``` directly handle bad design ```
| spath path={}
| mvexpand {}
| fromjson {}
| fields - _* {} type
| mvexpand rows
| eval idx = mvrange(0, mvcount(columns))
``` the above is the same as transformation ```
| foreach idx mode=multivalue
    [eval _raw = mvappend(_raw, json_extract(mvindex(columns, <<ITEM>>), "text") . "=" . mvindex(json_array_to_mv(rows), <<ITEM>>))]
| fields - rows columns idx
| extract

 

This will give you the same output as above.

Here is a data emulation you can play with the above methods and compare with real data

 

| makeresults
| eval _raw = "[{\"columns\":[{\"text\":\"id\",\"type\":\"string\"},{\"text\":\"event\",\"type\":\"number\"},{\"text\":\"delays\",\"type\":\"number\"},{\"text\":\"drops\",\"type\":\"number\"}],\"rows\":[[\"BM0077\",35602782,3043.01,0],[\"BM1604\",2920978,4959.1,2],[\"BM1612\",2141607,5623.3,6],[\"BM2870\",41825122,2545.34,7],[\"BM1834\",74963092,2409.0,8],[\"BM0267\",86497692,1804.55,44],[\"BM059\",1630092,5684.5,0]],\"type\":\"table\"}]"
``` data emulation above ```

 

A final note: spath has some difficulty handling array of arrays, so I used fromjson (available since 9.0) in one filter.

janesh222
Engager

Thanks a lot yuanli, 

 

That worked , you are a genuis. I thought I could never structure it in splunk. 

I did speak to the dev team , apparently the json is structured in that way to feed a particular dashboard system that they use. So it has to be in that structure for that system to consume. However they have agreed to update the structure in the next release which could be a few months away (6 months atleast) . So in the mean time I could work with this bad json until then. 

 

Thanks  a lot again. I had searched in splunk for something like this before and havent seen anything. 

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...