Solved: Splunk not able to extract fields properly

janesh222 · ‎03-24-2024

Hi Splunk Experts,

I have some data coming into splunk which has the following format:

[{"columns":[{"text":"id","type":"string"},{"text":"event","type":"number"},{"text":"delays","type":"number"},{"text":"drops","type":"number"}],"rows":[["BM0077",35602782,3043.01,0],["BM1604",2920978,4959.1,2],["BM1612",2141607,5623.3,6],["BM2870",41825122,2545.34,7],["BM1834",74963092,2409.0,8],["BM0267",86497692,1804.55,44],["BM059",1630092,5684.5,0]],"type":"table"}]

I tried to extract each field so that each value corresponds to id,event,delays and drops as a table using the below command.

index=result  | rex field=_raw max_match=0 "\[\"(?<id>[^\"]+)\",\s*(?<event>\d+),\s*(?<delays>\d+\.\d+),\s*(?<drops>\d+)" | table id

event delays drops

I get the result in table format , however it spits out as one whole table and not individual entries and I cannot manipulate the result. I have tried using mvexpand , however it can only do for one value, so have not been helpful as well.

Does anyone know how we can properly get the table in splunk .

yuanliu · ‎03-25-2024

This is a task for spath and JSON functions, not for rex. Important: Do not treat structured data as strings. But most importantly, if there is a worse data design in JSON than this, I haven't seen it before. (Trust me, I've seen many bad JSON designs.) Your developers really go to lengths to show the world how lazy they can be. After lengthy reverse engineering, I cannot decide whether this is the poorest transliteration of an existing data table into JSON, or the worst imagined construction of a data table in JSON. Using two separate arrays to describe row and column saves a little bit of space, but unlike in a SQL database, this makes search less efficient. (Unless the application reads the whole thing and reconstruct data into an in-memory SQL database. It also wastes text to describe data type in each column when you JSON format has enough datatypes to include strings and numbers.

Enough ranting. If you have any influence over your developers, make them change the event format to something like this instead:

[{"id":"BM0077","event":35602782,"delays":3043.01,"drops":0},{"id":"BM0267","event":86497692,"delays":1804.55,"drops":44},{"id":"BM059","event":1630092,"delays":5684.5,"drops":0},{"id":"BM1604","event":2920978,"delays":4959.1,"drops":2},{"id":"BM1612","event":2141607,"delays":5623.3,"drops":6},{"id":"BM1834","event":74963092,"delays":2409,"drops":8},{"id":"BM2870","event":41825122,"delays":2545.34,"drops":7}]

With properly designed JSON data, extraction can be simple as

``` extract from well-designed JSON ```
| spath path={}
| mvexpand {}
| spath input={}
| fields - _* {}

Output will your sample data (see emulation below) will be

delays	drops	event	id
3043.01	0	35602782	BM0077
1804.55	44	86497692	BM0267
5684.5	0	1630092	BM059
4959.1	2	2920978	BM1604
5623.3	6	2141607	BM1612
2409	8	74963092	BM1834
2545.34	7	41825122	BM2870

This is how to reach good JSON structure from the sample's bad structure:

``` transform from bad JSON to well-designed JSON ```
| spath path={}
| mvexpand {}
| fromjson {}
| mvexpand rows
| eval idx = mvrange(0, mvcount(columns))
| eval data = json_object()
| foreach idx mode=multivalue
    [eval row = mvindex(json_array_to_mv(rows), <<ITEM>>), data = json_set(data, json_extract(mvindex(columns, <<ITEM>>), "text"), row)]
| stats values(data) as _raw
| eval _raw = mv_to_json_array(_raw, true())

But before your developer can change their mind, you can do one of the following.

1. String together the bad-JSON-to-good-JSON transformation and normal extraction

``` transform, then extract ```
``` transform from bad JSON to well-designed JSON ```
| spath path={}
| mvexpand {}
| fromjson {}
| mvexpand rows
| eval idx = mvrange(0, mvcount(columns))
| eval data = json_object()
| foreach idx mode=multivalue
    [eval row = mvindex(json_array_to_mv(rows), <<ITEM>>), data = json_set(data, json_extract(mvindex(columns, <<ITEM>>), "text"), row)]
| stats values(data) as _raw
| eval _raw = mv_to_json_array(_raw, true())
``` extract from well-designed JSON ```
| spath path={}
| mvexpand {}
| spath input={}
| fields - _* {}

2. Transform the poorly designed JSON into table format with a little help from kv aka extract, like this:

``` directly handle bad design ```
| spath path={}
| mvexpand {}
| fromjson {}
| fields - _* {} type
| mvexpand rows
| eval idx = mvrange(0, mvcount(columns))
``` the above is the same as transformation ```
| foreach idx mode=multivalue
    [eval _raw = mvappend(_raw, json_extract(mvindex(columns, <<ITEM>>), "text") . "=" . mvindex(json_array_to_mv(rows), <<ITEM>>))]
| fields - rows columns idx
| extract

This will give you the same output as above.

Here is a data emulation you can play with the above methods and compare with real data

| makeresults
| eval _raw = "[{\"columns\":[{\"text\":\"id\",\"type\":\"string\"},{\"text\":\"event\",\"type\":\"number\"},{\"text\":\"delays\",\"type\":\"number\"},{\"text\":\"drops\",\"type\":\"number\"}],\"rows\":[[\"BM0077\",35602782,3043.01,0],[\"BM1604\",2920978,4959.1,2],[\"BM1612\",2141607,5623.3,6],[\"BM2870\",41825122,2545.34,7],[\"BM1834\",74963092,2409.0,8],[\"BM0267\",86497692,1804.55,44],[\"BM059\",1630092,5684.5,0]],\"type\":\"table\"}]"
``` data emulation above ```

A final note: spath has some difficulty handling array of arrays, so I used fromjson (available since 9.0) in one filter.

View solution in original post

yuanliu · ‎03-25-2024

This is a task for spath and JSON functions, not for rex. Important: Do not treat structured data as strings. But most importantly, if there is a worse data design in JSON than this, I haven't seen it before. (Trust me, I've seen many bad JSON designs.) Your developers really go to lengths to show the world how lazy they can be. After lengthy reverse engineering, I cannot decide whether this is the poorest transliteration of an existing data table into JSON, or the worst imagined construction of a data table in JSON. Using two separate arrays to describe row and column saves a little bit of space, but unlike in a SQL database, this makes search less efficient. (Unless the application reads the whole thing and reconstruct data into an in-memory SQL database. It also wastes text to describe data type in each column when you JSON format has enough datatypes to include strings and numbers.

Enough ranting. If you have any influence over your developers, make them change the event format to something like this instead:

[{"id":"BM0077","event":35602782,"delays":3043.01,"drops":0},{"id":"BM0267","event":86497692,"delays":1804.55,"drops":44},{"id":"BM059","event":1630092,"delays":5684.5,"drops":0},{"id":"BM1604","event":2920978,"delays":4959.1,"drops":2},{"id":"BM1612","event":2141607,"delays":5623.3,"drops":6},{"id":"BM1834","event":74963092,"delays":2409,"drops":8},{"id":"BM2870","event":41825122,"delays":2545.34,"drops":7}]

With properly designed JSON data, extraction can be simple as

``` extract from well-designed JSON ```
| spath path={}
| mvexpand {}
| spath input={}
| fields - _* {}

Output will your sample data (see emulation below) will be

delays	drops	event	id
3043.01	0	35602782	BM0077
1804.55	44	86497692	BM0267
5684.5	0	1630092	BM059
4959.1	2	2920978	BM1604
5623.3	6	2141607	BM1612
2409	8	74963092	BM1834
2545.34	7	41825122	BM2870

This is how to reach good JSON structure from the sample's bad structure:

``` transform from bad JSON to well-designed JSON ```
| spath path={}
| mvexpand {}
| fromjson {}
| mvexpand rows
| eval idx = mvrange(0, mvcount(columns))
| eval data = json_object()
| foreach idx mode=multivalue
    [eval row = mvindex(json_array_to_mv(rows), <<ITEM>>), data = json_set(data, json_extract(mvindex(columns, <<ITEM>>), "text"), row)]
| stats values(data) as _raw
| eval _raw = mv_to_json_array(_raw, true())

But before your developer can change their mind, you can do one of the following.

1. String together the bad-JSON-to-good-JSON transformation and normal extraction

``` transform, then extract ```
``` transform from bad JSON to well-designed JSON ```
| spath path={}
| mvexpand {}
| fromjson {}
| mvexpand rows
| eval idx = mvrange(0, mvcount(columns))
| eval data = json_object()
| foreach idx mode=multivalue
    [eval row = mvindex(json_array_to_mv(rows), <<ITEM>>), data = json_set(data, json_extract(mvindex(columns, <<ITEM>>), "text"), row)]
| stats values(data) as _raw
| eval _raw = mv_to_json_array(_raw, true())
``` extract from well-designed JSON ```
| spath path={}
| mvexpand {}
| spath input={}
| fields - _* {}

2. Transform the poorly designed JSON into table format with a little help from kv aka extract, like this:

``` directly handle bad design ```
| spath path={}
| mvexpand {}
| fromjson {}
| fields - _* {} type
| mvexpand rows
| eval idx = mvrange(0, mvcount(columns))
``` the above is the same as transformation ```
| foreach idx mode=multivalue
    [eval _raw = mvappend(_raw, json_extract(mvindex(columns, <<ITEM>>), "text") . "=" . mvindex(json_array_to_mv(rows), <<ITEM>>))]
| fields - rows columns idx
| extract

This will give you the same output as above.

Here is a data emulation you can play with the above methods and compare with real data

| makeresults
| eval _raw = "[{\"columns\":[{\"text\":\"id\",\"type\":\"string\"},{\"text\":\"event\",\"type\":\"number\"},{\"text\":\"delays\",\"type\":\"number\"},{\"text\":\"drops\",\"type\":\"number\"}],\"rows\":[[\"BM0077\",35602782,3043.01,0],[\"BM1604\",2920978,4959.1,2],[\"BM1612\",2141607,5623.3,6],[\"BM2870\",41825122,2545.34,7],[\"BM1834\",74963092,2409.0,8],[\"BM0267\",86497692,1804.55,44],[\"BM059\",1630092,5684.5,0]],\"type\":\"table\"}]"
``` data emulation above ```

A final note: spath has some difficulty handling array of arrays, so I used fromjson (available since 9.0) in one filter.

janesh222 · ‎03-25-2024

Thanks a lot yuanli,

That worked , you are a genuis. I thought I could never structure it in splunk.

I did speak to the dev team , apparently the json is structured in that way to feed a particular dashboard system that they use. So it has to be in that structure for that system to consume. However they have agreed to update the structure in the next release which could be a few months away (6 months atleast) . So in the mean time I could work with this bad json until then.

Thanks a lot again. I had searched in splunk for something like this before and havent seen anything.

Splunk not able to extract fields properly

field extraction

fields

table

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

SignalFlow: What? Why? How?