Hi Splunk Experts,
I have some data coming into splunk which has the following format:
[{"columns":[{"text":"id","type":"string"},{"text":"event","type":"number"},{"text":"delays","type":"number"},{"text":"drops","type":"number"}],"rows":[["BM0077",35602782,3043.01,0],["BM1604",2920978,4959.1,2],["BM1612",2141607,5623.3,6],["BM2870",41825122,2545.34,7],["BM1834",74963092,2409.0,8],["BM0267",86497692,1804.55,44],["BM059",1630092,5684.5,0]],"type":"table"}]
I tried to extract each field so that each value corresponds to id,event,delays and drops as a table using the below command.
index=result | rex field=_raw max_match=0 "\[\"(?<id>[^\"]+)\",\s*(?<event>\d+),\s*(?<delays>\d+\.\d+),\s*(?<drops>\d+)" | table id
event delays drops
I get the result in table format , however it spits out as one whole table and not individual entries and I cannot manipulate the result. I have tried using mvexpand , however it can only do for one value, so have not been helpful as well.
Does anyone know how we can properly get the table in splunk .
This is a task for spath and JSON functions, not for rex. Important: Do not treat structured data as strings. But most importantly, if there is a worse data design in JSON than this, I haven't seen it before. (Trust me, I've seen many bad JSON designs.) Your developers really go to lengths to show the world how lazy they can be. After lengthy reverse engineering, I cannot decide whether this is the poorest transliteration of an existing data table into JSON, or the worst imagined construction of a data table in JSON. Using two separate arrays to describe row and column saves a little bit of space, but unlike in a SQL database, this makes search less efficient. (Unless the application reads the whole thing and reconstruct data into an in-memory SQL database. It also wastes text to describe data type in each column when you JSON format has enough datatypes to include strings and numbers.
Enough ranting. If you have any influence over your developers, make them change the event format to something like this instead:
[{"id":"BM0077","event":35602782,"delays":3043.01,"drops":0},{"id":"BM0267","event":86497692,"delays":1804.55,"drops":44},{"id":"BM059","event":1630092,"delays":5684.5,"drops":0},{"id":"BM1604","event":2920978,"delays":4959.1,"drops":2},{"id":"BM1612","event":2141607,"delays":5623.3,"drops":6},{"id":"BM1834","event":74963092,"delays":2409,"drops":8},{"id":"BM2870","event":41825122,"delays":2545.34,"drops":7}]
With properly designed JSON data, extraction can be simple as
``` extract from well-designed JSON ```
| spath path={}
| mvexpand {}
| spath input={}
| fields - _* {}
Output will your sample data (see emulation below) will be
delays | drops | event | id |
3043.01 | 0 | 35602782 | BM0077 |
1804.55 | 44 | 86497692 | BM0267 |
5684.5 | 0 | 1630092 | BM059 |
4959.1 | 2 | 2920978 | BM1604 |
5623.3 | 6 | 2141607 | BM1612 |
2409 | 8 | 74963092 | BM1834 |
2545.34 | 7 | 41825122 | BM2870 |
This is how to reach good JSON structure from the sample's bad structure:
``` transform from bad JSON to well-designed JSON ```
| spath path={}
| mvexpand {}
| fromjson {}
| mvexpand rows
| eval idx = mvrange(0, mvcount(columns))
| eval data = json_object()
| foreach idx mode=multivalue
[eval row = mvindex(json_array_to_mv(rows), <<ITEM>>), data = json_set(data, json_extract(mvindex(columns, <<ITEM>>), "text"), row)]
| stats values(data) as _raw
| eval _raw = mv_to_json_array(_raw, true())
But before your developer can change their mind, you can do one of the following.
1. String together the bad-JSON-to-good-JSON transformation and normal extraction
``` transform, then extract ```
``` transform from bad JSON to well-designed JSON ```
| spath path={}
| mvexpand {}
| fromjson {}
| mvexpand rows
| eval idx = mvrange(0, mvcount(columns))
| eval data = json_object()
| foreach idx mode=multivalue
[eval row = mvindex(json_array_to_mv(rows), <<ITEM>>), data = json_set(data, json_extract(mvindex(columns, <<ITEM>>), "text"), row)]
| stats values(data) as _raw
| eval _raw = mv_to_json_array(_raw, true())
``` extract from well-designed JSON ```
| spath path={}
| mvexpand {}
| spath input={}
| fields - _* {}
2. Transform the poorly designed JSON into table format with a little help from kv aka extract, like this:
``` directly handle bad design ```
| spath path={}
| mvexpand {}
| fromjson {}
| fields - _* {} type
| mvexpand rows
| eval idx = mvrange(0, mvcount(columns))
``` the above is the same as transformation ```
| foreach idx mode=multivalue
[eval _raw = mvappend(_raw, json_extract(mvindex(columns, <<ITEM>>), "text") . "=" . mvindex(json_array_to_mv(rows), <<ITEM>>))]
| fields - rows columns idx
| extract
This will give you the same output as above.
Here is a data emulation you can play with the above methods and compare with real data
| makeresults
| eval _raw = "[{\"columns\":[{\"text\":\"id\",\"type\":\"string\"},{\"text\":\"event\",\"type\":\"number\"},{\"text\":\"delays\",\"type\":\"number\"},{\"text\":\"drops\",\"type\":\"number\"}],\"rows\":[[\"BM0077\",35602782,3043.01,0],[\"BM1604\",2920978,4959.1,2],[\"BM1612\",2141607,5623.3,6],[\"BM2870\",41825122,2545.34,7],[\"BM1834\",74963092,2409.0,8],[\"BM0267\",86497692,1804.55,44],[\"BM059\",1630092,5684.5,0]],\"type\":\"table\"}]"
``` data emulation above ```
A final note: spath has some difficulty handling array of arrays, so I used fromjson (available since 9.0) in one filter.
This is a task for spath and JSON functions, not for rex. Important: Do not treat structured data as strings. But most importantly, if there is a worse data design in JSON than this, I haven't seen it before. (Trust me, I've seen many bad JSON designs.) Your developers really go to lengths to show the world how lazy they can be. After lengthy reverse engineering, I cannot decide whether this is the poorest transliteration of an existing data table into JSON, or the worst imagined construction of a data table in JSON. Using two separate arrays to describe row and column saves a little bit of space, but unlike in a SQL database, this makes search less efficient. (Unless the application reads the whole thing and reconstruct data into an in-memory SQL database. It also wastes text to describe data type in each column when you JSON format has enough datatypes to include strings and numbers.
Enough ranting. If you have any influence over your developers, make them change the event format to something like this instead:
[{"id":"BM0077","event":35602782,"delays":3043.01,"drops":0},{"id":"BM0267","event":86497692,"delays":1804.55,"drops":44},{"id":"BM059","event":1630092,"delays":5684.5,"drops":0},{"id":"BM1604","event":2920978,"delays":4959.1,"drops":2},{"id":"BM1612","event":2141607,"delays":5623.3,"drops":6},{"id":"BM1834","event":74963092,"delays":2409,"drops":8},{"id":"BM2870","event":41825122,"delays":2545.34,"drops":7}]
With properly designed JSON data, extraction can be simple as
``` extract from well-designed JSON ```
| spath path={}
| mvexpand {}
| spath input={}
| fields - _* {}
Output will your sample data (see emulation below) will be
delays | drops | event | id |
3043.01 | 0 | 35602782 | BM0077 |
1804.55 | 44 | 86497692 | BM0267 |
5684.5 | 0 | 1630092 | BM059 |
4959.1 | 2 | 2920978 | BM1604 |
5623.3 | 6 | 2141607 | BM1612 |
2409 | 8 | 74963092 | BM1834 |
2545.34 | 7 | 41825122 | BM2870 |
This is how to reach good JSON structure from the sample's bad structure:
``` transform from bad JSON to well-designed JSON ```
| spath path={}
| mvexpand {}
| fromjson {}
| mvexpand rows
| eval idx = mvrange(0, mvcount(columns))
| eval data = json_object()
| foreach idx mode=multivalue
[eval row = mvindex(json_array_to_mv(rows), <<ITEM>>), data = json_set(data, json_extract(mvindex(columns, <<ITEM>>), "text"), row)]
| stats values(data) as _raw
| eval _raw = mv_to_json_array(_raw, true())
But before your developer can change their mind, you can do one of the following.
1. String together the bad-JSON-to-good-JSON transformation and normal extraction
``` transform, then extract ```
``` transform from bad JSON to well-designed JSON ```
| spath path={}
| mvexpand {}
| fromjson {}
| mvexpand rows
| eval idx = mvrange(0, mvcount(columns))
| eval data = json_object()
| foreach idx mode=multivalue
[eval row = mvindex(json_array_to_mv(rows), <<ITEM>>), data = json_set(data, json_extract(mvindex(columns, <<ITEM>>), "text"), row)]
| stats values(data) as _raw
| eval _raw = mv_to_json_array(_raw, true())
``` extract from well-designed JSON ```
| spath path={}
| mvexpand {}
| spath input={}
| fields - _* {}
2. Transform the poorly designed JSON into table format with a little help from kv aka extract, like this:
``` directly handle bad design ```
| spath path={}
| mvexpand {}
| fromjson {}
| fields - _* {} type
| mvexpand rows
| eval idx = mvrange(0, mvcount(columns))
``` the above is the same as transformation ```
| foreach idx mode=multivalue
[eval _raw = mvappend(_raw, json_extract(mvindex(columns, <<ITEM>>), "text") . "=" . mvindex(json_array_to_mv(rows), <<ITEM>>))]
| fields - rows columns idx
| extract
This will give you the same output as above.
Here is a data emulation you can play with the above methods and compare with real data
| makeresults
| eval _raw = "[{\"columns\":[{\"text\":\"id\",\"type\":\"string\"},{\"text\":\"event\",\"type\":\"number\"},{\"text\":\"delays\",\"type\":\"number\"},{\"text\":\"drops\",\"type\":\"number\"}],\"rows\":[[\"BM0077\",35602782,3043.01,0],[\"BM1604\",2920978,4959.1,2],[\"BM1612\",2141607,5623.3,6],[\"BM2870\",41825122,2545.34,7],[\"BM1834\",74963092,2409.0,8],[\"BM0267\",86497692,1804.55,44],[\"BM059\",1630092,5684.5,0]],\"type\":\"table\"}]"
``` data emulation above ```
A final note: spath has some difficulty handling array of arrays, so I used fromjson (available since 9.0) in one filter.
Thanks a lot yuanli,
That worked , you are a genuis. I thought I could never structure it in splunk.
I did speak to the dev team , apparently the json is structured in that way to feed a particular dashboard system that they use. So it has to be in that structure for that system to consume. However they have agreed to update the structure in the next release which could be a few months away (6 months atleast) . So in the mean time I could work with this bad json until then.
Thanks a lot again. I had searched in splunk for something like this before and havent seen anything.