Solved: Re: How do you solve a problem like Json

Lazarix · ‎01-24-2019

I'm having serious issues in Splunk related to searching Json structures.
I really don't understand why Json isn't easier to search, considering that Splunk recognises the structure and allows you to expand it in syntax highlighted mode.

The issue that i'm having is that it appears to be impossible to search for a simple table of
value.data.timestamp value.name.localizedValue value.data.count
ultimately across multiple value.name.localizedValue entries, in order to try and do a timechart that shows metrics over time, or to be able to use this data inside ITSI as metrics for KPIs.

Above is one expanded data node, of which inside this value node, there are multiple data nodes each with this structure. Each datanode contains one type of name.localizedValue (effectively the metric name).
I have tried, unsuccessfully, to use spath and mvexpand (admittantly, i don't fully understand these commands, and how they are used to their best effect) and I can't get them to format the data in the way i want to display it. But at the same time, why should we have to put over 500 characters into a search bar in order to search this type of structured data?
If the solution is ultimately to flatten the data so that you can search it, why have json at all?

As an aside, is there any way to make this process easier?

FrankVl · ‎01-24-2019

Ok, so there is even different measurements in the same event.

So:
step 1: split out the different measurements (value{})
step 2: extract the name of each measurement (name.value) and split out the datapoints (data{})
step 3: extract the timestamp and count of each data point

...your search to get to this data...
| spath output=value path=value{} 
| table value 
| mvexpand value 
| spath input=value path=name.value output=name 
| spath input=value path=data{} output=data 
| table data,name 
| mvexpand data 
| spath input=data 
| fields - data 
| eval _time = strptime(timeStamp,"%Y-%m-%dT%H:%M:%S%Z")
| table _time,name,count

View solution in original post

ddrillic · ‎02-02-2019

Field aliases helped us a lot. Something like -

FIELDALIAS-alias01 = "apiRequest.reqInfo.clientID" AS clientID
FIELDALIAS-alias02 = "apiResponse.apiResponse.services{}.servHttpStatus" AS servHttpStatus

mhoogcarspel_sp · ‎02-02-2019

Took this as an exercise in my spare time, but to be honest, I'm sceptical how well this scales.

... your search for events...
| fields _time _raw
| spath path="value{}" 
| mvexpand value{} 
| streamstats count AS level1 
| eval _raw='value{}' 
| spath 
| fields - _raw 
| mvexpand data{}.timeStamp 
| streamstats count AS level2 BY level1 
| eval value=mvindex('data{}.count',level2-1) 
| eval _time=strptime(('data{}.timeStamp'),"%Y-%m-%dT%H:%M:%S%Z") 
| fields _time value name.* type unit 
| table *

FrankVl · ‎01-24-2019

Ok, so there is even different measurements in the same event.

So:
step 1: split out the different measurements (value{})
step 2: extract the name of each measurement (name.value) and split out the datapoints (data{})
step 3: extract the timestamp and count of each data point

...your search to get to this data...
| spath output=value path=value{} 
| table value 
| mvexpand value 
| spath input=value path=name.value output=name 
| spath input=value path=data{} output=data 
| table data,name 
| mvexpand data 
| spath input=data 
| fields - data 
| eval _time = strptime(timeStamp,"%Y-%m-%dT%H:%M:%S%Z")
| table _time,name,count

Lazarix · ‎02-15-2019

This does exactly as expected!
Thank you so much.
I'm starting to get a better appreciation for json now, just wish it were easier to search on in splunk

kamlesh_vaghela · ‎01-24-2019

@Lazarix

Can you please try below search??

YOUR_SEARCH | spath path=value{} output=data | mvexpand data | eval _raw= data | kv | spath path=data{} output=data | mvexpand data | eval _raw=data | kv | table id "name.localizedValue" "name.value" type unit data timeStamp count

My Sample Search:

| makeresults | eval _raw="{\"value\": [{\"data\": [{\"timeStamp\": \"2019-01-23T13:10:00Z\",\"count\": 15.0},{\"timeStamp\": \"2019-01-23T13:11:00Z\",\"count\": 16.0},{\"timeStamp\": \"2019-01-23T13:12:00Z\",\"count\": 15.0},{\"timeStamp\": \"2019-01-23T13:13:00Z\",\"count\": 12.0},{\"timeStamp\": \"2019-01-23T13:14:00Z\",\"count\": 13.0}],\"id\": \"1\",\"name\": {\"value\": \"CpuTime\",\"localizedValue\": \"CPU Time\"},\"type\": \"Microsoft.Insights/metrics\",\"unit\": \"Seconds\"},{\"data\": [{\"timeStamp\": \"2019-01-23T13:10:00Z\",\"count\": 2.0},{\"timeStamp\": \"2019-01-23T13:11:00Z\",\"count\": 3.0},{\"timeStamp\": \"2019-01-23T13:12:00Z\",\"count\": 2.0},{\"timeStamp\": \"2019-01-23T13:13:00Z\",\"count\": 2.0},{\"timeStamp\": \"2019-01-23T13:14:00Z\",\"count\": 2.0}],\"id\": \"2\",\"name\": {\"value\": \"Requests\",\"localizedValue\": \"Requests\"},\"type\": \"Microsoft.Insights/metrics\",\"unit\": \"Count\"},{\"data\": [{\"timeStamp\": \"2019-01-23T13:10:00Z\",\"count\": 12.0},{\"timeStamp\": \"2019-01-23T13:11:00Z\",\"count\": 12.0},{\"timeStamp\": \"2019-01-23T13:12:00Z\",\"count\": 12.0},{\"timeStamp\": \"2019-01-23T13:13:00Z\",\"count\": 12.0},{\"timeStamp\": \"2019-01-23T13:14:00Z\",\"count\": 12.0}],\"id\": \"3\",\"name\": {\"value\": \"AppConnections\",\"localizedValue\": \"Connections\"},\"type\": \"Microsoft.Insights/metrics\",\"unit\": \"Count\"},{\"data\": [{\"timeStamp\": \"2019-01-23T13:10:00Z\",\"count\": 0.0},{\"timeStamp\": \"2019-01-23T13:11:00Z\",\"count\": 0.0},{\"timeStamp\": \"2019-01-23T13:12:00Z\",\"count\": 0.0},{\"timeStamp\": \"2019-01-23T13:13:00Z\",\"count\": 0.0},{\"timeStamp\": \"2019-01-23T13:14:00Z\",\"count\": 0.0}],\"id\": \"4\",\"name\": {\"value\": \"Http4xx\",\"localizedValue\": \"Http 4xx\"},\"type\": \"Microsoft.Insights/metrics\",\"unit\": \"Count\"},{\"data\": [{\"timeStamp\": \"2019-01-23T13:10:00Z\",\"count\": 0.0},{\"timeStamp\": \"2019-01-23T13:11:00Z\",\"count\": 0.0},{\"timeStamp\": \"2019-01-23T13:12:00Z\",\"count\": 0.0},{\"timeStamp\": \"2019-01-23T13:13:00Z\",\"count\": 0.0},{\"timeStamp\": \"2019-01-23T13:14:00Z\",\"count\": 0.0}],\"id\": \"5\",\"name\": {\"value\": \"Http5xx\",\"localizedValue\": \"Http Server Errors\"},\"type\": \"Microsoft.Insights/metrics\",\"unit\": \"Count\"},{\"data\": [{\"timeStamp\": \"2019-01-23T13:10:00Z\",\"count\": 17.0},{\"timeStamp\": \"2019-01-23T13:11:00Z\",\"count\": 17.0},{\"timeStamp\": \"2019-01-23T13:12:00Z\",\"count\": 18.0},{\"timeStamp\": \"2019-01-23T13:13:00Z\",\"count\": 13.0},{\"timeStamp\": \"2019-01-23T13:14:00Z\",\"count\": 17.0}],\"id\": \"6\",\"name\": {\"value\": \"MemoryWorkingSet\",\"localizedValue\": \"Memory working set\"},\"type\": \"Microsoft.Insights/metrics\",\"unit\": \"Bytes\"},{\"data\": [{\"timeStamp\": \"2019-01-23T13:10:00Z\",\"count\": 2.0},{\"timeStamp\": \"2019-01-23T13:11:00Z\",\"count\": 3.0},{\"timeStamp\": \"2019-01-23T13:12:00Z\",\"count\": 2.0},{\"timeStamp\": \"2019-01-23T13:13:00Z\",\"count\": 2.0},{\"timeStamp\": \"2019-01-23T13:14:00Z\",\"count\": 2.0}],\"id\": \"7\",\"name\": {\"value\": \"AverageResponseTime\",\"localizedValue\": \"Average Response Time\"},\"type\": \"Microsoft.Insights/metrics\",\"unit\": \"Seconds\"}]}" | spath path=value{} output=data | mvexpand data | eval _raw= data | kv | spath path=data{} output=data | mvexpand data | eval _raw=data | kv | table id "name.localizedValue" "name.value" type unit data timeStamp count

Thanks

FrankVl · ‎01-24-2019

Generically, Splunk works best with more atomic events so to say. So events with a single timestamp and then some fields with values. Formatting that as json works fine.

Splunk doesn't work as easily when you have events like this, that actually contain multiple events (or data points) in one. You'll then need to do some processing to split that out. Not impossible, but not always very straightforward if you're not too experienced with it.

It would help if you could share a sample raw json event, so we can play a bit with that and give you some suggestions on how to best split it into the individual data points.

On a high level, what I would try is to extract the entire data points into a multi valued field 'data', then mvexpand on that field, then parse out the individual fields of the data point (time, count), parse the timestamp and assign it to _time. Then you should be able to apply the timechart command.

Lazarix · ‎01-24-2019

Thanks for the information Frank.
I've uploaded an example of the json here:
https://gist.github.com/IanMoroney/34369128701f67ce0a1b159332356665

I understand the approach around atomic data points, and I agree that splunk is more suited to that. I suppose i wouldn't expect to be forced to flatten a structured relational data source in order to read it, as the associations are contained within the structure.
but, you live and learn!

kamlesh_vaghela · ‎01-24-2019

@Lazarix

Can you please share sample JSON event? So we can help you.

Lazarix · ‎01-24-2019

https://gist.github.com/IanMoroney/34369128701f67ce0a1b159332356665

mayurr98 · ‎01-24-2019

did you try using spath command?
https://docs.splunk.com/Documentation/SplunkCloud/7.2.3/SearchReference/Spath

Lazarix · ‎01-24-2019

As mentioned in the question, yes i did and I also tried combinations of spath and mvexpand.

How do you solve a problem like Json

Splunk MCP & Agentic AI: Machine Data Without Limits

Finding Based Detections General Availability

Get Your Hands Dirty (and Your Shoes Comfy): The Splunk Experience

Join the Conversation

How do you solve a problem like Json

Splunk MCP & Agentic AI: Machine Data Without Limits

Finding Based Detections General Availability

Get Your Hands Dirty (and Your Shoes Comfy): The Splunk Experience