Getting Data In

How do you solve a problem like Json

Lazarix
Communicator

I'm having serious issues in Splunk related to searching Json structures.
I really don't understand why Json isn't easier to search, considering that Splunk recognises the structure and allows you to expand it in syntax highlighted mode.

The issue that i'm having is that it appears to be impossible to search for a simple table of
value.data.timestamp value.name.localizedValue value.data.count
ultimately across multiple value.name.localizedValue entries, in order to try and do a timechart that shows metrics over time, or to be able to use this data inside ITSI as metrics for KPIs.

alt text
Above is one expanded data node, of which inside this value node, there are multiple data nodes each with this structure. Each datanode contains one type of name.localizedValue (effectively the metric name).
I have tried, unsuccessfully, to use spath and mvexpand (admittantly, i don't fully understand these commands, and how they are used to their best effect) and I can't get them to format the data in the way i want to display it. But at the same time, why should we have to put over 500 characters into a search bar in order to search this type of structured data?
If the solution is ultimately to flatten the data so that you can search it, why have json at all?

The way i'd expect to see the data is like:
Table:
| value.data.timestamp | value.name.localizedValue | value.data.count
| 2019-01-23T13:10:00Z | CPU Time | 15
| 2019-01-23T13:11:00Z | CPU Time | 16

As an aside, is there any way to make this process easier?

Tags (2)
0 Karma
1 Solution

FrankVl
Ultra Champion

Ok, so there is even different measurements in the same event.

So:
step 1: split out the different measurements (value{})
step 2: extract the name of each measurement (name.value) and split out the datapoints (data{})
step 3: extract the timestamp and count of each data point

...your search to get to this data...
| spath output=value path=value{} 
| table value 
| mvexpand value 
| spath input=value path=name.value output=name 
| spath input=value path=data{} output=data 
| table data,name 
| mvexpand data 
| spath input=data 
| fields - data 
| eval _time = strptime(timeStamp,"%Y-%m-%dT%H:%M:%S%Z")
| table _time,name,count

View solution in original post

0 Karma

ddrillic
Ultra Champion

Field aliases helped us a lot. Something like -

FIELDALIAS-alias01 = "apiRequest.reqInfo.clientID" AS clientID
FIELDALIAS-alias02 = "apiResponse.apiResponse.services{}.servHttpStatus" AS servHttpStatus
0 Karma

mhoogcarspel_sp
Splunk Employee
Splunk Employee

Took this as an exercise in my spare time, but to be honest, I'm sceptical how well this scales.

... your search for events...
| fields _time _raw
| spath path="value{}" 
| mvexpand value{} 
| streamstats count AS level1 
| eval _raw='value{}' 
| spath 
| fields - _raw 
| mvexpand data{}.timeStamp 
| streamstats count AS level2 BY level1 
| eval value=mvindex('data{}.count',level2-1) 
| eval _time=strptime(('data{}.timeStamp'),"%Y-%m-%dT%H:%M:%S%Z") 
| fields _time value name.* type unit 
| table *
0 Karma

FrankVl
Ultra Champion

Ok, so there is even different measurements in the same event.

So:
step 1: split out the different measurements (value{})
step 2: extract the name of each measurement (name.value) and split out the datapoints (data{})
step 3: extract the timestamp and count of each data point

...your search to get to this data...
| spath output=value path=value{} 
| table value 
| mvexpand value 
| spath input=value path=name.value output=name 
| spath input=value path=data{} output=data 
| table data,name 
| mvexpand data 
| spath input=data 
| fields - data 
| eval _time = strptime(timeStamp,"%Y-%m-%dT%H:%M:%S%Z")
| table _time,name,count
0 Karma

Lazarix
Communicator

This does exactly as expected!
Thank you so much.
I'm starting to get a better appreciation for json now, just wish it were easier to search on in splunk

0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

@Lazarix

Can you please try below search??

YOUR_SEARCH | spath path=value{} output=data | mvexpand data | eval _raw= data | kv | spath path=data{} output=data | mvexpand data | eval _raw=data | kv | table id "name.localizedValue" "name.value" type unit data timeStamp count

My Sample Search:

| makeresults | eval _raw="{\"value\": [{\"data\": [{\"timeStamp\": \"2019-01-23T13:10:00Z\",\"count\": 15.0},{\"timeStamp\": \"2019-01-23T13:11:00Z\",\"count\": 16.0},{\"timeStamp\": \"2019-01-23T13:12:00Z\",\"count\": 15.0},{\"timeStamp\": \"2019-01-23T13:13:00Z\",\"count\": 12.0},{\"timeStamp\": \"2019-01-23T13:14:00Z\",\"count\": 13.0}],\"id\": \"1\",\"name\": {\"value\": \"CpuTime\",\"localizedValue\": \"CPU Time\"},\"type\": \"Microsoft.Insights/metrics\",\"unit\": \"Seconds\"},{\"data\": [{\"timeStamp\": \"2019-01-23T13:10:00Z\",\"count\": 2.0},{\"timeStamp\": \"2019-01-23T13:11:00Z\",\"count\": 3.0},{\"timeStamp\": \"2019-01-23T13:12:00Z\",\"count\": 2.0},{\"timeStamp\": \"2019-01-23T13:13:00Z\",\"count\": 2.0},{\"timeStamp\": \"2019-01-23T13:14:00Z\",\"count\": 2.0}],\"id\": \"2\",\"name\": {\"value\": \"Requests\",\"localizedValue\": \"Requests\"},\"type\": \"Microsoft.Insights/metrics\",\"unit\": \"Count\"},{\"data\": [{\"timeStamp\": \"2019-01-23T13:10:00Z\",\"count\": 12.0},{\"timeStamp\": \"2019-01-23T13:11:00Z\",\"count\": 12.0},{\"timeStamp\": \"2019-01-23T13:12:00Z\",\"count\": 12.0},{\"timeStamp\": \"2019-01-23T13:13:00Z\",\"count\": 12.0},{\"timeStamp\": \"2019-01-23T13:14:00Z\",\"count\": 12.0}],\"id\": \"3\",\"name\": {\"value\": \"AppConnections\",\"localizedValue\": \"Connections\"},\"type\": \"Microsoft.Insights/metrics\",\"unit\": \"Count\"},{\"data\": [{\"timeStamp\": \"2019-01-23T13:10:00Z\",\"count\": 0.0},{\"timeStamp\": \"2019-01-23T13:11:00Z\",\"count\": 0.0},{\"timeStamp\": \"2019-01-23T13:12:00Z\",\"count\": 0.0},{\"timeStamp\": \"2019-01-23T13:13:00Z\",\"count\": 0.0},{\"timeStamp\": \"2019-01-23T13:14:00Z\",\"count\": 0.0}],\"id\": \"4\",\"name\": {\"value\": \"Http4xx\",\"localizedValue\": \"Http 4xx\"},\"type\": \"Microsoft.Insights/metrics\",\"unit\": \"Count\"},{\"data\": [{\"timeStamp\": \"2019-01-23T13:10:00Z\",\"count\": 0.0},{\"timeStamp\": \"2019-01-23T13:11:00Z\",\"count\": 0.0},{\"timeStamp\": \"2019-01-23T13:12:00Z\",\"count\": 0.0},{\"timeStamp\": \"2019-01-23T13:13:00Z\",\"count\": 0.0},{\"timeStamp\": \"2019-01-23T13:14:00Z\",\"count\": 0.0}],\"id\": \"5\",\"name\": {\"value\": \"Http5xx\",\"localizedValue\": \"Http Server Errors\"},\"type\": \"Microsoft.Insights/metrics\",\"unit\": \"Count\"},{\"data\": [{\"timeStamp\": \"2019-01-23T13:10:00Z\",\"count\": 17.0},{\"timeStamp\": \"2019-01-23T13:11:00Z\",\"count\": 17.0},{\"timeStamp\": \"2019-01-23T13:12:00Z\",\"count\": 18.0},{\"timeStamp\": \"2019-01-23T13:13:00Z\",\"count\": 13.0},{\"timeStamp\": \"2019-01-23T13:14:00Z\",\"count\": 17.0}],\"id\": \"6\",\"name\": {\"value\": \"MemoryWorkingSet\",\"localizedValue\": \"Memory working set\"},\"type\": \"Microsoft.Insights/metrics\",\"unit\": \"Bytes\"},{\"data\": [{\"timeStamp\": \"2019-01-23T13:10:00Z\",\"count\": 2.0},{\"timeStamp\": \"2019-01-23T13:11:00Z\",\"count\": 3.0},{\"timeStamp\": \"2019-01-23T13:12:00Z\",\"count\": 2.0},{\"timeStamp\": \"2019-01-23T13:13:00Z\",\"count\": 2.0},{\"timeStamp\": \"2019-01-23T13:14:00Z\",\"count\": 2.0}],\"id\": \"7\",\"name\": {\"value\": \"AverageResponseTime\",\"localizedValue\": \"Average Response Time\"},\"type\": \"Microsoft.Insights/metrics\",\"unit\": \"Seconds\"}]}" | spath path=value{} output=data | mvexpand data | eval _raw= data | kv | spath path=data{} output=data | mvexpand data | eval _raw=data | kv | table id "name.localizedValue" "name.value" type unit data timeStamp count

Thanks

0 Karma

FrankVl
Ultra Champion

Generically, Splunk works best with more atomic events so to say. So events with a single timestamp and then some fields with values. Formatting that as json works fine.

Splunk doesn't work as easily when you have events like this, that actually contain multiple events (or data points) in one. You'll then need to do some processing to split that out. Not impossible, but not always very straightforward if you're not too experienced with it.

It would help if you could share a sample raw json event, so we can play a bit with that and give you some suggestions on how to best split it into the individual data points.

On a high level, what I would try is to extract the entire data points into a multi valued field 'data', then mvexpand on that field, then parse out the individual fields of the data point (time, count), parse the timestamp and assign it to _time. Then you should be able to apply the timechart command.

0 Karma

Lazarix
Communicator

Thanks for the information Frank.
I've uploaded an example of the json here:
https://gist.github.com/IanMoroney/34369128701f67ce0a1b159332356665

I understand the approach around atomic data points, and I agree that splunk is more suited to that. I suppose i wouldn't expect to be forced to flatten a structured relational data source in order to read it, as the associations are contained within the structure.
but, you live and learn!

0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

@Lazarix

Can you please share sample JSON event? So we can help you.

0 Karma

Lazarix
Communicator
0 Karma

mayurr98
Super Champion
0 Karma

Lazarix
Communicator

As mentioned in the question, yes i did and I also tried combinations of spath and mvexpand.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...