Getting Data In

How to do bucketing on json data?

ksrujana
New Member

I have a json data similar to the example given below

{
"name":"srini",
"date":"20160801",
"distribution": { "20":1, "10":2, "10":1, "15":2, 
                            "10":3, "15":4, "20":3, "30":4 }
}
{
"name":"srini2",
"date":"20160802",
"distribution": { "3":1, "1":2, "4":1, "1":2, "1":1
                            "3":3, "1":4, "4":3, "1":4 }
}

I want to do bucketing based on the keys in the distribution field. Buckets should be like (1,2) and (3,4) and sum up all the values in each bucket. Expected result is like following:

date        bucket1    bucket2
20160801      55         75
20160802      10         10 
Tags (2)
0 Karma

javiergn
Super Champion

Hi,

Assuming your field is named json and also assuming the only buckets there are the ones you mentioned (bucket1 = 1,2 and bucket2 = 3,4), give the following a go and see if that helps:

your base search
| spath input=json path=name
| spath input=json path=date
| spath input=json path=distribution
| rex field=distribution max_match=0 "\"(?<bucket1>\d+)\":[12]"
| rex field=distribution max_match=0 "\"(?<bucket2>\d+)\":[34]"
| stats sum(*) as * by date, name

Example:

| makeresults | fields - _time
| eval json = "
{
\"name\":\"srini\",
\"date\":\"20160801\",
\"distribution\": { \"20\":1, \"10\":2, \"10\":1, \"15\":2, 
                            \"10\":3, \"15\":4, \"20\":3, \"30\":4 }
};
{
\"name\":\"srini2\",
\"date\":\"20160802\",
\"distribution\": { \"3\":1, \"1\":2, \"4\":1, \"1\":2, \"1\":1,
                            \"3\":3, \"1\":4, \"4\":3, \"1\":4 }
}
"
| eval json = split(json, ";") 
| mvexpand json
| spath input=json path=name
| spath input=json path=date
| spath input=json path=distribution
| rex field=distribution max_match=0 "\"(?<bucket1>\d+)\":[12]"
| rex field=distribution max_match=0 "\"(?<bucket2>\d+)\":[34]"
| stats sum(*) as * by date, name

Output:

alt text

0 Karma
Get Updates on the Splunk Community!

New Year, New Changes for Splunk Certifications

As we embrace a new year, we’re making a small but important update to the Splunk Certification ...

Stay Connected: Your Guide to January Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...

[Puzzles] Solve, Learn, Repeat: Reprocessing XML into Fixed-Length Events

This challenge was first posted on Slack #puzzles channelFor a previous puzzle, I needed a set of fixed-length ...