Splunk Search

How to write array within array for million records?

gheribhai1234
Engager

Hey Team,
I have Million records to search for.
Record Structure is given below.
My requirement is to get length of aValues across million records. For example if aValues length for two recors is 10,12 . I should display 22.

 

{
  resp: {
    meta: {
      bValues: [
        {
          aValues: [
            
          ]
        }
      ]
    }
  }
}

 


Below Splunk Query I tried but its not working for million records. Only working for small set like 10 records

 

index=myIndex| spath path=resp.meta.bValues{} output=BValues
| stats count by BValues
| spath input=BValues path=aValues{} output=aValues
| mvexpand aValues
| spath input=aValues
| spath input=BValues
| fields - BValues count aValues* | stats count

 

Labels (3)
0 Karma
1 Solution

tscroggins
Influencer

@gheribhai1234 

In your example, aValues is an array of objects:

"aValues": [{"e": "d", "s": "h"}, {"g": "i", "r": "t"}]
"aValues": [{"e1": "d1", "s1": "h1"}, {"g1": "i1", "r1": "t1"}]

Each array has two elements for a total of four:

{"e": "d", "s": "h"}
{"g": "i", "r": "t"}

{"e1": "d1", "s1": "h1"}
{"g1": "i1", "r1": "t1"}

Each element has two keys for a total of eight:

e
s
g
r
e1
s1
g1
r1

Do you want a count of all elements (four) or a count of all keys (eight)?

You can return the element count using:

| stats sum(eval(mvcount(json_array_to_mv(json_extract(_raw, "resp.meta.bValues{}.aValues{}"))))) as element_count

You can return the key count using:

| eval aValues_mv=json_array_to_mv(json_extract(_raw, "resp.meta.bValues{}.aValues{}")), aValues_key_array=mvmap(aValues_mv, json_keys(aValues_mv)), aValues_keys=mvmap(aValues_key_array, json_array_to_mv(aValues_key_array))
| stats sum(eval(mvcount(aValues_keys))) as key_count

It may be possible to flatten the array in a more clever way using json_extend, but examples above do illustrate methods for counting elements and keys in complex objects.

You may encounter memory issues over large sets of events. If the counts appear correct for smaller sets but larger sets return one or more errors, I suggest submitting a new question for the error received if an answer isn't already available.

View solution in original post

0 Karma

tscroggins
Influencer

@gheribhai1234 

By "records," do you mean Splunk events or JSON array elements? An answer using spath or native JSON key-value mode depends on the context.

If you're okay with an approximate value for character length, you can try a regex approach. This assumes all values of aValues are strings and includes whitespace, quotes, and commas. It's not a replacement for a JSON parser.

 

index=myIndex
| rex "(?s)aValues\\s*:\\s*\\[\\s*(?<aValues>['\"].*?['\"])\\s*\\]"
| stats sum(eval(len(aValues))) as len_aValues

 

 

0 Karma

gheribhai1234
Engager

By records I mean the events. Million events would have aValues and bValues both are json array.
I want to get the length of aValues across all events.
So for example if there two events and bValues in each event are 2 elements. aValues in all bValues have 4 elemnts.
So after this query, I should be able to get length of aValues across all bValues for all events.
In the above example output should be 16 as there are total 16 aValues across all bValues for all events.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

What do you mean by "length"? Anyway, if you're doing a stats by some fields which has several million distinct values... especially long ones...

It simply won't fly. Even thought there is no hard limit on stats that I can think of, the number of distinct values that you classify by will probably kill your environment. Or your search will get terminated if you hit some resource limits or so.

0 Karma

gheribhai1234
Engager

By Length I mean, Json array size inside aValues and bValues.
I also tried 
index=myIndex | spath path=resp.meta.bValues{}.aValues{} output=allAValues | stats count

Its not giving the correct count. I was expected all aValues count.

THis is working for small count like 10 but not with million events.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

OK, so as a result you want a single number being the cumulative number of aValues inside bValues?

If they're always at the same nesting level it should be relatively simple.

First get all the aValues

index=myIndex | spath resp.meta.bValues{}.aValues{} output=aValues 

Then check how many values are there within a single event

| eval cnt=mvcount(aValues)

 Then finally sum the counts to get a big total

| stats sum(cnt) as "Overall Count"

Is this what you want?

 

Your approach with spath was relatively ok but the stats count does count the events and you probably expect to have multiple values within a single event so your overall result was off versus what you expected. It's not about the number of events.

gheribhai1234
Engager

Thanks for quick response.
The result with query you suggested is same as mine.
aValues is json array with multiple fields in each json element.
I dont know if its calculating all fields of each aValues.

The Value I m getting is much higher in both queries.
Is there any simple function to get the size of json array ??
Thanks,
Gheri.

0 Karma

gheribhai1234
Engager

I Think query seems to be working as suggested by you.
I manually check for some of events using below query

index=myIndex | spath path=resp.meta.bValues{}.aValues{} output=aValues | eval cnt=mvcount(aValues) | table cnt, eventId


My bad. 
Just recofirming again with you that the above query works with json array in aVaules as well right ??
See below one event for sample
Every event has bValues as json array and every bValue has aValues as json array.
I want to calculate aValues for all bVaues across all events

{
  resp: {
    meta: {
      bValues: [
        {
          aValues: [{e:d, s:h}, {g:i, r:t}
          ]
          x: y
          a: b
        },
        {
          aValues: [{e1:d1, s1:h1}, {g1:i1, r1:t1}
          ]
          x1: y1
          a1: b1
        }
      ]
    }
  }
}

 

0 Karma

tscroggins
Influencer

@gheribhai1234 

In your example, aValues is an array of objects:

"aValues": [{"e": "d", "s": "h"}, {"g": "i", "r": "t"}]
"aValues": [{"e1": "d1", "s1": "h1"}, {"g1": "i1", "r1": "t1"}]

Each array has two elements for a total of four:

{"e": "d", "s": "h"}
{"g": "i", "r": "t"}

{"e1": "d1", "s1": "h1"}
{"g1": "i1", "r1": "t1"}

Each element has two keys for a total of eight:

e
s
g
r
e1
s1
g1
r1

Do you want a count of all elements (four) or a count of all keys (eight)?

You can return the element count using:

| stats sum(eval(mvcount(json_array_to_mv(json_extract(_raw, "resp.meta.bValues{}.aValues{}"))))) as element_count

You can return the key count using:

| eval aValues_mv=json_array_to_mv(json_extract(_raw, "resp.meta.bValues{}.aValues{}")), aValues_key_array=mvmap(aValues_mv, json_keys(aValues_mv)), aValues_keys=mvmap(aValues_key_array, json_array_to_mv(aValues_key_array))
| stats sum(eval(mvcount(aValues_keys))) as key_count

It may be possible to flatten the array in a more clever way using json_extend, but examples above do illustrate methods for counting elements and keys in complex objects.

You may encounter memory issues over large sets of events. If the counts appear correct for smaller sets but larger sets return one or more errors, I suggest submitting a new question for the error received if an answer isn't already available.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

I'm not sure about your solution but spath doesn't handle well arrays of arrays. It simply loses elements. So that's why I said that unless you have some restrictions on your json structure, it will be very hard to come up with a generalized solution.

0 Karma

gheribhai1234
Engager

Thanks Team,
I am getting same results from all three splunk queries that we discussed. I think this seems to be correct result to me
I have one doubt with recent query that you had suggested
json_array_mv --> does it convert json array as Json with every array element as Json ??
for example [{a:b}, {c:d}, {e:f}] --> {0:{a:b},1:{c:d},2:{e:f}} then when we do mvcount it counts 0,1,2.... etc

Thanks for helping me with this
@tscroggins @PickleRick 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

I also thought of using rex command to extract "atomic" values regardless of the json structure but it would quickly get complicated since you'd have to account for simple values, arrays, you's have to handle simple values within arrays that also contain structures...

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

No. It is not. To be honest, I don't know if there is any solution at all to your problem. After all, splunk is not a programming language and json parsing is meant only for parsing the data into a "readable" format but it cannot do sophisticated summaries and such. Especially if your data format is not constant...

The main problem is that spath can either parse out just given datapath (in our case it's the resp.meta.bValues{}.aValues{} field) which will simply get extracted as a list of strings without splunk digging into it and thinking what is there. So the solution both mine as well as yours (I forgot that count over multivalued field will indeed give count of all values; so your solution is more concise) is "good" meaning that it does what it's asked to do.

But if you want to count all the "leaves" of your json structure... well, I can't think of any reasonable way to do so.

You could try extracting the aValues and then extracting all fields and subfields with simple spath without providing the datapath to extract but then you're left with task of filtering the results and checking which ones are atomic values and which are structures which got futher spath-ed.

So it's not that straightforward I'm afraid.

0 Karma

tscroggins
Influencer

@gheribhai1234

@PickleRick does make a good point about SPL not being a typical programming language. If SPL doesn't do what you need, you do have the option of writing custom commands in Python, but that's beyond the scope of this question.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...