Hi,
I have documents similar to the one below:
request_id: 12345
revision: 123
other_field: stuff
my_precious: { [-]
1648665400.774453: { [-]
keys: [ [-]
key:key1,
size: 329
]
op: operation_1
}
1648665400.7817056: { [-]
keys: [ [-]
key:key2,
size: 785
]
op: operation_2
}
1648665400.7847242: { [-]
keys: [ [-]
key:key4,
size: 632
]
op: operation_1
}
1648665400.7886434: { [-]
keys: [ [-]
key:key5,
size: 1938
]
op: operation_3
}
1648665400.7932374: { [-]
keys: [ [-]
key:key3,
size: 23
]
op: operation_2
}
I currently have a query to get the frequency of a certain key, but how can I display the "size" information along side with it?
My query right now is:
rex (?<keys>"(?<=key:).*?(?=,)") |stats count by keys | sort -count | head 10
which displays the keys with the highest number of count, but it doesn't show the key's associated "size".
Can't quite figure this out...any help is appreciated!
Your data looks like conformant JSON, so you should process it as such. Code will be much more maintainable if you use builtin functions. (This question is very similar to https://community.splunk.com/t5/Splunk-Search/How-to-create-a-table-using-SPATH-usage-on-simple-JSON)
The way you displayed the sample data, Splunk has already extracted fields like my_precious.1648665400.774453.keys{}.key, my_precious.1648665400.774453.keys{}.size, my_precious.1648665400.774453.op, etc., to the effect of (If not, its very easy to use spath to extract.)
_raw | my_precious.1648665400.774453.keys{}.key | my_precious.1648665400.774453.keys{}.size | my_precious.1648665400.774453.op | my_precious.1648665400.7817056.keys{}.key | ... |
{ "request_id": "12345", "revision": "123", "other_field": "stuff", "my_precious": { "1648665400.774453": { "keys": [ { "key": "key1" }, { "size": "329" } ], "op": "operation_1" }, "1648665400.7817056": { "keys": [ { "key": "key2" }, { "size": "785" } ], "op": "operation_2" }, "1648665400.7847242": { "keys": [ { "key": "key4" }, { "size": "632" } ], "op": "operation_1" }, "1648665400.7886434": { "keys": [ { "key": "key5" }, { "size": "1938" } ], "op": "operation_3" }, "1648665400.7932374": { "keys": [ { "key": "key3" }, { "size": "23" } ], "op": "operation_2" } } } | key1 | 329 | operation_1 | key2 |
Like the other solution, we use a combination of foreach, join, mvmap, mvexpand, split, and so on to get key and corresponding size into respective fields.
| rename my_precious.*.*.keys{}.* as key_*_*_*, my_precious.*.*.op as key_*_*_op ``` make names friendly ```
| foreach key_*_key
[eval keys = mvappend(keys, <<FIELD>> . "|" . key_<<MATCHSTR>>_size . "|" . key_<<MATCHSTR>>_op)] ``` zip up key, size, and op into one string ```
| mvexpand keys ``` make each key its own event so we can apply stats easily
| eval keys = mvmap(keys, split(keys, "|")) ``` untie keys so we can dereference attributes ```
| eval key = mvindex(keys, 0), size = mvindex(keys, 1), op = mvindex(keys, 2)
| stats count by key size
Your sample data will give
key | size | count |
key1 | 329 | 1 |
key2 | 785 | 1 |
key3 | 23 | 1 |
key4 | 632 | 1 |
key5 | 1938 | 1 |
Of course, all count is 1 in the sample. You didn't ask about op, but I thought you might be interested at some point so that's also taken care of.
When Splunk processes the events in the search pipeline at every step you have only the results of previous step. Splunk "forgets" what it had before. So if you do "stats count by something", you have only a summarized table for further processing. No original events.
Anyway, I'm not sure what you want since you're counting, then talking about single value of size. Something doesn't fit here.