Hi,
I have the following JSON object that is indexed via the default JSON extraction (INDEXED_EXTRACTIONS)
{
"assetId": 123456,
"cloudProvider": {
"aws": {
"ec2": {
...
},
"tags": [
{
"key": "AAA",
"value": "aaa"
},
{
"key": "BBB",
"value": "bbb"
},
{
"key": "CCC",
"value": "ccc"
}
]
}
}
}
I'm attempting to re-write the following original search into tstats:
...
| spath output=AWS_TAGS path="cloudProvider.aws"
| latest(AWS_TAGS) AS AWS_TAGS by assetId
| spath input=AWS_TAGS output=AWS_TAGS path="tags{}"
| eval AWS_TAGS=mvmap(AWS_TAGS,spath(AWS_TAGS,"key")."::".spath(AWS_TAGS,"value"))
This creates the AWS_TAGS multivalue list with the result like this for each assetId:
AAA::aaa
BBB::bbb
CCC::ccc
The issue with tstats is that the JSON object found at the path 'cloudProvider.aws' does not exist with tstats. I.e. there's no JSON object value for the TERM(cloudprovider.aws)
That's why my original search had an spath, to explicitly grab the JSON object at 'cloudprovider.aws'. This way it allowed me to achieve latest tags for each assetId and preserve the key-value pairs with mvmap.
With tstats, it only sees the terms cloudprovider.aws.tags{}.key and cloudprovider.aws.tags{}.value
Which I could do with tstats values() but it may or may NOT be latest. Plus it will be tricky to line up them as key-value pairs.
I definitely get the fact that tstats looks for terms in tsidx files so _raw is not searched.
I guess the ask here is, any idea how to get the cloudprovider.aws JSON object extracted for tstats at searchtime?
If there is no missing value for any key, you can potentially do something simple to achieve the simple goal of presenting aws_tags in <key>::<value> format:
| tstats latest(cloudprovider.aws.tags{}.key) as key latest(cloudprovider.aws.tags{}.value) as value where <your filter>
by assetId
| eval idx = range(0, mvcount(key))
| eval AWS_TAGS = mvmap(idx, mvindex(key, idx) . "::" . mvindex(value, idx))
| fields - key value
I think the unfortunate reality with this specific JSON structure for tags is equivalent to a multivalue when converted with spath.
When you perform latest() like you suggested, and even latest() on a multivalue field, it returns only a single value from a multivalue list. The mvcount will always return exactly 1 or 0 if empty.
Values() has to be used instead of latest() to capture the full list, BUT there is no guarantee all values are factually latest _time. If a tag value was removed 2 days ago, I don't want that tag in my report.
This is why I try to parse _raw to capture whole JSON objects with all tags nested within. This way, the entire JSON object survives latest() function with its nested tag list intact.
Note the AWS's IMDSv2 metadata has this exact same JSON structure so this observed problem persists in AWS space as well. As instance tagging is industry standard, isn't some random edge case.
What Splunk really is missing is something like latest_values() function which could allow for values() to function and its values _time matches the latest(_time) of a unique identifier field.
You are correct. Thanks for pointing out this subtle behavior of latest. In addition to tstats, I verified that this behavior exists in stats as well; in fact, this applies to any multivalue data, not just JSON array. (I don't believe that latest_values will really solve the problem because | stats values() discards original order; a latest_list would work but tstats doesn't support list to begin with.)
Hi @ClubMed
The only way to do this with tstats might be to get the fields extracted in a datamodel first, however I suspect that might defeat the purpose of using tstats as it would be slower than just using your original search.
Another option might be to save your original search as a scheduled report which dumps the key/value/assetid data into a lookup which you could quickly retrieve with | inputlookup.
Yep, I've been conceded to store data as Kvstore lookups (it's a large table.)
It is a struggle because I have a personal dislike for lookups due to the search logic being abstracted and its stanza is a pain in butt to locate in a savedsearches.conf file.
Why use lookups when tstats gives the result in 3 seconds? Could save tstats as a macro too.