Splunk Search

How to handle JSON object extraction with tstats?

ClubMed
Path Finder

Hi,

I have the following JSON object that is indexed via the default JSON extraction (INDEXED_EXTRACTIONS)

{
    "assetId": 123456,
    "cloudProvider": {
        "aws": {
            "ec2": {
                ...
            },
            "tags": [
                {
                    "key": "AAA",
                    "value": "aaa"
                },
                {
                    "key": "BBB",
                    "value": "bbb"
                },
                {
                    "key": "CCC",
                    "value": "ccc"
                }
            ]
        }
    }
}

 


I'm attempting to re-write the following original search into tstats:

 

...
| spath output=AWS_TAGS path="cloudProvider.aws"
| latest(AWS_TAGS) AS AWS_TAGS by assetId

| spath input=AWS_TAGS output=AWS_TAGS path="tags{}"
| eval AWS_TAGS=mvmap(AWS_TAGS,spath(AWS_TAGS,"key")."::".spath(AWS_TAGS,"value"))

 


This creates the AWS_TAGS multivalue list with the result like this for each assetId:
AAA::aaa
BBB::bbb
CCC::ccc

The issue with tstats is that the JSON object found at the path 'cloudProvider.aws' does not exist with tstats. I.e. there's no JSON object value for the TERM(cloudprovider.aws)

That's why my original search had an spath, to explicitly grab the JSON object at 'cloudprovider.aws'. This way it allowed me to achieve latest tags for each assetId and preserve the key-value pairs with mvmap.

With tstats, it only sees the terms cloudprovider.aws.tags{}.key and cloudprovider.aws.tags{}.value

Which I could do with tstats values() but it may or may NOT be latest. Plus it will be tricky to line up them as key-value pairs.

I definitely get the fact that tstats looks for terms in tsidx files so _raw is not searched.

I guess the ask here is, any idea how to get the cloudprovider.aws JSON object extracted for tstats at searchtime?

 

Labels (2)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

If there is no missing value for any key, you can potentially do something simple to achieve the simple goal of presenting aws_tags in <key>::<value> format:

| tstats latest(cloudprovider.aws.tags{}.key) as key latest(cloudprovider.aws.tags{}.value) as value where <your filter>
  by assetId
| eval idx = range(0, mvcount(key))
| eval AWS_TAGS = mvmap(idx, mvindex(key, idx) . "::" . mvindex(value, idx))
| fields - key value

 

0 Karma

ClubMed
Path Finder

I think the unfortunate reality with this specific JSON structure for tags is equivalent to a multivalue when converted with spath.

When you perform latest() like you suggested, and even latest() on a multivalue field, it returns only a single value from a multivalue list. The mvcount will always return exactly 1 or 0 if empty.

Values() has to be used instead of latest() to capture the full list, BUT there is no guarantee all values are factually latest _time. If a tag value was removed 2 days ago, I don't want that tag in my report.

This is why I try to parse _raw to capture whole JSON objects with all tags nested within. This way, the entire JSON object survives latest() function with its nested tag list intact.

Note the AWS's IMDSv2 metadata has this exact same JSON structure so this observed problem persists in AWS space as well. As instance tagging is industry standard, isn't some random edge case.

What Splunk really is missing is something like latest_values() function which could allow for values() to function and its values _time matches the latest(_time) of a unique identifier field.

yuanliu
SplunkTrust
SplunkTrust

You are correct.  Thanks for pointing out this subtle behavior of latest.  In addition to tstats, I verified that this behavior exists in stats as well; in fact, this applies to any multivalue data, not just JSON array.  (I don't believe that latest_values will really solve the problem because | stats values() discards original order; a latest_list would work but tstats doesn't support list to begin with.)

0 Karma

KendallW
Communicator

Hi @ClubMed 

The only way to do this with tstats might be to get the fields extracted in a datamodel first, however I suspect that might defeat the purpose of using tstats as it would be slower than just using your original search. 

Another option might be to save your original search as a scheduled report which dumps the key/value/assetid data into a lookup which you could quickly retrieve with | inputlookup.

0 Karma

ClubMed
Path Finder

Yep, I've been conceded to store data as Kvstore lookups (it's a large table.)

It is a struggle because I have a personal dislike for lookups due to the search logic being abstracted and its stanza is a pain in butt to locate in a savedsearches.conf file.

Why use lookups when tstats gives the result in 3 seconds? Could save tstats as a macro too.

0 Karma
Get Updates on the Splunk Community!

Accelerate Service Onboarding, Decomposition, Troubleshooting - and more with ITSI’s ...

Accelerate Service Onboarding, Decomposition, Troubleshooting - and more! Faster Time to ValueManaging and ...

New Release | Splunk Enterprise 9.3

Admins and Analyst can benefit from:  Seamlessly route data to your local file system to save on storage ...

2024 Splunk Career Impact Survey | Earn a $20 gift card for participating!

Hear ye, hear ye! The time has come again for Splunk's annual Career Impact Survey!  We need your help by ...