Splunk Search

JSON parsing problem

incognito
Explorer

Hello, 

My Splunk query an API and gets a JSON answer.

Here is a sample for 1 Host (the JSON answer is very long ≈ 400 hosts)

{
    "hosts": [
        {
            "hostInfo": {
             "displayName": "host1.fr"
                                   },
             "modules": [
                {
                    "moduleType": "JAVA",
                    "instances": [
                        {
                            "Instance Name": "Test1",
                            "moduleVersion": "1.0"
                        },
                        {
                            "Instance Name": "Test2",
                            "moduleVersion": "1.1"
                        },
                        {
                            "Instance Name": "Test3",
                            "moduleVersion": "1.2"
                        }
                      ]
                  }
              ]
          }
       ]
}
 
First-of-all I have to manually parse this JSON because SPLUNK automatically gets the 1st fields of the 1st host only.
 

With this following search, I manually parse this JSON all the way through the "instances{}" array and I count the number of moduleVersion :
index="supervision_software" source="API" earliest=-1m
| spath path=hosts{}.modules{}.instances{} output=host
| fields - _raw
| mvexpand host
| spath input=host
| stats count(moduleVersion)

It displays a number of 1277 moduleVersion and it is the right number.

On the other hand with the next similar search, when I parse the JSON starting only to the 1st array ("hosts{}"), I am getting a different number of moduleVersion  :
index="supervision_software" source="API" earliest=-1m
| spath path=hosts{} output=host
| fields - _raw
| mvexpand host
| spath input=host
| stats count(modules{}.instances{}.moduleVersion)

It displays a number of 488 moduleVersion but it's incorrect.

Why is there a difference ?

Thank you.

Best regards,

Labels (3)
Tags (1)
0 Karma

isoutamo
SplunkTrust
SplunkTrust
Hi
If your above example is direct copy paste from your whole "json" it is broken. You should validate it e.g. with jq to see those errors (e.g. missing/additional "," between entries).
Until it's real and valid json splunk cannot parse it automatically.
r. Ismo
0 Karma

incognito
Explorer

Hi,

You're right, I missed commas during the copy. The sample is fixed now.

I believe the JSON answer is correct (syntactically) but I'm still getting those 2 different count numbers when parsing it manually with 'spath'.

I tried to let SPLUNK parse it automatically by configuring the sourcetype with those parameters :

incognito_0-1640860188327.png

Splunk parses it, but incorrectly (e.g. by doing 'stats count()' on some fields, the results are incorrect). I was thinking that I might have to adjust the "LINE_BREAKER" or "SHOULD_LINEMERGE" sourcetype parameters because of the complex JSON answer.

Do you have any ideas on how to adjust those params for my case ?

Thank you.

Best regards,

0 Karma

isoutamo
SplunkTrust
SplunkTrust
You should have only one of INDEXED_EXTRACTIONS=json or KV_MODE=json otherwise it will duplicate those jsons.

incognito
Explorer

I deleted the "KV_MODE" param. I have now those : 

incognito_1-1640862266589.png

Although, splunk parses it, I have nonsense results. For example, with the following search I count the number of 'moduleVersion' :
index="supervision_software" source="API" earliest=-1m
| stats count(hosts{}.modules{}.instances{}.moduleVersion)

incognito_2-1640862394832.png

And then with this one, I count the number of 'moduleVersion'  by moduleVersion :
index="supervision_software" source="API" earliest=-1m
| stats count(hosts{}.modules{}.instances{}.moduleVersion) by hosts{}.modules{}.instances{}.moduleVersion

incognito_3-1640862469148.png

I don't understand how splunk parses the JSON ...

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Can you try these?

index="supervision_software" source="API" earliest=-1m
| spath
| stats count(hosts{}.modules{}.instances{}.moduleVersion)

and 

index="supervision_software" source="API" earliest=-1m
| spath  path=hosts{}.modules{}.instances{}.moduleVersion output=moduleVersion2
| stats count by moduleVersion2

Based on my tests those should work.

r. Ismo

0 Karma

incognito
Explorer

Hello, 

index="supervision_software" source="API" earliest=-1m
| spath  path=hosts{}.modules{}.instances{}.moduleVersion output=moduleVersion2
| stats count by moduleVersion2

 Displays :

incognito_1-1640949585203.png

This one is correct (total = 1290 moduleversions). The problem here is that I don't have anymore my 'hostInfo' fields and so I can't sort the moduleversions by host.

 

index="supervision_software" source="API" earliest=-1m
| spath
| stats count(hosts{}.modules{}.instances{}.moduleVersion)

 

 Displays :

incognito_0-1640949494503.png

1345 isn't right (there are around 1270 - 1295 moduleVersion, it depends on when the API call is done).  With this search, even if I try to table displayName, moduleversion and instanceName, it seems like they are not linked at all (but they are in the JSON) because the results are nonsense (e.g. a host has at least 50 moduleversion but it display only 1 for that host, it's like there is no relationships).

Best regards,

0 Karma
Get Updates on the Splunk Community!

Say goodbye to manually analyzing phishing and malware threats with Splunk Attack ...

In today’s evolving threat landscape, we understand you’re constantly bombarded with phishing and malware ...

AppDynamics is now part of Splunk Ideas

Hello Splunkers, We have exciting news for you! AppDynamics has been added to the Splunk Ideas Portal. Which ...

Advanced Splunk Data Management Strategies

Join us on Wednesday, May 14, 2025, at 11 AM PDT / 2 PM EDT for an exclusive Tech Talk that delves into ...