Solved: Fetching statistics: Unique discrepancy details, r...

adikrhd · ‎07-09-2023

Hello Community,
I am fairly new to Splunk, and I am struggling with this. Here is my raw event: these are discrepancy events that show a reported discrepancy in the two JSONs (for the context of this problem, those JSONs are not necessary to be known).
Assuming there are n events similar to what we have in the sample JSON.

{
  "severity": "INFO",
  "time": "2023-07-09 18:53:53.930",
  "Stats": {
    "discrepancy" : 10
  },
  "discrepancyDetails": {
    "record/0": "#DEL",
    "record/1": "#DEL",
    "recordD": "#DEL",
    "recordX": "expected => actual",
    "recordY": "someExpectedVal => null", <-- actual value is null in this case
    "recordN": "someExpectedValN => null"
  }
}

Stats.discrepancy provides the total count, while discrepancyDetails provides the actual discrepancy.

I want to fetch some statistics from this, which involve the following details:

All the unique discrepancyDetails with their respective counts.
Before finding the count, I want to remove all numerical characters from the key. For example, in the same JSON, we have two keys in the discrepancyDetails: "record/1" and "record/2". I want to treat these keys as "record/" and replace the numeric strings with an empty value.
figure out all the keys with null actual (from sample json "expected => actual") values and "#DEL" (deleted) values

I was able to obtain the unique count of all the keys using the following query.

index="demo1" sourcetype="demo2" 
| search discrepancyDetails AND Stats 
| spath "Stats.discrepancy" 
| search "Stats.discrepancy" > 0 
| stats count(discrepancyDetails.*) as discrepancyDetails.*
| transpose

I am unable to figure out points 2 and 3 from the above requirements.

Desired output for requirement 2 considering above sample json:

Unique_key	count
record/	2
recordD	1
recordX	1
recordY	1
recordN	1

Desired output for requirement 3 considering above sample json:

Unique_key	null or #DEL	count
record/ recordD	#DEL	2
recordY recordN	null	2

yuanliu · ‎07-10-2023

The illustration of desired output eliminates a lot of guess work. You are correct. After renaming the original discrepancyDetails.*, I should have excluded them before preceding.

So, my previous search should address requirement 2 when bad.* are excluded (it cannot be combined into the same stats as 3 based on your illustration).

| rex mode=sed "s/\/\d+\"/\/\"/g"
| rename discrepancyDetails.* as bad.*
| fields - bad.*
| spath
| stats count(discrepancyDetails.*) as discrepancyDetails.*
| eval legend = "count"
| transpose header_field=legend column_name=Unique_key

To address 3, values need to be preserved in stats, then table calculated after transpose.

| rename discrepancyDetails.* as bad.*
| fields - bad.*
| rex mode=sed "s/\/\d+\"/\/\"/g"
| spath
| stats values(discrepancyDetails.*) as discrepancyDetails.*
| eval legend = "value"
| transpose header_field=legend column_name=Unique_key
| where match(value, "#DEL|=> *null")
| rex field=value mode=sed "s/.+=> *//"
| stats values(Unique_key) as Unique_key count by value
| table Unique_key value count
| rename value as "#DEL or null"

The following is a data emulation that gives the exact desired outputs. You can play with it and compare with real data.

| makeresults
| eval _raw = "{
  \"severity\": \"INFO\",
  \"time\": \"2023-07-09 18:53:53.930\",
  \"Stats\": {
    \"discrepancy\" : 10
  },
  \"discrepancyDetails\": {
    \"record/0\": \"#DEL\",
    \"record/1\": \"#DEL\",
    \"recordD\": \"#DEL\",
    \"recordX\": \"expected => actual\",
    \"recordY\": \"someExpectedVal => null\",
    \"recordN\": \"someExpectedValN => null\"
  }
}
"
| spath
``` data emulation above ```

View solution in original post

yuanliu · ‎07-10-2023

This question needs some serious clarification. First, the sample data and sample code you illustrated are incompatible: Node discrepancyDetails in the data is a hash of key-value pairs, not an array. Yet the path option in the illustrated SPL uses the {} notation, implying that discrepancyDetails should be array; the use of mvexpand also only has effect on array content. In fact, both spath and mvexpand should have no effect if data is as illustrated. (If the data format is as illustrated, Splunk would have already given you discrepancyDetails.record/0, discrepancyDetails.record/1, etc.) Is discrepancyDetails a hash node or an array node? I will assume the data illustration as correct.

Second, your sample code suggests that the count you wanted is to be grouped by (modified) keys in discrepancyDetails. But the stats command as illustrated should give you no output at all. I can sense two possibilities: you either want

| stats count(discrepancyDetails.*) as discrepancyDetails.*

or

| stats count by discrepancyDetails.*

which one is it? I will assume the former.

Then, I'm still unclear about the 3rd requirement. What is a null actual? Do you mean "expected =>"? There is no way for me to make a meaningful speculation. I will ignore this ask until you can clarify.

Based on my reinterpretation, what you ask (except 3) can be achieved with

| rex mode=sed "s/\/\d+\"/\"/g"
| rename discrepancyDetails.* AS bad.*
| spath
| stats count(discrepancyDetails.*) as discrepancyDetails.*

Hope this helps

adikrhd · ‎07-10-2023

hey @yuanliu ,
yes you are right about mvexpand and {} use in the SPL, it turns out that the following query also returns the same result, please note that the transpose is important.

index="demo1" sourcetype="demo2" 
| search discrepancyDetails AND Stats 
| spath "Stats.discrepancy" 
| search "Stats.discrepancy" > 0 
| stats count(discrepancyDetails.*) as discrepancyDetails.*
| transpose

However, I appreciate your suggestion, but it seems that the provided answer is not yielding the desired results. It appears that the solution is renaming the "discrepancyDetails" fields to "bad," resulting in duplicated "Interested" fields with prefixes "discrepancyDetails." and "bad." in my Splunk fields section. Moreover, the numeric characters in the keys are not being replaced with an empty string. Could you please kindly assist me in finding a more effective solution? Thank you for your understanding.

I have also added desired results for point 2 and 3 in the original post, please give it a try.

yuanliu · ‎07-10-2023

The illustration of desired output eliminates a lot of guess work. You are correct. After renaming the original discrepancyDetails.*, I should have excluded them before preceding.

So, my previous search should address requirement 2 when bad.* are excluded (it cannot be combined into the same stats as 3 based on your illustration).

| rex mode=sed "s/\/\d+\"/\/\"/g"
| rename discrepancyDetails.* as bad.*
| fields - bad.*
| spath
| stats count(discrepancyDetails.*) as discrepancyDetails.*
| eval legend = "count"
| transpose header_field=legend column_name=Unique_key

To address 3, values need to be preserved in stats, then table calculated after transpose.

| rename discrepancyDetails.* as bad.*
| fields - bad.*
| rex mode=sed "s/\/\d+\"/\/\"/g"
| spath
| stats values(discrepancyDetails.*) as discrepancyDetails.*
| eval legend = "value"
| transpose header_field=legend column_name=Unique_key
| where match(value, "#DEL|=> *null")
| rex field=value mode=sed "s/.+=> *//"
| stats values(Unique_key) as Unique_key count by value
| table Unique_key value count
| rename value as "#DEL or null"

The following is a data emulation that gives the exact desired outputs. You can play with it and compare with real data.

| makeresults
| eval _raw = "{
  \"severity\": \"INFO\",
  \"time\": \"2023-07-09 18:53:53.930\",
  \"Stats\": {
    \"discrepancy\" : 10
  },
  \"discrepancyDetails\": {
    \"record/0\": \"#DEL\",
    \"record/1\": \"#DEL\",
    \"recordD\": \"#DEL\",
    \"recordX\": \"expected => actual\",
    \"recordY\": \"someExpectedVal => null\",
    \"recordN\": \"someExpectedValN => null\"
  }
}
"
| spath
``` data emulation above ```

adikrhd · ‎07-17-2023

@yuanliu this worked fine for requirement 3,
just a follow up ask for requirement 3, I want to filter out all the fields that do not contain certain substring in the keys,
I am not really able to figure our how can I filter out this.

for instance, let us say I want to remove all the keys from the result that contain "Y", considering sample example "recordY" would be removed from the result.

yuanliu · ‎07-17-2023

To filter out certain keys, simply add the condition in the last "where" before stats. For example,

| rename discrepancyDetails.* as bad.*
| fields - bad.*
| rex mode=sed "s/\/\d+\"/\/\"/g"
| spath
| stats values(discrepancyDetails.*) as discrepancyDetails.*
| eval legend = "value"
| transpose header_field=legend column_name=Unique_key
| where match(value, "#DEL|=> *null") AND NOT match(Unique_key, "Y$") ``` add any negated condition ```
| rex field=value mode=sed "s/.+=> *//"
| stats values(Unique_key) as Unique_key count by value
| table Unique_key value count
| rename value as "#DEL or null"

adikrhd · ‎07-17-2023

Thanks @yuanliu ,
In requirement 3 what if I want to group remaining keys under different group,
so my result column would have header '#DEL or null or others' and all the other keys that do not fall under '#DEL' and 'null' category should be grouped under 'others' column.
could you help me with this query is well?

yuanliu · ‎07-18-2023

I'm not sure if I understand. Do you mean something like the following?

Unique_key	#DEL or null	other	count
discrepancyDetails.record/ discrepancyDetails.recordD	#DEL		2
discrepancyDetails.recordX		actual	1
discrepancyDetails.recordN discrepancyDetails.recordY	null		2

To get the above, just get rid of "where" command and manipulate field name.

| rex field=value mode=sed "s/.+=> *//"
| stats values(Unique_key) as Unique_key count by value
| eval other = if(match(value, "#DEL|null"), null(), value)
| eval "#DEL or null" = if(match(value, "#DEL|null"), value, null)
| table Unique_key "#DEL or null" other count

But I'm not sure if the table is useful.

adikrhd · ‎07-18-2023

@yuanliu , I want something like this

Unique_key	#DEL or null or others	count
discrepancyDetails.record/ discrepancyDetails.recordD	#DEL	2
discrepancyDetails.recordX . .	others	1
discrepancyDetails.recordN discrepancyDetails.recordY	null	2

| rex field=value mode=sed "s/.+=> *//"
| eval value = if(match(value, "#DEL|null"), value, "others")
| stats values(Unique_key) as Unique_key count by value
| table Unique_key value count
| rename value as "#DEL or null or others"

I tried something like this but it doesn't seem to be working not sure why.

adikrhd · ‎07-18-2023

@yuanliu ,
I was able to achieve it through this; I missed the part where the value is a multivalued field. After using mvmap, it worked fine.

| rex field=value mode=sed "s/.+=> *//"
| eval value = mvmap(value, if(match(value, "#DEL|null"), value, "others"))
| stats values(Unique_key) as Unique_key count by value
| table Unique_key value count
| rename value as "#DEL or null or others"

ITWhisperer · ‎07-10-2023

Requirements 2 and 3 are not clear - perhaps if you shared what your expected output would be, it might become clearer

adikrhd · ‎07-10-2023

hey @ITWhisperer,
I have updated the post with desired results, please give it a try now.

ITWhisperer · ‎07-10-2023

Requirement 3 is still unclear - all keys have values, none of them are null

adikrhd · ‎07-10-2023

hey @ITWhisperer ,

The values are of two types:

"#DEL": This indicates that the field is deleted.
"<expected> => <actual>": This pattern represents the actual and expected values of the field.

Therefore, we need to group all the fields that have "#DEL" and "null(<actual>)" values.

adikrhd · ‎07-12-2023

@ITWhisperer
please let me know if you are still not clear.

ITWhisperer · ‎07-10-2023

Requirement 2

| spath discrepancyDetails
| rex mode=sed field=discrepancyDetails "s/\/\d+/\//g"
| eval keys=json_keys(discrepancyDetails)
| rex mode=sed field=keys "s/[\[\]\\\"]//g"
| eval keys=split(keys,",")
| stats count by keys

Fetching statistics: Unique discrepancy details, removing numerical characters, and identifying null and deleted values

field extraction

stats

table

Introducing the Splunk Community Dashboard Challenge!

Wondering How to Build Resiliency in the Cloud?

Updated Data Management and AWS GDI Inventory in Splunk Observability