Splunk Search

Fetching statistics: Unique discrepancy details, removing numerical characters, and identifying null and deleted values

adikrhd
Path Finder

Hello Community,
I am fairly new to Splunk, and I am struggling with this. Here is my raw event: these are discrepancy events that show a reported discrepancy in the two JSONs (for the context of this problem, those JSONs are not necessary to be known).
Assuming there are n events similar to what we have in the sample JSON.

 

 

{
  "severity": "INFO",
  "time": "2023-07-09 18:53:53.930",
  "Stats": {
    "discrepancy" : 10
  },
  "discrepancyDetails": {
    "record/0": "#DEL",
    "record/1": "#DEL",
    "recordD": "#DEL",
    "recordX": "expected => actual",
    "recordY": "someExpectedVal => null", <-- actual value is null in this case
    "recordN": "someExpectedValN => null"
  }
}

 

 

Stats.discrepancy provides the total count, while discrepancyDetails provides the actual discrepancy.

I want to fetch some statistics from this, which involve the following details:

  1. All the unique discrepancyDetails with their respective counts.
  2. Before finding the count, I want to remove all numerical characters from the key. For example, in the same JSON, we have two keys in the discrepancyDetails: "record/1" and "record/2". I want to treat these keys as "record/" and replace the numeric strings with an empty value.
  3. figure out all the keys with null actual (from sample json "expected => actual") values and "#DEL" (deleted) values

I was able to obtain the unique count of all the keys using the following query.

 

 

index="demo1" sourcetype="demo2" 
| search discrepancyDetails AND Stats 
| spath "Stats.discrepancy" 
| search "Stats.discrepancy" > 0 
| stats count(discrepancyDetails.*) as discrepancyDetails.*
| transpose

 

 

 

I am unable to figure out points 2 and 3 from the above requirements.

Desired output for requirement 2  considering above sample json:

Unique_keycount
record/2
recordD1
recordX1
recordY1
recordN1


Desired output for requirement 3  considering above sample json:

Unique_keynull or #DELcount
record/
recordD
#DEL2
recordY
recordN
 null2
Labels (3)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

The illustration of desired output eliminates a lot of guess work.  You are correct.  After renaming the original discrepancyDetails.*, I should have excluded them before preceding.

So, my previous search should address requirement 2 when bad.* are excluded (it cannot be combined into the same stats as 3 based on your illustration).

| rex mode=sed "s/\/\d+\"/\/\"/g"
| rename discrepancyDetails.* as bad.*
| fields - bad.*
| spath
| stats count(discrepancyDetails.*) as discrepancyDetails.*
| eval legend = "count"
| transpose header_field=legend column_name=Unique_key

To address 3, values need to be preserved in stats, then table calculated after transpose.

| rename discrepancyDetails.* as bad.*
| fields - bad.*
| rex mode=sed "s/\/\d+\"/\/\"/g"
| spath
| stats values(discrepancyDetails.*) as discrepancyDetails.*
| eval legend = "value"
| transpose header_field=legend column_name=Unique_key
| where match(value, "#DEL|=> *null")
| rex field=value mode=sed "s/.+=> *//"
| stats values(Unique_key) as Unique_key count by value
| table Unique_key value count
| rename value as "#DEL or null"

The following is a data emulation that gives the exact desired outputs.  You can play with it and compare with real data.

| makeresults
| eval _raw = "{
  \"severity\": \"INFO\",
  \"time\": \"2023-07-09 18:53:53.930\",
  \"Stats\": {
    \"discrepancy\" : 10
  },
  \"discrepancyDetails\": {
    \"record/0\": \"#DEL\",
    \"record/1\": \"#DEL\",
    \"recordD\": \"#DEL\",
    \"recordX\": \"expected => actual\",
    \"recordY\": \"someExpectedVal => null\",
    \"recordN\": \"someExpectedValN => null\"
  }
}
"
| spath
``` data emulation above ```

View solution in original post

yuanliu
SplunkTrust
SplunkTrust

This question needs some serious clarification.  First, the sample data and sample code you illustrated are incompatible: Node discrepancyDetails in the data is a hash of key-value pairs, not an array.  Yet the path option in the illustrated SPL uses the {} notation, implying that discrepancyDetails should be array; the use of mvexpand also only has effect on array content.  In fact, both spath and mvexpand should have no effect if data is as illustrated. (If the data format is as illustrated, Splunk would have already given you discrepancyDetails.record/0, discrepancyDetails.record/1, etc.)  Is discrepancyDetails a hash node or an array node?  I will assume the data illustration as correct.

Second, your sample code suggests that the count you wanted is to be grouped by (modified) keys in discrepancyDetails.  But the stats command as illustrated should give you no output at all.  I can sense two possibilities: you either want

| stats count(discrepancyDetails.*) as discrepancyDetails.*

or

| stats count by discrepancyDetails.*

which one is it?  I will assume the former.

Then, I'm still unclear about the 3rd requirement.  What is a null actual?  Do you mean "expected =>"?  There is no way for me to make a meaningful speculation.  I will ignore this ask until you can clarify.

Based on my reinterpretation, what you ask (except 3) can be achieved with

| rex mode=sed "s/\/\d+\"/\"/g"
| rename discrepancyDetails.* AS bad.*
| spath
| stats count(discrepancyDetails.*) as discrepancyDetails.*

Hope this helps

0 Karma

adikrhd
Path Finder

hey @yuanliu ,
yes you are right about mvexpand and {} use in the SPL, it turns out that the following query also returns the same result, please note that the transpose is important.

 

index="demo1" sourcetype="demo2" 
| search discrepancyDetails AND Stats 
| spath "Stats.discrepancy" 
| search "Stats.discrepancy" > 0 
| stats count(discrepancyDetails.*) as discrepancyDetails.*
| transpose

 

 
However, I appreciate your suggestion, but it seems that the provided answer is not yielding the desired results. It appears that the solution is renaming the "discrepancyDetails" fields to "bad," resulting in duplicated "Interested" fields with prefixes "discrepancyDetails." and "bad." in my Splunk fields section. Moreover, the numeric characters in the keys are not being replaced with an empty string. Could you please kindly assist me in finding a more effective solution? Thank you for your understanding.

I have also added desired results for point 2 and 3 in the original post, please give it a try.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

The illustration of desired output eliminates a lot of guess work.  You are correct.  After renaming the original discrepancyDetails.*, I should have excluded them before preceding.

So, my previous search should address requirement 2 when bad.* are excluded (it cannot be combined into the same stats as 3 based on your illustration).

| rex mode=sed "s/\/\d+\"/\/\"/g"
| rename discrepancyDetails.* as bad.*
| fields - bad.*
| spath
| stats count(discrepancyDetails.*) as discrepancyDetails.*
| eval legend = "count"
| transpose header_field=legend column_name=Unique_key

To address 3, values need to be preserved in stats, then table calculated after transpose.

| rename discrepancyDetails.* as bad.*
| fields - bad.*
| rex mode=sed "s/\/\d+\"/\/\"/g"
| spath
| stats values(discrepancyDetails.*) as discrepancyDetails.*
| eval legend = "value"
| transpose header_field=legend column_name=Unique_key
| where match(value, "#DEL|=> *null")
| rex field=value mode=sed "s/.+=> *//"
| stats values(Unique_key) as Unique_key count by value
| table Unique_key value count
| rename value as "#DEL or null"

The following is a data emulation that gives the exact desired outputs.  You can play with it and compare with real data.

| makeresults
| eval _raw = "{
  \"severity\": \"INFO\",
  \"time\": \"2023-07-09 18:53:53.930\",
  \"Stats\": {
    \"discrepancy\" : 10
  },
  \"discrepancyDetails\": {
    \"record/0\": \"#DEL\",
    \"record/1\": \"#DEL\",
    \"recordD\": \"#DEL\",
    \"recordX\": \"expected => actual\",
    \"recordY\": \"someExpectedVal => null\",
    \"recordN\": \"someExpectedValN => null\"
  }
}
"
| spath
``` data emulation above ```

adikrhd
Path Finder

@yuanliu this worked fine for requirement 3,
just a follow up ask for requirement 3, I want to filter out all the fields that do not contain certain substring in the keys,
I am not really able to figure our how can I filter out this.

for instance, let us say I want to remove all the keys from the result that contain "Y", considering sample example "recordY" would be removed from the result.

Tags (1)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

To filter out certain keys, simply add the condition in the last "where" before stats. For example,

| rename discrepancyDetails.* as bad.*
| fields - bad.*
| rex mode=sed "s/\/\d+\"/\/\"/g"
| spath
| stats values(discrepancyDetails.*) as discrepancyDetails.*
| eval legend = "value"
| transpose header_field=legend column_name=Unique_key
| where match(value, "#DEL|=> *null") AND NOT match(Unique_key, "Y$") ``` add any negated condition ```
| rex field=value mode=sed "s/.+=> *//"
| stats values(Unique_key) as Unique_key count by value
| table Unique_key value count
| rename value as "#DEL or null"

 

adikrhd
Path Finder

Thanks @yuanliu ,
In requirement 3 what if I want to group remaining keys under different group,
so my result column would have header '#DEL or null or others' and all the other keys that do not fall under '#DEL' and 'null' category should be grouped under 'others' column.
could you help me with this query is well?

0 Karma

yuanliu
SplunkTrust
SplunkTrust

I'm not sure if I understand.  Do you mean something like the following?

Unique_key
#DEL or nullothercount
discrepancyDetails.record/
discrepancyDetails.recordD
#DEL 2
discrepancyDetails.recordX actual1
discrepancyDetails.recordN
discrepancyDetails.recordY
null 2

To get the above, just get rid of "where" command and manipulate field name.

 

| rex field=value mode=sed "s/.+=> *//"
| stats values(Unique_key) as Unique_key count by value
| eval other = if(match(value, "#DEL|null"), null(), value)
| eval "#DEL or null" = if(match(value, "#DEL|null"), value, null)
| table Unique_key "#DEL or null" other count

 

But I'm not sure if the table is useful.

0 Karma

adikrhd
Path Finder

@yuanliu , I want something like this

Unique_key
#DEL or null or otherscount
discrepancyDetails.record/
discrepancyDetails.recordD
#DEL2
discrepancyDetails.recordX
.
.
others1
discrepancyDetails.recordN
discrepancyDetails.recordY
null2

 

| rex field=value mode=sed "s/.+=> *//"
| eval value = if(match(value, "#DEL|null"), value, "others")
| stats values(Unique_key) as Unique_key count by value
| table Unique_key value count
| rename value as "#DEL or null or others"

I tried something like this but it doesn't seem to be working not sure why.

0 Karma

adikrhd
Path Finder

@yuanliu ,
I was able to achieve it through this; I missed the part where the value is a multivalued field. After using mvmap, it worked fine.

 

 

| rex field=value mode=sed "s/.+=> *//"
| eval value = mvmap(value, if(match(value, "#DEL|null"), value, "others"))
| stats values(Unique_key) as Unique_key count by value
| table Unique_key value count
| rename value as "#DEL or null or others"

 

 

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Requirements 2 and 3 are not clear - perhaps if you shared what your expected output would be, it might become clearer

0 Karma

adikrhd
Path Finder

hey @ITWhisperer,
I have updated the post with desired results, please give it a try now.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Requirement 3 is still unclear - all keys have values, none of them are null

0 Karma

adikrhd
Path Finder

hey @ITWhisperer ,

The values are of two types:

  1. "#DEL": This indicates that the field is deleted.
  2. "<expected> => <actual>": This pattern represents the actual and expected values of the field.

Therefore, we need to group all the fields that have "#DEL" and "null(<actual>)" values.

0 Karma

adikrhd
Path Finder

@ITWhisperer 
please let me know if you are still not clear.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Requirement 2

| spath discrepancyDetails
| rex mode=sed field=discrepancyDetails "s/\/\d+/\//g"
| eval keys=json_keys(discrepancyDetails)
| rex mode=sed field=keys "s/[\[\]\\\"]//g"
| eval keys=split(keys,",")
| stats count by keys
Get Updates on the Splunk Community!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...

Share Your Ideas & Meet the Lantern team at .Conf! Plus All of This Month’s New ...

Splunk Lantern is Splunk’s customer success center that provides advice from Splunk experts on valuable data ...