Hello Community,
I am fairly new to Splunk, and I am struggling with this. Here is my raw event: these are discrepancy events that show a reported discrepancy in the two JSONs (for the context of this problem, those JSONs are not necessary to be known).
Assuming there are n events similar to what we have in the sample JSON.
{
"severity": "INFO",
"time": "2023-07-09 18:53:53.930",
"Stats": {
"discrepancy" : 10
},
"discrepancyDetails": {
"record/0": "#DEL",
"record/1": "#DEL",
"recordD": "#DEL",
"recordX": "expected => actual",
"recordY": "someExpectedVal => null", <-- actual value is null in this case
"recordN": "someExpectedValN => null"
}
}
Stats.discrepancy provides the total count, while discrepancyDetails provides the actual discrepancy.
I want to fetch some statistics from this, which involve the following details:
I was able to obtain the unique count of all the keys using the following query.
index="demo1" sourcetype="demo2"
| search discrepancyDetails AND Stats
| spath "Stats.discrepancy"
| search "Stats.discrepancy" > 0
| stats count(discrepancyDetails.*) as discrepancyDetails.*
| transpose
I am unable to figure out points 2 and 3 from the above requirements.
Desired output for requirement 2 considering above sample json:
Unique_key | count |
record/ | 2 |
recordD | 1 |
recordX | 1 |
recordY | 1 |
recordN | 1 |
Desired output for requirement 3 considering above sample json:
Unique_key | null or #DEL | count |
record/ recordD | #DEL | 2 |
recordY recordN | null | 2 |
The illustration of desired output eliminates a lot of guess work. You are correct. After renaming the original discrepancyDetails.*, I should have excluded them before preceding.
So, my previous search should address requirement 2 when bad.* are excluded (it cannot be combined into the same stats as 3 based on your illustration).
| rex mode=sed "s/\/\d+\"/\/\"/g"
| rename discrepancyDetails.* as bad.*
| fields - bad.*
| spath
| stats count(discrepancyDetails.*) as discrepancyDetails.*
| eval legend = "count"
| transpose header_field=legend column_name=Unique_key
To address 3, values need to be preserved in stats, then table calculated after transpose.
| rename discrepancyDetails.* as bad.*
| fields - bad.*
| rex mode=sed "s/\/\d+\"/\/\"/g"
| spath
| stats values(discrepancyDetails.*) as discrepancyDetails.*
| eval legend = "value"
| transpose header_field=legend column_name=Unique_key
| where match(value, "#DEL|=> *null")
| rex field=value mode=sed "s/.+=> *//"
| stats values(Unique_key) as Unique_key count by value
| table Unique_key value count
| rename value as "#DEL or null"
The following is a data emulation that gives the exact desired outputs. You can play with it and compare with real data.
| makeresults
| eval _raw = "{
\"severity\": \"INFO\",
\"time\": \"2023-07-09 18:53:53.930\",
\"Stats\": {
\"discrepancy\" : 10
},
\"discrepancyDetails\": {
\"record/0\": \"#DEL\",
\"record/1\": \"#DEL\",
\"recordD\": \"#DEL\",
\"recordX\": \"expected => actual\",
\"recordY\": \"someExpectedVal => null\",
\"recordN\": \"someExpectedValN => null\"
}
}
"
| spath
``` data emulation above ```
This question needs some serious clarification. First, the sample data and sample code you illustrated are incompatible: Node discrepancyDetails in the data is a hash of key-value pairs, not an array. Yet the path option in the illustrated SPL uses the {} notation, implying that discrepancyDetails should be array; the use of mvexpand also only has effect on array content. In fact, both spath and mvexpand should have no effect if data is as illustrated. (If the data format is as illustrated, Splunk would have already given you discrepancyDetails.record/0, discrepancyDetails.record/1, etc.) Is discrepancyDetails a hash node or an array node? I will assume the data illustration as correct.
Second, your sample code suggests that the count you wanted is to be grouped by (modified) keys in discrepancyDetails. But the stats command as illustrated should give you no output at all. I can sense two possibilities: you either want
| stats count(discrepancyDetails.*) as discrepancyDetails.*
or
| stats count by discrepancyDetails.*
which one is it? I will assume the former.
Then, I'm still unclear about the 3rd requirement. What is a null actual? Do you mean "expected =>"? There is no way for me to make a meaningful speculation. I will ignore this ask until you can clarify.
Based on my reinterpretation, what you ask (except 3) can be achieved with
| rex mode=sed "s/\/\d+\"/\"/g"
| rename discrepancyDetails.* AS bad.*
| spath
| stats count(discrepancyDetails.*) as discrepancyDetails.*
Hope this helps
hey @yuanliu ,
yes you are right about mvexpand and {} use in the SPL, it turns out that the following query also returns the same result, please note that the transpose is important.
index="demo1" sourcetype="demo2"
| search discrepancyDetails AND Stats
| spath "Stats.discrepancy"
| search "Stats.discrepancy" > 0
| stats count(discrepancyDetails.*) as discrepancyDetails.*
| transpose
However, I appreciate your suggestion, but it seems that the provided answer is not yielding the desired results. It appears that the solution is renaming the "discrepancyDetails" fields to "bad," resulting in duplicated "Interested" fields with prefixes "discrepancyDetails." and "bad." in my Splunk fields section. Moreover, the numeric characters in the keys are not being replaced with an empty string. Could you please kindly assist me in finding a more effective solution? Thank you for your understanding.
I have also added desired results for point 2 and 3 in the original post, please give it a try.
The illustration of desired output eliminates a lot of guess work. You are correct. After renaming the original discrepancyDetails.*, I should have excluded them before preceding.
So, my previous search should address requirement 2 when bad.* are excluded (it cannot be combined into the same stats as 3 based on your illustration).
| rex mode=sed "s/\/\d+\"/\/\"/g"
| rename discrepancyDetails.* as bad.*
| fields - bad.*
| spath
| stats count(discrepancyDetails.*) as discrepancyDetails.*
| eval legend = "count"
| transpose header_field=legend column_name=Unique_key
To address 3, values need to be preserved in stats, then table calculated after transpose.
| rename discrepancyDetails.* as bad.*
| fields - bad.*
| rex mode=sed "s/\/\d+\"/\/\"/g"
| spath
| stats values(discrepancyDetails.*) as discrepancyDetails.*
| eval legend = "value"
| transpose header_field=legend column_name=Unique_key
| where match(value, "#DEL|=> *null")
| rex field=value mode=sed "s/.+=> *//"
| stats values(Unique_key) as Unique_key count by value
| table Unique_key value count
| rename value as "#DEL or null"
The following is a data emulation that gives the exact desired outputs. You can play with it and compare with real data.
| makeresults
| eval _raw = "{
\"severity\": \"INFO\",
\"time\": \"2023-07-09 18:53:53.930\",
\"Stats\": {
\"discrepancy\" : 10
},
\"discrepancyDetails\": {
\"record/0\": \"#DEL\",
\"record/1\": \"#DEL\",
\"recordD\": \"#DEL\",
\"recordX\": \"expected => actual\",
\"recordY\": \"someExpectedVal => null\",
\"recordN\": \"someExpectedValN => null\"
}
}
"
| spath
``` data emulation above ```
@yuanliu this worked fine for requirement 3,
just a follow up ask for requirement 3, I want to filter out all the fields that do not contain certain substring in the keys,
I am not really able to figure our how can I filter out this.
for instance, let us say I want to remove all the keys from the result that contain "Y", considering sample example "recordY" would be removed from the result.
To filter out certain keys, simply add the condition in the last "where" before stats. For example,
| rename discrepancyDetails.* as bad.*
| fields - bad.*
| rex mode=sed "s/\/\d+\"/\/\"/g"
| spath
| stats values(discrepancyDetails.*) as discrepancyDetails.*
| eval legend = "value"
| transpose header_field=legend column_name=Unique_key
| where match(value, "#DEL|=> *null") AND NOT match(Unique_key, "Y$") ``` add any negated condition ```
| rex field=value mode=sed "s/.+=> *//"
| stats values(Unique_key) as Unique_key count by value
| table Unique_key value count
| rename value as "#DEL or null"
Thanks @yuanliu ,
In requirement 3 what if I want to group remaining keys under different group,
so my result column would have header '#DEL or null or others' and all the other keys that do not fall under '#DEL' and 'null' category should be grouped under 'others' column.
could you help me with this query is well?
I'm not sure if I understand. Do you mean something like the following?
Unique_key | #DEL or null | other | count |
discrepancyDetails.record/ discrepancyDetails.recordD | #DEL | 2 | |
discrepancyDetails.recordX | actual | 1 | |
discrepancyDetails.recordN discrepancyDetails.recordY | null | 2 |
To get the above, just get rid of "where" command and manipulate field name.
| rex field=value mode=sed "s/.+=> *//"
| stats values(Unique_key) as Unique_key count by value
| eval other = if(match(value, "#DEL|null"), null(), value)
| eval "#DEL or null" = if(match(value, "#DEL|null"), value, null)
| table Unique_key "#DEL or null" other count
But I'm not sure if the table is useful.
@yuanliu , I want something like this
Unique_key | #DEL or null or others | count |
discrepancyDetails.record/ discrepancyDetails.recordD | #DEL | 2 |
discrepancyDetails.recordX . . | others | 1 |
discrepancyDetails.recordN discrepancyDetails.recordY | null | 2 |
| rex field=value mode=sed "s/.+=> *//"
| eval value = if(match(value, "#DEL|null"), value, "others")
| stats values(Unique_key) as Unique_key count by value
| table Unique_key value count
| rename value as "#DEL or null or others"
I tried something like this but it doesn't seem to be working not sure why.
@yuanliu ,
I was able to achieve it through this; I missed the part where the value is a multivalued field. After using mvmap, it worked fine.
| rex field=value mode=sed "s/.+=> *//"
| eval value = mvmap(value, if(match(value, "#DEL|null"), value, "others"))
| stats values(Unique_key) as Unique_key count by value
| table Unique_key value count
| rename value as "#DEL or null or others"
Requirements 2 and 3 are not clear - perhaps if you shared what your expected output would be, it might become clearer
hey @ITWhisperer,
I have updated the post with desired results, please give it a try now.
Requirement 3 is still unclear - all keys have values, none of them are null
hey @ITWhisperer ,
The values are of two types:
Therefore, we need to group all the fields that have "#DEL" and "null(<actual>)" values.
@ITWhisperer
please let me know if you are still not clear.
Requirement 2
| spath discrepancyDetails
| rex mode=sed field=discrepancyDetails "s/\/\d+/\//g"
| eval keys=json_keys(discrepancyDetails)
| rex mode=sed field=keys "s/[\[\]\\\"]//g"
| eval keys=split(keys,",")
| stats count by keys