Solved: Questions regarding datamodel, stats, NOT, and Mac...

Justin1224 · ‎09-21-2016

This is the query I have:

| tstats summariesonly count from datamodel=Threat_Intelligence.Threat_Activity where NOT [| ppf_subsearch_dm("ppf_threat_activity","threat_match_field,threat_match_value",now(),"Threat_Activity")] by Threat_Activity.threat_key | drop_dm_object_name("Threat_Activity") | get_threat_attribution(threat_key) | stats sum(count) as count by threat_category | sort 10 - count

I have a couple questions regarding it:

1) What is the datamodel=Threat_Intelligence.Threat_Activity part doing? If it was just (for example): datamodel=Threat_Intelligence, then it would be counting from the data model node that is named "Threat_Intelligence" (if I'm not mistaken). So what does the .Threat_Activity do to it?

2)Similar to the first question, what is the "by Threat_Activity.threat_key" part doing? I believe the "by" means that it's aggregating by the field "Threat_Activity.threat_key". Again, what is the .threat_key doing there?

3) What is the stats sum(count) as count by threat_category part doing? I've read through the stats page on the Splunk reference site but I'm still not 100% sure what stats sum does. I believe that the other part of that command is renaming what stats sum(count) did as count and aggregating by the field threat_category.

4)Regarding the NOT operator, does the NOT apply to all of: ppf_subsearch_dm("ppf_threat_activity","threat_match_field,threat_match_value",now(),"Threat_Activity") ? Also, what is the square bracket doing there and why does a pipe directly follow the NOT operator?

5) Does anyone have any idea of what any of the macros are doing? I don't have the macro definitions for them and I also don't have access to them. I'm pretty sure that the `summariesonly' one directly following tstats just sets tstats to true. But other than that, I'm lost.

If anyone could help me with all or any one of the questions I have, I would really appreciate it.

somesoni2 · ‎09-21-2016

Q 1 and 2:
There is a data model named "Threat_Intelligence" and there is a child node under that data model called "Threat_Activity".
Wherever you see "datamodel=Threat_Intelligence.Threat_Activity", the search is selecting data from child node Threat_Intelligence.Threat_Activity. (it's a data hierarchy. Consider there is data model for Cars and there is child node for Electric cars, so if you want to query data for Electric cars, you'd use Cars.Electric)
Wherever you see "by Threat_Activity.threat_key", the aggregation is done with group by of field threat_key which is available in Node Threat_Activity.

Q 3: I would suggest to not to post duplicate posts.
Read answer from other post: https://answers.splunk.com/answers/453887/stats-sumcount-by-foo.html

Q 4:
The subsearch [| ppf_subsearch_dm("ppf_threat_activity","threat_match_field,threat_match_value",now(),"Threat_Activity")] (subsearches are enclosed within square brackets, they start with either command search OR a pipe symbol based on the subsearch. What subsearch is running is defined in macro ppf_subsearch_dm, since you don't have access, we'll not know what that does.
The subsearch will return a series of key value pair (for example fieldname="fieldvalue"). If there are multiple rows returned by subsearch, by default they will be added a giant nested OR statemenet (e.g. ((fieldname="fieldvalue1") OR (fieldname="fieldvalue2")...). The NOT operator will be just added in front of the nested OR and will cause results to get filtered where the OR condition is matched. (regular NOT logical operator)

Q 5:
We'll be in same situation as you're. We can't tell what macro does unless we have the definition. For few my guess will be this
drop_dm_object_name("Threat_Activity") = Rename any fields with patttern Threat_Activity.fieldname to fieldname.
get_threat_attribution(threat_key) = Some sort of lookup/case statement which adds fields threat_category based on field threat_key.

View solution in original post

somesoni2 · ‎09-21-2016

Q 1 and 2:
There is a data model named "Threat_Intelligence" and there is a child node under that data model called "Threat_Activity".
Wherever you see "datamodel=Threat_Intelligence.Threat_Activity", the search is selecting data from child node Threat_Intelligence.Threat_Activity. (it's a data hierarchy. Consider there is data model for Cars and there is child node for Electric cars, so if you want to query data for Electric cars, you'd use Cars.Electric)
Wherever you see "by Threat_Activity.threat_key", the aggregation is done with group by of field threat_key which is available in Node Threat_Activity.

Q 3: I would suggest to not to post duplicate posts.
Read answer from other post: https://answers.splunk.com/answers/453887/stats-sumcount-by-foo.html

Q 4:
The subsearch [| ppf_subsearch_dm("ppf_threat_activity","threat_match_field,threat_match_value",now(),"Threat_Activity")] (subsearches are enclosed within square brackets, they start with either command search OR a pipe symbol based on the subsearch. What subsearch is running is defined in macro ppf_subsearch_dm, since you don't have access, we'll not know what that does.
The subsearch will return a series of key value pair (for example fieldname="fieldvalue"). If there are multiple rows returned by subsearch, by default they will be added a giant nested OR statemenet (e.g. ((fieldname="fieldvalue1") OR (fieldname="fieldvalue2")...). The NOT operator will be just added in front of the nested OR and will cause results to get filtered where the OR condition is matched. (regular NOT logical operator)

Q 5:
We'll be in same situation as you're. We can't tell what macro does unless we have the definition. For few my guess will be this
drop_dm_object_name("Threat_Activity") = Rename any fields with patttern Threat_Activity.fieldname to fieldname.
get_threat_attribution(threat_key) = Some sort of lookup/case statement which adds fields threat_category based on field threat_key.

Justin1224 · ‎09-21-2016

Great, thank you!

Questions regarding datamodel, stats, NOT, and Macros in my query

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

Questions regarding datamodel, stats, NOT, and Macros in my query

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits