Splunk Search

Performance issue with query

amoldesai
Explorer

Hi,

I have a query written to find average exceptions per device on monthly basis for my use case. The query returns results as expected but the query performance is very poor. Below is the query and details on it. Request your help in optimising the query:

index="79390-np" sourcetype=np-cache-v2 source="*bp_detail*" [search index="79390-np" sourcetype=np-cache-v2 source="*bp_detail*" | join bpRuleId [search index="79390-np" sourcetype=np-cache-v2 (source="*bp_summary*" and bpPrimaryTechnology="*" ) ]| join deviceId [ search index="79390-np" sourcetype=np-cache-v2 source="*group_member*" groupId="*"] | fields deviceId]| eval month_num=strftime(_time,"%m") | eval Month=strftime(_time,"%b %Y")|stats dc(deviceId) as uniquedevices,dc(source) as sourcecount,count by Month|sort month_num| eval avgdeviceperupload=count/sourcecount | eval avguniquedevices = round(avgdeviceperupload/uniquedevices)| rename avguniquedevices as "Average Exceptions Per Device" | table Month,"Average Exceptions Per Device"
,
Details:
sources:bp_detail,bp_summary,group_member.
bp_detail fields: bpRuleId,deviceId
bp_summary fields: bpRuleId,bpPrimaryTechnology
group_members fields : groupId,deviceId

Thanks

Tags (1)
0 Karma
1 Solution

somesoni2
Revered Legend

Give this a try

index="79390-np" sourcetype=np-cache-v2 source="*bp_detail*" 
[search index="79390-np" sourcetype=np-cache-v2 source="*bp_detail*" OR (source="*bp_summary*" and bpPrimaryTechnology="*" )
| stats values(deviceId) as deviceId by bpRuleId  | where isnotnull(devideId)
| append [search index="79390-np" sourcetype=np-cache-v2 source="*group_member*" groupId="*" | stats count by devideId groupId]
| stats values(groupId) as groupId by deviceId | where isnotnull(groupId) | tabel devideId]
| timechart span=1mon dc(deviceId) as uniquedevices,dc(source) as sourcecount,count 
| eval avgdeviceperupload=count/sourcecount | eval avguniquedevices = round(avgdeviceperupload/uniquedevices)| rename avguniquedevices as "Average Exceptions Per Device"
| eval Month=strftime(_time,"%b %Y") | table Month,"Average Exceptions Per Device"

View solution in original post

0 Karma

somesoni2
Revered Legend

Give this a try

index="79390-np" sourcetype=np-cache-v2 source="*bp_detail*" 
[search index="79390-np" sourcetype=np-cache-v2 source="*bp_detail*" OR (source="*bp_summary*" and bpPrimaryTechnology="*" )
| stats values(deviceId) as deviceId by bpRuleId  | where isnotnull(devideId)
| append [search index="79390-np" sourcetype=np-cache-v2 source="*group_member*" groupId="*" | stats count by devideId groupId]
| stats values(groupId) as groupId by deviceId | where isnotnull(groupId) | tabel devideId]
| timechart span=1mon dc(deviceId) as uniquedevices,dc(source) as sourcecount,count 
| eval avgdeviceperupload=count/sourcecount | eval avguniquedevices = round(avgdeviceperupload/uniquedevices)| rename avguniquedevices as "Average Exceptions Per Device"
| eval Month=strftime(_time,"%b %Y") | table Month,"Average Exceptions Per Device"
0 Karma

amoldesai
Explorer

Thanks a lot. It helps in optimising the query.

0 Karma

Richfez
SplunkTrust
SplunkTrust

OK, so two things stand out. First, using a wildcard at the front of some search criteria (like source="*bp_detail*") isn't making it speedy, but there might not be any choice in that. The second, though, is join. Join is wonderful. But also amazingly inefficient and slow. Whenever possible (which is usually but not always), stats or transaction is preferred.

So, let's strip it down a bit. It looks like the "get my data into one nice event" stuff is joining together bp_detail to bp_summary on bpRuleId, then bp_detail to group_members on deviceId. Does that sound right?

I usually end up with transactions for my data, so that's what I'll use here. A similar technique can be used for stats, too, it just ends up with a bit of a different look.

index="79390-np" sourcetype=np-cache-v2 source="*bp_detail*" OR source="*group_member*" OR source="*bp_summary*"
| transaction bpRuleId deviceId
| ... evals and pretty output things here

First line just grabs all the events that you eventually want to join together.
The second line builds a transaction that says wherever a bpRuleId exists in two events and matches, join it together. Ditto where deviceId exists in two events and matches, join THAT in too. For more on this, read the docs for Transaction very carefully, especially right at the Transaction Usage section. There's also a great answer by gkanapathy that may be of use - if you find that helpful, be sure to upvote it! You'll likely want some maxspan=X and maxpause=Y and/or other things like that in there to make it even speedier, but you can play with those and if you have problems, feel free to ask again.

Once you have all the events linking up to each other properly, well, the rest is mostly what you already have.

0 Karma

amoldesai
Explorer

Thanks for your quick and detailed reply. It was useful to know about how transaction could be used here.
But as I require groupId and technology field(mandatory input to my form) to be explicitly set in the query,I will go with the stats based technique. Thanks.

0 Karma
Get Updates on the Splunk Community!

Observability Highlights | November 2022 Newsletter

 November 2022Observability CloudEnd Of Support Extension for SignalFx Smart AgentSplunk is extending the End ...

Avoid Certificate Expiry Issues in Splunk Enterprise with Certificate Assist

This blog post is part 2 of 4 of a series on Splunk Assist. Click the links below to see the other ...

Using Machine Learning for Hunting Security Threats

REGISTER NOW Seeing the exponential hike in global cyber threat spectrum, organizations are now striving more ...