I have a panel which loads data for last 3 months
and it takes approx 120 secs to load the single panel value - showing the count of advanced users in percentage. Currently, we have implemented the summary index and data model to improve the search performance, but still the query takes approx 45 seconds to show the value in the panel. Need your suggestions on further improving the performance of the query. Below are the details of the same :
1) Following saved search is running every 5 minutes
collecting last 5 minutes
of data into summary index. Please note that we need to show value of last 3 months.
index=box event_type!="*login*" |fields - raw|dedup event_id |table _time,created_by_login,event_type,source_item_name,event_id
2) Data model is created from the summary index.
3) Below is the query which is taking data from datamodel and loads the panel, but still it takes approx 45 seconds to load the value in the panel :
|tstats dc(event_type) as count from datamodel=box_activity by created_by_login
|dedup created_by_login
|replace "Unknown User" with "Shared Links" in created_by_login
|rex field=created_by_login ".@(?<sourcedomain>[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z])"
|search sourcedomain="dadomain.com"
|rename created_by_login as User, event_type as Activity
|where User!="BOX Admin"
|sort - count|where count >=7| stats count|appendcols [search index=dctest sourcetype=box-users earliest=-24h latest=now|table S_No|search S_No!="S.No"|sort - S_No|head 1 |eval S_No=S_No]
|eval Advanced=round((count*100)/S_No,0)|fields - count|table Advanced|rename Advanced as "%"
How can we further improve the search performance. Whether we can use loadjob
command here ? I think we can use it, but loadjob
will only load the static values based on constant time picker option. If we change the time picker range, then loadjob
will not work accordingly. Below link has some workaround for loadjob
issue, but I think this requires _time
to be part of the output in the above query. How can I add _time
to the above query ? Please help resolve this issue ?
Thanks
PG
OK, let's start completely over. First of all, realize that these 2 methods are 100% mutually-exclusive, but not incompatibly so.
For summary index
you are scheduled to run Every 5 minutes
for The last 5 minutes
. This is a TERRIBLE plan because typically, events take 2-3 minutes to get into splunk which means that the events that arrive 2-3 minutes late get missed forever. Instead, I would schedule your search to run Every 5 minutes
for earliest=-10m latest=-5m
and this is assuming that your event latency is typical (you should measure it with | eval latency = _index_time - _time | stats avg(latency) max(latency) BY sourcetype
to be sure. That should cover the summary index
use case.
For accelerated data model
, you have not shown us your datamodel definition so we really cannot help there accept to say that the summary index
option should work whether you are doing this option or not. We can optimize your exploitation search a bit, though; try this:
|tstats summariesonly=t dc(event_type) AS count FROM datamodel=box_activity WHERE index=* BY created_by_login
| rex field=created_by_login ".@(?<sourcedomain>[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z])"
| search sourcedomain = "dadomain.com"
| where count >= 7 AND created_by_login != "BOX Admin"
| stats count
| appendcols [search index=dctest sourcetype=box-users earliest=-24h latest=now
| search S_No!="S.No"
| sort - S_No
| head 1
| table S_No]
| eval Advanced = round((count * 100) / S_No, 0)
| table Advanced
| rename Advanced as "%"
Sure. I will change the schedule for my saved search, per your suggestion.
Regarding datamodel definition, I have uploaded the images in my above comments, as I cannot directly upload image here. Please have a look at those images, to understand my datamodel definition.
I will try to run the modified query to check, whether there is any improvement in search performance.
You have your bases covered here. If you need to have the timepicker
and not a static panel, then loadjob
will not work. Did you accelerate your datamodel and try adding summariesonly=true
? If not, then try that. Your only other option beyond this to speed this up is to convert to a metrics index
; see here:
https://docs.splunk.com/Documentation/Splunk/latest/Metrics/GetStarted
Somehow our discussed old comments got deleted. Not sure what happened, I cannot see my old comments !!!
I asked some questions :
You are confusing the 2 questions. All of our comments are in the old question. I am resetting here.
1: If you are using summary index or accelerated data model, then TTL makes no difference; that is only if we are using loadjob
.
2: Forget loadjob
.
3: See #4.
4: | tstats summariesonly=t ...
or switch to metrics store.
I accelerated the datamodel and tried using summariesonly=true, but it is not returning any events and any values now ? returning zero. What can be the issue here ?
|tstats summariesonly=true dc(event_type) as count from datamodel=box_activity by created_by_login
|dedup created_by_login
|replace "Unknown User" with "Shared Links" in created_by_login
|rex field=created_by_login ".@(?<sourcedomain>[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z])"
|search sourcedomain="dadomain.com"
|rename created_by_login as User, event_type as Activity
|where User!="BOX Admin"
|sort - count|where count >=7| stats count|appendcols [search index=dctest sourcetype=box-users earliest=-6h latest=now|table S_No|search S_No!="S.No"|sort - S_No|head 1 |eval S_No=S_No+13 ]
|eval Advanced=round((count*100)/S_No,0)|fields - count|table Advanced|rename Advanced as "%"
You have to give the ADM a chance to complete. Check the status.
I checked it now, but still it is not returning any events ? How much time we have to wait for the ADM to get complete ? I think its already more than 30 minutes.
It depends on how much data you have, how fast your indexers are and how far back you told it to accelerate. How did you "check it"?
When I click on datamodel. In the Acceleration status it shows as 100% completed. Then I am using the same query with summariesonly=true, but still I am not getting any events. The output of the query is zero.
while enabling acceleration, I gave last 3 months in the acceleration time.
So acceleration is complete but your search isn't working. Peel off |
one by one from the bottom on up until you get it to work.
I think there is some issue in my summary index configuration and datamodel, which I am not able to figure it out. Now, I have disabled "Acceleration", but still I am not able to get the events, zero result. Can you go through my question once again and suggest whether my configuration sequence is proper ? I have configured the summary index for the first time, hence not sure if something is wrong there ?
My Scenario - I want to show %value of users whose event_type is more than 12 activities such as download, upload, preview etc. For that I need 2 fields created_by_name and event_type. So to start with :
Please advise, as I am still confused on my sequence which I have put into question ?
Images of datamodel are uploaded at below links
https://ibb.co/n7FR86x
https://ibb.co/X3ZVpCg
https://ibb.co/rdKZBgB
Are you certain that the data range of the query you are using as a test falls within the date/time range of your acceleration? If you query outside of that range then your query will be as slow as if it was not accelerated.
Lastly, I already tried accelerating the datamodel, but still the query runs slow and takes time to load the result in the panel and also, after accelerating, it is not showing proper value.. showing reduced value in the panel.
Yes, accelerating the datamodel means that you will be running a few minutes behind because it takes time to accelerate. That is why most people use earliest=-65m latest=-5m
when doing an hourly report (shifted back 5 minutes).
@vishaltaneja07011993 , @lakshman239 - I have open a new question for another query for improving the search performance. Please help to resolve.