Monitoring Splunk

Improving search performance of search powered by summary index, datamodel and using loadjob ?

Builder

I have a panel which loads data for last 3 months and it takes approx 120 secs to load the single panel value - showing the count of advanced users in percentage. Currently, we have implemented the summary index and data model to improve the search performance, but still the query takes approx 45 seconds to show the value in the panel. Need your suggestions on further improving the performance of the query. Below are the details of the same :

1) Following saved search is running every 5 minutes collecting last 5 minutes of data into summary index. Please note that we need to show value of last 3 months.

index=box event_type!="*login*" |fields - raw|dedup event_id |table _time,created_by_login,event_type,source_item_name,event_id

2) Data model is created from the summary index.
3) Below is the query which is taking data from datamodel and loads the panel, but still it takes approx 45 seconds to load the value in the panel :

|tstats dc(event_type) as count from datamodel=box_activity by created_by_login
|dedup created_by_login
|replace "Unknown User" with "Shared Links" in created_by_login
|rex field=created_by_login ".@(?<sourcedomain>[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z])" 
|search sourcedomain="dadomain.com"
|rename created_by_login as User, event_type as Activity 
|where User!="BOX Admin"
|sort - count|where count >=7| stats count|appendcols [search index=dctest sourcetype=box-users earliest=-24h latest=now|table S_No|search S_No!="S.No"|sort - S_No|head 1 |eval S_No=S_No]
|eval Advanced=round((count*100)/S_No,0)|fields - count|table Advanced|rename Advanced as "%"

How can we further improve the search performance. Whether we can use loadjob command here ? I think we can use it, but loadjob will only load the static values based on constant time picker option. If we change the time picker range, then loadjob will not work accordingly. Below link has some workaround for loadjob issue, but I think this requires _time to be part of the output in the above query. How can I add _time to the above query ? Please help resolve this issue ?

https://answers.splunk.com/answers/188469/how-to-get-results-to-load-with-a-time-picker-sett.html#an....

Thanks
PG

Labels (1)

Esteemed Legend

OK, let's start completely over. First of all, realize that these 2 methods are 100% mutually-exclusive, but not incompatibly so.

For summary index you are scheduled to run Every 5 minutes for The last 5 minutes. This is a TERRIBLE plan because typically, events take 2-3 minutes to get into splunk which means that the events that arrive 2-3 minutes late get missed forever. Instead, I would schedule your search to run Every 5 minutes for earliest=-10m latest=-5m and this is assuming that your event latency is typical (you should measure it with | eval latency = _index_time - _time | stats avg(latency) max(latency) BY sourcetype to be sure. That should cover the summary index use case.

For accelerated data model, you have not shown us your datamodel definition so we really cannot help there accept to say that the summary index option should work whether you are doing this option or not. We can optimize your exploitation search a bit, though; try this:

|tstats summariesonly=t dc(event_type) AS count FROM datamodel=box_activity WHERE index=* BY created_by_login
| rex field=created_by_login ".@(?<sourcedomain>[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z])" 
| search sourcedomain = "dadomain.com"
| where count >= 7 AND created_by_login != "BOX Admin"
| stats count
| appendcols [search index=dctest sourcetype=box-users earliest=-24h latest=now
| search S_No!="S.No"
| sort - S_No
| head 1
| table S_No]
| eval Advanced = round((count * 100) / S_No, 0)
| table Advanced
| rename Advanced as "%"

Builder

Sure. I will change the schedule for my saved search, per your suggestion.

Regarding datamodel definition, I have uploaded the images in my above comments, as I cannot directly upload image here. Please have a look at those images, to understand my datamodel definition.

I will try to run the modified query to check, whether there is any improvement in search performance.

0 Karma

Esteemed Legend

You have your bases covered here. If you need to have the timepicker and not a static panel, then loadjob will not work. Did you accelerate your datamodel and try adding summariesonly=true? If not, then try that. Your only other option beyond this to speed this up is to convert to a metrics index; see here:

https://docs.splunk.com/Documentation/Splunk/latest/Metrics/GetStarted

Builder

Somehow our discussed old comments got deleted. Not sure what happened, I cannot see my old comments !!!

I asked some questions :

  1. What would be ideal schedule for the saved query which is returning 3 months of data ? and what TTL would be the ideal one ? Please suggest ?
  2. Below link is providing some solutions for Timepicker with loadjob ? I tried doing that, but it was not working, as I think somehow I am not getting _time in my saved search which is passing output to loadjob : https://answers.splunk.com/answers/188469/how-to-get-results-to-load-with-a-time-picker-sett.html#an....
  3. Whether my saved search in my question (point no.3) can be further fine-tuned to improve the performance ?
  4. Where do we specify summariesonly=true ? in the query ?

Esteemed Legend

You are confusing the 2 questions. All of our comments are in the old question. I am resetting here.

1: If you are using summary index or accelerated data model, then TTL makes no difference; that is only if we are using loadjob.
2: Forget loadjob.
3: See #4.
4: | tstats summariesonly=t ... or switch to metrics store.

Builder

I accelerated the datamodel and tried using summariesonly=true, but it is not returning any events and any values now ? returning zero. What can be the issue here ?

|tstats summariesonly=true  dc(event_type) as count from datamodel=box_activity by created_by_login
 |dedup created_by_login
 |replace "Unknown User" with "Shared Links" in created_by_login
 |rex field=created_by_login ".@(?<sourcedomain>[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z])" 
 |search sourcedomain="dadomain.com"
 |rename created_by_login as User, event_type as Activity 
 |where User!="BOX Admin"
 |sort - count|where count >=7| stats count|appendcols [search index=dctest sourcetype=box-users earliest=-6h latest=now|table S_No|search S_No!="S.No"|sort - S_No|head 1 |eval S_No=S_No+13 ]
 |eval Advanced=round((count*100)/S_No,0)|fields - count|table Advanced|rename Advanced as "%"

Esteemed Legend

You have to give the ADM a chance to complete. Check the status.

0 Karma

Builder

I checked it now, but still it is not returning any events ? How much time we have to wait for the ADM to get complete ? I think its already more than 30 minutes.

0 Karma

Esteemed Legend

It depends on how much data you have, how fast your indexers are and how far back you told it to accelerate. How did you "check it"?

0 Karma

Builder

When I click on datamodel. In the Acceleration status it shows as 100% completed. Then I am using the same query with summariesonly=true, but still I am not getting any events. The output of the query is zero.

while enabling acceleration, I gave last 3 months in the acceleration time.

0 Karma

Esteemed Legend

So acceleration is complete but your search isn't working. Peel off | one by one from the bottom on up until you get it to work.

Builder

I think there is some issue in my summary index configuration and datamodel, which I am not able to figure it out. Now, I have disabled "Acceleration", but still I am not able to get the events, zero result. Can you go through my question once again and suggest whether my configuration sequence is proper ? I have configured the summary index for the first time, hence not sure if something is wrong there ?

My Scenario - I want to show %value of users whose event_type is more than 12 activities such as download, upload, preview etc. For that I need 2 fields created_by_name and event_type. So to start with :

  1. Whether my first saved search is proper to put into summary index ?
  2. How do I verify the datamodel is created properly, I can attach the screenshot for the same ?
  3. whether tstats is the proper command to use for datamodel ?

Please advise, as I am still confused on my sequence which I have put into question ?

Images of datamodel are uploaded at below links

https://ibb.co/n7FR86x
https://ibb.co/X3ZVpCg
https://ibb.co/rdKZBgB

Motivator

Are you certain that the data range of the query you are using as a test falls within the date/time range of your acceleration? If you query outside of that range then your query will be as slow as if it was not accelerated.

0 Karma

Builder

Lastly, I already tried accelerating the datamodel, but still the query runs slow and takes time to load the result in the panel and also, after accelerating, it is not showing proper value.. showing reduced value in the panel.

0 Karma

Esteemed Legend

Yes, accelerating the datamodel means that you will be running a few minutes behind because it takes time to accelerate. That is why most people use earliest=-65m latest=-5m when doing an hourly report (shifted back 5 minutes).

Builder

@vishaltaneja07011993 , @lakshman239 - I have open a new question for another query for improving the search performance. Please help to resolve.

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!