How to improve performance of a "loadjob" search w...

samkaj · ‎01-20-2017

I am using loadjob to load an already scheduled report that contains more than 2 million results. But when i try to fetch it, it takes an average of ~90 sec to get the results whereas i would like to have this returned within 10 sec - since it is already computed result set.

|loadjob savesearch="test:testApp:testReport"

Splunk Version 6.4

How can i improve the performance? Please share your thoughts.

codebuilder · ‎08-12-2021

When you use loadjob the indexers return ALL results from your saved search. Anything you do with the data after loadjob such as display fields, stats, etc. will be run against the entire result set.

Additionally, anytime the dashboard page is refreshed, visited by another user, or re-visited, etc. then the loadjob is called again each time. If your artifact bundle is huge, as you state, then you're definitely going to hit performance and storage issues.

You might consider modifying your savedsearch to limit the results, or break it down into multiple saved searches.

Or, you can build a search directly in the dashboard and use the results as a "base" search.
This post has a good discussion on it:

https://community.splunk.com/t5/Splunk-Search/Base-search-query-for-different-dashboard-panels/m-p/3...

----
An upvote would be appreciated and Accept Solution if it helps!

csatech245 · ‎08-12-2021

I believe if you separate the loading order so not everything in the dashboard is loading at once. Use the artifact_offset=0 command and set some to maybe 1 and/or 2 so it loads at different times and doesn't slow down the dashboard results by loading every panel at the same time.

For example:

| loadjob savedsearch=tech123:Residential:"name of saved/enabled alert" artifact_offset=0
| timechart span=1d count by incident_type

snoobzilla · ‎01-22-2017

Having gone down this path with fewer events, I changed over to an accelerated datamodel with just the fields of concern.

samkaj · ‎01-22-2017

I am trying to fetch the already compressed result file or scheduled reports from REST service and applying the filter on top of it post the report is fetched. So i am not using any Search head directly here.

Also, the concept of acceleration can be applied if i am displaying some report in dashboard, but in my case i am using this report and fetching it for other purposes outside Splunk. This report has only needed data and doesn't contain any junk data and is computed from the data with events above 25 million.

Please let me know if this is the right thing as i have lot of filters to be applied when it comes to data to be fetched post report is generated/fetched.

Do let us know if 2 million is too huge what would be the optimal number of results set the report should have so that it can return in few seconds.

snoobzilla · ‎01-23-2017

There are a lot of ways to do things in Splunk and I was relating my experience where I was trying to improve performance of a dashboard.

2 million records is a lot to put through the pipeline every call and expect it to be fast. Some options:
1. cache the loaded job in a database outside of Splunk and poll it from there.
2. build an accelerated datamodel containing the results and query Splunk with filters and/or any aggregation commands applied to the datamodel, so you are only pulling a smaller set of results out.

If you are trying to join with external data can you move that data into Splunk and join it there?

somesoni2 · ‎01-20-2017

Are you using Search Head Clustering/Search Head pooling? Also, 2 million records are a lot, there will be some slowness (reading 2 million records off the compressed result file).

samkaj · ‎01-22-2017

I am trying to fetch the already compressed result file or scheduled reports from REST service and applying the filter on top of it post the report is fetched. So i am not using any Search head directly here.

Also, the concept of acceleration can be applied if i am displaying some report in dashboard, but in my case i am using this report and fetching it for other purposes outside Splunk. This report has only needed data and doesn't contain any junk data and is computed from the data with events above 25 million.

Please let me know if this is the right thing as i have lot of filters to be applied when it comes to data to be fetched post report is generated/fetched.

Do let us know if 2 million is too huge what would be the optimal number of results set the report should have so that it can return in few seconds.

How to improve performance of a "loadjob" search which takes a long time to fetch the results?

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes