Dashboards & Visualizations

A Saved Search within a Saved Search using loadjob

pj
Contributor

I would like to build a saved search that returns say 200,000 events. The events in this search contain a field called business and this field contains a business name.

From this master saved search I would like to create multiple charts on a dashboard, each chart filtered by a specific business.

So for example, my filtered search would look something along the lines of:

| loadjob savedsearch="username:application:MyMasterSavedSearch" | search business=businessX

For each chart I would use a different businessX. This would greatly reduce the amount of processing required as the big main search runs once and I am simply filtering the main search. In fact, ideally, I would create filtered saved searches that use the main saved search - so a saved search off a saved search if you like. In this way, my dashboard would load instantly.

Ok, so a problem i have run into is that the loadjob command seems to only load 10000 results. I have read the explanation for this on a previous answer, but it doesnt really make sense to me. I can go to the search and load it from the jobs page and get all the results that way, but dont seem to be able to do the same thing using the loadjob command.

If i am doing something completely wrong and there is a better way - let me know. I really dont want to have to create duplicate queries as that would be a maintenance nightmare given the amount of searches i would need.

Many thanks!

0 Karma
1 Solution

vbumgarner
Contributor

The usual approach to this problem is to use summary indexing, but another option would be to use outputcsv and inputcsv. I know you can deal with quite large data sets either of those ways.

There is probably a setting in limits.conf that would affect the maximum rows returned by loadjob, but I'm not quite sure which one it would be.

Can you reduce the number of rows that are required to generated the results you need?

View solution in original post

0 Karma

sideview
SplunkTrust
SplunkTrust

You could solve it on the front-end by using postProcess in a dashboard. Or you could solve it on the back end with summary indexing. It ends up being somewhat similar.

I'm filling in a lot of blanks here, but say you want to end up with the distinct count of users for each business.

1) Save a search

<your search terms> | timechart dc(users) by business

Run this saved search on a schedule. Then you could make a dashboard that uses the postProcess to make one chart per business. You give each chart a different 'postProcess' search and it carves out one slice of the larger dataset.

Since the 'business' field has become a column in the timechart, the postProcess search would actually just be a fields clause:

fields _time business1
fields _time business2
fields _time business3
fields _time business4

2) The other option is to do summary indexing.

Save a search like this, remember the name you picked, and schedule it to run every 15mins and write to the summary index.

<your search terms> | bin _time span=5min | stats dc(users) as userCount by business, _time

And then in your dashboards, for the chart for one particular business, you'd do

index=summary source="<your saved search name>" business="businessX" | timechart sum(userCount) as userCount 
0 Karma

vbumgarner
Contributor

The usual approach to this problem is to use summary indexing, but another option would be to use outputcsv and inputcsv. I know you can deal with quite large data sets either of those ways.

There is probably a setting in limits.conf that would affect the maximum rows returned by loadjob, but I'm not quite sure which one it would be.

Can you reduce the number of rows that are required to generated the results you need?

0 Karma
Get Updates on the Splunk Community!

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...