I would like to build a saved search that returns say 200,000 events. The events in this search contain a field called business and this field contains a business name.
From this master saved search I would like to create multiple charts on a dashboard, each chart filtered by a specific business.
So for example, my filtered search would look something along the lines of:
| loadjob savedsearch="username:application:MyMasterSavedSearch" | search business=businessX
For each chart I would use a different businessX. This would greatly reduce the amount of processing required as the big main search runs once and I am simply filtering the main search. In fact, ideally, I would create filtered saved searches that use the main saved search - so a saved search off a saved search if you like. In this way, my dashboard would load instantly.
Ok, so a problem i have run into is that the loadjob command seems to only load 10000 results. I have read the explanation for this on a previous answer, but it doesnt really make sense to me. I can go to the search and load it from the jobs page and get all the results that way, but dont seem to be able to do the same thing using the loadjob command.
If i am doing something completely wrong and there is a better way - let me know. I really dont want to have to create duplicate queries as that would be a maintenance nightmare given the amount of searches i would need.
Many thanks!
The usual approach to this problem is to use summary indexing, but another option would be to use outputcsv and inputcsv. I know you can deal with quite large data sets either of those ways.
There is probably a setting in limits.conf that would affect the maximum rows returned by loadjob, but I'm not quite sure which one it would be.
Can you reduce the number of rows that are required to generated the results you need?
You could solve it on the front-end by using postProcess in a dashboard. Or you could solve it on the back end with summary indexing. It ends up being somewhat similar.
I'm filling in a lot of blanks here, but say you want to end up with the distinct count of users for each business.
1) Save a search
<your search terms> | timechart dc(users) by business
Run this saved search on a schedule. Then you could make a dashboard that uses the postProcess to make one chart per business. You give each chart a different 'postProcess' search and it carves out one slice of the larger dataset.
Since the 'business' field has become a column in the timechart, the postProcess search would actually just be a fields clause:
fields _time business1
fields _time business2
fields _time business3
fields _time business4
2) The other option is to do summary indexing.
Save a search like this, remember the name you picked, and schedule it to run every 15mins and write to the summary index.
<your search terms> | bin _time span=5min | stats dc(users) as userCount by business, _time
And then in your dashboards, for the chart for one particular business, you'd do
index=summary source="<your saved search name>" business="businessX" | timechart sum(userCount) as userCount
The usual approach to this problem is to use summary indexing, but another option would be to use outputcsv and inputcsv. I know you can deal with quite large data sets either of those ways.
There is probably a setting in limits.conf that would affect the maximum rows returned by loadjob, but I'm not quite sure which one it would be.
Can you reduce the number of rows that are required to generated the results you need?