I am not sure if I am even wording this question correctly (which is probably why I didn't find any good results)
What is the best practice for working with large result sets in Splunk without rerunning the search every time you make a small change?
We all know how to use the
HiddenPostProcess module in dashboards and do multiple transforming searches off a base search, but how do I do the same thing with ad hoc searches?
I have a very large base search, that takes tens of seconds (or even minutes) to run, but I want to be able to quickly adjust (experiment, fiddle with, etc) the transforming searches quicker than that. I don't care if the data is fresh, but I do need to use a significant chunk of the large result set to produce meaningful results.
TLDR: When using the Search App, can I "cache" the base search, and rerun ad hoc transforming searches against it?
I am sure there is a way to do what I am describing, but I can't find it.
Can anyone help me out?
I have 2 suggestions.
The free answer: Instant Pivot. Make sure your base search has everything that you need and do all of the post-pipe stuff with pivot (should work but untested; IMHO this is the main reason to create this brand-new feature):
The expensive answer: Tableau. Tableau has a Splunk connector that will allow you to run a base search and pull the raw events back into memory (you can save this to a local file, too!) and do your post-pipe stuff inside Tableau.
In my tests, instant pivot doesn't give you those benefits. When I open in search, it actually reverts to a stats command, which doesn't leverage any of the caching of pivot. But pivot on an accelerated data model would definitely give a lot of this benefit -- check out my .conf preso from last year for an example of how to do that: http://conf.splunk.com/sessions/2014/conf2014_DavidVeuve_Splunk_UsingTrack_SecurityNinjutsu.pdf
Well, a lot of the goal of instant pivot is to give you an easy onramp to using pivot. Once you save the data model, you'll have a formal data model that does the caching, can be accelerated, etc. Instant Pivot allows uers who might not normally play with pivot, to onramp to creating data models easily.
you can create a scheduled saved search out of your base search and use the latest search results like this :
| loadjob NameOfYourSavedSearch | ...
See docs for more details http://docs.splunk.com/Documentation/Splunk/6.2.6/SearchReference/Loadjob
Hope this helps ...
You can also use loadjob without a scheduled search. If you run your base search, you will get a search id (you can grab it from the URL or search inspector). You can reference that search id in loadjob, which I've done a bunch of times. You can even click the Share (or Send Job to Background) buttons in your original search, which will up the TTL of your search to 7 days. That means you can continue to manipulate those results over a period of many days.
Keep in mind that it usually makes sense to grab the first level of base statistics with your original search (rather than the raw events) so that you can make the amount of data being written to, and read off disk on the search head manageable -- it will speed up your loadjob if it doesn't have to read 4 GB of search results in order to do your processing.