I run several nightly reports that consume about 40MB each. All that I really care about for the report is the summary info: the top 10 IP addresses meeting some criteria. I think that Splunk stores the job with the events (otherwise, why 40MB?)
These jobs push me past my disk quota regularly and then some of my other jobs don't run. Is there a way that the job can just store the summary info?
I am mostly interested in a way to do this in the query or on the report.
I already realized that I could achieve the same thing by writing the results to a lookup table and then making the job shorter lived.
You can store the results in a lookup. (outputlookup)
You can accelerate your search too. (if your role allows it)
you can use summary indexing, and save the top 10 with the count every day.
also if you do a top or sort and trim your results, you should have less than 40MB of results.
my long search doing the count | sort -count | head 10
my long search doing the count | top 10 count
my original query did a
sort limit=10 -count
I just tired sort | head 10, but that also still had huge results. I will look into summary indexes and outputing summary to lookup table. As I think about it, using a a lookup table may actually make it easier to include previous runs in a dashboard.
I agree with @yannK but with some additional detail. If you use
outputlookup, you will need to trash your search results afterward so that the search artifacts are minimal, like this:
Your Big Honking Search Here | outputlookup YourLookupHere | where ThisFieldDoesNotExists="So this clause will drop all events"
Or, combining both, like this:
Your Big Honking Search Here | stats count by host | sort 10 -count | outputlookup YourLookupHere | where ThisFieldDoesNotExists="So this clause will drop all events"
a possible workaround is to append, or reinject your previous lookup results.
<my new search > | outputlookup mylookup append=true
to reinject with a host count
<my new search > | append [ | inputlookup mylookup | rename count AS yesterday_count | table yesterday_count host ] | stats count sum(yesterday_count) AS yesterday_count by host | outputlookup mylookup
I will compare this approach with a summary index.
On plus to either is they may make it easier to display previous summaries in dashboards. With the search jobs, I still had an outstanding item to figure out how to find the previous runs of a job and load those results into the dashboard. Using a lookup table or summary index, I could simply add a field for run date and use the time picker to select against that.