Hi everyone. I've got an instance of Splunk 5.0.1 running with a large amount of firewall data coming into it daily (roughly 15GB). I created a relatively simply dashboard with 5 panels with the intent of scheduling the view for PDF delivery once per week. The view itself is fine, I tested it using a relatively short timespan on my search (e.g. last 60 mins of data). The problem is when I want to generate the view based on 1 week worth of data, it always fails and I suspect it has something to do with the large amount of data its trying to search through.
Some further points to add context to the problem...
Each of the 5 panels in the view run their own search, even though the base search is the same. e.g. index=firewall type=opsec attack="*" | ... After the base search the results are piped to stuff like "top src_ip", "top des_ip", and stats. Since each panel uses the same base search I thought about using post processing to make things more efficient but I read in the documentation somewhere that you can't post process if the base search returns more than 10,000 events. My base search is returning close to 2 million matching events over the course of a week. 😞
So... that left things in a position where I have 5 saved searches, one for each dashboard panel. To try and speed things up I turned acceleration on for the searches and specified the summary period as 7 days (since I need to run this view to produce a PDF on a weekly basis). The acceleration doesn't appear to have had much (if any) effect.
I've also tried opening the view, then going to the job manager and clicking save on each of the jobs that the view has kicked off thinking when they're done I can reopen the view and it should load the cached results. This doesn't work but I did learn that the searches take roughly 10 hours to complete 😞
Now I'm pretty sure I'm doing this in a way that's highly inefficient.. I know there must be a better way. Please help me with any ideas. I'm more than happy to provide more technical detail if need be.
Depending on your exact searches, you could merge them into one and then use postprocess. For example, take these two:
index=firewall ... | top src_ip
index=firewall ... | top des_ip
You could merge them into this:
index=firewall ... | top src_ip top des_ip
And add post-process pipes like this:
stats sum(count) as count by src_ip | eventstats sum(count) as sum | eval percent = count*100/sum | fields - sum | sort - count
Same for dst_ip, and you should get around some duplicate searches.
That probably won't solve the issue entirely though - consider summary indexing.
Staying with the top example, you could compute a stats count by src_ip every hour and store that in a summary index to use that for computing overall top values.
As for report acceleration, that can work brilliantly but its effect depends on the specific search. Also, make sure the report acceleration summary has finished building before checking its speed-up.