Hello All,
I've searched Answers here and I have not really found an answer to my problem, my apologies if I missed one or two.
As the title states, I'm trying to find generic search strategies that will allow me to collect the necessary events given a particular scenario or goal. Here is an image that will hopefully help to describe the data I'm working with.
http://www.freeimagehosting.net/dl8bb
There will be nearly 40 different data sets all to the same source type. The data is not inserted in real time but in bulk, there is a time stamp in a "start" field that is the same for all events in one insert, denoted by the red dashes. One insert has upwards of 200,000 events. I have control over most of the source data before it reaches Splunk.
Strategies I have working so far:
1) Single data set most recent insert (single green dash in image):
sourcetype="dataset" dataset_name="dataset 1" | streamstats dc(start) as distinct_times | head (distinct_times == 1) | ...
2) Single data set all inserts (light green oval on single row in image):
sourcetype="dataset" dataset_name="dataset 1" | ...
3) All data sets, all inserts (light blue square in image):
sourcetype="dataset" | ...
Strategies I don't have down:
4) All data sets, only the most recent inserts each (purple ovals in image).
I've considered a strategy based on the distinct count on data set names as compared to pre-calculated total number of data set names:
sourcetype="dataset" | streamstats dc(dataset_name) as distinct_names | head (distinct_names == total_dataset_names) | ...
There is at least one problem with this strategy which is indicated by Data Set 5 in the image which has higher insert frequency, this would include all of those inserts while waiting to reach the total dataset name count.
I've also tried to map sourcetype="dataset" dataset_name="$dataset_name$" | streamstats dc(start) as distinct_times | head (distinct_times == 1) | ... across all dataset names, but due to the size of the inserts I was reaching subsearch max problems.
5) All dataset month to month comparison.
Live search strategies are preferred but I'm also open to strategies that utilize saved searches that can be done dynamically, i.e. I have a new data set name I would need the reports, alerts, views to be able to adjust without intervention.
I'd love to hear what you all think could solve 4) the most, also if there is a better approach to the others that I should be making I'm all ears. I realize this was a long post and I appreciate it if you've been able to make it through.
Best Regards,
Chris
... View more