Is there any benefit to using the old method when using summary indexing? Basically I would like to the know differences in terms of performance or any value the one way may have over the other, between using si commands/the new way and | collect/the old way.
I don't think I've compared the performance of the two but I always prefer selecting Summary indexing in Saved search Option versus using Collect command in-line. The former does the summary indexing (processing result, generating raw events etc) in background and should be better than in-line option. Love hear thoughts from others.
Using Summary indexing I found a large benefit in terms of performances (also ten times quicker!), for example with BlueCoat logs that have billions of events every day!
Problems are that it's impossible to delete a part of summarized logs, but only the full tsixd file (I asked to Splunk to take in consideration the opportunity to insert the delete functionality), this means that you cannot do errors in summarizing, because you cannot delete them.
In addition using Summary indexing there is a delay in data access because logs are indexed two times and you have to wait that logs are indexed before to summarize them.
In my case I have a larger delay because I have to be sure that logs are really all arrived before summarizing.
There aren't many checks on the summary indexing operation, so there is the risk to index twice a log or lost it.
At the end there is a greater disk space occupation, but in my case is less of the problem.
In conclusion: use it because is really useful but attention!
Thanks for the reply, however I am interested in the difference between the two methods of implementing summary indexes(si commands vs | collect) rather than using summary indexing itself.