Splunk's command types page is missing a few functions, including accum. I would like to know if accum is a centralized streaming command, distributable streaming command, or none of the above. Essentially, I need to know if it can run on indexers, so that I don't have to bring the whole event set back to the search head before computing totals. For my use case, this is crucial for developing a scalable solution.
One comment at the bottom of the accum page suggests that it functions similarly to streamstats . . . which might imply that it is a centralized streaming command. If so, that would be disappointing: what I'm really looking for is a way to offload the computation of some counts/totals to a group of distributed indexers. I am using the Machine Learning Toolkit and working with some large DNS datasets, and it appears that every command type except for distributable streaming commands cannot run on indexers.
I am worried that Splunk may lack an important cluster computing capability: being able to compute intermediary statistics on separate nodes (e.g. map-reduce). It sounds like stats and other transforming commands really only run on the search head . . . meaning the entire event set has to be pulled from the distributed indexers into a single node before computing counts, totals, averages, etc.
From a cluster computing perspective, this lack of parallelism is concerning . . . and if there is no way to compute subtotals on indexers or otherwise parallelize the summarization process before data reaches the search head, it would seem that a custom big data platform is still the only answer for high-volume machine learning tasks like this one. I hope I'm wrong about this, or that I missed something in the documentation. Please let me know if you have any other information.
... View more