I have time-series data (around 100GB) on a single machine and it grows by around 5GB each day.
How can I deploy a scalable Splunk so as to use the data for building different machine learning models, for example, ARIMA, or run some regression analysis? The complete data cannot fit in the memory. I want to do similar to what Spark does, distributed data processing.
Any ideas about how can it be done?