The relevant data about the future development of CustID are read in via a lookup (new_custID.csv) based on the table:
| search index=myindex sourcetype=mysourcetype
| stats earliest(_time) as first_seen latest(_time) by CustID
| inputlookup append=t new_custID.csv
| stats min(frist_seen) as first_seen max(last_seen) by CustID
| output lookup new_custID.csv
My goal is to build dashboards with the predicted amount of data depending on the connection projects in order to get an overview of the imported amount of data in Splunk (Licensing costs) , so that I can link these new findings in my query:
| search index=myindex sourcetype=mysourcetype
| eval esize=len(_raw)
| stats sum(esize) as Volume_of_Data
The questions arise:
1) How can one automatically in Lookup recognize which concrete CustIDs are in operation / have been newly added after a certain period of time (for example, statistically seen monthly)? (Extrapolation).
2) A selection list can be used to select the times mentioned in the above lookup.
After selecting a point in time, the predicted amount of data for this point in time is displayed.
How does your past license usage look? does it increase in a linear way? Can you post a screenshot of a timechart with your lic usage?
Hello,
yes, statistically I have on the day the more or less the same distribution (avg=2,2 GB). Of course on weekends up to about 30% have little data volume.
If it follows a linear trend then this should be real easy to do. Check out my CONF talk that is related to this and shows how to build a model in the MLTK. I can't help with your use case until a see a line chart of the growth