I am using the MachineLearning Toolkit in order to predict how many events each host are usually sending.
To do so, I selected the "Predict Numeric Fields" showcase and created the following command:
| tstats count where index=* by host,_time span=1h |eval date_wday=strftime(_time,"%w"), date_hday=strftime(_time,"%H")
This gives me the number of event per host for each hour. I also compute 2 fields: the weekday and the hour of the day.
But when I run the Linear Regression with "count" field to predict and the "host", "date_wday" and "date_hday" as used fields for predicting, the result is awful.
When I filter on just one host, the prediciting is working quite well but as soon as there are severals hosts names, the ML does not work.
Any idea how to create a model that take in account the name of the host? Maybe some preprocessing?
I expect that means that each host is a different context with different data and needs a different linear regression. If they were all the same then the model of one's past would predict future for all the others. Since your results show that isn't true...
Yes they all have a different behavior, but I can not create a model for my 20K Forwarders.... Can I? I bet there is a more clever solution..
do groups behave similarly? Can you make a model for each group?
I do not have groups... They all behave differently