hmmm. well, it is not that kalman filter applied over your linear regression is the only thing that is working. What is means is that your linear regression dependent variables are not really dependent in the sense that you are trying to predict something like avg utilization (A+B+C) = k + n*max (A+B+C) + m * min(A+B+C)...not exactly but hope you get the drift.
Hence it does not matter what the linear regression model predicts , the klaman filter LLP is enough and one is as good as the other. I do believe you need some other dependent variables from the web layer as you say, because atm your model , though mathematically correct is merely a linear slope line over the dependent variables that actually go into directly (summation) of the independent variable. That is the reason why you are getting R^2 as 1. You can still use your model , it is not wrong but think about it, do we really need a model to tell us that avg cpu utilization will dip by a factor of X , if one of the constituents CPUs have a dip? Won't it be better if we can say something like 'if X log ins in 3 of the 20-30 web servers are more THEN the total avg cpu utilization increases by a factor Y?'
One more test that you can run is instead of having min. max and sum just take the avg cpu utilization of the 20-30 web servers as dependent variables and predict the overall avg cpu utilization? It might give you very similar results OR reveal a trend like if avg cpu utilization of servers, say for servers 13 , 11 and 5, an increase the total avg utilization in fact increases the overall cpu utilization more than you would expect just by applying unitary or logarithmic dependency. The coefficients of the servers will reveal the extent to which they affect the overall cpu utilization. This is a fascinating case and I am looking forward to your response. Happy ML!!
... View more