How do I join the time field with a different field name from a CSV lookup file, with the time field specified in an index summary?
You will have to convert _time and date into the internal format using strptime, try the below code-
index=summary *avg_Total_Capacity* | rename avg_Total_Capacity AS Total_GB
| rename avg_Used_GB AS Used_GB| eval _time=strptime(_time,"%Y-%m-%d")
| append
[| inputlookup ServerName_G | fields - Disk
| eval _time=strptime(DateLogged,"%m/%d/%Y")
| rename DiskFreeSpaceMB as Free_GB
| eval Free_GB=(Free_GB/1024)
| rename DiskSizeMB as Total_GB
| eval Total_GB=(Free_GB/PercentFree)*100
| eval Used_GB=(Total_GB-Free_GB)]
| sort - _time
| apply "RFRCapacity_v2"
| timechart span=1d avg(Predict_UsedGB_Future) as Predict_UsedGB_Future
| eval residual = 'Used_GB' - 'predicted(Used_GB)'
| eval baseline=0
| eval Predict_UsedGB_Future='Predict_UsedGB_Future'
You can rename the time field to match the time field name in index summary in the join subsearch.
Joins are expensive and often can be replaced by alternative commands. Could you provide more details here on what's your current search is, what you're trying to accomplish, possibly with sample data/output?
Hey somesoni2,
I'm trying to build a predictive model using RandomForest in Splunk. I'm having issues with the inputlookup subsearch, and am trying to change it to use lookup instead and join the DateLogged field from the csv with _time from the index summary and see if the model will work as expected and push the prediction into the future. Currently its not doing that when I run the apply step on the model., if you have any suggestions for me I'm glad to hear them! Thanks!
Fit ->
index=summary *avg_Total_Capacity* | rename avg_Total_Capacity AS Total_GB
| rename avg_Used_GB AS Used_GB
| append
[| inputlookup ServerName_G | fields - Disk
| convert timeformat="%m/%d/%Y" mktime(DateLogged) as _time
| rename DiskFreeSpaceMB as Free_GB
| eval Free_GB=(Free_GB/1024)
| rename DiskSizeMB as Total_GB
| eval Total_GB=(Free_GB/PercentFree)*100
| eval Used_GB=(Total_GB-Free_GB)]
| sort - _time
| timechart span=1d avg(Total_GB) AS Total_GB avg(Used_GB) AS Used_GB
| eval Total_GB=round(Total_GB,2)
| eval Used_GB=round(Used_GB,2)
| reverse
| streamstats window=60 current=f first(Used_GB) as Predict_UsedGB_Future
| reverse
| fit RandomForestRegressor "Predict_UsedGB_Future" from "_time" into "RFRCapacity_v2"
| eval DesiredCapacity_70PercUsed=(Predict_UsedGB_Future/.7)
| eval Space_To_Add_GB=(DesiredCapacity_70PercUsed-Total_GB)
| eval Predicted_PercUsed=(Predict_UsedGB_Future/Total_GB)*100
Apply ->
index=summary *avg_Total_Capacity* | rename avg_Total_Capacity AS Total_GB
| rename avg_Used_GB AS Used_GB
| append
[| inputlookup ServerName_G | fields - Disk
| convert timeformat="%m/%d/%Y" mktime(DateLogged) as _time
| rename DiskFreeSpaceMB as Free_GB
| eval Free_GB=(Free_GB/1024)
| rename DiskSizeMB as Total_GB
| eval Total_GB=(Free_GB/PercentFree)*100
| eval Used_GB=(Total_GB-Free_GB)]
| sort - _time
| apply "RFRCapacity_v2"
| timechart span=1d avg(Predict_UsedGB_Future) as Predict_UsedGB_Future
| eval residual = 'Used_GB' - 'predicted(Used_GB)'
| eval baseline=0
| eval Predict_UsedGB_Future='Predict_UsedGB_Future'
what is the time/Date format in your lookup table ?
Format for the csv file ->
DateLogged
1 5/9/2016
2 5/9/2016
3 5/10/2016
4 5/10/2016
5 5/11/2016
Format from the summary index ->
_time
1 2018-11-02
2 2018-07-22
3 2018-07-20
4 2018-07-13
5 2018-07-09