I've got some logs I need to join and put on the same row.
I've tried a few different ways and searched the community but I can't seem to get exactly what I need.
There's a log every 10 minutes for each host and each drive on said hosts (there are a lot of hosts and drives). Each log has 2 events for the same time and drive letter. One for free MB and one for percent. Basically I need to join together each set of these two separate events based on the time, host and drive letter of the log. Is this possible?
base query:
index=perfmon host=host1 Category="PERFORMANCE" collection="WIN_PERF" object="LogicalDisk" counter="% Free Space" OR counter="Free Megabytes"
Drive letter is extracted as "instance"
percent and MB are both extracted as "Value"
Returns these logs:
"09/02/2021 21:48:49","host1","PERFORMANCE","WIN_PERF","LogicalDisk","Free Megabytes","d:","36092.00"
"09/02/2021 21:48:49","host1","PERFORMANCE","WIN_PERF","LogicalDisk","% Free Space","d:","41.47"
"09/02/2021 22:08:49","host1","PERFORMANCE","WIN_PERF","LogicalDisk","% Free Space","C:","19.30"
"09/02/2021 22:08:49","host1","PERFORMANCE","WIN_PERF","LogicalDisk","Free Megabytes","C:","19767.00"
Desired output:
Time Host Drive FreePercent FreeGB
09/02/2021 21:48:49 host1 d: 41.47 36092.0
09/02/2021 22:08:49 host1 C: 19.30 19767.00
Any help would be appreciated.
Hi
here is one option to do it. I just use your "Returns these logs:" here as input, but you should replace that part with your original query with correct field names.
| makeresults
| eval _raw = "_time, host, type, stype, etype, item, Drive, value
09/02/2021 21:48:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,Free Megabytes,d:,36092.00
09/02/2021 21:48:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,% Free Space,d:,41.47
09/02/2021 22:08:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,% Free Space,C:,19.30
09/02/2021 22:08:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,Free Megabytes,C:,19767.00"
| multikv forceheader=1
``` Above generates sample data ```
| eval FM = if (item = "Free Megabytes", value,null()), FS = if (item="% Free Space", value,null())
| stats values(FM) as FreeGB values(FS) as FreePercent by _time, host Drive
| table _time host Drive FreePercent FreeGB
r. Ismo
Thanks for all the help guys.
Hi
here is one option to do it. I just use your "Returns these logs:" here as input, but you should replace that part with your original query with correct field names.
| makeresults
| eval _raw = "_time, host, type, stype, etype, item, Drive, value
09/02/2021 21:48:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,Free Megabytes,d:,36092.00
09/02/2021 21:48:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,% Free Space,d:,41.47
09/02/2021 22:08:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,% Free Space,C:,19.30
09/02/2021 22:08:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,Free Megabytes,C:,19767.00"
| multikv forceheader=1
``` Above generates sample data ```
| eval FM = if (item = "Free Megabytes", value,null()), FS = if (item="% Free Space", value,null())
| stats values(FM) as FreeGB values(FS) as FreePercent by _time, host Drive
| table _time host Drive FreePercent FreeGB
r. Ismo
Yep, values() is another way to do it. Thought about it myself.
I'm wondering however which way - this or selfjoin is more effective in terms of CPU/memory usage and which has less limitations.
The selfjoin method is - IMHO - more obvious to construct (especially for a person with RDBMS background) but the results of selfjoin can be a bit confusing sometimes (see the example for selfjoin on splunk docs page).
Almost every time stats is better than join. Of course there could be situation where it's another way. Best way to check it is just create both queries and then check with Job inspector which one is better. There are many presentations about Job Inspector and replacing join with stats.
Above is couple of those, but you could found easily more.
r. Ismo
Thanks! Good to know.
Firstly, lose the unneeded fields, then do a selfjoin on host, drive and _time fields.