Solved: Join different events with common values

met · ‎09-02-2021

I've got some logs I need to join and put on the same row.

I've tried a few different ways and searched the community but I can't seem to get exactly what I need.

There's a log every 10 minutes for each host and each drive on said hosts (there are a lot of hosts and drives). Each log has 2 events for the same time and drive letter. One for free MB and one for percent. Basically I need to join together each set of these two separate events based on the time, host and drive letter of the log. Is this possible?

base query:

index=perfmon host=host1 Category="PERFORMANCE" collection="WIN_PERF" object="LogicalDisk" counter="% Free Space" OR counter="Free Megabytes"

Drive letter is extracted as "instance"

percent and MB are both extracted as "Value"

Returns these logs:

"09/02/2021 21:48:49","host1","PERFORMANCE","WIN_PERF","LogicalDisk","Free Megabytes","d:","36092.00"

"09/02/2021 21:48:49","host1","PERFORMANCE","WIN_PERF","LogicalDisk","% Free Space","d:","41.47"

"09/02/2021 22:08:49","host1","PERFORMANCE","WIN_PERF","LogicalDisk","% Free Space","C:","19.30"

"09/02/2021 22:08:49","host1","PERFORMANCE","WIN_PERF","LogicalDisk","Free Megabytes","C:","19767.00"

Desired output:

Time Host Drive FreePercent FreeGB

09/02/2021 21:48:49 host1 d: 41.47 36092.0

09/02/2021 22:08:49 host1 C: 19.30 19767.00

Any help would be appreciated.

isoutamo · ‎09-03-2021

Hi

here is one option to do it. I just use your "Returns these logs:" here as input, but you should replace that part with your original query with correct field names.

| makeresults 
| eval _raw = "_time, host, type, stype, etype, item, Drive, value
09/02/2021 21:48:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,Free Megabytes,d:,36092.00
09/02/2021 21:48:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,% Free Space,d:,41.47
09/02/2021 22:08:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,% Free Space,C:,19.30
09/02/2021 22:08:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,Free Megabytes,C:,19767.00"
| multikv forceheader=1
``` Above generates sample data ```
| eval FM = if (item = "Free Megabytes", value,null()), FS = if (item="% Free Space", value,null())
| stats  values(FM) as FreeGB values(FS) as FreePercent by _time, host Drive
| table _time host Drive FreePercent FreeGB

r. Ismo

View solution in original post

met · ‎09-14-2021

Thanks for all the help guys.

isoutamo · ‎09-03-2021

Hi

here is one option to do it. I just use your "Returns these logs:" here as input, but you should replace that part with your original query with correct field names.

| makeresults 
| eval _raw = "_time, host, type, stype, etype, item, Drive, value
09/02/2021 21:48:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,Free Megabytes,d:,36092.00
09/02/2021 21:48:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,% Free Space,d:,41.47
09/02/2021 22:08:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,% Free Space,C:,19.30
09/02/2021 22:08:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,Free Megabytes,C:,19767.00"
| multikv forceheader=1
``` Above generates sample data ```
| eval FM = if (item = "Free Megabytes", value,null()), FS = if (item="% Free Space", value,null())
| stats  values(FM) as FreeGB values(FS) as FreePercent by _time, host Drive
| table _time host Drive FreePercent FreeGB

r. Ismo

PickleRick · ‎09-03-2021

Yep, values() is another way to do it. Thought about it myself.

I'm wondering however which way - this or selfjoin is more effective in terms of CPU/memory usage and which has less limitations.

The selfjoin method is - IMHO - more obvious to construct (especially for a person with RDBMS background) but the results of selfjoin can be a bit confusing sometimes (see the example for selfjoin on splunk docs page).

isoutamo · ‎09-03-2021

Almost every time stats is better than join. Of course there could be situation where it's another way. Best way to check it is just create both queries and then check with Job inspector which one is better. There are many presentations about Job Inspector and replacing join with stats.

Above is couple of those, but you could found easily more.

r. Ismo

PickleRick · ‎09-03-2021

Thanks! Good to know.

PickleRick · ‎09-02-2021

Firstly, lose the unneeded fields, then do a selfjoin on host, drive and _time fields.

Join different events with common values

eval

field extraction

join

subsearch

New This Month in Splunk Observability Cloud - Metrics Usage Analytics, Enhanced K8s ...

Alerting Best Practices: How to Create Good Detectors

Discover Powerful New Features in Splunk Cloud Platform: Enhanced Analytics, ...