Splunk Search

Join different events with common values

met
Engager

I've got some logs I need to join and put on the same row.

I've tried a few different ways and searched the community but I can't seem to get exactly what I need.

There's a log every 10 minutes for each host and each drive on said hosts (there are a lot of hosts and drives). Each log has 2 events for the same time and drive letter. One for free MB and one for percent. Basically I need to join together each set of these two separate events based on the time, host and drive letter of the log. Is this possible? 

 

base query:

index=perfmon host=host1 Category="PERFORMANCE" collection="WIN_PERF" object="LogicalDisk" counter="% Free Space" OR counter="Free Megabytes"

 

Drive letter is extracted as "instance"

percent and MB are both extracted as "Value"

 

Returns these logs:

"09/02/2021 21:48:49","host1","PERFORMANCE","WIN_PERF","LogicalDisk","Free Megabytes","d:","36092.00"

"09/02/2021 21:48:49","host1","PERFORMANCE","WIN_PERF","LogicalDisk","% Free Space","d:","41.47"

"09/02/2021 22:08:49","host1","PERFORMANCE","WIN_PERF","LogicalDisk","% Free Space","C:","19.30"

"09/02/2021 22:08:49","host1","PERFORMANCE","WIN_PERF","LogicalDisk","Free Megabytes","C:","19767.00"

 

 

Desired output:

 

Time                                                       Host       Drive      FreePercent     FreeGB        

 09/02/2021 21:48:49                host1          d:                41.47            36092.0 

09/02/2021 22:08:49                 host1          C:                19.30            19767.00

 

Any help would be appreciated.

 

 

Labels (4)
0 Karma
1 Solution

isoutamo
SplunkTrust
SplunkTrust

Hi

here is one option to do it. I just use your "Returns these logs:" here as input, but you should replace that part with your original query with correct field names.

| makeresults 
| eval _raw = "_time, host, type, stype, etype, item, Drive, value
09/02/2021 21:48:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,Free Megabytes,d:,36092.00
09/02/2021 21:48:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,% Free Space,d:,41.47
09/02/2021 22:08:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,% Free Space,C:,19.30
09/02/2021 22:08:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,Free Megabytes,C:,19767.00"
| multikv forceheader=1
``` Above generates sample data ```
| eval FM = if (item = "Free Megabytes", value,null()), FS = if (item="% Free Space", value,null())
| stats  values(FM) as FreeGB values(FS) as FreePercent by _time, host Drive
| table _time host Drive FreePercent FreeGB

r. Ismo 

View solution in original post

met
Engager

Thanks for all the help guys. 

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

here is one option to do it. I just use your "Returns these logs:" here as input, but you should replace that part with your original query with correct field names.

| makeresults 
| eval _raw = "_time, host, type, stype, etype, item, Drive, value
09/02/2021 21:48:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,Free Megabytes,d:,36092.00
09/02/2021 21:48:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,% Free Space,d:,41.47
09/02/2021 22:08:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,% Free Space,C:,19.30
09/02/2021 22:08:49,host1,PERFORMANCE,WIN_PERF,LogicalDisk,Free Megabytes,C:,19767.00"
| multikv forceheader=1
``` Above generates sample data ```
| eval FM = if (item = "Free Megabytes", value,null()), FS = if (item="% Free Space", value,null())
| stats  values(FM) as FreeGB values(FS) as FreePercent by _time, host Drive
| table _time host Drive FreePercent FreeGB

r. Ismo 

View solution in original post

PickleRick
Champion

Yep, values() is another way to do it. Thought about it myself.

I'm wondering however which way - this or selfjoin is more effective in terms of CPU/memory usage and which has less limitations.

The selfjoin method is - IMHO - more obvious to construct (especially for a person with RDBMS background) but the results of selfjoin can be a bit confusing sometimes (see the example for selfjoin on splunk docs page).

isoutamo
SplunkTrust
SplunkTrust

Almost every time stats is better than join. Of course there could be situation where it's another way. Best way to check it is just create both queries and then check with Job inspector which one is better. There are many presentations about Job Inspector and replacing join with stats.

Above is couple of those, but you could found easily more.

r. Ismo

PickleRick
Champion

Thanks! Good to know.

0 Karma

PickleRick
Champion

Firstly, lose the unneeded fields, then do a selfjoin on host, drive and _time fields.

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!