Splunk Search

How to group events by a common directory prefix

lee_melvin
Path Finder

I want to group events describing backup job status with other events describing the volumes being backed up. The data conceptually looks like this:

source=backup server=a path=/vol/engineering status=successful level=0 size_gb=23
source=backup server=b path=/scratch test=failed level=0
source=backup server=z path=/regression status=successful level=1 size_gb=1

and

source=volume server=a ndmp_path=/vol/engineering owner=bob 
source=volume server=b ndmp_path=/test owner=jill
source=volume server=c path=/vs0/hr owner=jack
source=volume server=d path=/scratch_nobackup owner=zack

I want to see what is getting backed up, what isn't and whether something is getting backed up that I don't have in my volume inventory. I can do that with something like a stats last() or join server path, ie

source=backup OR source=volume | status last(*) as * by server, path

and then identify protected data (owner=* size_gb=*) or unprotected data (owner=* NOT size_gb=*) or volumes that are backed up but not in inventory (size_gb=* NOT owner=*).

The problem I'm running into is that the backup path is sometimes a subdirectory of the volume inventory path:

source=backup server=q path=/hr/important/data status=successful level=0 size_gb=2
source=volume server=q path=/hr owner=ellen

or

source=backup server=m path=/flower/database/.snapshot/daily status=successful level=0 size_gb=3
source=volume server=m path=/flower/database owner=don

I need to group these together -- same server, and volume path is a prefix of the backup path. But there is no stats last(*) as * by server, one_is_a_prefix_of_the_other(path). I've looked into stats, join, and transaction and haven't turned up anything obvious.

I can't extract the first field of the path and group by that, because sometimes the volume path is depth 1 (/volume) and sometimes it is depth 2 (/server_name/volume or /vol/volume).

I was able to get close by generating a lookup file for the volumes, then using wildcard functionality in transforms.conf to pull volume fields into the backup data. But that has some complications of its own that I'd rather avoid.

Thanks!

Lee

0 Karma
1 Solution

lguinn2
Legend

Would this work? First build a table using the full path, then use streamstats to create a new path...

 source=backup OR source=volume 
| fields server path source owner size_gb successful
| status latest(*) as * latest(_time) as timestamp by server path
| streamstats current=f reset_on_change=t last(new_path) as lastPath by server
| eval new_path = case(isnull(lastPath), path,
                                          match(path, lastPath),lastPath,
                                          1==1,path)
| stats last(*) as * by server new_path

View solution in original post

0 Karma

aaraneta_splunk
Splunk Employee
Splunk Employee

@lee_melvin - Were you able to test out lguinn's solution? Did it work? If yes, please don't forget to resolve this post by clicking on "Accept". If you still need more help, please provide a comment with some feedback. Thanks!

0 Karma

lguinn2
Legend

Would this work? First build a table using the full path, then use streamstats to create a new path...

 source=backup OR source=volume 
| fields server path source owner size_gb successful
| status latest(*) as * latest(_time) as timestamp by server path
| streamstats current=f reset_on_change=t last(new_path) as lastPath by server
| eval new_path = case(isnull(lastPath), path,
                                          match(path, lastPath),lastPath,
                                          1==1,path)
| stats last(*) as * by server new_path
0 Karma

lee_melvin
Path Finder

The 'streamstats | stats last(*)' construct is the concept I was missing.

I ended up with a search rather like this:

source=backup OR source=volume 
| eval fs_path=path
| eval backup_path=ndmp_path
| eval path=coalesce(fs_path, backup_path)
| sort -path source 
| streamstats current=f reset_on_change=t values(path) as allpaths by server 
| eval idx=mvfind(allpaths, "^" + path) 
| eval match_path=if(isnotnull(idx), mvindex(allpaths, idx), path) 
| stats last(*) as * by server match_path 
| table server path fs_path backup_path

I still have some details to work out, but its workable.

Thanks!

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...