I want to group events describing backup job status with other events describing the volumes being backed up. The data conceptually looks like this:
source=backup server=a path=/vol/engineering status=successful level=0 size_gb=23
source=backup server=b path=/scratch test=failed level=0
source=backup server=z path=/regression status=successful level=1 size_gb=1
and
source=volume server=a ndmp_path=/vol/engineering owner=bob
source=volume server=b ndmp_path=/test owner=jill
source=volume server=c path=/vs0/hr owner=jack
source=volume server=d path=/scratch_nobackup owner=zack
I want to see what is getting backed up, what isn't and whether something is getting backed up that I don't have in my volume inventory. I can do that with something like a stats last()
or join server path
, ie
source=backup OR source=volume | status last(*) as * by server, path
and then identify protected data (owner=* size_gb=*)
or unprotected data (owner=* NOT size_gb=*)
or volumes that are backed up but not in inventory (size_gb=* NOT owner=*)
.
The problem I'm running into is that the backup path is sometimes a subdirectory of the volume inventory path:
source=backup server=q path=/hr/important/data status=successful level=0 size_gb=2
source=volume server=q path=/hr owner=ellen
or
source=backup server=m path=/flower/database/.snapshot/daily status=successful level=0 size_gb=3
source=volume server=m path=/flower/database owner=don
I need to group these together -- same server, and volume path is a prefix of the backup path. But there is no stats last(*) as * by server, one_is_a_prefix_of_the_other(path)
. I've looked into stats, join, and transaction and haven't turned up anything obvious.
I can't extract the first field of the path and group by that, because sometimes the volume path is depth 1 (/volume) and sometimes it is depth 2 (/server_name/volume or /vol/volume).
I was able to get close by generating a lookup file for the volumes, then using wildcard functionality in transforms.conf to pull volume fields into the backup data. But that has some complications of its own that I'd rather avoid.
Thanks!
Lee
Would this work? First build a table using the full path, then use streamstats to create a new path...
source=backup OR source=volume
| fields server path source owner size_gb successful
| status latest(*) as * latest(_time) as timestamp by server path
| streamstats current=f reset_on_change=t last(new_path) as lastPath by server
| eval new_path = case(isnull(lastPath), path,
match(path, lastPath),lastPath,
1==1,path)
| stats last(*) as * by server new_path
@lee_melvin - Were you able to test out lguinn's solution? Did it work? If yes, please don't forget to resolve this post by clicking on "Accept". If you still need more help, please provide a comment with some feedback. Thanks!
Would this work? First build a table using the full path, then use streamstats to create a new path...
source=backup OR source=volume
| fields server path source owner size_gb successful
| status latest(*) as * latest(_time) as timestamp by server path
| streamstats current=f reset_on_change=t last(new_path) as lastPath by server
| eval new_path = case(isnull(lastPath), path,
match(path, lastPath),lastPath,
1==1,path)
| stats last(*) as * by server new_path
The 'streamstats | stats last(*)' construct is the concept I was missing.
I ended up with a search rather like this:
source=backup OR source=volume
| eval fs_path=path
| eval backup_path=ndmp_path
| eval path=coalesce(fs_path, backup_path)
| sort -path source
| streamstats current=f reset_on_change=t values(path) as allpaths by server
| eval idx=mvfind(allpaths, "^" + path)
| eval match_path=if(isnotnull(idx), mvindex(allpaths, idx), path)
| stats last(*) as * by server match_path
| table server path fs_path backup_path
I still have some details to work out, but its workable.
Thanks!