Splunk Search

Issues with Joining: Maybe there is a better way?

cboillot
Contributor

We have the following search that stopped working:

| tstats summariesonly=true sum(everything.rawlen) as rawBytes from datamodel=storage_billing by splunk_server,index,everything.bucketId,host 
| rename everything.* as * 
| eval rawMBytes=rawBytes/1024/1024 
| join splunk_server, bucketId 
    [ dbinspect index=* 
    | eval rawSizeMB=rawSize/1024/1024 
    | fields splunk_server, bucketId, path, state, startEpoch, endEpoch, modTime, sizeOnDiskMB,rawSizeMB ] 
| search state=cold 
| eval compression=sizeOnDiskMB/rawSizeMB, newRawMBytes = rawMBytes * compression 
| eventstats sum(rawMBytes), sum(newRawMBytes) by splunk_server, bucketId 
| eval margin_of_error= round( ( sizeOnDiskMB - 'sum(newRawMBytes)' ) / sizeOnDiskMB,4) 
| stats sum(newRawMBytes) as MBytes_Used, count(bucketId) as Bucket_Count by splunk_server,index,state,host 
| eval GBytes_Used=round(MBytes_Used/1024,2) 
| rename host as "Volume Name" 
| rename MBytes_Used as Space 
| eval "Copy Type"="Primary" 
| eval F4="Copy" 
| fields "Volume Name", Space, "Copy Type", F4

We have narrowed the issue down to join splunk_server, bucketId as when we run

| tstats summariesonly=true sum(everything.rawlen) as rawBytes from datamodel=storage_billing by splunk_server,index,everything.bucketId,host 
| rename everything.* as * 
| eval rawMBytes=rawBytes/1024/1024 

or

| dbinspect index=* 
| eval rawSizeMB=rawSize/1024/1024 
| fields splunk_server, bucketId, path, state, startEpoch, endEpoch, modTime, sizeOnDiskMB,rawSizeMB 

Separately, they work just fine. When we try to join then, that's when the search breaks. For reasons, this search is going back 7 years. Our current theory is that it is timing out before completion.

Is there a way to streamline the search? Is join the right way to do this? is there a faster, better way to do this?

0 Karma

woodcock
Esteemed Legend

There is almost always a better way than using join (it should be avoided like the plague that it is). Try this:

| tstats summariesonly=true sum(everything.rawlen) as rawBytes from datamodel=storage_billing by splunk_server,index,everything.bucketId,host 
| rename everything.* as * 
| eval rawMBytes=rawBytes/1024/1024 
| appendpipe
[ |dbinspect index=* 
| eval rawSizeMB=rawSize/1024/1024 
| fields splunk_server, bucketId, path, state, startEpoch, endEpoch, modTime, sizeOnDiskMB,rawSizeMB ] 
| stats values(*) AS * BY splunk_server, bucketId 
| search state=cold 
| eval compression=sizeOnDiskMB/rawSizeMB, newRawMBytes = rawMBytes * compression 
| eventstats sum(rawMBytes), sum(newRawMBytes) by splunk_server, bucketId 
| eval margin_of_error= round( ( sizeOnDiskMB - 'sum(newRawMBytes)' ) / sizeOnDiskMB,4) 
| stats sum(newRawMBytes) as MBytes_Used, count(bucketId) as Bucket_Count by splunk_server,index,state,host 
| eval GBytes_Used=round(MBytes_Used/1024,2) 
| rename host as "Volume Name" 
| rename MBytes_Used as Space 
| eval "Copy Type"="Primary" 
| eval F4="Copy" 
| fields "Volume Name", Space, "Copy Type", F4
0 Karma

cboillot
Contributor

This didn't return anydata. I changed search state=cold to search state=* played around with removing lines to see where the issue or issues are at. I removed line13 ( stats sum(newRawMBytes) as MBytes_Used, count(bucketId) as Bucket_Count by splunk_server,index,state,host ) and got some results, but nothing in the cold status. If i just run the dbinspct sub, i do get results with all four states (how, warm, cold, thawed)

0 Karma

woodcock
Esteemed Legend

You will have to play around a bit, I guess. The main thing is that you use the | appendpipe [ ... ] | stats values(*) AS * BY splunk_server bucketId instead of the join.

0 Karma
Get Updates on the Splunk Community!

October Community Champions: A Shoutout to Our Contributors!

As October comes to a close, we want to take a moment to celebrate the people who make the Splunk Community ...

Community Content Calendar, November Edition

Welcome to the November edition of our Community Spotlight! Each month, we dive into the Splunk Community to ...

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...