topic Issues with Joining: Maybe there is a better way? in Splunk Search

Issues with Joining: Maybe there is a better way?

cboillot — Wed, 30 Jan 2019 14:59:57 GMT

We have the following search that stopped working:

| tstats summariesonly=true sum(everything.rawlen) as rawBytes from datamodel=storage_billing by splunk_server,index,everything.bucketId,host 
| rename everything.* as * 
| eval rawMBytes=rawBytes/1024/1024 
| join splunk_server, bucketId 
    [ dbinspect index=* 
    | eval rawSizeMB=rawSize/1024/1024 
    | fields splunk_server, bucketId, path, state, startEpoch, endEpoch, modTime, sizeOnDiskMB,rawSizeMB ] 
| search state=cold 
| eval compression=sizeOnDiskMB/rawSizeMB, newRawMBytes = rawMBytes * compression 
| eventstats sum(rawMBytes), sum(newRawMBytes) by splunk_server, bucketId 
| eval margin_of_error= round( ( sizeOnDiskMB - 'sum(newRawMBytes)' ) / sizeOnDiskMB,4) 
| stats sum(newRawMBytes) as MBytes_Used, count(bucketId) as Bucket_Count by splunk_server,index,state,host 
| eval GBytes_Used=round(MBytes_Used/1024,2) 
| rename host as "Volume Name" 
| rename MBytes_Used as Space 
| eval "Copy Type"="Primary" 
| eval F4="Copy" 
| fields "Volume Name", Space, "Copy Type", F4

We have narrowed the issue down to join splunk_server, bucketId as when we run

| tstats summariesonly=true sum(everything.rawlen) as rawBytes from datamodel=storage_billing by splunk_server,index,everything.bucketId,host 
| rename everything.* as * 
| eval rawMBytes=rawBytes/1024/1024

| dbinspect index=* 
| eval rawSizeMB=rawSize/1024/1024 
| fields splunk_server, bucketId, path, state, startEpoch, endEpoch, modTime, sizeOnDiskMB,rawSizeMB

Separately, they work just fine. When we try to join then, that's when the search breaks. For reasons, this search is going back 7 years. Our current theory is that it is timing out before completion.

Is there a way to streamline the search? Is join the right way to do this? is there a faster, better way to do this?

Re: Issues with Joining: Maybe there is a better way?

woodcock — Wed, 30 Jan 2019 15:27:04 GMT

There is almost always a better way than using join (it should be avoided like the plague that it is). Try this:

| tstats summariesonly=true sum(everything.rawlen) as rawBytes from datamodel=storage_billing by splunk_server,index,everything.bucketId,host 
| rename everything.* as * 
| eval rawMBytes=rawBytes/1024/1024 
| appendpipe
[ |dbinspect index=* 
| eval rawSizeMB=rawSize/1024/1024 
| fields splunk_server, bucketId, path, state, startEpoch, endEpoch, modTime, sizeOnDiskMB,rawSizeMB ] 
| stats values(*) AS * BY splunk_server, bucketId 
| search state=cold 
| eval compression=sizeOnDiskMB/rawSizeMB, newRawMBytes = rawMBytes * compression 
| eventstats sum(rawMBytes), sum(newRawMBytes) by splunk_server, bucketId 
| eval margin_of_error= round( ( sizeOnDiskMB - 'sum(newRawMBytes)' ) / sizeOnDiskMB,4) 
| stats sum(newRawMBytes) as MBytes_Used, count(bucketId) as Bucket_Count by splunk_server,index,state,host 
| eval GBytes_Used=round(MBytes_Used/1024,2) 
| rename host as "Volume Name" 
| rename MBytes_Used as Space 
| eval "Copy Type"="Primary" 
| eval F4="Copy" 
| fields "Volume Name", Space, "Copy Type", F4

Re: Issues with Joining: Maybe there is a better way?

cboillot — Wed, 30 Jan 2019 16:19:13 GMT

This didn't return anydata. I changed search state=cold to search state=* played around with removing lines to see where the issue or issues are at. I removed line13 ( stats sum(newRawMBytes) as MBytes_Used, count(bucketId) as Bucket_Count by splunk_server,index,state,host ) and got some results, but nothing in the cold status. If i just run the dbinspct sub, i do get results with all four states (how, warm, cold, thawed)

Re: Issues with Joining: Maybe there is a better way?

woodcock — Wed, 30 Jan 2019 16:48:35 GMT

You will have to play around a bit, I guess. The main thing is that you use the | appendpipe [ ... ] | stats values(*) AS * BY splunk_server bucketId instead of the join.