Splunk Search

Issues with Joining: Maybe there is a better way?

cboillot
Contributor

We have the following search that stopped working:

| tstats summariesonly=true sum(everything.rawlen) as rawBytes from datamodel=storage_billing by splunk_server,index,everything.bucketId,host 
| rename everything.* as * 
| eval rawMBytes=rawBytes/1024/1024 
| join splunk_server, bucketId 
    [ dbinspect index=* 
    | eval rawSizeMB=rawSize/1024/1024 
    | fields splunk_server, bucketId, path, state, startEpoch, endEpoch, modTime, sizeOnDiskMB,rawSizeMB ] 
| search state=cold 
| eval compression=sizeOnDiskMB/rawSizeMB, newRawMBytes = rawMBytes * compression 
| eventstats sum(rawMBytes), sum(newRawMBytes) by splunk_server, bucketId 
| eval margin_of_error= round( ( sizeOnDiskMB - 'sum(newRawMBytes)' ) / sizeOnDiskMB,4) 
| stats sum(newRawMBytes) as MBytes_Used, count(bucketId) as Bucket_Count by splunk_server,index,state,host 
| eval GBytes_Used=round(MBytes_Used/1024,2) 
| rename host as "Volume Name" 
| rename MBytes_Used as Space 
| eval "Copy Type"="Primary" 
| eval F4="Copy" 
| fields "Volume Name", Space, "Copy Type", F4

We have narrowed the issue down to join splunk_server, bucketId as when we run

| tstats summariesonly=true sum(everything.rawlen) as rawBytes from datamodel=storage_billing by splunk_server,index,everything.bucketId,host 
| rename everything.* as * 
| eval rawMBytes=rawBytes/1024/1024 

or

| dbinspect index=* 
| eval rawSizeMB=rawSize/1024/1024 
| fields splunk_server, bucketId, path, state, startEpoch, endEpoch, modTime, sizeOnDiskMB,rawSizeMB 

Separately, they work just fine. When we try to join then, that's when the search breaks. For reasons, this search is going back 7 years. Our current theory is that it is timing out before completion.

Is there a way to streamline the search? Is join the right way to do this? is there a faster, better way to do this?

0 Karma

woodcock
Esteemed Legend

There is almost always a better way than using join (it should be avoided like the plague that it is). Try this:

| tstats summariesonly=true sum(everything.rawlen) as rawBytes from datamodel=storage_billing by splunk_server,index,everything.bucketId,host 
| rename everything.* as * 
| eval rawMBytes=rawBytes/1024/1024 
| appendpipe
[ |dbinspect index=* 
| eval rawSizeMB=rawSize/1024/1024 
| fields splunk_server, bucketId, path, state, startEpoch, endEpoch, modTime, sizeOnDiskMB,rawSizeMB ] 
| stats values(*) AS * BY splunk_server, bucketId 
| search state=cold 
| eval compression=sizeOnDiskMB/rawSizeMB, newRawMBytes = rawMBytes * compression 
| eventstats sum(rawMBytes), sum(newRawMBytes) by splunk_server, bucketId 
| eval margin_of_error= round( ( sizeOnDiskMB - 'sum(newRawMBytes)' ) / sizeOnDiskMB,4) 
| stats sum(newRawMBytes) as MBytes_Used, count(bucketId) as Bucket_Count by splunk_server,index,state,host 
| eval GBytes_Used=round(MBytes_Used/1024,2) 
| rename host as "Volume Name" 
| rename MBytes_Used as Space 
| eval "Copy Type"="Primary" 
| eval F4="Copy" 
| fields "Volume Name", Space, "Copy Type", F4
0 Karma

cboillot
Contributor

This didn't return anydata. I changed search state=cold to search state=* played around with removing lines to see where the issue or issues are at. I removed line13 ( stats sum(newRawMBytes) as MBytes_Used, count(bucketId) as Bucket_Count by splunk_server,index,state,host ) and got some results, but nothing in the cold status. If i just run the dbinspct sub, i do get results with all four states (how, warm, cold, thawed)

0 Karma

woodcock
Esteemed Legend

You will have to play around a bit, I guess. The main thing is that you use the | appendpipe [ ... ] | stats values(*) AS * BY splunk_server bucketId instead of the join.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Data Persistence in the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. What happens if the OpenTelemetry collector ...

Introducing Splunk 10.0: Smarter, Faster, and More Powerful Than Ever

Now On Demand Whether you're managing complex deployments or looking to future-proof your data ...

Community Content Calendar, September edition

Welcome to another insightful post from our Community Content Calendar! We're thrilled to continue bringing ...