Splunk Search

How to write a SPL query to identify top 20 largest indexes / sourcetypes by size (In GB's)?

Yashvik
Explorer

Hello All,
I need to identify the top log sources which are sending large data to Splunk. Tried Licence master dashboard which isn't helping much. 

My requirement is to create a table which contains following fields. e.g: sourcetype, vol_GB, index, percentage.

Labels (2)
Tags (3)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @Yashvik,

I found an errore, even if it runs on my search, please try again this and check all the rows:

index=_internal source=*license_usage.log* type="Usage" 
| eval h=if(len(h)=0 OR isnull(h),"(SQUASHED)",h) 
| eval s=if(len(s)=0 OR isnull(s),"(SQUASHED)",s) 
| eval idx=if(len(idx)=0 OR isnull(idx),"(UNKNOWN)",idx) 
| bin _time span=1d 
| stats sum(b) as b by _time, pool, s, st, h, idx   
| bin span=1d _time 
| stats values(st) AS sourcetype sum(b) AS volumeB by _time idx 
| rename idx AS index
| eval volumeB=round(volumeB/1024/1024/1024,2)
| sort 20 -volumeB

Ciao.

Giuseppe

View solution in original post

gcusello
SplunkTrust
SplunkTrust

Hi @Yashvik,

please try this search:

index=_internal source=*license_usage.log* type="Usage" 
| eval h=if(len(h)=0 OR isnull(h),"(SQUASHED)",h) 
| eval s=if(len(s)=0 OR isnull(s),"(SQUASHED)",s) 
| eval idx=if(len(idx)=0 OR isnull(idx),"(UNKNOWN)",idx) 
| bin _time span=1d 
| stats sum(b) as b by _time, pool, s, st, h, idx   
| bin span=1d _time 
| stats sum(b) AS volumeB by _time idx st 
| eval volumeB=round(volumeB/1024/1024/1024,2)
| sort 20 -volumeB

Ciao.

Giuseppe

0 Karma

Yashvik
Explorer

Hello @gcusello 
Thanks a lot for the swift response. the query gives the most of info I was looking for. However, it contains multiple entries for single index. Lets say If Index A has 3 sourcetypes, it appears in 3 rows. 
Can we group them in a single row? 
e.g:

_timeindexsourcetypeVol_GBpercentage
17th SepMain st1
st2
st3
100G10.00%
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Yashvik,

if you're not interested to the value for each sourcetype, but only to know which sourcetypes are in an index, you cam modify the above search in this way:

index=_internal source=*license_usage.log* type="Usage" 
| eval h=if(len(h)=0 OR isnull(h),"(SQUASHED)",h) 
| eval s=if(len(s)=0 OR isnull(s),"(SQUASHED)",s) 
| eval idx=if(len(idx)=0 OR isnull(idx),"(UNKNOWN)",idx) 
| bin _time span=1d 
| stats sum(b) as b by _time, pool, s, st, h, idx   
| bin span=1d _time 
| stats values( st) AS sourcetype sum(b) AS volumeB by _time idx 
| rename idx AS index
| eval volumeB=round(volumeB/1024/1024/1024,2)
| sort 20 -volumeB

Ciao.

Giuseppe

0 Karma

Yashvik
Explorer

Hello @gcusello 
Thanks for the response. Unfortunately, I see only empty values for sourcetype column.  other 3 fields showing the info. 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Yashvik,

very strange!

as you can see it works on my Splunk

gcusello_0-1695025353570.png

did you exactly copied my search?

Ciao.

Giuseppe

0 Karma

Yashvik
Explorer

Hi @gcusello 
I used the same search which you shared above and didn't made any changes. I will share the screenshot shortly as I am getting some errors in uploading the picture.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Yashvik,

I found an errore, even if it runs on my search, please try again this and check all the rows:

index=_internal source=*license_usage.log* type="Usage" 
| eval h=if(len(h)=0 OR isnull(h),"(SQUASHED)",h) 
| eval s=if(len(s)=0 OR isnull(s),"(SQUASHED)",s) 
| eval idx=if(len(idx)=0 OR isnull(idx),"(UNKNOWN)",idx) 
| bin _time span=1d 
| stats sum(b) as b by _time, pool, s, st, h, idx   
| bin span=1d _time 
| stats values(st) AS sourcetype sum(b) AS volumeB by _time idx 
| rename idx AS index
| eval volumeB=round(volumeB/1024/1024/1024,2)
| sort 20 -volumeB

Ciao.

Giuseppe

Yashvik
Explorer

Thanks a lot for the response @gcusello , it works.

0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...