Splunk Search

How to write a SPL query to identify top 20 largest indexes / sourcetypes by size (In GB's)?

Yashvik
Explorer

Hello All,
I need to identify the top log sources which are sending large data to Splunk. Tried Licence master dashboard which isn't helping much. 

My requirement is to create a table which contains following fields. e.g: sourcetype, vol_GB, index, percentage.

Labels (2)
Tags (3)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @Yashvik,

I found an errore, even if it runs on my search, please try again this and check all the rows:

index=_internal source=*license_usage.log* type="Usage" 
| eval h=if(len(h)=0 OR isnull(h),"(SQUASHED)",h) 
| eval s=if(len(s)=0 OR isnull(s),"(SQUASHED)",s) 
| eval idx=if(len(idx)=0 OR isnull(idx),"(UNKNOWN)",idx) 
| bin _time span=1d 
| stats sum(b) as b by _time, pool, s, st, h, idx   
| bin span=1d _time 
| stats values(st) AS sourcetype sum(b) AS volumeB by _time idx 
| rename idx AS index
| eval volumeB=round(volumeB/1024/1024/1024,2)
| sort 20 -volumeB

Ciao.

Giuseppe

View solution in original post

gcusello
SplunkTrust
SplunkTrust

Hi @Yashvik,

please try this search:

index=_internal source=*license_usage.log* type="Usage" 
| eval h=if(len(h)=0 OR isnull(h),"(SQUASHED)",h) 
| eval s=if(len(s)=0 OR isnull(s),"(SQUASHED)",s) 
| eval idx=if(len(idx)=0 OR isnull(idx),"(UNKNOWN)",idx) 
| bin _time span=1d 
| stats sum(b) as b by _time, pool, s, st, h, idx   
| bin span=1d _time 
| stats sum(b) AS volumeB by _time idx st 
| eval volumeB=round(volumeB/1024/1024/1024,2)
| sort 20 -volumeB

Ciao.

Giuseppe

0 Karma

Yashvik
Explorer

Hello @gcusello 
Thanks a lot for the swift response. the query gives the most of info I was looking for. However, it contains multiple entries for single index. Lets say If Index A has 3 sourcetypes, it appears in 3 rows. 
Can we group them in a single row? 
e.g:

_timeindexsourcetypeVol_GBpercentage
17th SepMain st1
st2
st3
100G10.00%
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Yashvik,

if you're not interested to the value for each sourcetype, but only to know which sourcetypes are in an index, you cam modify the above search in this way:

index=_internal source=*license_usage.log* type="Usage" 
| eval h=if(len(h)=0 OR isnull(h),"(SQUASHED)",h) 
| eval s=if(len(s)=0 OR isnull(s),"(SQUASHED)",s) 
| eval idx=if(len(idx)=0 OR isnull(idx),"(UNKNOWN)",idx) 
| bin _time span=1d 
| stats sum(b) as b by _time, pool, s, st, h, idx   
| bin span=1d _time 
| stats values( st) AS sourcetype sum(b) AS volumeB by _time idx 
| rename idx AS index
| eval volumeB=round(volumeB/1024/1024/1024,2)
| sort 20 -volumeB

Ciao.

Giuseppe

0 Karma

Yashvik
Explorer

Hello @gcusello 
Thanks for the response. Unfortunately, I see only empty values for sourcetype column.  other 3 fields showing the info. 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Yashvik,

very strange!

as you can see it works on my Splunk

gcusello_0-1695025353570.png

did you exactly copied my search?

Ciao.

Giuseppe

0 Karma

Yashvik
Explorer

Hi @gcusello 
I used the same search which you shared above and didn't made any changes. I will share the screenshot shortly as I am getting some errors in uploading the picture.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Yashvik,

I found an errore, even if it runs on my search, please try again this and check all the rows:

index=_internal source=*license_usage.log* type="Usage" 
| eval h=if(len(h)=0 OR isnull(h),"(SQUASHED)",h) 
| eval s=if(len(s)=0 OR isnull(s),"(SQUASHED)",s) 
| eval idx=if(len(idx)=0 OR isnull(idx),"(UNKNOWN)",idx) 
| bin _time span=1d 
| stats sum(b) as b by _time, pool, s, st, h, idx   
| bin span=1d _time 
| stats values(st) AS sourcetype sum(b) AS volumeB by _time idx 
| rename idx AS index
| eval volumeB=round(volumeB/1024/1024/1024,2)
| sort 20 -volumeB

Ciao.

Giuseppe

Yashvik
Explorer

Thanks a lot for the response @gcusello , it works.

0 Karma
Get Updates on the Splunk Community!

Cloud Platform | Customer Change Announcement: Email Notification Will Be Available ...

The Notification Team is migrating our email service provider since currently there’s no support ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...