Getting Data In

Tracking indexing per source - without _internal index access

sonicZ
Contributor

Looking to measure heavy sources and track how much is getting indexed per day by source.
the main problem is our Splunk admin team cannot give us access to the _internal index, so i cannot run the standard  _internal metrics commands such as:

 index=_internal sourcetype=splunkd source=*metrics.log* group=per_source_thruput

 

Curious as to how accurate measuring actual log sizes with Splunk commands might be compared to _internal index stats. we dont need 100% accurate results just a ballpark estimate such as one source might be indexing 5-600Gbs per day or 1-1.5 Tb a day for example.
Thinking of trying something like 

 

index=aws-index sourcetype=someSource
source="/some/source/file.log"
| eval raw_len=len(_raw)
| eval raw_len_kb = raw_len/1024
| eval raw_len_mb = raw_len/1024/1024
| eval raw_len_gb = raw_len/1024/1024/1024
| eval raw_len_tb = raw_len/1024/1024/1024/1024
| stats sum(raw_len_mb) as MB sum(raw_len_gb) as GB sum(raw_len_tb) as TB by source

 

 

 

 

 

 

 

 

Labels (2)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

That method is close enough, but will be slow since you have to read every event to get its size.

To improve performance ever so slightly, add up the length of _raw then convert to MB/GB/TB at the end.

index=aws-index sourcetype=someSource
source="/some/source/file.log"
| eval raw_len=len(_raw)
| stats sum(raw_len) as B by source
| eval MB = B/1024/1024, eval GB = B/1024/1024/1024, eval TB = B/1024/1024/1024/1024
---
If this reply helps you, Karma would be appreciated.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

That method is close enough, but will be slow since you have to read every event to get its size.

To improve performance ever so slightly, add up the length of _raw then convert to MB/GB/TB at the end.

index=aws-index sourcetype=someSource
source="/some/source/file.log"
| eval raw_len=len(_raw)
| stats sum(raw_len) as B by source
| eval MB = B/1024/1024, eval GB = B/1024/1024/1024, eval TB = B/1024/1024/1024/1024
---
If this reply helps you, Karma would be appreciated.

sonicZ
Contributor

Thanks, that will most likely help a bit!
planning to run this a few times per day so we can populate results in a .csv lookup table as well

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Why not ask your admin team to setup a summary index for license usage logs and give you access to that summary index. That way you can have access to that data without having access to whole _internal index. Something like this:

https://community.splunk.com/t5/Getting-Data-In/How-to-create-a-summary-Index-that-will-give-license...

 

0 Karma

sonicZ
Contributor

trying to get our Splunk admin team to do anything here is like pulling teeth 🙂 but summary indexing might work thanks for that. Will probably take them weeks to get to unfortunately

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...