Getting Data In

How to estimate today's indexing volume

DMohn
Motivator

Hi Splunkers,

I want to create an Instance overview dashboard, and one KPI should be today's estimated indexing volume. The daily traffic varies greatly by time (significantly more over the working hours, less during nighttime), which makes it a bit hard to just sum up the already indexed data and just extrapolate the value.

Currently I am trying to get a value by using a rolling average with streamstats like this:

 index=_internal source=*metrics.log* sourcetype=splunkd group=per_host_thruput host=indexhost* earliest=@d | timechart per_day(kb) as daily | streamstats window=0 avg(daily) as davg

The davg value is being displayed as a single value. My problem is: the value will be much too low in the morning hours, and too high in the early afternoon. I have already tried using data from the last 24h to get a better average, but with limited success.

Any chance to properly consider the changing traffic per day here?

0 Karma
1 Solution

hexx
Splunk Employee
Splunk Employee

If you are looking to estimate the usage of your license quota, the only source of truth is the events of the license_usage.log file as they are recorded on your license master. The panels of the License Usage view in the Distributed Management Console provide authoritative searches on this matter.

Now, if you are looking to estimate your daily indexing throughput (whether the data counted against your license quota or not), I would recommend to leverage the events of group=thruput name=index_thruput in metrics.log, like so:

index=_internal group=thruput name=index_thruput | timechart span=1d sum(kb) AS daily_KB

Do not attempt to use events of group=per_*_thruput to accurately determine license usage or indexing thruput as those represent a sampled measurement.

View solution in original post

hexx
Splunk Employee
Splunk Employee

If you are looking to estimate the usage of your license quota, the only source of truth is the events of the license_usage.log file as they are recorded on your license master. The panels of the License Usage view in the Distributed Management Console provide authoritative searches on this matter.

Now, if you are looking to estimate your daily indexing throughput (whether the data counted against your license quota or not), I would recommend to leverage the events of group=thruput name=index_thruput in metrics.log, like so:

index=_internal group=thruput name=index_thruput | timechart span=1d sum(kb) AS daily_KB

Do not attempt to use events of group=per_*_thruput to accurately determine license usage or indexing thruput as those represent a sampled measurement.

DMohn
Motivator

I have marked this as accepted, as it comes most closely to what I needed to achieve.

We have built a dashboard on the group=thruput name=index_thruput metrics, and did some averaging to get reasonable results.

Thanks!

marina_rovira
Contributor

Hi there!

You know, if this can help you, with licensing, you can consult a report done by spllunk. Settings -> Licensing -> usage report.

If you open the seraches for each panel they have, maybe you can finde there some useful stuff

0 Karma

gyslainlatsa
Motivator

hi DMohn,

The LicenseManager search will not count things like index=_internal and index=_audit data, because that volume doesnt count against your license. And the per_host search does.

However you can use the per_index_thruput numbers and then filter out the indexes that have leading underscores.

index=internal source=metrics.log splunk_server="" group="per_index_thruput" | eval MB=kb/1024 | stats sum(MB) by series | rename series as index | search index!=* | sort sum(MB) | addcoltotals | fillnull value="[ Total Indexed Volume ] last 24 hours" index

If I run this search against the data from yesterday and compare it to the licenseManager's search from today (necessary because the licenseManager runs just after midnight and its talking about yesterday), then the numbers seem very close to eachother but oddly they are not equal. Im not sure why.

for other solutions follow this link:

https://answers.splunk.com/answers/140/how-do-i-determine-my-indexing-volume-by-host-source-or-sourc...

you can use new searches for the detail per : sourcetype/host/source per pool

see http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume

0 Karma

mendesjo
Path Finder

another query, another one that when you plug it in.. you get nothing 😞

0 Karma

DMohn
Motivator

Hi,

Thanks for your reply. However - the stated search only sums up the last 24 hours. What I would need is a prediction of the current day.

Any idea how to accomplish that?

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...