Getting Data In
Highlighted

How to estimate today's indexing volume

Motivator

Hi Splunkers,

I want to create an Instance overview dashboard, and one KPI should be today's estimated indexing volume. The daily traffic varies greatly by time (significantly more over the working hours, less during nighttime), which makes it a bit hard to just sum up the already indexed data and just extrapolate the value.

Currently I am trying to get a value by using a rolling average with streamstats like this:

 index=_internal source=*metrics.log* sourcetype=splunkd group=per_host_thruput host=indexhost* earliest=@d | timechart per_day(kb) as daily | streamstats window=0 avg(daily) as davg

The davg value is being displayed as a single value. My problem is: the value will be much too low in the morning hours, and too high in the early afternoon. I have already tried using data from the last 24h to get a better average, but with limited success.

Any chance to properly consider the changing traffic per day here?

0 Karma
Highlighted

Re: How to estimate today's indexing volume

Motivator

hi DMohn,

The LicenseManager search will not count things like index=internal and index=audit data, because that volume doesnt count against your license. And the per_host search does.

However you can use the perindexthruput numbers and then filter out the indexes that have leading underscores.

index=internal source=metrics.log splunk_server="" group="per_index_thruput" | eval MB=kb/1024 | stats sum(MB) by series | rename series as index | search index!=* | sort sum(MB) | addcoltotals | fillnull value="[ Total Indexed Volume ] last 24 hours" index

If I run this search against the data from yesterday and compare it to the licenseManager's search from today (necessary because the licenseManager runs just after midnight and its talking about yesterday), then the numbers seem very close to eachother but oddly they are not equal. Im not sure why.

for other solutions follow this link:

https://answers.splunk.com/answers/140/how-do-i-determine-my-indexing-volume-by-host-source-or-sourc...

you can use new searches for the detail per : sourcetype/host/source per pool

see http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume

0 Karma
Highlighted

Re: How to estimate today's indexing volume

Motivator

Hi,

Thanks for your reply. However - the stated search only sums up the last 24 hours. What I would need is a prediction of the current day.

Any idea how to accomplish that?

0 Karma
Highlighted

Re: How to estimate today's indexing volume

Explorer

another query, another one that when you plug it in.. you get nothing 😞

0 Karma
Highlighted

Re: How to estimate today's indexing volume

Contributor

Hi there!

You know, if this can help you, with licensing, you can consult a report done by spllunk. Settings -> Licensing -> usage report.

If you open the seraches for each panel they have, maybe you can finde there some useful stuff

0 Karma
Highlighted

Re: How to estimate today's indexing volume

Splunk Employee
Splunk Employee

If you are looking to estimate the usage of your license quota, the only source of truth is the events of the license_usage.log file as they are recorded on your license master. The panels of the License Usage view in the Distributed Management Console provide authoritative searches on this matter.

Now, if you are looking to estimate your daily indexing throughput (whether the data counted against your license quota or not), I would recommend to leverage the events of group=thruput name=index_thruput in metrics.log, like so:

index=_internal group=thruput name=index_thruput | timechart span=1d sum(kb) AS daily_KB

Do not attempt to use events of group=per_*_thruput to accurately determine license usage or indexing thruput as those represent a sampled measurement.

View solution in original post

Highlighted

Re: How to estimate today's indexing volume

Motivator

I have marked this as accepted, as it comes most closely to what I needed to achieve.

We have built a dashboard on the group=thruput name=index_thruput metrics, and did some averaging to get reasonable results.

Thanks!