Getting Data In

Major discrepancy between per_sourcetype_thruput and tcpin_connections - which is right?

Jason
Motivator

I'm looking at a Splunk instance right now that is getting 99+% of its data as one particular sourcetype, from two heavy forwarders.

Running a search on index="_internal" source="*metrics.log" per_sourcetype_thruput | eval GB=kb/1024/1024 | timechart span=1d sum(GB) by series over 7 days gives a peak of 188GB/day on Thursday.

But, a search for index="_internal" source=*metrics.log group=tcpin_connections | eval GB=kb/1024/1024 | timechart span=1d sum(GB) by sourceHost stacked over the same period shows a similar curve but a peak of almost 500GB/day on Thursday!

What is going on here?

Which metric is correct?

Is the heavy forwarder really adding an additional 150% to the amount of bandwidth used? (Regardless of what actually gets indexed)

Tags (2)
1 Solution

jbsplunk
Splunk Employee
Splunk Employee

They are both correct, but they are measuring different things. There is a discrepancy because per_sourcetype_thruput in metrics.log is based on an entry in limits.conf which defines the number of series to collect every 30 seconds, for which the default is 10. As such, if you've got more then 10 sourcetypes, you won't have the full picture on the total thruput of all sourcetypes. That doesn't mean it isn't useful, but you just don't have the full picture. You might have a better idea by looking at per_index_thruput, as most people probably don't have more than ten indexes. Anyway, thats the source of the discrepancy. Here is the setting from docs:

http://docs.splunk.com/Documentation/Splunk/latest/admin/Limitsconf

[metrics]

maxseries = <integer>
 * The number of series to include in the per_x_thruput reports in metrics.log.
 * Defaults to 10.

Additionally, here is a page that is full of useful searches for troubleshooting data volume issues:

http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume

Specifically, I think you'll find the search for 'Counting event sizes over a time range' to be of use.

View solution in original post

Jason
Motivator

Nope, they're about the same as per_sourcetype_thruput. I didn't mention before, but it may be helpful, that this indexer is only receiving data from heavy forwarders. (edited op)

0 Karma

hexx
Splunk Employee
Splunk Employee

What numbers do you see for index=_internal source=*metrics.log per_index_thruput | eval GB=kb/1024/1024 | timechart span=1d sum(GB)? Are they closer to what you find from tcpin_connections?

0 Karma

jbsplunk
Splunk Employee
Splunk Employee

They are both correct, but they are measuring different things. There is a discrepancy because per_sourcetype_thruput in metrics.log is based on an entry in limits.conf which defines the number of series to collect every 30 seconds, for which the default is 10. As such, if you've got more then 10 sourcetypes, you won't have the full picture on the total thruput of all sourcetypes. That doesn't mean it isn't useful, but you just don't have the full picture. You might have a better idea by looking at per_index_thruput, as most people probably don't have more than ten indexes. Anyway, thats the source of the discrepancy. Here is the setting from docs:

http://docs.splunk.com/Documentation/Splunk/latest/admin/Limitsconf

[metrics]

maxseries = <integer>
 * The number of series to include in the per_x_thruput reports in metrics.log.
 * Defaults to 10.

Additionally, here is a page that is full of useful searches for troubleshooting data volume issues:

http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume

Specifically, I think you'll find the search for 'Counting event sizes over a time range' to be of use.

jbsplunk
Splunk Employee
Splunk Employee

That may or may not be true. It is completely feasible that the aggregate size of the sourcetypes which aren't in the top 10 would cause significant differences between these measurements. I think per_index_thruput is probably a better measurement if you've got under 10 indexes configured.

0 Karma

Jason
Motivator

I figured the maxseries would not be an issue, since the test data sourcetype is constantly orders of magnitude higher than any other type of data coming in to the indexer.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...

Design, Compete, Win: Submit Your Best Splunk Dashboards for a .conf26 Pass

Hello Splunkers,  We’re excited to kick off a Splunk Dashboard contest! We know that dashboards are a primary ...

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...