Getting Data In

What's the maximum data throughput per second can a single indexer handle?

jichen
Explorer

Hi, we met a tough issue , there's a system generate more than 10MB/s log to forwarder to index server at a special peak time, then the data will be delayed by half an hour or more. In our scenario, the index server has 2 CPUs with 16 core ,16GB ram, 15000rpm sas HDD with raid 5. I noticed it's loading's not heavy , CPU below 6 percent ,load average about 0.1, HDD iops at 60. The index server configuration : maxKBps=0 in limits.conf, queue max size set 2048MB in server.conf. The forwarder side I configured the thrughput with maxKBps=0.

With above configuration the maximum bandwidth between forwarder and index is about 2MB/s.

So I have some question:
1. Can splunk run faster (speed up the data throughput much more than 2MB/s)? I didn't see any bottle neck with current hardware performance. The index engine worked in single threading? It's possible to disable the auto fields finding on some special source type or indexes to speed up the data throughput?
2. It seems the splunk process doesn't consume much resource while indexing data, it's limited by the splunk indexing process's ability itself or the software want to spare the resource for other routine jobs like real time search ,report ?
3. If I have a single log file with horrible size (3MB*3600*24=253GB) per day,is it possible to make it searchable near real time level?
4. In your experience ,what's the maximum data throughput per second can a single indexer approach?

Tags (1)

jrodman
Splunk Employee
Splunk Employee

It's common to hit numbers like 10MB/s on modern hardware in splunk 5 or 6, though there are many variables and with some data you might have lower numbers without anything being wrong. I have seen scenarios where 20MB/s was achieved. 2MB/s sounds like problem territory.

We don't have "Quality of Service"-like controls to prioritize one large file over all other data, so if the system can't handle the aggregate data, the largest single datasource may lag.

There are many potential bottlenecks in the system, and it's hard to diagnose this without full support contact. Of course you probably dealt with this problem long ago, but I'm answering because the core question about expected throughput is important.

0 Karma

kristian_kolb
Ultra Champion

Interesting. You seem to have done your homework. Have you contacted Splunk Support?

Could there be other limits in UF which you could overcome by switching to a Heavy Forwarder? This is not my strong side, just thinking ...

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...