Getting Data In

Historical Data Indexing Speed

clyde772
Communicator

Hey Splunkers!

I have a question where we are testing Splunk for it's indexing speed. We are trying to do that with Forwarder and an indexer where we are trying to ingest 500GB data that resides in Forwarder. One thing we realize is that

  1. We had to control the bw limit to maximize the transport of the data

but there seems to be a mechnism that rates the indexing speed even we took out the bandwidth limit. It was fast then as it indexed older data the rate sinificantly dropped. so the question is

regardless of environment, how can we setup Splunk to maximize the amount of data ingestion speed?

So maximize the index speed for historical data over network forwarder?

Thanks in advance for your answers~! Happy summer~!

Tags (2)
0 Karma

Drainy
Champion

Basically you want to tune the maxKbps in limits.conf on the forwarder;
http://docs.splunk.com/Documentation/Splunk/latest/admin/Limitsconf

This controls the rate at which the forwarder can forward data over the network, but this will be limited by the network and also by IO on the local device for how quickly it can read the data (if it is a heavily used machine for example it may experience some delay).

On the indexer end you want to install Splunk on Splunk to monitor for blocked queues (or just search for them) as if Splunk cannot write to disc quick enough it will begin to block each of its queues in turn until eventually it blocks the TCP in so the forwarder will have to locally queue its data.

To avoid the indexer queue blocking (the final queue which writes to disc) you need to ensure you have a sufficiently high amount of IOPS available. This will depend entirely on the amount of data it is dealing with elsewhere and the machine load but really you want 800-1200 IOPS minimum. If the data cannot be written to disc quickly it will block.

What specification machine are you using for the indexer? if you are looking at 500GB then you should really have more than one indexer, infact you should probably have quite a few to take the load off. Remember there is also processing load as it parses the data arriving and performs any index time extractions you may have (which is hopefully minimal 😉 )

Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...