Getting Data In

How much data can i index per second on a single indexer?

Chris_R_
Splunk Employee
Splunk Employee

I've already got my single indexer spec'd to handle under 100Gigs a day and it meets the requirements. However i am getting blocked queue's at certain times of the day. What gives?

Tags (3)
1 Solution

Chris_R_
Splunk Employee
Splunk Employee

Splunk recommends indexing anywhere from 3-10mb per second on a single indexer. Please keep in mind the upper limit of 10mbps is on very fast hardware, 15k rpm disks, raid 0+1 array, fast bonnie++ results

Your system may be indexing within the reccomendations of < 100gig per day spec'd box, but if you have blocked indexqueue's at certain times you may be indexing in too much data at certain time frames.

Check your queues with this search during problem time frames.
index=_internal source="*metrics.log*" group=queue | timechart perc95(current_size) by name

If you want to drill down and find out the maximum kbps indexed at that time
index="_internal" source="*metrics.log*" per_index_thruput | timechart span=1h max(kbps) by series | addtotals

You can then identify heavy forwarders sending lots of data
index=_internal source="*metrics.log*" per_host_thruput | eval mb=(kb/1024) | timechart span=1h sum(mb) by series | addtotals

For further assistance and recommendations on how to increase performance open a case with support.

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

Most of the time, if you are not reaching the target kbps indexed (i.e., three to six kb per second - 10 is possible, but not easy to achieve), it's either because of your disk performance, or because you have poor index-time rules. To achieve the best index thruput, you should optimize:

  • Timestamp extraction: use explicit timestamp prefixes, formats, and lookaheads as much as possible
  • Line breaking rules: try to use LINE_BREAKER and avoid LINE_MERGING if possible, and keep the merging rules simple if not
  • Index-time transforms: Have as few as and simple index-time transforms (for sources, hosts, index, or other fields) as possible
  • Regular expressions: Make sure your regular expressions are PCRE-efficient
0 Karma

sonicZ
Contributor

Hey Gerald, i know this is a really old question but did you mean target indexed value of
3 - 6 "kbps" or target "mbps" ?

0 Karma

Chris_R_
Splunk Employee
Splunk Employee

Splunk recommends indexing anywhere from 3-10mb per second on a single indexer. Please keep in mind the upper limit of 10mbps is on very fast hardware, 15k rpm disks, raid 0+1 array, fast bonnie++ results

Your system may be indexing within the reccomendations of < 100gig per day spec'd box, but if you have blocked indexqueue's at certain times you may be indexing in too much data at certain time frames.

Check your queues with this search during problem time frames.
index=_internal source="*metrics.log*" group=queue | timechart perc95(current_size) by name

If you want to drill down and find out the maximum kbps indexed at that time
index="_internal" source="*metrics.log*" per_index_thruput | timechart span=1h max(kbps) by series | addtotals

You can then identify heavy forwarders sending lots of data
index=_internal source="*metrics.log*" per_host_thruput | eval mb=(kb/1024) | timechart span=1h sum(mb) by series | addtotals

For further assistance and recommendations on how to increase performance open a case with support.

Get Updates on the Splunk Community!

Building Reliable Asset and Identity Frameworks in Splunk ES

 Accurate asset and identity resolution is the backbone of security operations. Without it, alerts are ...

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

For Splunk Cloud customers, understanding and optimizing Splunk Virtual Compute (SVC) usage and resource ...

Automatic Discovery Part 3: Practical Use Cases

If you’ve enabled Automatic Discovery in your install of the Splunk Distribution of the OpenTelemetry ...