Getting Data In

How much data can i index per second on a single indexer?

Chris_R_
Splunk Employee
Splunk Employee

I've already got my single indexer spec'd to handle under 100Gigs a day and it meets the requirements. However i am getting blocked queue's at certain times of the day. What gives?

Tags (3)
1 Solution

Chris_R_
Splunk Employee
Splunk Employee

Splunk recommends indexing anywhere from 3-10mb per second on a single indexer. Please keep in mind the upper limit of 10mbps is on very fast hardware, 15k rpm disks, raid 0+1 array, fast bonnie++ results

Your system may be indexing within the reccomendations of < 100gig per day spec'd box, but if you have blocked indexqueue's at certain times you may be indexing in too much data at certain time frames.

Check your queues with this search during problem time frames.
index=_internal source="*metrics.log*" group=queue | timechart perc95(current_size) by name

If you want to drill down and find out the maximum kbps indexed at that time
index="_internal" source="*metrics.log*" per_index_thruput | timechart span=1h max(kbps) by series | addtotals

You can then identify heavy forwarders sending lots of data
index=_internal source="*metrics.log*" per_host_thruput | eval mb=(kb/1024) | timechart span=1h sum(mb) by series | addtotals

For further assistance and recommendations on how to increase performance open a case with support.

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

Most of the time, if you are not reaching the target kbps indexed (i.e., three to six kb per second - 10 is possible, but not easy to achieve), it's either because of your disk performance, or because you have poor index-time rules. To achieve the best index thruput, you should optimize:

  • Timestamp extraction: use explicit timestamp prefixes, formats, and lookaheads as much as possible
  • Line breaking rules: try to use LINE_BREAKER and avoid LINE_MERGING if possible, and keep the merging rules simple if not
  • Index-time transforms: Have as few as and simple index-time transforms (for sources, hosts, index, or other fields) as possible
  • Regular expressions: Make sure your regular expressions are PCRE-efficient
0 Karma

sonicZ
Contributor

Hey Gerald, i know this is a really old question but did you mean target indexed value of
3 - 6 "kbps" or target "mbps" ?

0 Karma

Chris_R_
Splunk Employee
Splunk Employee

Splunk recommends indexing anywhere from 3-10mb per second on a single indexer. Please keep in mind the upper limit of 10mbps is on very fast hardware, 15k rpm disks, raid 0+1 array, fast bonnie++ results

Your system may be indexing within the reccomendations of < 100gig per day spec'd box, but if you have blocked indexqueue's at certain times you may be indexing in too much data at certain time frames.

Check your queues with this search during problem time frames.
index=_internal source="*metrics.log*" group=queue | timechart perc95(current_size) by name

If you want to drill down and find out the maximum kbps indexed at that time
index="_internal" source="*metrics.log*" per_index_thruput | timechart span=1h max(kbps) by series | addtotals

You can then identify heavy forwarders sending lots of data
index=_internal source="*metrics.log*" per_host_thruput | eval mb=(kb/1024) | timechart span=1h sum(mb) by series | addtotals

For further assistance and recommendations on how to increase performance open a case with support.

View solution in original post

Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!