Getting Data In

Backed Up Indexers

vadud3
Path Finder

According to my Deployment monitor app one of my indexer shows backed up. I need help find out if it is some thing due to slow disk or some complex regex.

I am providing the following logs as evidence to my issue

indexer's splunkd.log

11-16-2010 17:23:14.625 INFO  TailingProcessor - Could not send data to output queue (parsingQueue), retrying...
11-16-2010 17:23:39.674 INFO  TailingProcessor -   ...continuing.
11-16-2010 17:23:43.608 WARN  DateParserVerbose - Accepted time (Fri Oct 22 05:29:30 2010) is suspiciously far away from the previous event's time (Tue Nov 16 17:11:52 2010), but still accepted because it was extracted by the same pattern.  Context="source::/var/genesys/GVHC/GVHC_Stat_Server2.log.20101116_163632_193.log|host::tuk1cc-g2|genesys_statserver_log|remoteport::37189"
                1041 similar messages suppressed.  First occurred at: Tue Nov 16 17:18:38 2010
11-16-2010 17:23:43.608 WARN  DateParserVerbose - Failed to parse timestamp for event.  Context="source::/var/genesys/QwestHSI/QwestHSI_UR_Server2.log.20101116_045930_983.log|host::cer1cc-g2|genesys_urserver_log|remoteport::47624" Text="   AttributeCustomerID 'QwestHSI'..."
                25777 similar messages suppressed.  First occurred at: Tue Nov 16 17:18:38 2010
11-16-2010 17:23:43.608 WARN  DateParserVerbose - Failed to parse timestamp for event.  Context="source::/var/genesys/QwestHSI/QwestHSI_UR_Server2.log.20101116_045930_983.log|host::cer1cc-g2|genesys_urserver_log|remoteport::47624" Text="   AttributeANI    '606837491'..."
11-16-2010 17:23:43.608 WARN  DateParserVerbose - Failed to parse timestamp for event.  Context="source::/var/genesys/QwestHSI/QwestHSI_UR_Server2.log.20101116_045930_983.log|host::cer1cc-g2|genesys_urserver_log|remoteport::47624" Text="   AttributeDNIS   '8665313546'..."

indexer's metrics.log

11-16-2010 17:26:41.086 INFO  Metrics - group=queue, name=indexqueue, blocked=true, max_size=1000, filled_count=11, empty_count=7307, current_size=1000, largest_size=1000, smallest_size=1
11-16-2010 17:26:41.086 INFO  Metrics - group=queue, name=typingqueue, blocked=true, max_size=1000, filled_count=4, empty_count=15508, current_size=1000, largest_size=1000, smallest_size=1
11-16-2010 17:27:35.067 INFO  Metrics - group=queue, name=aggqueue, blocked=true, max_size=1000, filled_count=1, empty_count=0, current_size=1000, largest_size=1000, smallest_size=817
11-16-2010 17:27:35.067 INFO  Metrics - group=queue, name=indexqueue, blocked=true, max_size=1000, filled_count=0, empty_count=0, current_size=1000, largest_size=0, smallest_size=1000
11-16-2010 17:27:35.067 INFO  Metrics - group=queue, name=typingqueue, blocked=true, max_size=1000, filled_count=0, empty_count=0, current_size=1000, largest_size=0, smallest_size=1000

forwarder's metrics.log

11-16-2010 17:27:16.324 INFO  Metrics - group=queue, name=typingqueue, blocked=true, max_size=1000, filled_count=4, empty_count=1132, current_size=1000, largest_size=1000, smallest_size=1
11-16-2010 17:27:16.324 INFO  Metrics - group=tcpout_connections, apa-splunk, blocked=true, current_entries_count=1000, queue_size=1000
11-16-2010 17:28:18.903 INFO  Metrics - group=queue, name=aggqueue, blocked=true, max_size=1000, filled_count=0, empty_count=0, current_size=1000, largest_size=0, smallest_size=1000
11-16-2010 17:28:18.903 INFO  Metrics - group=queue, name=indexqueue, blocked=true, max_size=1000, filled_count=0, empty_count=0, current_size=1000, largest_size=0, smallest_size=1000
11-16-2010 17:28:18.904 INFO  Metrics - group=queue, name=parsingqueue, blocked=true, max_size=1000, filled_count=0, empty_count=0, current_size=1000, largest_size=0, smallest_size=1000
11-16-2010 17:28:18.904 INFO  Metrics - group=queue, name=tcpout_apa-splunk, blocked=true, max_size=1000, filled_count=0, empty_count=0, current_size=1000, largest_size=0, smallest_size=1000
11-16-2010 17:28:18.904 INFO  Metrics - group=queue, name=typingqueue, blocked=true, max_size=1000, filled_count=0, empty_count=0, current_size=1000, largest_size=0, smallest_size=1000
11-16-2010 17:28:18.904 INFO  Metrics - group=tcpout_connections, apa-splunk, blocked=true, current_entries_count=1000, queue_size=1000

Here is an snapshot of the DM status

http://picpaste.com/pics/indexer-ObvwPnn9.1289928881.png

Here is little bit more details of the queue on the indexer

http://picpaste.com/pics/splunk-1vsSa4RV.1289929280.png

Tags (2)
0 Karma

gkanapathy
Splunk Employee
Splunk Employee

A queue is blocked if the queues downstream from it are blocked. The furthest downstream queue you have that is showing blocked is the indexqueue on the indexer. Below that are only two things: The thruput throttle (set in limits.conf) and the OS/disk itself. (These two items are not queues really and are not recorded in metrics.log) The thruput throttle might be set accidentally in limits.conf ([thruput] maxKBps), though this is unlikely. If that's not the problem, then that points to slowness in writing out to disk.

vadud3
Path Finder

I see no outputs.conf file in etc/systems/local

0 Karma

vadud3
Path Finder

yes very late.. with autolb 97% data gets indexed by the other indexer and 3% by this one. they (two indexers) both are on same subnet. let me check the outputs.conf

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

And, also, is it permanently stuck, or just slow? i.e., do events eventually make it through and get indexed, just late?

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Actually, I guess there's one other thing, and that would be if your indexer was configured to index and forward as well...this would be another accidental config, specifying an output group in outputs.conf.

0 Karma

vadud3
Path Finder

I see maxKBps = 0 for the indexers. so I guess that boils down to slow disk on the slow indexer.

0 Karma

Archana
Splunk Employee
Splunk Employee

An indexer is "backed up" if its parsingQueue is over 50% full most of the time. It seems like this is the case based on your queue stats (parsingQueue size seems to be >500 and often 1000).

It's very likely that one of your regexes to parse events is too complex/inefficient.

vadud3
Path Finder

Well actually since I have two indexers built the same way, regex should not be issue. It might be just slow disks on one of the server.

0 Karma
Get Updates on the Splunk Community!

Monitoring MariaDB and MySQL

In a previous post, we explored monitoring PostgreSQL and general best practices around which metrics to ...

Financial Services Industry Use Cases, ITSI Best Practices, and More New Articles ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Splunk Federated Analytics for Amazon Security Lake

Thursday, November 21, 2024  |  11AM PT / 2PM ET Register Now Join our session to see the technical ...