Getting Data In

Indexer missing events from forwarder

Explorer

Scenario: 1x load balancer, 2x light forwarders, 1x indexer.

alt text

The goal is to make it possible to reboot a single Splunk instance and not lose events. Here is the output.conf on the forwarders:

[tcpout:x.x.x.xx]
autoLB = false
server = x.x.x.x
x
useACK = true
maxQueueSize = 400MB
sendCookedData = true

To test this scenario I produce a controlled number of events and then verify they are indexed. Restarting either forwarder while sending events does not incur loss. Though, if I restart the indexer I always lose a small number of events.

I have verified that the events are received by the forwarders.

It is my understanding that using maxQueueSize and autoLB together should eliminate lost events. (http://docs.splunk.com/Documentation/Splunk/4.3.3/Deploy/Protectagainstlossofin-flightdata)

Here is what the splunk.log looks like:

06-28-2012 10:17:12.818 -0900 WARN TcpOutputFd - Connect to x.x.x.x:x failed. Connection refused
06-28-2012 10:17:12.818 -0900 ERROR TcpOutputFd - Connection to host=x.x.x.x:x failed
06-28-2012 10:17:12.818 -0900 WARN TcpOutputFd - Connect to x.x.x.x:x failed. Connection refused
06-28-2012 10:17:12.818 -0900 ERROR TcpOutputFd - Connection to host=x.x.x.x:x failed
06-28-2012 10:17:12.818 -0900 WARN TcpOutputProc - Applying quarantine to idx=x.x.x.x:x numberOfFailures=4
06-28-2012 10:17:12.818 -0900 WARN TcpOutputProc - Applying quarantine to idx=x.x.x.x:x numberOfFailures=5
06-28-2012 10:18:42.847 -0900 INFO TcpOutputProc - Removing quarantine from idx=x.x.x.x:x
06-28-2012 10:18:42.850 -0900 INFO TcpOutputProc - Connected to idx=x.x.x.x:x using ACK.

In this particular test I sent 150 events, 144 were indexed

What am I missing?

EDIT:

The indexer log is shown below. The log begins as I issue the restart command. In this particular test 67 of 80 events were indexed.


06-28-2012 17:11:10.296 -0400 INFO IndexProcessor - handleSignal 2: Disabling streaming searches.
06-28-2012 17:11:10.296 -0400 INFO IndexProcessor - request state change RUN to SHUTDOWNSIGNALED
06-28-2012 17:11:10.296 -0400 WARN HTTPServer - Disabling all acceptors due to shutdown
06-28-2012 17:11:10.951 -0400 INFO IndexProcessor - active realtime streams have hit 0 during shutdown
06-28-2012 17:11:11.288 -0400 WARN HTTPServer - Shutdown triggered via callback.
06-28-2012 17:11:11.288 -0400 WARN HTTPServer - HTTPServer shutting down. Terminating eloop.
06-28-2012 17:11:11.288 -0400 WARN HTTPServer - Closing all acceptors due to shutdown
06-28-2012 17:11:11.288 -0400 INFO loader - Shutdown HTTPDispatchThread
06-28-2012 17:11:11.289 -0400 INFO ShutdownHandler - Shutting down splunkd
06-28-2012 17:11:11.289 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevel
Thruput"
06-28-2012 17:11:11.289 -0400 INFO ShutdownHandler - Component 'ThruputProcessor' completed shutdown immediately
06-28-2012 17:11:11.289 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevelTcpInput"
06-28-2012 17:11:11.289 -0400 INFO TcpInputProc - Stopping IPv4 port x
06-28-2012 17:11:11.289 -0400 INFO TcpInputProc - Stopping IPv4 port x
06-28-2012 17:11:11.289 -0400 INFO TcpInputProc - Stopping IPv4 port x
06-28-2012 17:11:11.289 -0400 INFO TcpInputProc - Stopping IPv4 port x
06-28-2012 17:11:11.289 -0400 INFO TcpInputProc - Stopping IPv4 port x
06-28-2012 17:11:11.289 -0400 INFO TcpInputProc - Stopping IPv4 port x
06-28-2012 17:11:12.084 -0400 INFO TcpInputProc - Waiting for connection from src=x to close before shutting down TcpInputProcessor.
06-28-2012 17:11:12.278 -0400 WARN ProcessTracker - Failed to start process, PGID 4587; will try again
06-28-2012 17:11:14.279 -0400 WARN ProcessTracker - Failed to start process, PGID 4588; will try again
06-28-2012 17:11:17.278 -0400 WARN ProcessTracker - Failed to start process, PGID 4589; will try again
06-28-2012 17:11:21.280 -0400 WARN ProcessTracker - Failed to start process, PGID 4590; will try again
06-28-2012 17:11:22.280 -0400 WARN ProcessTracker - Failed to start process, PGID 4591; will try again
06-28-2012 17:11:24.283 -0400 WARN ProcessTracker - Failed to start process, PGID 4592; will try again
06-28-2012 17:11:26.174 -0400 ERROR DispatchCommand - Failed to start the search process.
06-28-2012 17:11:28.279 -0400 WARN ProcessTracker - Failed to start process, PGID 4593; will try again
06-28-2012 17:11:36.283 -0400 WARN ProcessTracker - Failed to start process, PGID 4594; will try again
06-28-2012 17:11:40.281 -0400 WARN ProcessTracker - Failed to start process, PGID 4595; will try again
06-28-2012 17:11:41.278 -0400 WARN ProcessTracker - Failed to start process, PGID 4596; will try again
06-28-2012 17:11:43.278 -0400 WARN ProcessTracker - Failed to start process, PGID 4597; will try again
06-28-2012 17:11:45.278 -0400 WARN ProcessTracker - Failed to start process, PGID 4598; will try again
06-28-2012 17:11:45.279 -0400 WARN ProcessTracker - Failed to start process, PGID 4599; will try again
06-28-2012 17:11:52.279 -0400 WARN ProcessTracker - Failed to start process, PGID 4600; will try again
06-28-2012 17:11:52.287 -0400 WARN ProcessTracker - Failed to start process, PGID 4601; will try again
06-28-2012 17:11:54.278 -0400 WARN ProcessTracker - Failed to start process, PGID 4602; will try again
06-28-2012 17:11:57.278 -0400 WARN ProcessTracker - Failed to start process, PGID 4603; will try again
06-28-2012 17:11:57.279 -0400 WARN ProcessTracker - Failed to start process, PGID 4604; will try again
06-28-2012 17:11:57.280 -0400 WARN ProcessTracker - Failed to start process, PGID 4605; will try again
06-28-2012 17:11:57.280 -0400 WARN ProcessTracker - Failed to start process, PGID 4606; will try again
06-28-2012 17:11:57.281 -0400 WARN ProcessTracker - Failed to start process, PGID 4607; will try again
06-28-2012 17:12:00.278 -0400 WARN ProcessTracker - Failed to start process, PGID 4608; will try again
06-28-2012 17:12:02.278 -0400 WARN ProcessTracker - Failed to start process, PGID 4609; will try again
06-28-2012 17:12:02.279 -0400 WARN ProcessTracker - Failed to start process, PGID 4610; will try again
06-28-2012 17:12:04.278 -0400 WARN ProcessTracker - Failed to start process, PGID 4611; will try again
06-28-2012 17:12:06.278 -0400 WARN ProcessTracker - Failed to start process, PGID 4612; will try again
06-28-2012 17:12:06.280 -0400 WARN ProcessTracker - Failed to start process, PGID 4613; will try again
06-28-2012 17:12:06.281 -0400 WARN ProcessTracker - Failed to start process, PGID 4614; will try again
06-28-2012 17:12:06.281 -0400 WARN ProcessTracker - Failed to start process, PGID 4615; will try again
06-28-2012 17:12:06.281 -0400 WARN ProcessTracker - Failed to start process, PGID 4616; will try again
06-28-2012 17:12:06.281 -0400 WARN ProcessTracker - Failed to start process, PGID 4617; will try again
06-28-2012 17:12:06.281 -0400 WARN ProcessTracker - Failed to start process, PGID 4618; will try again
06-28-2012 17:12:06.282 -0400 WARN ProcessTracker - Failed to start process, PGID 4619; will try again
06-28-2012 17:12:06.282 -0400 WARN ProcessTracker - Failed to start process, PGID 4620; will try again
06-28-2012 17:12:06.282 -0400 WARN ProcessTracker - Failed to start process, PGID 4621; will try again
06-28-2012 17:12:06.282 -0400 WARN ProcessTracker - Failed to start process, PGID 4622; will try again
06-28-2012 17:12:06.303 -0400 WARN ProcessTracker - Failed to start process, PGID 4623; will try again
06-28-2012 17:12:11.279 -0400 WARN ProcessTracker - Failed to start process, PGID 4624; will try again
06-28-2012 17:12:18.283 -0400 WARN ProcessTracker - Failed to start process, PGID 4625; will try again
06-28-2012 17:12:18.284 -0400 WARN ProcessTracker - Failed to start process, PGID 4626; will try again
06-28-2012 17:12:20.278 -0400 WARN ProcessTracker - Failed to start process, PGID 4627; will try again
06-28-2012 17:12:23.279 -0400 WARN ProcessTracker - Failed to start process, PGID 4628; will try again
06-28-2012 17:12:23.283 -0400 WARN ProcessTracker - Failed to start process, PGID 4629; will try again
06-28-2012 17:12:23.287 -0400 WARN ProcessTracker - Failed to start process, PGID 4630; will try again
06-28-2012 17:12:23.291 -0400 WARN ProcessTracker - Failed to start process, PGID 4631; will try again
06-28-2012 17:12:23.294 -0400 WARN ProcessTracker - Failed to start process, PGID 4632; will try again
06-28-2012 17:12:25.278 -0400 WARN timeinvertedIndex - splunk-optimize is already running for this index /opt/splunk/var/lib/splunk/
internaldb/db/hotv1677.
06-28-2012 17:12:25.278 -0400 WARN timeinvertedIndex - splunk-optimize is already running for this index /opt/splunk/var/lib/splunk/syslog/db/hotv1358.
06-28-2012 17:12:26.278 -0400 WARN ProcessTracker - Failed to start process, PGID 4633; will try again
06-28-2012 17:12:26.443 -0400 ERROR DispatchCommand - Failed to start the search process.
06-28-2012 17:12:26.454 -0400 ERROR DispatchCommand - Failed to start the search process.
06-28-2012 17:12:27.278 -0400 WARN ProcessTracker - Failed to start process, PGID 4634; will try again
06-28-2012 17:12:28.278 -0400 WARN timeinvertedIndex - splunk-optimize is already running for this index /opt/splunk/var/lib/splunk/perfmon/db/hotv112.
06-28-2012 17:12:32.280 -0400 WARN ProcessTracker - Failed to start process, PGID 4635; will try again
06-28-2012 17:12:32.280 -0400 WARN ProcessTracker - Failed to start process, PGID 4636; will try again
06-28-2012 17:12:32.280 -0400 WARN ProcessTracker - Failed to start process, PGID 4637; will try again
06-28-2012 17:12:32.280 -0400 WARN ProcessTracker - Failed to start process, PGID 4638; will try again
06-28-2012 17:12:32.280 -0400 WARN ProcessTracker - Failed to start process, PGID 4639; will try again
06-28-2012 17:12:32.280 -0400 WARN ProcessTracker - Failed to start process, PGID 4640; will try again
06-28-2012 17:12:32.281 -0400 WARN ProcessTracker - Failed to start process, PGID 4641; will try again
06-28-2012 17:12:32.281 -0400 WARN ProcessTracker - Failed to start process, PGID 4642; will try again
06-28-2012 17:12:32.281 -0400 WARN ProcessTracker - Failed to start process, PGID 4643; will try again
06-28-2012 17:12:32.281 -0400 WARN ProcessTracker - Failed to start process, PGID 4644; will try again
06-28-2012 17:12:32.281 -0400 WARN timeinvertedIndex - splunk-optimize is already running for this index /opt/splunk/var/lib/splunk/nagios/db/hotv127.
06-28-2012 17:12:37.278 -0400 WARN ProcessTracker - Failed to start process, PGID 4645; will try again
06-28-2012 17:12:41.288 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevelTcpOutput"
06-28-2012 17:12:41.288 -0400 INFO ShutdownHandler - Component 'TcpOutputProcessor' completed shutdown immediately
06-28-2012 17:12:41.288 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevel
UdpInput"
06-28-2012 17:12:41.288 -0400 INFO ShutdownHandler - Component 'UDPInputProcessor' completed shutdown immediately
06-28-2012 17:12:41.288 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevelFifoInput"
06-28-2012 17:12:41.288 -0400 INFO UDPInputProcessor - Unregistering metrics callback for: udpin
connections
06-28-2012 17:12:41.289 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevelWinEventLogInput"
06-28-2012 17:12:41.289 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevel
Scheduler"
06-28-2012 17:12:41.289 -0400 INFO ShutdownHandler - Component 'SavedSearchScheduler' completed shutdown immediately
06-28-2012 17:12:41.289 -0400 INFO ShutdownHandler - Component 'DispatchReaper' completed shutdown immediately
06-28-2012 17:12:41.289 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevelTailing"
06-28-2012 17:12:41.289 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevel
SyslogOutput"
06-28-2012 17:12:41.289 -0400 INFO ShutdownHandler - Component 'SyslogOutputProcessor' completed shutdown immediately
06-28-2012 17:12:41.289 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevelHTTPOutput"
06-28-2012 17:12:41.289 -0400 INFO ShutdownHandler - Component 'HTTPOutputProcessor' completed shutdown immediately
06-28-2012 17:12:41.289 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevel
TailingXP"
06-28-2012 17:12:41.289 -0400 INFO TailingProcessor - Received shutdown signal.
06-28-2012 17:12:41.289 -0400 INFO TailingProcessor - Will reconfigure input.
06-28-2012 17:12:41.289 -0400 INFO TailingProcessor - Calling addFromAnywhere in TailWatcher=0x7fb26886ab40.
06-28-2012 17:12:41.289 -0400 INFO TailingProcessor - Shutting down with TailingShutdownActor=0x878fef0 and TailWatcher=0x7fb26886ab40.
06-28-2012 17:12:41.289 -0400 INFO TailingProcessor - Pausing BatchReader module...
06-28-2012 17:12:41.289 -0400 INFO BatchReader - State transitioning from 0 to 1 (pseudoPause).
06-28-2012 17:12:41.289 -0400 INFO BatchReader - State transitioning from 1 to 2 (pseudoPause).
06-28-2012 17:12:41.289 -0400 INFO TailingProcessor - Removing TailWatcher from eventloop...
06-28-2012 17:12:41.289 -0400 INFO TailingProcessor - ...removed.
06-28-2012 17:12:41.289 -0400 INFO TailingProcessor - Eventloop terminated successfully.
06-28-2012 17:12:41.292 -0400 INFO TailingProcessor - Signalling shutdown complete.
06-28-2012 17:12:41.292 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevelBatchReader"
06-28-2012 17:12:41.292 -0400 INFO BatchReader - State transitioning from 2 to 3 (signalShutdown).
06-28-2012 17:12:41.292 -0400 INFO BatchReader - Terminating threadpool.
06-28-2012 17:12:41.293 -0400 INFO ShutdownHandler - Component 'BatchReader' completed shutdown immediately
06-28-2012 17:12:41.293 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevel
Archive"
06-28-2012 17:12:41.943 -0400 WARN pipeline - Exiting pipeline udp gracefully: got eExit from processor udp
06-28-2012 17:12:42.067 -0400 WARN pipeline - Exiting pipeline fifo gracefully: got eExit from processor fifo
06-28-2012 17:12:42.085 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevelAuditTrailManager"
06-28-2012 17:12:42.085 -0400 INFO ShutdownHandler - Component 'AuditTrailManager' completed shutdown immediately
06-28-2012 17:12:42.085 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevel
AuditTrailQueueServiceThread"
06-28-2012 17:12:42.085 -0400 INFO ShutdownHandler - Component 'QueueServiceThread' completed shutdown immediately
06-28-2012 17:12:42.085 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevelFSChangeMonitor"
06-28-2012 17:12:42.085 -0400 WARN pipeline - Exiting pipeline archivePipe gracefully: got eExit from processor archiveProcessor
06-28-2012 17:12:42.106 -0400 INFO TcpInputProc - Closing splunk 2 splunk IPv4 port x
06-28-2012 17:12:42.106 -0400 INFO TcpInputProc - Closing splunk 2 splunk IPv4 port x
06-28-2012 17:12:42.106 -0400 INFO TcpInputProc - Closing splunk 2 splunk IPv4 port x
06-28-2012 17:12:42.106 -0400 INFO TcpInputProc - Closing splunk 2 splunk IPv4 port x
06-28-2012 17:12:42.106 -0400 INFO TcpInputProc - Closing splunk 2 splunk IPv4 port x
06-28-2012 17:12:42.106 -0400 INFO TcpInputProc - Closing splunk 2 splunk IPv4 port x
06-28-2012 17:12:42.106 -0400 ERROR TcpInputProc - Error encountered for connection from src=10.1.120.48:38420. Local side shutting down
06-28-2012 17:12:42.713 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevel
FSChangeManagerProcessor"
06-28-2012 17:12:42.714 -0400 INFO ShutdownHandler - Component 'FSChangeManagerProcessor' completed shutdown immediately
06-28-2012 17:12:42.714 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevelSelect"
06-28-2012 17:12:42.714 -0400 INFO ShutdownHandler - Component 'PipeFlusher' completed shutdown immediately
06-28-2012 17:12:42.714 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevel
Database1"
06-28-2012 17:12:42.714 -0400 INFO IndexProcessor - request state change SHUTDOWNSIGNALED to SHUTDOWNINPROGRESS
06-28-2012 17:12:42.714 -0400 INFO PipeFlusher - Flushing pipelines...
06-28-2012 17:12:42.714 -0400 INFO PipeFlusher - Finished triggering pipeline flush.
06-28-2012 17:12:42.714 -0400 INFO IndexProcessor - shutting down: start
06-28-2012 17:12:42.714 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: _audit
06-28-2012 17:12:42.716 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: _internal
06-28-2012 17:12:42.721 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: _thefishbucket
06-28-2012 17:12:42.721 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: blackberry
06-28-2012 17:12:42.722 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: email
06-28-2012 17:12:42.722 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: history
06-28-2012 17:12:42.722 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: main
06-28-2012 17:12:42.729 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: modsecurity
06-28-2012 17:12:42.729 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: msexchange
06-28-2012 17:12:42.778 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: nagios
06-28-2012 17:12:42.781 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: os
06-28-2012 17:12:42.782 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: perfmon
06-28-2012 17:12:42.785 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: rss
06-28-2012 17:12:42.786 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: summary
06-28-2012 17:12:42.786 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: summary
forwarders
06-28-2012 17:12:42.788 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: summaryhosts
06-28-2012 17:12:42.789 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: summary
indexers
06-28-2012 17:12:42.790 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: summarypools
06-28-2012 17:12:42.790 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: summary
sources
06-28-2012 17:12:42.793 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: summarysourcetypes
06-28-2012 17:12:42.796 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: syslog
06-28-2012 17:12:42.799 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: test
06-28-2012 17:12:42.799 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: windows
06-28-2012 17:12:42.799 -0400 INFO databasePartitionPolicy - Handling shutdown or signal in DatabasePartitionPolicy: _blocksignature
06-28-2012 17:12:42.805 -0400 INFO BucketMover - 0 jobs canceled from BucketMoverExecutor
06-28-2012 17:12:42.805 -0400 INFO IndexProcessor - request state change SHUTDOWN
INPROGRESS to SHUTDOWNCOMPLETE
06-28-2012 17:12:42.805 -0400 INFO IndexProcessor - shutting down: end
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevelLoadLDAPUsers"
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - Component 'LoadLDAPUsersThread' completed shutdown immediately
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevel
MetricsManager"
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - Component 'MetricsManager' completed shutdown immediately
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevelPipeline"
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - Component 'Pipeline:dev-null' completed shutdown immediately
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - Component 'Pipeline:indexerPipe' completed shutdown immediately
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - Component 'Pipeline:exec' completed shutdown immediately
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - Component 'Pipeline:fschangemanager' completed shutdown immediately
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - Component 'Pipeline:tcp' completed shutdown immediately
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - Component 'Pipeline:merging' completed shutdown immediately
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - Component 'Pipeline:typing' completed shutdown immediately
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - Component 'Pipeline:stashparsing' completed shutdown immediately
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - Component 'Pipeline:parsing' completed shutdown immediately
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevel
Queue"
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - Component 'parsingQueue' completed shutdown immediately
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - Component 'parsingQueue' completed shutdown immediately
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - Component 'parsingQueue' completed shutdown immediately
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - Component 'parsingQueue' completed shutdown immediately
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - Component 'parsingQueue' completed shutdown immediately
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - Component 'indexQueue' completed shutdown immediately
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - Component 'parsingQueue' completed shutdown immediately
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - Component 'aggQueue' completed shutdown immediately
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - Component 'typingQueue' completed shutdown immediately
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - Component 'indexQueue' completed shutdown immediately
06-28-2012 17:12:42.805 -0400 INFO ShutdownHandler - shutting down level "ShutdownLevel_Exec"
06-28-2012 17:12:42.805 -0400 WARN pipeline - Exiting pipeline indexerPipe gracefully: got eExit from processor indexer
06-28-2012 17:12:42.807 -0400 INFO ShutdownHandler - Shutdown complete in 91.52 seconds
06-28-2012 17:12:42.831 -0400 WARN pipeline - Exiting pipeline exec gracefully: got eExit from processor exec
06-28-2012 17:12:43.075 -0400 INFO TcpInputProc - Finishing shutdown. Timed out in 90 seconds
06-28-2012 17:12:43.075 -0400 WARN pipeline - Exiting pipeline tcp gracefully: got eExit from processor tcp
06-28-2012 17:12:43.526 -0400 WARN pipeline - Exiting pipeline stashparsing forcefully
06-28-2012 17:12:43.581 -0400 WARN pipeline - Exiting pipeline dev-null forcefully
06-28-2012 17:12:43.713 -0400 WARN pipeline - Exiting pipeline fschangemanager gracefully: got eExit from processor fschangemanager
06-28-2012 17:12:43.714 -0400 WARN pipeline - Exiting pipeline parsing forcefully
06-28-2012 17:12:43.714 -0400 WARN pipeline - Exiting pipeline merging forcefully
06-28-2012 17:12:43.714 -0400 WARN pipeline - Exiting pipeline typing forcefully
06-28-2012 17:12:43.714 -0400 WARN loader - All pipelines finished.

06-28-2012 17:12:52.291 -0400 INFO loader - Splunkd starting (build 123586).

EDIT # 2

Added the following to the inputs.conf:

queueSize=50KB
persistentQueueSize=400MB

Doing this resulted in duplication: 100 events triggered, 169 indexed

Screenshot:

alt text

1 Solution

Splunk Employee
Splunk Employee

I believe that if you just use "useACK =true" in your forwarder's outputs.conf (no need to touch persistentQueueSize in your forwarder's inputs.conf), you would not suffer from any events lost even when restarting your indexers.

When using that "useACK=true", your indexer get some duplicates events due to the momentary discontinuity between your forwarder and indexer, but this is far better than losing some events. And to deal with those duplicates, you could just use dedup to get the unique events out of it. This should just solve it.

useACK=true guarantees full delivery, try that to see how it goes

View solution in original post

Explorer

I got an answer from Splunk support. It turns out that duplication is expected behavior.

Chad,

Unfortunately, there is no solution to
the duplicated events. With the
"useACK" functionality, there's times
where the forwarder doesn't get the
ack message after the indexer has
written the event to the log. This is
expected behavior.

Brian

Here is the results of Splunk support's test:

24460 Messages Sent
24444 Messages received by the forwarder (tcpdump)
25276 Messages received by the indexer
832 Duplicated messages

Splunk Employee
Splunk Employee

I believe that if you just use "useACK =true" in your forwarder's outputs.conf (no need to touch persistentQueueSize in your forwarder's inputs.conf), you would not suffer from any events lost even when restarting your indexers.

When using that "useACK=true", your indexer get some duplicates events due to the momentary discontinuity between your forwarder and indexer, but this is far better than losing some events. And to deal with those duplicates, you could just use dedup to get the unique events out of it. This should just solve it.

useACK=true guarantees full delivery, try that to see how it goes

View solution in original post

Communicator
0 Karma

Explorer

I get a weird duplication issue when I try that. Triggered 100 events, see 169 on the indexer. (see screenshot above)

0 Karma

Explorer

@Ayn - No, there are no TCP errors during normal operation. I have not seen events lost during normal operation - only during restarts.

0 Karma

Legend

Umm, the portions of splunkd.log I was referring to was rather what happens BEFORE you issue a restart command - that is, in what is supposed to be normal operations, do you see errors on the indexer side with regards to TCP connections?

0 Karma

Legend

Do you see corresponding events on the indexer that might give you clues about why it is refusing connections?

0 Karma