All Apps and Add-ons
Highlighted

Production Splunk stopped showing search results

Explorer

HI Team,

We have been facing the issue with Splunk for 6 hours. Suddenly our Splunk stopped showing results of all dashboards. The splunk node(centos box) also not responding properly. I am attaching the entire splunkd.log .
I am seeing so many WARN messages in the splunkd.log . Please any inputs on this as its production cluster.
Quick help will be appreciated.link text

0 Karma
Highlighted

Re: Production Splunk stopped showing search results

Ultra Champion

From where is this splunkd.log? It seems to be from the forwarder.

We can see many messages such as -

12-09-2018 01:07:08.755 -0500 INFO  TailReader - Could not send data to output queue (parsingQueue), retrying...
0 Karma
Highlighted

Re: Production Splunk stopped showing search results

Contributor

@prathapkcsc

From the logs i can deduce the following :

  1. The queues are blocked and filled which is allowing no data to be searched or indexed
  2. I would see if there is a resource intrusive search being run somewhere due to which the R/W operations are impacted.
  3. You may do a quick restart of the box but that would just interrupt the search operation for a limited time and then initiate this issue again, only if the search is a continuous one.
0 Karma
Highlighted

Re: Production Splunk stopped showing search results

Contributor

Also let us know from where did you deduce the log, cause its tough to understand as we can just give a broader perspective.

0 Karma
Highlighted

Re: Production Splunk stopped showing search results

Explorer

The above log is from Splunk Master. Yesterday i restarted the splunk master, but after 1 hour every thing became bad as i said above.

0 Karma
Highlighted

Re: Production Splunk stopped showing search results

Explorer

I am also seeing below logs info continuously repeated(look for datasource line)
AggregatorMiningProcessor - Breaking event because limit of 256 has been exceeded - data
source="/var/log/hive/metrics-hivemetastore/metrics.log", datahost="sfiappnwh026.statefarm-dss.com", datasourcetype="metrics"
12-09-2018 02:01:52.337 -0500 WARN DateParserVerbose - A possible timestamp match (Tue Dec 24 06:51:04 2019) is outside of the acceptable time window. If this timestamp is correct, consider adjusting MAXDAYSAGO and MAXDAYSHENCE. Context: source::/var/log/hive/metrics-hivemetastore/metrics.log|host::sfiappnwh026.statefarm-dss.com|metrics|83303
12-09-2018 02:01:52.338 -0500 WARN AggregatorMiningProcessor - Breaking event because limit of 256 has been exceeded - datasource="/var/log/hive/metrics-hivemetastore/metrics.log", datahost="sfiappnwh026.statefarm-dss.com", datasourcetype="metrics"
12-09-2018 02:02:24.493 -0500 INFO TailReader - ...continuing.
12-09-2018 02:02:24.503 -0500 WARN DateParserVerbose - Failed to parse timestamp. Defaulting to timestamp of previous event (Sun Dec 9 01:45:01 2018). Context: source::/var/log/rabbitmq
queuesizecheck.out|host::sfisvlnwh007.statefarm-dss.com|breakabletext|233395
12-09-2018 02:03:01.494 -0500 INFO TailReader - Could not send data to output queue (parsingQueue), retrying...
12-09-2018 02:03:27.099 -0500 WARN PeriodicReapingTimeout - Spent 85084ms updating search-related banner messages
12-09-2018 02:03:27.105 -0500 INFO PipelineComponent - MetricsManager:probeandreport() took longer than seems reasonable (96007 milliseconds) in callbackRunnerThread. Might indicate hardware or splunk limitations.
12-09-2018 02:03:28.717 -0500 WARN AggregatorMiningProcessor - Breaking event because limit of 256 has been exceeded - datasource="/var/log/hive/metrics-hivemetastore/metrics.log", datahost="sfiappnwh026.statefarm-dss.com", datasourcetype="metrics"
12-09-2018 02:03:28.717 -0500 WARN AggregatorMiningProcessor - Changing breaking behavior for event stream because MAX
EVENTS (256) was exceeded without a single event break. Will set BREAKONLYBEFOREDATE to False, and unset any MUSTNOTBREAKBEFORE or MUSTNOTBREAKAFTER rules. Typically this will amount to treating this data as single-line only. - datasource="/var/log/hive/metrics-hivemetastore/metrics.log", datahost="sfiappnwh026.statefarm-dss.com", datasourcetype="metrics"
12-09-2018 02:03:28.792 -0500 INFO TailReader - ...continuing.
12-09-2018 02:03:28.872 -0500 WARN AggregatorMiningProcessor - Breaking event because limit of 256 has been exceeded - datasource="/var/log/hive/metrics-hivemetastore/metrics.log", datahost="sfiappnwh026.statefarm-dss.com", datasourcetype="metrics"
12-09-2018 02:03:28.872 -0500 WARN AggregatorMiningProcessor - Changing breaking behavior for event stream because MAX
EVENTS (256) was exceeded without a single event break. Will set BREAKONLYBEFOREDATE to False, and unset any MUSTNOTBREAKBEFORE or MUSTNOTBREAKAFTER rules. Typically this will amount to treating this data as single-line only. - datasource="/var/log/hive/metrics-hivemetastore/metrics.log", datahost="sfiappnwh026.statefarm-dss.com", datasourcetype="metrics"
12-09-2018 02:03:33.795 -0500 INFO TailReader - Could not send data to output queue (parsingQueue), retrying...

0 Karma
Highlighted

Re: Production Splunk stopped showing search results

Contributor

Okay, if these are from the Cluster Master, can you share the log files of one of the Indexers ? If the Master is unable to forward its data to the IDX, we would need to identify if its an IDX issue.

0 Karma
Highlighted

Re: Production Splunk stopped showing search results

Explorer

Hi,
The below is the splunk universal forwarder log. 10.61.1.81 is the master node. I am afraid 10.61.1.81 indexer too.

12-10-2018 04:16:20.474 +0000 INFO HttpPubSubConnection - Running phone uri=/services/broker/phonehome/connection10.61.1.788089sfiappnwh027.statefarm-dss.comsfiappnwh027.statefarm-dss.comE56FB3B9-46F9-4430-8F91-AA8496ED0C2A
12-10-2018 04:17:20.475 +0000 INFO HttpPubSubConnection - Running phone uri=/services/broker/phonehome/connection
10.61.1.788089sfiappnwh027.statefarm-dss.comsfiappnwh027.statefarm-dss.comE56FB3B9-46F9-4430-8F91-AA8496ED0C2A
12-10-2018 04:17:57.809 +0000 WARN TcpOutputProc - Forwarding to indexer group default-autolb-group blocked for 200 seconds.
12-10-2018 04:18:20.478 +0000 INFO HttpPubSubConnection - Running phone uri=/services/broker/phonehome/connection10.61.1.788089sfiappnwh027.statefarm-dss.comsfiappnwh027.statefarm-dss.comE56FB3B9-46F9-4430-8F91-AA8496ED0C2A
12-10-2018 04:19:20.482 +0000 INFO HttpPubSubConnection - Running phone uri=/services/broker/phonehome/connection
10.61.1.788089sfiappnwh027.statefarm-dss.comsfiappnwh027.statefarm-dss.comE56FB3B9-46F9-4430-8F91-AA8496ED0C2A
12-10-2018 04:19:37.826 +0000 WARN TcpOutputProc - Forwarding to indexer group default-autolb-group blocked for 300 seconds.
12-10-2018 04:20:20.485 +0000 INFO HttpPubSubConnection - Running phone uri=/services/broker/phonehome/connection10.61.1.788089sfiappnwh027.statefarm-dss.comsfiappnwh027.statefarm-dss.comE56FB3B9-46F9-4430-8F91-AA8496ED0C2A
12-10-2018 04:20:27.834 +0000 WARN TcpOutputProc - Raw connection to ip=10.61.1.81:9997 timed out
12-10-2018 04:20:27.834 +0000 INFO TcpOutputProc - Ping connection to idx=10.61.1.81:9997 timed out. continuing connections
12-10-2018 04:21:17.842 +0000 WARN TcpOutputProc - Forwarding to indexer group default-autolb-group blocked for 400 seconds.
12-10-2018 04:21:20.489 +0000 INFO HttpPubSubConnection - Running phone uri=/services/broker/phonehome/connection
10.61.1.788089sfiappnwh027.statefarm-dss.comsfiappnwh027.statefarm-dss.comE56FB3B9-46F9-4430-8F91-AA8496ED0C2A
12-10-2018 04:22:20.492 +0000 INFO HttpPubSubConnection - Running phone uri=/services/broker/phonehome/connection10.61.1.788089sfiappnwh027.statefarm-dss.comsfiappnwh027.statefarm-dss.comE56FB3B9-46F9-4430-8F91-AA8496ED0C2A
12-10-2018 04:22:57.859 +0000 WARN TcpOutputProc - Forwarding to indexer group default-autolb-group blocked for 500 seconds.
12-10-2018 04:23:20.495 +0000 INFO HttpPubSubConnection - Running phone uri=/services/broker/phonehome/connection
10.61.1.788089sfiappnwh027.statefarm-dss.comsfiappnwh027.statefarm-dss.comE56FB3B9-46F9-4430-8F91-AA8496ED0C2A
12-10-2018 04:24:20.498 +0000 INFO HttpPubSubConnection - Running phone uri=/services/broker/phonehome/connection10.61.1.788089sfiappnwh027.statefarm-dss.comsfiappnwh027.statefarm-dss.com_E56FB3B9-46F9-4430-8F91-AA8496ED0C2A
12-10-2018 04:24:37.876 +0000 WARN TcpOutputProc - Forwarding to indexer group default-autolb-group blocked for 600 seconds.

0 Karma
Highlighted

Re: Production Splunk stopped showing search results

Splunk Employee
Splunk Employee

If you're using a Heavy forwarder go to Settings>Monitoring console>Indexing>Indexing performance. In the snapshot panel, see the status of your indexing.
If you're not using a HF, then follow the same steps in your indexer(s) to see what blocks.

0 Karma
Highlighted

Re: Production Splunk stopped showing search results

Contributor

Hello @prathapkcsc

From the UF, the data to HF/IDX is blocked cause of queues majorly. You might need to see why the queues are blocked at the IDX end. Probably due to low disk or probably due to an extensive search being run or network connectivity

0 Karma