How To Investigate Hiccups (With Evidence)

morethanyell · ‎03-07-2019

Hello everyone!

I've tried looking at the _internal splunkd logs but couldn't make sense out if it. Boss is asking why there had been, suddenly, an abnormal gap between the events we're ingesting for a particular period in time. It's back to normal but we're trying to figure out what happened. I couldn't figure it out.

Thanks in advance for any input you can provide.

nickhills · ‎03-08-2019

Assuming your _internal logs go back far enough...

Take a look at the metrics.log - this will show you the rate/volume of data processed both by your indexers and forwarders.
You should be able to chart this data to find the gaps, and from this work out some time windows when your logs dried up.

Once you have these times, head over to the internal logs and look for messages about blocked queues, or major indexing issues...
However, if the problem was elsewhere (like your network) you're probably going to have very little to go on, but an absence of errors may help you identify that the issue was not with Splunk.

Ideally of course, you'd be 'Splunking' your network hardware - so that gives you something else to look at (if you have it)

If my comment helps, please give it a thumbs up!

morethanyell · ‎03-11-2019

Thank you. I will try this.

How To Investigate Hiccups (With Evidence)

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability

span_metrics: The OpenTelemetry-Idiomatic Way to See Inside Your Services

Join the Conversation

How To Investigate Hiccups (With Evidence)

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability

span_metrics: The OpenTelemetry-Idiomatic Way to See Inside Your Services