We have a dashboard panel showing network traffic. I want to override the default values used by Splunk.
last 60min: span=1m
last 24h: span=15m
last 7 days: span=1h
last 30days: span=4h
all time: span=1d
Our first version of the panel used a hardcoded span of 15m, but obviously that won't work well when you're looking at 30days or all time.
... View more
We have ~50 hosts that are placed on various locations outside our data center. To receive logs from these hosts we have setup a virtual machine on ec2 to relay the logs to our Splunk Platform.
From time to time we see that the amount of indexed data drops (the number of events is more or less the same since the bulk of the data is perfmon-events from Windows Servers). When this happens we can see in metrics.log (on the ec2 host) that some of the queues are blocked:
05-28-2015 21:00:52.585 +0200 INFO Metrics - group=queue, name=indexqueue, blocked=true, max_size_kb=500, current_size_kb=499, current_size=1845, largest_size=1845, smallest_size=1845
05-28-2015 21:00:52.585 +0200 INFO Metrics - group=queue, name=typingqueue, blocked=true, max_size_kb=500, current_size_kb=499, current_size=1848, largest_size=1848, smallest_size=1848
Restarting splunk solves the issue, but it returns after a random amount of days.
I'm trying to grasp how the queues works from http://wiki.splunk.com/Community:HowIndexingWorks, and as far as I understand the indexqueue is what writes data to disk?
Even if we're "just" relaying data through the ec2-host does all the logdata pass through each queue, get written to disk, and then forwarded?
Because the issue is not constant, but appear on random times, I suspect the root cause might be problems with ec2 (amazon doing maintenance without our knowledge, high load on the underlying ebs volumes etc) degrading performance. Am I on the right track here, or are there other reasons that are more likely?
Is there any tuning that could be done to omit these issues, or is it just a "throw more/better hw at it"-problem?
We would really appreciate some feedback. We want to start indexing our IIS logs as well. This will significantly increase the volume of events indexed, but we can't enable it before we're sure our architecture is stable
... View more