I am having trouble getting
_audit to be forwarder properly when being passed through more than one forwarder. Any ideas on what I should try or why this doesn't not work?
Here is a simple example layout using 3 servers (a,b,c). Where
splunk-b are normal forwarders (e.g. not light-weight forwarders), and
splunk-c is the central indexer.
Splunk server diagram:splunk-a --> splunk-b:9997 --> splunk-c:9997
In my setup, all three servers are now running 4.0.10. I recently upgraded
splunk-b from 3.4.x, and I am seeing some different results, but still not what would would expect.
What I'm finding is that internal events for
splunk-a are being dropped, and with Splunk 3.4 these events were actually getting indexed on
splunk-b, but in botch cases, the internal events are not being passed to
splunk-c as I would expect them to be. All events that are generated on
splunk-b are properly being forwarded and indexed on
splunk-c, as expected.
I can update this post with my inputs.conf and outputs.conf upon request.
Background info: Our setup is such that we have a central splunk indexer (
splunk-c) on our trusted network. We also have a central forwarding splunk instance in our DMZ (
splunk-b), which forwards all splunk DMZ events (i.e. from
splunk-a) to the trusted network. (From a networking/firewall perspective, this seems safer than letting just any machine in the DMZ forward events to our internal central splunk indexer. But I'm willing to rethink this if there are good reasons.)
Update: I somehow found a way to get the internal events to be forwarded across
splunk-b. It seems like adding the
_TCP_ROUTING entry for the receiving splunktcp port in
inputs.conf made the difference. This input now looks like:
[splunktcp://9997] _TCP_ROUTING = * sourcetype = tcp-9997
Of course, there is still something very weird about this. The internal events seem lag between 2 to 7 minutes behind the rest of the events. (In other words, the latest events from the search
host=splunk-a sourcetype=access_common will be from within the past 30 seconds or so, whereas the latest events from
host=splunk-a index=_internal will often be many minutes old.
Update #2: This issue has gone away after upgrading
splunk-b to 4.1.x. So if anyone else is running into similarity weird forwarding stuff, then I'd recommend upgrading to 4.1 based on my experiences.
I also had an issue with all events being double up from
splunk-c (at one point it thought that only the
_internal events were being doubled). Again, the upgrade 4.1. fixed this.
I updated the post. (All forwarders are heavy.) I'm not using
SplunkForwarder app, but the same config values are being set.
splunk-a is running a "semi-light" forwarder; we have a custom forwarding app that disables unused features, but all events are still being processed locally and forwarding cooked events; so therefore I think it would still be considered "heavy". (I worked with Splunk support to confirm the validity of our custom forwarding app.)
Splunk, historically, had a policy of not forwarding events belonging to indexes beginning with underscores. This includes _internal.
To get _internal events to be forwarded for the light forwarder case (where there was no local indexing), this weird '_TCP_ROUTING = *' setting at the input layer was applied to override that policy to cause the events to be forwarded regardles.
Meanwhile there is some sort of bug floating around regarding forwarding in 4.0.10, which you are probably seeing a manifestation of in your delays. More research / answering is required to address this. I think support should be engaged.
Thanks for your response. (Good work with the new
forwardedindex.* settings, that does seem like a much more logical approach.) For the moment, I think I'll wait and see if 4.0.11 (once its released) fixes anything. Or perhaps give 4.1.1 a try (I hesitate on installing any
x.y.0 version of anything on a production system).
Upgrading my forwarding systems to 4.1.x resolve my forwarding issues. (I also update the post, see "Update #2")