Hi we have a rather complicated setup, part of which uses an intermediate forwarder (full wf) to pass events from a series of light weight forwarders to a central index.
We would like to conduct all our transformations and data processing on the indexer, with the intermediate forwarder acting as nothing more than just a relay point for the light weight forwarders which point to it.
The problems I've been facing are rather strange. originally I noticed that splunk was not doing any transformations, field extractions or time stamping on the data that was coming through the intermediate forwarder. I am assuming that as it passes through the forwarder the data get's "cooked" so when it arrives at the indexer, the indexer assumes it does not need to do anything other than add it to the index.
So I changed the outputs.conf on the intermediate forwarder to send raw data.
[tcpout] defaultGroup = blahblah disabled = false [tcpout:blahblah] server = indexer:9996 sendCookedData = false
Hoping it would spur the indexer into doing the data processing. However this just lead to splunkd crashing on the intermediate forwarder every time it tried to connect to the indexer after it spewed these messages below a few hundred times (crash report here).
Taken from the intermediate forwarder,
01-31-2011 01:52:59.846 WARN TcpOutputProc - TcpSendThread: Connection to server 10.137.7.20:9996, fd:23 lost - retrying: Broken pipe 01-31-2011 01:52:59.846 INFO TcpOutputProc - attempting to connect to indexer:9996...
Taken from the indexer
<Monday Januar<90>^Yó^C! from hostname=intermediateforwarder, ip=XXXXXXXXXX, port=34447 01-31-2011 08:47:51.676 INFO TcpInputProc - Hostname=intermediateforwarder closed connection 01-31-2011 08:47:52.444 ERROR TcpInputProc - Received unrecognized signature
This lead me to reverting back to sending cooked data, which worked, (ie splunk did not crash and the data makes it through, but unanalysed),
current outputs.conf on the intermediate forwarder.
[tcpout] defaultGroup = blahblah disabled = false [tcpout:blahblah] server = indexer:9996
Then I went down the route of trying to force the indexer to reanaylze the data, by adding route=... to inputs .conf as below.
[default] host = indexer [splunktcp://9996] route=has_key:_utf8:parsingQueue;has_key:_linebreaker:parsingQueue;absent_key:_utf8:parsingQueue;absent_key:_linebreaker:parsingQueue;
This nearly works, all the data now gets transformed and has its fields extracted but, about 40% of the data is not being correctly timestamped (ie exactly X hours in the future). Its strange because the date format is exactly the same, same sourcetypes, and even the same sources of data are appearing to be correctly indexed and indexed in the future at random.
I tried disabling all timestamp extraction on the indexer and just logging the events as they came in, however this still resulted in "future events".
As such I am completely out of ideas on what my next move should be.
I know the obvious solution would be to do all the transforms on the intermediate forwarder, but this is something we would like to avoid doing.
If you have ideas any I'd be glad to hear them.
ps our enviroment is a mix of solaris and linux (red hat), all splunk instances are 4.1.6.
I know this is likely not what you want to hear, but I suggest just putting your props.conf and transforms.conf entries for your extractions etc onto your intermediate forwarder.
What you want is to just set up your intermediate forwarder as a LWF, instead of a heavy forwarder. The most significant difference between light and heavy forwarders is precisely that a heavy does the parsing of the data.
There are two things you have to modify about the base LWF installation, however:
Re-enable Splunk TCP input so you can receive forwarded data. In default-mode.conf:
[pipeline:tcp] disabled = false
Disable or increase the throughput throttle. In limits.conf:
[thruput] maxKBps = 0
This will give you exactly what you're asking for, a forwarder that forwards data without parsing it.