We have setup splunk in our environment, and we have logs coming in from different geographies (US/UK/Asia). The logs, all have different timestamps, but we have used a light forwarder to convert them all to current server time using ($SPLUNKHOME/etc/apps/search/local/props.conf):
DATETIME_CONFIG = CURRENT
Also the inputs.conf and outputs.conf are properly configured, and everything works fine.
But then after a few hours, i am unable to see any data coming from some of the machines (UK/Asia). I checked splunkd.log of light forwarder, there wasn't any ERROR in it.
I checked metrics.log of forwarder, it seems to be getting updated with each update in UK/Asia machines, but no data is going to the splunk receiver.
Checked splunkd.log at splunk receiver end, it contains this ERROR:
09-17-2012 08:05:20.470 -0400 ERROR SearchResults - Unable to write to file '/opt/splunk/etc/users/abcd/search/history/hostname.csv'. Retried 5 times, period=500 ms. error='No such file or directory'
but i don't think that is related to the issue in any way.
All clients and splunk receiver is on Linux, forwarder is on windows 2008.
Can someone please help on how to debug the issue and what could be causing it?
I have restored the system to a state (many times) where everything is working but then again the problem comes back.
I have narrowed down the problem to communication issue between jboss AS and forwarder.
Recreating the problem from start:
we have setup JBoss AS7 in our environment. our application servers are situated in different geographies (US/UK etc). splunk server/forwarders are situated in US and different geographies are connected by vpn. when the setup is ready, we restart application servers to make connection in raw mode to splunk forwarders.
this works fine, but after an hour (or so) of sending log data from application servers to forwarders, UK servers stop sending the data to splunk forwarders.
there isn't any ERROR in splunkd.log on forwarder, and the metrics.log shows that it is still having tcpin connection from UK machine.
i created a shell script to send data from UK server on same tcp port of forwarder and it showed up fine in splunk server, so that means something went wrong between communication of jboss AS and forwarder.
I enabled deployment monitor app on server. It is showing up the forwarder as fine and there is a consistent connection between splunk server and forwarder. So then why my application server data have stopped showing up on server?
splunk "All Forwarders" status in deployment monitor app is:
my_forwarder heavy forwarder 4.3.4 Linux 09/19/12 12:15:58 PM 09/19/12 12:15:58 PM active 26.0500 0.0338
splunk "All Indexer" status in deployment monitor app is:
my_indexer normal 09/19/12 12:00:38 PM 0.0000
So then where the things can be going wrong???