Here's my setup: I have three clustered indexers, two search heads, a deployment server, as well as several Heavy Forwarders (three Windows and three Linux). I've been collecting Windows logs remotely from the HF via WMI no problems for a while. This week, I decided to install a universal forwarder on two servers as a pilot in preparation for further deployments.
After installing, I found I was getting no log events at all. So I commenced troubleshooting.
First I checked to see if the indexers were receiving data by running tcpdump and I saw the logs and metrics coming over the wire to the indexers. CHECK
Then I checked to see if the records were in ANY index by running the following search:
index = * host=hostnames
This returned nothing. So I searched:
And while this returned multiple events, none were FROM those machines.
Then, I checked to see if there were records in the _internal index from those servers. CHECK
Then, I looked to see if any of those _internal records contained errors. No entries that said ERROR, so tentative CHECK
Then I looked on each server where where the UF was installed and looked in splunkd.log for errors. Just one:
AuditTrailManager - Private key error Error opening C:\Program Files\SplunkUniversalForwarder\etc\auth\audit\private.pem: The system cannot find the patch specified.
But I was kind of expecting this as I told the UF to use Splunk own internal certificate during install? Not sure if this is a factor....
So no other errors.
Here's C:\Program Files\SplunkUniversalForwarder\etc\apps\Splunk_TA_Windows\local\inputs.conf
[WinEventLog://Application] disabled = 0 index = wineventlog [WinEventLog://Security] disabled = 0 index = wineventlog [WinEventLog://System] disabled = 0 index = wineventlog [WinEventLog://Windows Powershell] disabled = 0 index = wineventlog
Here's C:\Program Files\SplunkUniversalForwarder\etc\system\local\outputs.conf
# BASE SETTINGS [tcpout] defaultGroup = primary_indexers [tcpout:primary_indexers] server = ip1:9997, ip2:9997, ip3:9997 ## autolbsettings autoLB = true autoLBFrequency = 15 forceTimebasedAutoLB = true
Some other posts have mentioned that there could be a permissions issue. Is there a way to verify that? I installed this UF with the same domain admin account that the HF are using to pull logs via WMI so there shouldn't be a permissions issue?
What other steps can I take to fix this?
I'm currently having this issue. I am seeing metrics logs coming in when searching "index=_internal" on my cluster.
But I do not see the data coming in.... At one time it did work and it seems things sometimes work? I am currently trying to troubleshoot .... Windows Universal Forwarder 6.4.3 (port 9997-> intermediate forwarder 6.4.3 -> Index Cluster
Any other information to help isolate this issue? Thank You
OK, I ended up opening a ticket on this, doing some more troubleshooting and giving them a diag, but no smoking gun was really found.
Then I had some work things come up and I didn't get to work on the problem for a couple of days. During that time things started working and events from all three main windows logs showed up in my indexers, but not for any reason I could tell as I hadn't had a chance to implement the latest suggestions from tech support.
I think in the end, the msi needed to be installed (at least in my environment) by running it from a administrator command prompt and choosing to install it as the local administrator account. Then, after it has finished installing, waiting for a few hours/days and then events started showing up in my indexes.
I'm NOT saying this is the right answer, but I didn't want to leave this question hanging.
Thanks to @jkat54 for all the help.
Sorry I went cold on you. I lost visibility on the question. So some group policy or something was getting in the way. I've always installed splunk as non-priveleged accounts and I've always run into a different issue that was always related to some silly something / policy implemented by who knows who and who knows when, etc. One time I spent weeks trying to solve something and it turned out the vendor had disabled service accounts somehow. You could add them, give them passwords etc, but when you tried to use it as a service account whatever service would fall on its face... SMH.. As with everything computer, you just never know...
The events are not in the _internal log.
Furthermore, I performed a general search index=* host=hostname and found that I HAD gotten some results.
From 2pm 3 May 2016 to midnight 3 May 2016, I received about 100,000+ events per hour. Then it has dropped off to maybe one event per hour.
and even then, it's only been the events from the system log.
I just tried the SPL99687 suggestion from http://docs.splunk.com/Documentation/Forwarder/latest/Forwarder/KnownIssues
and when I stopped and restarted splunk, THOSE TWO log entries showed in the search. But still nothing else.
double checking spelling again....
I should say I've been looking at the following posts, but have not gotten a solution from them yet:
So here are the results:
runas /noprofile /env /netonly /user:domain\username "c:\windows\system32\eventvwr.msc"
RUNAS ERROR: Unable to run - eventvwr.msc
193: eventvwr.msc is not a valid Win32 application.
To verify I ran just eventvwr.msc. That worked
I ran runas /noprofile /env /netonly /user:domain\username "notepad.exe"
I tried both of the above from the command prompt AND the elevated command prompt with the exact same results.
as follow up... adding permission for users can be tricky:
well then the account has permission 😉
Are there a LOT of events in the logs? maybe from 2006 and beyond... if so it will take a while for the newer events to be read (depends on everything from size of the box to network throughput) etc. but events older than 6 years might be getting rolled to frozen as soon as they arrive, etc.
The server is three years old and yes I forgot to put that limit into the .conf files.
Is there a way to determine where those events are going? If they are in any index?
While I had turned off the UF last night, it's been running now for four hours today and still nothing is showing up (I just checked)
OK I checked _internal and no entries. So then I re-ran the search for anything from that server, just to see if anything turned up (index=wineventlog host=hostmanes) and lo and behold ONE event showed up!
But only one and from the system log.
So I expanded the search to all time (because why not) and it seems that your previous theory was right, I have VERY few events after midnight 3 May 2016, hundreds of thousands of events between 2pm and midnight and then almost nothing prior to that. 2pm on 3 May was about when I installed the UF.
So that leads to another question, can I stop the UF, add in the history limit of 3 days and restart? Or at this point will it ignore that config?
And why isn't it getting any logs after midnight?