User of splunk attempted a search of index="os"
It returns nothing after Dec 23. (Yes this went unnoticed for this long. We were on a single version of RedHat until recently).
Splunk servers are all RH7.9
Version:8.2.4
Build:87e2dda940d1
Clients are all 7.9 or 8.5
1 - probably
2 - if by that you mean /opt/splunk/splunkforwarder, yes that is the default on all clients in our environment
3 - I see almost all of our servers using the search given
4 - haven't touched our config files since installation. I have done splunk updates and OS patching. Both using a shutdown/patch-or-update/restart sequence that has been approved directly by splunk.
I expect I will get nowhere here as the answers so far have presumed knowledge that the admin team here was _never_ _given_. Again. We were supposed to have training on days 4 and 5 of installation. But since days 1 and 2 were taken doing tasks that we were told had to be done before install could happen - even though we asked what to do before the installation - we DID NOT GET TRAINING.
I know the very basics. But nothing more.
Sorry for unclear questions 😉
As you have done both OS and splunk updates that could be the reason (especially if you are starting those via init.d).
When you are running that previous query with time span after that patching time, are you still seeing those UF hosts (e.g. add earliest=-1d to the first line).
If not then it's possible that those splunk processes are waiting some confirmation before they are full functional mode on source system? You can check it by login one of those and then try then next
sudo -u<your splunk user> bash
/opt/splunkforwarder/bin/splunk status
If it's running normally this should give you some process id etc.
You could also check what you have on /opt/splunkforwarder/var/log/splunk/splunkd.log
There should be information what UF has done and if there is any errors which prevent splunk to start.
If it's not running then just start it with
/opt/splunkforwarder/bin/splunk start --accept-license --answer-yes
Run this as your splunk user.
r. Ismo
next step.
ran another oneliner to check splunk status
While there are a few which do not have license accepted, which I will fix shortly, all others are running properly, and the log shows current activity.
OK. We're getting somewhere.
Step by step, maybe we'll find something.
@isoutamo's search should give you number of events (which is a bit less important at the moment) and the latest event you ingested from given server. Are those times long ago or are they fairly recent?
As you say you have /opt/splunk/splunkforwarder directory on those servers (which is good), check if you have a process called splunkd running. For example, by running
ps auxw | grep splunkd
If you're getting some results, that means the forwarder is running but the events are not being ingested. But if you're not getting results, that means that the forwarder process is not running at all. Maybe during your OS upgrade operations you did a reboot and the system was not configured to start the forwarder automatically?
Ran an ansible one-liner against all my clients. All are running splunkd.
Will check the next suggestion shortly.
(work got busy)
You should check from which nodes have sent data to it before that. Then check from MC or query from _internal log if those are still sending any data to your splunk node. If not then log into those hosts and check if splunk is running on those or are there missing inputs or something else like FW is blocking sending/receiving events.
r. Ismo
huh. on admin nodes I get either a reading error or a message about 0 population, but splunk is running on all splunk servers.
I think we're getting some things confused a bit here.
Please correct me if I'm wrong anywhere.
1) Your index=os is supposed to collect data from remote (i.e. not hosting splunk infrastructure) hosts
2) You are using Universal Forwarders installed on those hosts to get events from them
The obvious questions (some of which @isoutamo already asked, but I'm not sure if you interpreted them correctly) are:
1) Have there been any changes introduced to your splunk environment around the time the events stopped being ingested?
2) If you are indeed using Universal Forwarders - are they running on the hosts you are getting events from? Not on your splunk servers! On those non-splunk servers you're pulling your logs from.
3) If your Universal Forwarders are running, verify if logs are being ingested from those hosts into _internal index - @isoutamo showed you a search for it
4) If you have recent events in _internal but don't have recent events in the os index, well... either you're filtering them out on your splunk servers (less probable) or they are simply not being ingested from the source machines (more probable). In either case you'd have to go through your your config starting from looking through inputs.conf on your UFs.
And you see internal logs from those on your splunk server?
index=_internal host!=<your splunk server>
| stats max(_time) as _time count by host
Or just use MC on your node (I suppose that you have enabled Forwarding monitoring in it).
r. Ismo
Finally getting around to this.
On all servers where splunk is installed the service splunkd is running.
(an example from grep:
{server}-106 | CHANGED | rc=0 >>
splunk 5131 1 0 Apr07 ? 01:18:04 splunkd -p 8089 start
splunk 5233 5131 0 Apr07 ? 00:00:00 [splunkd pid=5131] splunkd -p 8089 start [process-runner]
)
running the
index=_internal host!=utility-log-* | stats max(_time) as _time count by host
returns many servers.
Again. The only changes made to the clients and servers has been patching, updating and rebooting . Yes _some_ of the servers did not have the imitating with accept-license. but that's been fixed.
So it looks like I have no hope of getting this fixed due to the way Splunk does "support".
Hi
If you are running it with systemd then this should work
- name: "Start Splunk via service"
service:
name: "{{ splunk_service_name }}"
state: started
become: yes
become_user: "{{ privileged_user }}"
when:
- splunk.enable_service
- ansible_system is match("Linux")
Also after you have installed / updated splunk you must add/update (read: 1st remove then add) boot-start again like
- name: "Enable service via boot-start - Linux (init)"
command: "{{ splunk.exec }} enable boot-start -user {{ splunk.user }} --accept-license --answer-yes --no-prompt"
become: yes
become_user: "{{ privileged_user }}"
when:
- ansible_system is match("Linux") and not splunk_systemd
and if outside of systemd
- name: "Start Splunk via CLI"
command: "{{ splunk.exec }} start --accept-license --answer-yes --no-prompt"
register: start_splunk
changed_when: start_splunk.rc == 0 and 'already running' not in start_splunk.stdout
when: not splunk.enable_service
until: start_splunk.rc == 0
retries: 5
delay: 10
become: yes
become_user: "{{ splunk.user }}"
These are same as/based on (https://github.com/splunk/splunk-ansible).
r. Ismo
I presume those are all ansible plays.
Thank you.
Now to run those and see how they _will_ fail.