Getting Data In

Why is inputs.conf not indexing /var/log/messages?

alfredoh14
Engager

Hello,

I have a odd issue which seems to have been resolved but I would like to know the root cause of this issue.
I inherited a splunk configuration with one of the stanza entries in inputs.conf being:
[monitor:///var/log/messages*]
sourcetype=syslog
index = os
disabled = 0

When I perform a ls -l on /var/log/messages* I get the below:
-rw-------. 1 root root 7520499 Sep 23 07:15 messages
-rw-------. 1 root root 4795535 Aug 28 01:45 messages-20220828
-rw-------. 1 root root 6636499 Sep 4 01:42 messages-20220904
...

When I do a spl search on any of the possible sources, since the stanza uses "*", I get no results except for the source=messages.
I do not get results for the source=messages-20220828
(even if I extend the earliest=-365d).

When the rsyslog executed and rotated the messages log file this past week, at about 2 am on saturday, splunk stopped indexing the messages log file.
the messages log file kept being populated by linux so that side seems to be working as expected.
the last log entry splunk recorded was:
_time = 2022-09-18 01:46:40
_raw = Sep 18 01:46:40 ba-dev-web rsyslogd: [origin software="rsyslogd" swVersion="8.24.0-57.el7_9.3" x-pid="1899" x-info="http://www.rsyslog.com"] rsyslogd was HUPed

I restarted the splunkforwarder on the server with the issue and this fix the issue and splunk started indexing the messages log entries again.

To attempt to create a permanent solution to this issue because restarting the forwarder manually is not a adequate solution for this issue I created the below stanza:
[monitor:///var/log/messages]
index = test
disabled = 0


I do not believe I need the "*" because
1) messages* sources are not being indexed by splunk, so why use "*". (only source=messages).
2) we do not need to index messages backup log files.


When I came to work today, 18 hours after the "fix" (restart of splunk forwarder), my stanza is still working and indexing log entries as expected but the previous one:
[monitor:///var/log/messages*]
does not index log entries any more.

I used the working one and determine that the last entries before splunk stopped indexing were:
first column is _time and next column is _raw
2022-09-22 14:03:38 Sep 22 14:03:38 ba-prod-web audisp-remote: queue is full - dropping event
2022-09-22 14:03:38 Sep 22 14:03:38 ba-prod-web systemd: Stopped Systemd service file for Splunk, generated by 'splunk enable boot-start'.
2022-09-22 14:03:38 Sep 22 14:03:38 ba-prod-web systemd: Stopping Systemd service file for Splunk, generated by 'splunk enable boot-start'...
2022-09-22 14:03:38 Sep 22 14:03:38 ba-prod-web splunk: Dying on signal #15 (si_code=0), sent by PID 1 (UID 0)
2022-09-22 14:03:38 Sep 22 14:03:38 ba-qa-web audisp-remote: queue is full - dropping event
2022-09-22 14:03:37 Sep 22 14:03:37 ba-qa-web audisp-remote: queue is full - dropping event
2022-09-22 14:03:36 Sep 22 14:03:36 ba-qa-web audisp-remote: queue is full - dropping event

the last entry for the stanza that stopped working was:
2022-09-22 14:03:37 Sep 22 14:03:37 ba-qa-web audisp-remote: queue is full - dropping event

all the other monitor an dscripted inputs are working on that server except for the one above.
the version of the forwarder is 7.2.3.
I am running other forwarders with this version that are indexing messages log entries and they are working as expected.
the stanza I used was a copy and paste from the Splunk_TA_nix add-on (except I removed the other log files and just used messages), so IMO this would be the bbest practices".


I have a few questions:
1. why might be the reason why the stanza with "*' not work anymore while the one without it works?
2. Am I correct to believe that we do not need the stanza with "*", what are the consequences that I might not be aware of not using a stanza with "*"?
3. why would uid 1 (root) kill splunk (believe this is the reason why splunk stopped indexing messages log files again the 2nd time)?
4. any insights to understand this issue would be greatly apreciated. As far as i know right now, using my stanza should be good practice if we do not need the backup messages log files but I am concern I am missing something.

 

Labels (1)
Tags (1)
0 Karma
1 Solution

youngsuh
Contributor

The most recent TA has inputs.conf like so for just for monitors.

 

 

#Add-on upgrade 8.7 has new monitors.  Remark the old monitors and adding the current one.
[monitor:///Library/Logs]
disabled = 0
index = $someindex$

[monitor:///var/log]
whitelist=(\.log|log$|secure|messages|auth|mesg$|cron$|acpid$|\.out)
#customize aide.log$date$.gz file to excluded
blacklist=(aide.log-\d{8}.gz|anaconda\.syslog)
disabled = 0
index = $someindex$

[monitor:///var/adm]
whitelist=(\.log|log$|messages)
disabled = 0
index = $someindex$

[monitor:///etc]
whitelist=(\.conf|\.cfg|config$|\.ini|\.init|\.cf|\.cnf|shrc$|^ifcfg|\.profile|\.rc|\.rules|\.tab|tab$|\.login|policy$)
disabled = 0
index = $someindex$

 

 

A1:  You just need to ingestion the current file not the pervious ones.
A2:  Using * is very wide net.  you should use a whitelist = $REX$
A3. Work with Linux SA or look at the logs to figure out the cause of the kill.  Is a person or Job?
A4: I would recommend what in the TA for messages 

View solution in original post

0 Karma

youngsuh
Contributor

The most recent TA has inputs.conf like so for just for monitors.

 

 

#Add-on upgrade 8.7 has new monitors.  Remark the old monitors and adding the current one.
[monitor:///Library/Logs]
disabled = 0
index = $someindex$

[monitor:///var/log]
whitelist=(\.log|log$|secure|messages|auth|mesg$|cron$|acpid$|\.out)
#customize aide.log$date$.gz file to excluded
blacklist=(aide.log-\d{8}.gz|anaconda\.syslog)
disabled = 0
index = $someindex$

[monitor:///var/adm]
whitelist=(\.log|log$|messages)
disabled = 0
index = $someindex$

[monitor:///etc]
whitelist=(\.conf|\.cfg|config$|\.ini|\.init|\.cf|\.cnf|shrc$|^ifcfg|\.profile|\.rc|\.rules|\.tab|tab$|\.login|policy$)
disabled = 0
index = $someindex$

 

 

A1:  You just need to ingestion the current file not the pervious ones.
A2:  Using * is very wide net.  you should use a whitelist = $REX$
A3. Work with Linux SA or look at the logs to figure out the cause of the kill.  Is a person or Job?
A4: I would recommend what in the TA for messages 

0 Karma
Get Updates on the Splunk Community!

Tips & Tricks When Using Ingest Actions

Tune in to learn about:Large scale architecture when using Ingest ActionsRegEx performance considerations ...

Announcing Our Splunk MVPs

We are excited to announce the first cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Dashboard Studio Challenge - Learn New Tricks, Showcase Your Skills, and Win Prizes!

Reimagine what you can do with your dashboards. Dashboard Studio is Splunk’s newest dashboard builder to ...