Solved: Why am I getting unexpected behavior specifying fi...

xdaxdb · ‎10-22-2014

I am not getting expected behavior when specifying inputs.

All my logs are in a folder called "/syslog/"

1.3M -rw-r--r--   1 root root 1.3M Oct 22 09:42 cron
8.6M -rw-r--r--   1 root root 8.6M Oct 21 17:26 cron.1413937561.gz
4.3M -rw-r--r--   1 root root 4.2M Oct 22 09:42 maillog
1.2M -rw-r--r--   1 root root 1.2M Oct 21 17:08 maillog.1413936480.gz
8.1M -rw-r--r--   1 root root 8.1M Oct 22 09:42 messages
868K -rw-r--r--   1 root root 866K Oct 22 04:59 messages.1413979164.gz
872K -rw-r--r--   1 root root 869K Oct 22 06:40 messages.1413985201.gz
872K -rw-r--r--   1 root root 871K Oct 22 08:20 messages.1413991204.gz
636K -rw-r--r--   1 root root 632K Oct 22 09:42 secure
1.8M -rw-r--r--   1 root root 1.8M Oct 21 17:07 secure.1413936453.gz

I have an input defined as:

[monitor:///syslog/messages*]
disabled = false
followTail = 0
host =
host_regex =
sourcetype = syslog
index = messages

I want all log files called "messages" or "messages..gz" to be indexed in the messages index. However currently all logs in the directory are being indexed in the messages index.

Do I need to only specify directories as inputs? I thought I could specify a file.

Edit -

I was using the splunk list monitor command, I see the problem sort of. one of my inputs is being treated as a directory and is matching this it seems like it shouldn't and the other inputs are seen as files and not matching even though they seem like they should.

Monitored Directories:
...
        /syslog/secur*
                /syslog/cron
                /syslog/cron.1413937561.gz
                /syslog/maillog
                /syslog/maillog.1413936480.gz
                /syslog/messages
                /syslog/messages.1413936430.gz
                /syslog/messages.1413942613.gz
                /syslog/messages.1413948715.gz
                /syslog/messages.1413954849.gz
                /syslog/messages.1413960903.gz
                /syslog/messages.1413967086.gz
                /syslog/messages.1413973094.gz
                /syslog/messages.1413979164.gz
                /syslog/messages.1413985201.gz
                /syslog/messages.1413991204.gz
                /syslog/messages.1413997262.gz
                /syslog/messages.1414003369.gz
                /syslog/messages.1414009642.gz
                /syslog/messages.1414015924.gz
                /syslog/secure
                /syslog/secure.1413936453.gz
Monitored Files:
        $SPLUNK_HOME/etc/splunk.version
        /syslog/boo*
        /syslog/cro*
        /syslog/maillo*
        /syslog/message*

xdaxdb · ‎10-23-2014

Don't try to put files in the same directory into different indexes. Either put the logs destined for different indexes in different directories or put them all in the same index.

Using the whitelist approach with multiple indexes won't work because your [monitor:///] stanzas will have the same name and Splunk seems to ignore all but the last stanza.

In regards to the host = being blank. I was instructed to do that by an Splunk engineer. I haven't had any issues with host tagging. I would rather have no host tag than have the tag default to the syslog server itself because that throws off my reports. The syslog server itself is a server that needs to be managed and monitored.

View solution in original post

xdaxdb · ‎10-23-2014

Don't try to put files in the same directory into different indexes. Either put the logs destined for different indexes in different directories or put them all in the same index.

Using the whitelist approach with multiple indexes won't work because your [monitor:///] stanzas will have the same name and Splunk seems to ignore all but the last stanza.

In regards to the host = being blank. I was instructed to do that by an Splunk engineer. I haven't had any issues with host tagging. I would rather have no host tag than have the tag default to the syslog server itself because that throws off my reports. The syslog server itself is a server that needs to be managed and monitored.

jrodman · ‎10-23-2014

This answer is unfortunately incorrect. It is perfectly legitimate and normal to direct data from one directory to different indexes.

xdaxdb · ‎10-23-2014

Why can't I comment on people's answers? That seems counter productive.

jrodman · ‎10-23-2014

I'm not sure. Is there a karma point requirement for that?

jrodman · ‎10-22-2014

The input stanza you show will provide (mostly) the behavior you expect. The choice of host = is not very desirable, as it will fall through to a non-existent host if the syslog regex does not find one in the events.

In order to understand that's going on, you'll probably have to provide the full set of input specifications across the whole system. This might work better as a support ticket.

ShaneNewman · ‎10-22-2014

Instead of trying to match the filenames in the monitor path, try setting up a whitelist stanza in the inputs.conf

[monitor:///syslog]
whitelist = messages
 disabled = false
 followTail = 0
 sourcetype = syslog
 index = messages

Setting it up this way, you will index all files that have messages in the file name. If you find that it is not playing well with the messages\.\d+\.gz files:

whitelist = messages.*

Hope this helps!

jrodman · ‎10-23-2014

This will work, but won't work for more than one file type, which was the goal here.

xdaxdb · ‎10-22-2014

Don't edit my question unless you plan to answer it.

This subject makes it sound like I didn't do any research at all.

xdaxdb · ‎10-23-2014

Well it's your site. But I think the subject line implies things that are not really inline with what I am asking. THis is my first time setting up Splunk I would rather have someone tell me I'm using the wrong approach than help me waste hours, or days trying to shoe horn it in the way I initially tried to do it.

ppablo · ‎10-22-2014

Sorry @xdaxdb my edit bothered you, but the nature of my job is to optimize posts for searchability. If I didn't go through editing posts, people searching for similar content would have to filter through thousands upon thousands of questions with vague titles that aren't actual questions. Then users post the exact same questions that have been posted and answered previously, recreating the wheel. I have to edit posts to make it an actually searchable topic relevant to the content. Your issue is you want to monitor specific files rather than entire directories, but can't get your configuration to work. I changed the title on how to configure files as data inputs because that is what you are trying to do and find an answer to which wasn't clear in the previous title of "inputs specification".

There are users here in the Answers community with variable Splunk experience so someone is always going to have a piece of knowledge that others do not and are willing to help each other out. That's what this space is all about, . Many users will have the same issue of not being able to monitor files specifically versus directories, so rather than looking through a sea of somewhat relevant questions, they can find the exact question and answer they need.

jrodman · ‎10-22-2014

In general, answers works better when the question statement guides other readers towards the specific class of problem. The question form is a good way to ensure that it frames the problem in a general way.

Although answers is sometimes used for problem analysis instead of howto, it's kind of an awkward fit because problem analysis frequently requires back-and-forth.

That said, editors do try to reformulate questions sometimes in order to make them more approachable to potential experts.

Why am I getting unexpected behavior specifying files as data inputs versus directories?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

ATTENTION: We’re Moving! (AGAIN!)

Deep Dive: Optimizing Telemetry Pipelines in Splunk Observability Cloud

Announcing Modern Navigation: A New Era of Splunk User Experience

Join the Conversation