Getting Data In

Why am I getting unexpected behavior specifying files as data inputs versus directories?

Explorer

I am not getting expected behavior when specifying inputs.

All my logs are in a folder called "/syslog/"

1.3M -rw-r--r--   1 root root 1.3M Oct 22 09:42 cron
8.6M -rw-r--r--   1 root root 8.6M Oct 21 17:26 cron.1413937561.gz
4.3M -rw-r--r--   1 root root 4.2M Oct 22 09:42 maillog
1.2M -rw-r--r--   1 root root 1.2M Oct 21 17:08 maillog.1413936480.gz
8.1M -rw-r--r--   1 root root 8.1M Oct 22 09:42 messages
868K -rw-r--r--   1 root root 866K Oct 22 04:59 messages.1413979164.gz
872K -rw-r--r--   1 root root 869K Oct 22 06:40 messages.1413985201.gz
872K -rw-r--r--   1 root root 871K Oct 22 08:20 messages.1413991204.gz
636K -rw-r--r--   1 root root 632K Oct 22 09:42 secure
1.8M -rw-r--r--   1 root root 1.8M Oct 21 17:07 secure.1413936453.gz

I have an input defined as:

[monitor:///syslog/messages*]
disabled = false
followTail = 0
host =
host_regex =
sourcetype = syslog
index = messages

I want all log files called "messages" or "messages..gz" to be indexed in the messages index. However currently all logs in the directory are being indexed in the messages index.

Do I need to only specify directories as inputs? I thought I could specify a file.

  • Edit -

I was using the splunk list monitor command, I see the problem sort of. one of my inputs is being treated as a directory and is matching this it seems like it shouldn't and the other inputs are seen as files and not matching even though they seem like they should.

Monitored Directories:
...
        /syslog/secur*
                /syslog/cron
                /syslog/cron.1413937561.gz
                /syslog/maillog
                /syslog/maillog.1413936480.gz
                /syslog/messages
                /syslog/messages.1413936430.gz
                /syslog/messages.1413942613.gz
                /syslog/messages.1413948715.gz
                /syslog/messages.1413954849.gz
                /syslog/messages.1413960903.gz
                /syslog/messages.1413967086.gz
                /syslog/messages.1413973094.gz
                /syslog/messages.1413979164.gz
                /syslog/messages.1413985201.gz
                /syslog/messages.1413991204.gz
                /syslog/messages.1413997262.gz
                /syslog/messages.1414003369.gz
                /syslog/messages.1414009642.gz
                /syslog/messages.1414015924.gz
                /syslog/secure
                /syslog/secure.1413936453.gz
Monitored Files:
        $SPLUNK_HOME/etc/splunk.version
        /syslog/boo*
        /syslog/cro*
        /syslog/maillo*
        /syslog/message*
Tags (2)
0 Karma
1 Solution

Explorer

Don't try to put files in the same directory into different indexes. Either put the logs destined for different indexes in different directories or put them all in the same index.

Using the whitelist approach with multiple indexes won't work because your [monitor:///] stanzas will have the same name and Splunk seems to ignore all but the last stanza.

In regards to the host = being blank. I was instructed to do that by an Splunk engineer. I haven't had any issues with host tagging. I would rather have no host tag than have the tag default to the syslog server itself because that throws off my reports. The syslog server itself is a server that needs to be managed and monitored.

View solution in original post

0 Karma

Explorer

Don't try to put files in the same directory into different indexes. Either put the logs destined for different indexes in different directories or put them all in the same index.

Using the whitelist approach with multiple indexes won't work because your [monitor:///] stanzas will have the same name and Splunk seems to ignore all but the last stanza.

In regards to the host = being blank. I was instructed to do that by an Splunk engineer. I haven't had any issues with host tagging. I would rather have no host tag than have the tag default to the syslog server itself because that throws off my reports. The syslog server itself is a server that needs to be managed and monitored.

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

This answer is unfortunately incorrect. It is perfectly legitimate and normal to direct data from one directory to different indexes.

0 Karma

Explorer

Why can't I comment on people's answers? That seems counter productive.

0 Karma

Splunk Employee
Splunk Employee

I'm not sure. Is there a karma point requirement for that?

0 Karma

Splunk Employee
Splunk Employee

The input stanza you show will provide (mostly) the behavior you expect. The choice of host = is not very desirable, as it will fall through to a non-existent host if the syslog regex does not find one in the events.

In order to understand that's going on, you'll probably have to provide the full set of input specifications across the whole system. This might work better as a support ticket.

0 Karma

Motivator

Instead of trying to match the filenames in the monitor path, try setting up a whitelist stanza in the inputs.conf

[monitor:///syslog]
whitelist = messages
 disabled = false
 followTail = 0
 sourcetype = syslog
 index = messages

Setting it up this way, you will index all files that have messages in the file name. If you find that it is not playing well with the messages\.\d+\.gz files:

whitelist = messages.*

Hope this helps!

Splunk Employee
Splunk Employee

This will work, but won't work for more than one file type, which was the goal here.

0 Karma

Explorer

Don't edit my question unless you plan to answer it.

This subject makes it sound like I didn't do any research at all.

0 Karma

Explorer

Well it's your site. But I think the subject line implies things that are not really inline with what I am asking. THis is my first time setting up Splunk I would rather have someone tell me I'm using the wrong approach than help me waste hours, or days trying to shoe horn it in the way I initially tried to do it.

0 Karma

Community Manager
Community Manager

Sorry @xdaxdb my edit bothered you, but the nature of my job is to optimize posts for searchability. If I didn't go through editing posts, people searching for similar content would have to filter through thousands upon thousands of questions with vague titles that aren't actual questions. Then users post the exact same questions that have been posted and answered previously, recreating the wheel. I have to edit posts to make it an actually searchable topic relevant to the content. Your issue is you want to monitor specific files rather than entire directories, but can't get your configuration to work. I changed the title on how to configure files as data inputs because that is what you are trying to do and find an answer to which wasn't clear in the previous title of "inputs specification".

There are users here in the Answers community with variable Splunk experience so someone is always going to have a piece of knowledge that others do not and are willing to help each other out. That's what this space is all about, . Many users will have the same issue of not being able to monitor files specifically versus directories, so rather than looking through a sea of somewhat relevant questions, they can find the exact question and answer they need.

Splunk Employee
Splunk Employee

In general, answers works better when the question statement guides other readers towards the specific class of problem. The question form is a good way to ensure that it frames the problem in a general way.

Although answers is sometimes used for problem analysis instead of howto, it's kind of an awkward fit because problem analysis frequently requires back-and-forth.

That said, editors do try to reformulate questions sometimes in order to make them more approachable to potential experts.