Getting Data In

follow tail returning jumbled mess

Communicator

I’m currently getting a new log source ready for production, and I almost have it except for one issue. I’m forwarding email logs, which the email application appends each entry to. I’m using the followTail directive, which works but the appended data is coming in to the indexer all jumbled and cooked looking while the original data in the file is not. Below is the inputs.conf file from the indexer and a screen shot of what I’m seeing. Please help……….thanks!

[default]
host = TEST_SERVER

[monitor:///home/dcarmack/myLogs2]
disabled = false
host = TEST
sourcetype = TEST
index = default
followTail = 1

alt text

Tags (1)
0 Karma

Splunk Employee
Splunk Employee

"the source type equals the file path" ... did you mean sourcetype or source equals file path?

0 Karma

Communicator

yes, sorry to confuse, source=path

0 Karma

Splunk Employee
Splunk Employee

Output like this means that the data isn't valid UTF-8 when it arrives at the indexer.

I find it very odd that "source" is not properly set for this data. When forwarding and receiving, we typically expect source, sourcetype and host to be properly set by the forwarder. What does this directory structure look like?

As a side note, followTail is rarely a desired setting. Splunk will automatically start reading where it left off in a file. This setting is used to tell Splunk to reset this point to the end of the file, not where we last read up to.

This setting could possibly be related if there's a bad interaction with archived files (that don't look like text) or files with a character set that requires some long history to decode (this doesn't seem to be the case here).

0 Karma

Communicator

The data is xml and comes from an email security appliance. Each entry has a common header. Yes, the files are archived using gzip

0 Karma

Splunk Employee
Splunk Employee

It will show as tcp:5000 either if it's raw TCP in or if the forwarder isn't properly applying the source at input time. I don't suspect that it's raw TCP. I'm more curious about the file reading code. Are these archive files?

0 Karma

Splunk Employee
Splunk Employee

It looks to me, because you're data is showing tcp:5000 that it's being sent to and received on a plain TCP port number 5000. I'm not sure where that would come from. Perhaps you have some rogue conf files around.

The followTail behavior may be an artifact of how your files are being written? Perhaps they are being modified near the top of the file when they are appended?

0 Karma

Splunk Employee
Splunk Employee

I'm more curious about the files within the directory /home/dcarmack/myLogs2. I'm also wondering why we're reindex the files as that should not happen. Do they share a common header? What does the data inside the files look like?

0 Karma

Communicator

When I don't use followTail, the entire file gets re-indexed. One other thing I should mention, when the original file is indexed, the source type equals the file path, when the data that's appended to the file gets indexed, the source equals tcp:5000. As far as the directory structure, the forwarder is sitting in /home/dcarmack and the log files are located at /home/dcarmack/myLogs2

0 Karma

Splunk Employee
Splunk Employee

followTail has little to do with this. It seems to me that you are sending data to a standard TCP port, not a Splunk TCP port. Is that your intention? If you're using a Splunk forwarder, you should not do that. Standard TCP ports are for raw TCP log streams.

0 Karma

Communicator

No, I'm using the [splunktcp:] stanza on my indexer.

0 Karma