Getting Data In

Why does data stop getting indexed for a monitored file when Splunk is restarted, and how do I fix this?

antlefebvre
Communicator

I am attempting to monitor a file that is fairly large and on a UNC file share. It appears that the file only indexes up to the point at which I reboot the Splunk indexer that is monitoring the file. I am not using a universal forwarder. I configured the file input directly from a Splunk indexer/search head.

How would I make Splunk continue to monitor the file and add the data from after the Splunk reboot?

The file also grows extremely large. Growing to over 200 meg.

The source is a NetApp CIFS XML formatted log file.

Thanks in advance,

0 Karma
1 Solution

emiller42
Motivator

200MB isn't that big of a file, so that likely isn't the problem. The XML format may be a problem depending on how it's actually implemented. Is each event in the log a complete XML schema? Or is the log a single xml object with multiple events?

I.E:

<event>
    <attrib>foo</attrib>
</event>
<event>
    <attrib>bar</attrib>
</event>

or

<logs>
    <event>
        <attrib>foo</attrib>
    </event>
    <event>
        <attrib>bar</attrib>
    </event>
</logs>

The latter will confuse Splunk's tailing processor, as the end of the file never changes. It'll either stop reading the file, or keep re-reading the entire thing every time new data is inserted into the schema.

View solution in original post

emiller42
Motivator

200MB isn't that big of a file, so that likely isn't the problem. The XML format may be a problem depending on how it's actually implemented. Is each event in the log a complete XML schema? Or is the log a single xml object with multiple events?

I.E:

<event>
    <attrib>foo</attrib>
</event>
<event>
    <attrib>bar</attrib>
</event>

or

<logs>
    <event>
        <attrib>foo</attrib>
    </event>
    <event>
        <attrib>bar</attrib>
    </event>
</logs>

The latter will confuse Splunk's tailing processor, as the end of the file never changes. It'll either stop reading the file, or keep re-reading the entire thing every time new data is inserted into the schema.

antlefebvre
Communicator

Thanks for that info. I think I may be running into the latter. Is there any way to "fix" the tailing processor issue you have described?

0 Karma

emiller42
Motivator

Short answer: Don't tail them. Don't even try to ingest them until they're done being written. (So if NetApp is rolling the files, only monitor the rolled files, not the active ones) Then you need to figure out appropriate parsing for it to be split into events correctly. (If it doesn't do so already)

If there are other logging formats available, they may be worth investigating as well.

woodcock
Esteemed Legend

Check your splunk logs; you will almost certainly see warnings and errors regarding forwarding and queues. Research and resolve those and you should be able to get it to work normally.

0 Karma
Get Updates on the Splunk Community!

Optimize Cloud Monitoring

  TECH TALKS Optimize Cloud Monitoring Tuesday, August 13, 2024  |  11:00AM–12:00PM PST   Register to ...

What's New in Splunk Cloud Platform 9.2.2403?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.2.2403! Analysts can ...

Stay Connected: Your Guide to July and August Tech Talks, Office Hours, and Webinars!

Dive into our sizzling summer lineup for July and August Community Office Hours and Tech Talks. Scroll down to ...