Getting Data In

Splunk not detecting local files recursively.

millarma
Path Finder

I am I have a couple hundred log files I pulled from client computers using powershell. I am experimenting with having Splunk index them. It was working prior to upgrading to 6.6.

basically if I monitor a file directly, it works. But Splunk is not recursing sub-directories. I have never indexed these files before. On the data inputs screen it detects the files, but no events are parsed.

I think it has to do with the path of the log files. Because I am lazy, I copied recursively with a filter, resulting in a long path, e.g. C:\splunkdragonlogs\top25828\CCW03310\*****CCW03310*****\AppData\Roaming\Nuance\NaturallySpeaking12

I use a regex to define the host as the 'user', as you see bolded above.

I have tried editing input.conf to say recursive=true although that should be happening anyway.

any thoughts of things to explore?

0 Karma
1 Solution

woodcock
Esteemed Legend

Start with ./splunk list inputstatus and ./splunk list monitor but the problem is almost certainly that there are too many files to sort through. One quick way to test is to do ./splunk restart on your forwarder. Do most of the files start to catch up and then stop updating? Somewhere in the "thousands" of files, a forwarder will take so long sorting through and keeping track of everything that it cannot keep up with the actual task of forwarding. Usually the solution is simple: make sure that your housekeeping design is deleting or archiving files that are no longer going to change so that they disappear form the places Splunk is monitoring. If that cannot be done (the files must stay in place), then you can use this trick (be sure to UpVote😞
https://answers.splunk.com/answers/309910/how-to-monitor-a-folder-for-newest-files-only-file.html

Also check inodes; your splunk user should probably be running ulimit unlimited (or something quite large). This can also cause inability to handle large numbers of files and directories.

View solution in original post

woodcock
Esteemed Legend

Start with ./splunk list inputstatus and ./splunk list monitor but the problem is almost certainly that there are too many files to sort through. One quick way to test is to do ./splunk restart on your forwarder. Do most of the files start to catch up and then stop updating? Somewhere in the "thousands" of files, a forwarder will take so long sorting through and keeping track of everything that it cannot keep up with the actual task of forwarding. Usually the solution is simple: make sure that your housekeeping design is deleting or archiving files that are no longer going to change so that they disappear form the places Splunk is monitoring. If that cannot be done (the files must stay in place), then you can use this trick (be sure to UpVote😞
https://answers.splunk.com/answers/309910/how-to-monitor-a-folder-for-newest-files-only-file.html

Also check inodes; your splunk user should probably be running ulimit unlimited (or something quite large). This can also cause inability to handle large numbers of files and directories.

millarma
Path Finder

thank you. I think this was the issue. There were thousands of extra directories that , while empty, would keep the tailingprocessor busy.

0 Karma

millarma
Path Finder

I am OP. Please find my inputs.conf below. However you should know that the files are now there.

I have done nothing in the meantime. Can you help me understand why? I would hazard a guess that they weren't done indexing the last time I looked. This makes me think that files do not become searchable until the entire data input has been indexed. Is that so?

How would one know if files were in the process of being indexed? Thank you all for you help.

[monitor://C:\splunkdragonlogs\top25sinceJune1]
disabled = false
host_regex = \w+:\\w+\\w+\\w+\d+\(\w+)
index = dgn
recurse = true

[monitor://C:\splunkdragonlogs\top25810*]
disabled = false
host_regex = \w+:\\w+\\w+\\w+\d+\(\w+)
index = dgn
sourcetype = dgn

[monitor://C:\splunkdragonlogs\top25828*]
disabled = false
host_regex = \w+:\\w+\\w+\\w+\d+\(\w+)
index = dgn
sourcetype = dgn

[monitor://C:\dgnlogs\top25828]
disabled = false
host_regex = \w+:\\w+\\w+\\w+\d+\(\w+)
index = dgn
sourcetype = dgn

[monitor://C:\dgnlogs\top25sinceJune1]
disabled = false
host_regex = \w+:\\w+\\w+\\w+\d+\(\w+)
index = dgn
sourcetype = dragonlog

[monitor://C:\dgnlogs\PathDragonLogs]
disabled = false
host_regex = \w+:\\w+\\w+\\w+\d+\(\w+)
index = dgn
sourcetype = dgn- clone

0 Karma

woodcock
Esteemed Legend

Show us your inputs.conf. All of it.

0 Karma

tlam_splunk
Splunk Employee
Splunk Employee

1) Please check splunkd.log and find there is any TailingProcessor stanza for your folder when you startup the Splunk
e.g.
TailingProcessor - Parsing configuration stanza: monitor://xxxx/xxx/xxx

2) Try to add the '...' and '*' wildcard in the monitor stanza and see it helps.

0 Karma

mattymo
Splunk Employee
Splunk Employee

I'd start with ./splunk list inputstatus and check out what your inputs are saying.

Or check out index=_internal source=*splunkd.log ERROR OR WARN

- MattyMo
0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...