Deployment Architecture

many small files

HansK
Path Finder

Hi

I have an application which logs every transaction in a seperate file, splunkforwarder sends these to the indexer but on busy days the forwarder uses all the cpu ( 100%).

I think this is because it tails so many files, if i set "ignoreOlderThan = 2d" it gets better but I'm still having high cpu issues.

Is there a way to tell splunk not to tail these files? but to send them to the indexer and forget about them?

Tags (1)
1 Solution

HansK
Path Finder

One of these settings fixed it, files are flying through now in combination with batch.

limits.conf
[inputproc]
max_fd = 10000

I kept seeing the forwarder parse files in it's logs but the file count in /var/logdump/ was not decreasing. so maybe it's holding the files until it has parsed all files but then hitting max_fd.

props.conf
[source::/var/logdump/*]
sourcetype = nuance_recognizer
priority = 101

I changed this because it seemed to be making a new sourcetype for every file

View solution in original post

dshakespeare_sp
Splunk Employee
Splunk Employee

I think the sourcetype was the key.
Once the backlog has cleared I would suggest backing off the max_fd to a value in the hundreds
as too high a value could well affect performance.
As you know the max_fd defines the maximum number of file descriptors that Splunk will keep open, to capture any trailing data from files that are written to very slowly.

You should see if you are hitting the max_fd limit as there will be errors in the splunkd.log
TailingProcessor - File descriptor cache is full (100), trimming..

0 Karma

HansK
Path Finder

One of these settings fixed it, files are flying through now in combination with batch.

limits.conf
[inputproc]
max_fd = 10000

I kept seeing the forwarder parse files in it's logs but the file count in /var/logdump/ was not decreasing. so maybe it's holding the files until it has parsed all files but then hitting max_fd.

props.conf
[source::/var/logdump/*]
sourcetype = nuance_recognizer
priority = 101

I changed this because it seemed to be making a new sourcetype for every file

lguinn2
Legend

I suggest that you write a script (separate from Splunk) that moves all files that have not been modified in 24 hours to a different directory. Have the script run once or twice each day.

Then you would still have the older files, but Splunk would no longer see them in the directory it is monitoring.

I think this might reduce Splunk's workload even more significantly than ignoreOlderThan

As you have things set now, Splunk still has to assess the last modified date of a file before it determines whether or not to index it. If the files were moved, it would save Splunk this step.

0 Karma

HansK
Path Finder

I wrote a script which copies the files to a sinkhole location for the batch option. Still very high cpu.

Must be something in the files which is causing the high cpu.

0 Karma

MuS
Legend

Hi HansK

if you don't need the files after the forwarder picked them up, there would be the option to use the batch instead of monitor stanza. batch is a one time destructive input of data, read more here

cheers,

MuS
0 Karma

HansK
Path Finder

Thanks for the suggestion MuS, It was new for me.
Batch is not an option as i need to keep the original files.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...