Solved: Re: Is there a limit / best practice to how many d...

jwquah · ‎08-07-2015

Hi all,

Are there recommended guidelines or best practices on what would be the optimal amount of apps or data inputs (monitor) that a single instance should have? I gather that the apps don't really matter, but I've noticed that if a single instance has plenty of data inputs via file monitoring / receiving data from a forwarder, the indexing slows down simply because it needs to parse through all the monitoring tail processor is doing first; are my thoughts correct?

Thank you.

lguinn2 · ‎08-07-2015

Personally, I try to keep the number of monitor inputs to 5000 or fewer. You can see how many you have with the CLI command

splunk list monitor

Run this command wherever you are collecting the data - most likely on a forwarder.

Remember that Splunk will continue to monitor files even after they have been rotated or are no longer being written to. (How would Splunk know that they won't be used again?) So you really should set up a rotation policy for your logs that deletes them or moves them to another directory when they have aged. This will keep Splunk focused on the live log files. I have seen this technique make a very large performance improvement in multiple cases. I believe that the slow down occurs at the instance where inputs.conf lives - at the data collection point. This would cause the data to show up more slowly in the indexes, but the fix is probably at the forwarder. Hopefully you are not monitoring that many files on the indexer itself!

I haven't re-checked this number in a few years, so it may be different now, if monitor inputs have scaled like the other parts of Splunk. But you are on the right track - you want to watch the monitor inputs.

Finally, be sure that your indexer meets the specifications in the Installation and Capacity planning manuals. If not, then things like this can have a larger negative impact than they should.

View solution in original post

lguinn2 · ‎08-07-2015

Personally, I try to keep the number of monitor inputs to 5000 or fewer. You can see how many you have with the CLI command

splunk list monitor

Run this command wherever you are collecting the data - most likely on a forwarder.

Remember that Splunk will continue to monitor files even after they have been rotated or are no longer being written to. (How would Splunk know that they won't be used again?) So you really should set up a rotation policy for your logs that deletes them or moves them to another directory when they have aged. This will keep Splunk focused on the live log files. I have seen this technique make a very large performance improvement in multiple cases. I believe that the slow down occurs at the instance where inputs.conf lives - at the data collection point. This would cause the data to show up more slowly in the indexes, but the fix is probably at the forwarder. Hopefully you are not monitoring that many files on the indexer itself!

I haven't re-checked this number in a few years, so it may be different now, if monitor inputs have scaled like the other parts of Splunk. But you are on the right track - you want to watch the monitor inputs.

Finally, be sure that your indexer meets the specifications in the Installation and Capacity planning manuals. If not, then things like this can have a larger negative impact than they should.

khourihan_splun · ‎08-07-2015

Just to pile on, make sure you check your ulimits, that setting (*nix only) will determine the maximum number of files you can monitor.

jwquah · ‎08-07-2015

Thanks for the info!

Well, that instance is actually a set of 'dev' apps in there that our team uses to test and develop before deploying them out. As such, that instance is just a 'dummy' instance that works as a search head, indexer and forwarder. The slowness is actually in indexing; for example say Splunk is monitoring directoryX and new files go in at 7AM, it'll probably index into Splunk much later and not immediately.

How would a rotation policy help? Does that mean say if files are older than 2 months, it'd be good to move them to an archive place, so that Splunk no longer 'sees' it and as such no longer monitors it?

If I do a simple line count of the output of splunk list monitor, I end up with 13271, so it's probably safe to say that this instance is monitoring over 10,000 files. The ulimit is set to 4096 - is that enough or should it be at least doubled or more to be close to the # of input?

lguinn2 · ‎08-07-2015

Here are the usual settings that are recommended for ulimits:

ulimit -a
ulimit -c 1073741824 (1 GB) (unlimited)
ulimit -n 48 x default (48 x 1024 = 49,152) (65536)
ulimit -u 12 x default (12 x 1024 = 12,288) (258048)

Also, turn OFF Transparent Huge Pages (THP) as it may degrade performance on some Linux kernels.

@khourihan_splunk is right - and for any indexer, this is probably more important than anything else.

Oh, and in the example where it says "default", I mean whatever the default value for that ulimit setting is on your system.

lguinn2 · ‎08-07-2015

YES: Does that mean say if files are older than 2 months, it'd be good to move them to an archive place, so that Splunk no longer 'sees' it and as such no longer monitors it?

That's it exactly. You could use the Linux logrotate utility - or anything else that you have. You should keep at least the current log plus the prior log, because it is possible (though unlikely) that Splunk could be still reading a file while it is rotated. After that, you could move the old logs to somewhere like /var/old_log or delete them or whatever...

jwquah · ‎08-07-2015

OK Cool! Thanks to the leads, I poked around in the splunkd.log and I'm not seeing any ulimit errors like the one in http://docs.splunk.com/Documentation/Splunk/6.2.4/Troubleshooting/ulimitErrors. The logs however, do show INFO TailingProcessor - File descriptor cache is full (100), trimming...

If so, am I correct with the observation below?
- Issue due to multiple files being monitored by the TailingProcessor
- Don't see any ulimit error so we should be alright there
- No logrotate. Most files are actually csv dumps so doing a logrotate isn't really possible but probbaly some kind of archiving. How do we usually go about 'rotating' CSV reports?
- Would having multiple SourceTypes cause it? I noticed our users left it as default for CSV and there's quite a number of different sourcetypes
- We could try increasing the max_fd that defaults to 100

[inputproc]

max_fd = <integer>
* Maximum number of file descriptors that Splunk will keep open, to capture any trailing data from 
files that are written to very slowly.
* Defaults to 100.

lguinn2 · ‎08-07-2015

For rotating your log files, just write a script that looks for files with a mod time older than one week - and move the files to another directory.

Number of sourcetypes shouldn't make a difference in indexing performance. But if you are loading the same data week after week, just with a different file name - it will be quite useful to have the same sourcetype assigned to the data from the different files. Then you could write reports, etc. based on the sourcetype and re-use them.

If you up the max_fd, you should also up the ulimit. In fact, I would go ahead and set all the ulimits as recommended - you may not be getting errors, but it might still help performance.

jwquah · ‎08-12-2015

Thanks Iguinn, that sounds about right as we're seeing improvements from there. We'll continue testing to see what's the limit that we can best set it to! 🙂

Is there a limit / best practice to how many data inputs Splunk can monitor?

Community Content Calendar, November Edition

October Community Champions: A Shoutout to Our Contributors!

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

Are you a member of the Splunk Community?

Is there a limit / best practice to how many data inputs Splunk can monitor?

Community Content Calendar, November Edition

October Community Champions: A Shoutout to Our Contributors!

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!