Re: Why is the monitoring of 350GB of gzip files v...

JScordo · ‎12-02-2015

I am trying to monitor via a batch job, approx 300 gzip files and each file uncompressed is about 4GB. and it was about 350GB gzipped. The forwarder all this is occurring on is only running at 10% CPU on a single core of an 8 core box, 25 IOPS, 10% memory and only forwarding the data at about 1MB/s

This is substantially slow and I am questioning why it is only using 1 core on this box, and not using more resources to perform these batch jobs. I have already changed the [thruput] to = 0 but nothing has seemed to work.

Please let me know what if anything I can do to speed up this process. It has been running for 28 hours and only processed about 120GB of those files.

jplumsdaine22 · ‎12-04-2015

Hard to now what the exact problem is, but I would start by checking the splunkd logs for the forwarder and indexer to see if you have hit your limits.

Also when you say you have [thruput]=0, do you mean you're limits.conf is

[thruput]
maxKBps = 0

?

JScordo · ‎12-04-2015

Sorry, Yes. maxKBps=0 is there.

jplumsdaine22 · ‎12-04-2015

Cool. So check those splunkd logs (hopefully you have them available in index=_internal). Also I don't suppose you have any licensing issues processing that much data?

JScordo · ‎12-04-2015

No licensing issue, the license is for pretty large. But this is the issue i just noticed in the _internal index:

IndexConfig - Max bucket size is larger than destination path size limit. Please check your index configuration. idx=MyIndex; bucket size in (from maxDataSize) 10240 MB, homePath.maxDataSizeMB=10000, coldPath.maxDataSizeMB=0

jplumsdaine22 · ‎12-04-2015

Have a look at the config options for maxDataSize (http://docs.splunk.com/Documentation/Splunk/6.2.0/admin/Indexesconf). You should have it set to auto_high_volume. Not 100% sure if thats the heart of the issue but that is probably something you should change.

Have a read of this too: http://wiki.splunk.com/Community:Troubleshooting_Monitor_Inputs

JScordo · ‎12-04-2015

Since my environment is SplunkCloud, i would need to open a ticket to check and change that configuration, would't I?

jplumsdaine22 · ‎12-04-2015

I'm not sure of the differences with SplunkCloud. I would definitely get in touch with Splunk.

Why is the monitoring of 350GB of gzip files via a batch job running slow, with the forwarder sending data at only 1MB/s?

Developer Spotlight with Paul Stout

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

Data-Driven Success: Splunk & Financial Services