Getting Data In

Why is the monitoring of 350GB of gzip files via a batch job running slow, with the forwarder sending data at only 1MB/s?

JScordo
Path Finder

I am trying to monitor via a batch job, approx 300 gzip files and each file uncompressed is about 4GB. and it was about 350GB gzipped. The forwarder all this is occurring on is only running at 10% CPU on a single core of an 8 core box, 25 IOPS, 10% memory and only forwarding the data at about 1MB/s

This is substantially slow and I am questioning why it is only using 1 core on this box, and not using more resources to perform these batch jobs. I have already changed the [thruput] to = 0 but nothing has seemed to work.

Please let me know what if anything I can do to speed up this process. It has been running for 28 hours and only processed about 120GB of those files.

0 Karma

jplumsdaine22
Influencer

Hard to now what the exact problem is, but I would start by checking the splunkd logs for the forwarder and indexer to see if you have hit your limits.

Also when you say you have [thruput]=0, do you mean you're limits.conf is

[thruput]
maxKBps = 0

?

0 Karma

JScordo
Path Finder

Sorry, Yes. maxKBps=0 is there.

0 Karma

jplumsdaine22
Influencer

Cool. So check those splunkd logs (hopefully you have them available in index=_internal). Also I don't suppose you have any licensing issues processing that much data?

0 Karma

JScordo
Path Finder

No licensing issue, the license is for pretty large. But this is the issue i just noticed in the _internal index:

IndexConfig - Max bucket size is larger than destination path size limit. Please check your index configuration. idx=MyIndex; bucket size in (from maxDataSize) 10240 MB, homePath.maxDataSizeMB=10000, coldPath.maxDataSizeMB=0

0 Karma

jplumsdaine22
Influencer

Have a look at the config options for maxDataSize (http://docs.splunk.com/Documentation/Splunk/6.2.0/admin/Indexesconf). You should have it set to auto_high_volume. Not 100% sure if thats the heart of the issue but that is probably something you should change.

Have a read of this too: http://wiki.splunk.com/Community:Troubleshooting_Monitor_Inputs

0 Karma

JScordo
Path Finder

Since my environment is SplunkCloud, i would need to open a ticket to check and change that configuration, would't I?

0 Karma

jplumsdaine22
Influencer

I'm not sure of the differences with SplunkCloud. I would definitely get in touch with Splunk.

0 Karma
Get Updates on the Splunk Community!

The Splunk Success Framework: Your Guide to Successful Splunk Implementations

Splunk Lantern is a customer success center that provides advice from Splunk experts on valuable data ...

Splunk Training for All: Meet Aspiring Cybersecurity Analyst, Marc Alicea

Splunk Education believes in the value of training and certification in today’s rapidly-changing data-driven ...

Investigate Security and Threat Detection with VirusTotal and Splunk Integration

As security threats and their complexities surge, security analysts deal with increased challenges and ...