Deployment Architecture

How to make Splunk process fast around 3GB of data for conitnuous monitor file and directory option

sonunarula
New Member

With Splunk, I am trying to process around 3GB of data with "continuous monitor directory" option where files (apprx file count 5000-7000) of size 500KB are being placed into a directory and Splunk is monitoring this directory.
Splunk takes around 12-15 minutes to index this much amount of data on Intel core 2 Duo CPU having 3GB RAM.

Is there any way to make it faster? Will Splunk forwarder will help in this?

Tags (1)
0 Karma

kristian_kolb
Ultra Champion

From the hardware specs, that looks more like a 2007 laptop, than a modern server. If so, you probably have rather slow disks (which is probably the culprit here).

You could have a look at [batch] instead of [monitor] in inputs.conf instead. Then files will be deleted once indexed, and Splunk will not have to keep track of (historical) files that are never updated.

See the "Getting data in" section of the docs.

http://docs.splunk.com/Documentation/Splunk/5.0/Data/Monitorfilesanddirectories


UPDATE:

You'll need to describe your setup a bit better. I was assuming that you had a laptop with a full Splunk installation, and that the files you index are collected from the local file system.

A forwarder is a component that is installed on the machine where the logs are stored/generated, which then send the logs to a Splunk indexer.

A Universal Forwarder will not improve processing speed, but a Heavy Forwarder could do that, if there are tasks that can be offloaded from the indexer. The stuff that a Heavy Forwarder could do for you is what goes on in the parsing phase (linebreaking, timestamping, index-time transforms etc). But that mostly depends on if your current configuration require that many such tasks are performed by the indexer.

However I still think that the hardware is the issue here. If you are running Splunk on an old laptop, you cannot expect stellar performance.


UPDATE2:

Well, you can't do performance tests when the test rig is so much less powerful than what would be required in a production scenario. If you're talking about peaks or scheduled batches, you could probably make do with 1 powerful machine.

If you're going to have a sustained rate of log data at ~30GB/hour, you'll definitely going to need better hardware. You'll probably want 4-6 (or more) indexers working in parallel, since you'll most likely also want to search the indexed data.

There are no hard limits on how much data you can index per hour, since this is just a matter of how much money you can throw at the problem :-). There are organizations using Splunk to index dozens of TB of log data per day.

For more information regarding recommended hardware, please see:

http://docs.splunk.com/Documentation/Splunk/5.0/Installation/CapacityplanningforalargerSplunkdeploym...

Hope this helps,

Kristian

kristian_kolb
Ultra Champion

see update above /k

0 Karma

sonunarula
New Member

Actual requirement is we have to check if splunk can process 5000-7000 files each of size apprx 500KB in 5-7 minutes. Then we have to implement a business case.

For this I am using windows XP machine, with Splunk trial version 4.3.2 and files being processed are on local system.

Can you please suggest what system configuration is required to achieve the given scenario or any splunk configuration can work for this?

0 Karma

kristian_kolb
Ultra Champion

see update above /k

0 Karma

sonunarula
New Member

No improvement in processing speed with batch option also. Can forwarder be helpful here? Please suggest.

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...