Re: How to make Splunk process fast around 3GB of ...

sonunarula · ‎10-17-2012

With Splunk, I am trying to process around 3GB of data with "continuous monitor directory" option where files (apprx file count 5000-7000) of size 500KB are being placed into a directory and Splunk is monitoring this directory.
Splunk takes around 12-15 minutes to index this much amount of data on Intel core 2 Duo CPU having 3GB RAM.

Is there any way to make it faster? Will Splunk forwarder will help in this?

kristian_kolb · ‎10-18-2012

From the hardware specs, that looks more like a 2007 laptop, than a modern server. If so, you probably have rather slow disks (which is probably the culprit here).

You could have a look at [batch] instead of [monitor] in inputs.conf instead. Then files will be deleted once indexed, and Splunk will not have to keep track of (historical) files that are never updated.

See the "Getting data in" section of the docs.

http://docs.splunk.com/Documentation/Splunk/5.0/Data/Monitorfilesanddirectories

UPDATE:

You'll need to describe your setup a bit better. I was assuming that you had a laptop with a full Splunk installation, and that the files you index are collected from the local file system.

A forwarder is a component that is installed on the machine where the logs are stored/generated, which then send the logs to a Splunk indexer.

A Universal Forwarder will not improve processing speed, but a Heavy Forwarder could do that, if there are tasks that can be offloaded from the indexer. The stuff that a Heavy Forwarder could do for you is what goes on in the parsing phase (linebreaking, timestamping, index-time transforms etc). But that mostly depends on if your current configuration require that many such tasks are performed by the indexer.

However I still think that the hardware is the issue here. If you are running Splunk on an old laptop, you cannot expect stellar performance.

UPDATE2:

Well, you can't do performance tests when the test rig is so much less powerful than what would be required in a production scenario. If you're talking about peaks or scheduled batches, you could probably make do with 1 powerful machine.

If you're going to have a sustained rate of log data at ~30GB/hour, you'll definitely going to need better hardware. You'll probably want 4-6 (or more) indexers working in parallel, since you'll most likely also want to search the indexed data.

There are no hard limits on how much data you can index per hour, since this is just a matter of how much money you can throw at the problem :-). There are organizations using Splunk to index dozens of TB of log data per day.

For more information regarding recommended hardware, please see:

http://docs.splunk.com/Documentation/Splunk/5.0/Installation/CapacityplanningforalargerSplunkdeploym...

Hope this helps,

Kristian

kristian_kolb · ‎10-18-2012

see update above /k

sonunarula · ‎10-18-2012

Actual requirement is we have to check if splunk can process 5000-7000 files each of size apprx 500KB in 5-7 minutes. Then we have to implement a business case.

For this I am using windows XP machine, with Splunk trial version 4.3.2 and files being processed are on local system.

Can you please suggest what system configuration is required to achieve the given scenario or any splunk configuration can work for this?

kristian_kolb · ‎10-18-2012

see update above /k

sonunarula · ‎10-18-2012

No improvement in processing speed with batch option also. Can forwarder be helpful here? Please suggest.

How to make Splunk process fast around 3GB of data for conitnuous monitor file and directory option

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

Are you a member of the Splunk Community?

How to make Splunk process fast around 3GB of data for conitnuous monitor file and directory option

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...