Splunk Search

Will splitting our data into separate indexes provide better performance of real-time searches?

campbellj1977
Explorer

We are currently running into issues where our indexers become overloaded and cannot process all of the search and indexing functions when real time searches are abundant. We have identified this partially to be caused by the searching of raw data before indexing simultaneously with all of the other data combined. Our single index is currently housing about 70% of all our incoming data. Logic would tell me that smaller indexes amount in quicker searches and less incoming raw data to be searched. I tested and it seems to be true, although I don't really trust the results from my lab as it shares storage and process utilization with other servers in VM's and storage.

I was hoping that someone from Splunk or the community could confirm my findings. My test was listed below.
Eventgen to create sample logs; 10Mb a 5 minute span
postfix tcpdump to file then to splunk; 5mb in about 5 minutes

With all the data going into the same index, searching for a key values from postfix tcpdump and/or eventgen, it took about 30% less time to complete the search then when splitting the data into 2 separate indices.

So to sum up, will splitting our indexes provide better performance of real-time searches and less processing time?

0 Karma

MuS
SplunkTrust
SplunkTrust

Hi campbellj1977,

well, this is tricky to answer for sure; because your finding are most likely to be true for any searches, reports, alerts and dashboards. But not for the real-time searches. This is because real-time searches search through events as they stream into Splunk Enterprise for indexing. When you kick off a real-time search, Splunk Enterprise scans incoming events that contain index-time fields that indicate they could be a match for your search.

So, it is likely that you will not get a performance benefit out of splitting...But, how about adding a second indexer to add overall performance or troubleshoot what exactly gets your indexer overloaded or blocked. If your on Splunk 6.2 you can use the internal Distributed Management Console for this http://docs.splunk.com/Documentation/Splunk/6.2.3/Admin/ConfiguretheMonitoringConsole#What_is_the_di...

Take a look at the pipeline image below to get an overview of the Splunk input pipeline:

alt text

Regarding your comment about the forwarder; this will only skip a part of the parsing queue but still uses all other queues (the universal forwarder that is). If you're using a heavy forwarder in front, it will skip parsing, merging and typing and in the indexerPipe until tcpoutput . This would only help if your indexer has blocked parsing queues, because the heavy forwarder would take the parsing, merging and typing load.

Your real-time search will still pick up the events right at the stage they stream in into the indexer. But they maybe benefit from the fact that the indexer has less load when you're using the heavy forwarder in front....

Hope this helps ...

cheers, MuS

campbellj1977
Explorer

What if data is already partially cooked from an intermediate forwarder?

0 Karma

MuS
SplunkTrust
SplunkTrust

see the updated answer....

0 Karma
Get Updates on the Splunk Community!

New Splunk Observability innovations: Deeper visibility and smarter alerting to ...

You asked, we delivered. Splunk Observability Cloud has several new innovations giving you deeper visibility ...

Synthetic Monitoring: Not your Grandma’s Polyester! Tech Talk: DevOps Edition

Register today and join TekStream on Tuesday, February 28 at 11am PT/2pm ET for a demonstration of Splunk ...

Instrumenting Java Websocket Messaging

Instrumenting Java Websocket MessagingThis article is a code-based discussion of passing OpenTelemetry trace ...