Re: Distributed Search Question.

mehal · ‎10-14-2012

Hello All,

I have one question in distributed search processing. I have one dedicated search head which gathers data from multiple indexers. I have around 8 indexers which all had already index data from 1987 till 2012. To be precise, 1st indexer has indexed data from 1987 till 1990, 2nd has indexed data from 1991 till 1993 and so on and finally indexer 8 has indexed data from 2008 till 2012. Now i want to query the data on entire time index. But now when i fire the query, it starts searching data backwards from 2012 till 1987. But the processing is very slow because it only gathers data from indexer 8 then followed by indexer 7 and so on till indexer 1. And it takes lots of time ( around an hour ) for the query i am using.

Is there any way to do simultaneous search on all 8 indexer and then search head can combine all the results and hence reduce the overall searching time.

kristian_kolb · ‎10-15-2012

If you've set up your indexers as search peers, then you're already doing simultaneous searches. But (and somebody more wise than me should correct me if I'm wrong), I think the search head will take a look at your search query, which could look like below, and say;

index=horses earliest=-15y

which are my search peers
ask them for the most recent time slice of data, (since results are presented newest-first)
once data has (at least partially) come in, move on to the previous time slice.

So in your case the search head asks all the 8 indexers for the last hours worth of horses, and 1-7 will say "NO HORSES HERE" and indexer 8 will send its horses. Then it will ask them all for the hour preceding that, and again, 1-7 will say "NO HORSES HERE", and no8 will send. This goes on until you've gone back to data for 2007, when indexer 7 will give you horses, and 1-6, and 8 will say "NO HORSES HERE" etc etc.

This is a design feature, since in most cases you want to get to the data fast, and normally, the most recent data is the most interesting data.

Also, if the search head would have to deal with all the data at once, i.e. if you could tell the indexers to

"Give me YOUR most recent hours worth of horses"

instead of

"Give me your horses for -n to -(n+1) hours"

it would probably consume a lot more system resources on the search head (depending on the actual query, and what type of reporting commands you use in your search pipeline).

In hindsight it would have been (much) better to spread the data from 1987-2012 across all 8 indexers, since they would all be sending you horses simultaneously, whereas you now (effectively) only use one of them at a time.

If you have the time and license space, or if you need to make these searches often, you could consider clearing out your index, and re-index the logs and make sure that all indexers have an equal share of the data.

Hope this helps,

Kristian

PS. sorry about the horse thing, but I got tired of writing 'data'

mehal · ‎10-15-2012

Hi Kristian.kolb,

Your point made sense and now i have almost distributed data across all indexers in random fashion. Will again index the data on 8 indexers and hopefully that will sort out my problem.

Thanks for the big big help.

Mehal

Distributed Search Question.

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Splunk Asynchronous Forwarding Explained

55 Days to Go: Secure Your Seat at Splunk University in Denver

(re)Introducing the Splunk Community Champions + 2026 – 2027 Splunk MVPs ...

Join the Conversation