Deployment Architecture

Backfill across multiple indexers. Is it possible? Is it necessary?

deeboh
Path Finder

Hey folks. I put out the disclaimer now that I have convoluted setup, but I put this question to you anyway. Is it possible or necessary to backfill data across multiple search head? Let me describe my problem. I apologize in advance if i'm using incorrect Splunk Jargon.

I have the fun situation of keeping multi-year Splunk'd data across a primary datacenter and disaster recover (DR) site. Let's talk about the primary site. The primary site is being upgraded and there are two proposals. Mine which involves 4 bare metal servers and IT's which involves 15 VMs (ugh). Either way I have my current data which includes a ton of summary index data. Assuming my 3 indexer solution wins, I will set these up with a single search head and create search peers of these indexers. I don't suppose there is a simple way to copy a single block of summary data and evenly logically split it three ways to take advantage of distributed search performance gains. So I intend to scrub my index data backfill it in my new setup.

So here's the question and the dilemma. If the IT solution wins out, there will be 3 search heads and 12 indexers. the 3 search heads will be fronted by an F5 LTM VIP where the logins to the search heads will be balanced. How do I Backfill across three search heads and keep the data in sync??? Come to think of it, How does a summary search behave in the face of the possible 3 search heads. I think I face the situation where the view of my data is dependent on which search I land on. No I can't set the F5 to sticky because then i'll not only have my summary search on 1 search head, but up to 70 users doing ad hoc searches. we'd crush the puny VM.

My IT guys swear they've go the backup and mirroring stuff down for primary to DR failover (gulp...) but that's another story. I need some guidance on how to backfill my old summary indexes and how to properly distribute or replicate my new summary index data.

Should I call support :D...

Thanks,

Curtis

Tags (1)

Lucas_K
Motivator

I would have just run the backfill script on the search head. Aslong as it is outputting events to all of those indexes anyway the results will be distributed to all the available indexes as per normal.

The limitation that I have found is that the max number of cpu's you can use for this is 16. I believe this is a hardcoded limit in the fill_summary_index.py script itself. I am unsure of the reason for this and if you could just change the number in line 268.

The F5 vip is just the web front end your balancing right? If so then it has no bearing on backfill data.

Your search heads should be configured for distributed search anyhow. This way all the search heads will know which search head is performing what search so everything should be in sync.

As far as I have seen the backfill script won't populate jobs to be distributed to other search heads other than the one your running the script on. So to get the best performance from all your available sh'es you might have to do some intelligent division of jobs across the available search heads's. Easiest would be by time and different savedsearches.

replication on the other hand ... i'll leave for someone else to answer 😉 (v5.0.1 based summary indexes?)

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...