Indexer, indexes, indexing?! What's what?

adukes_splunk · ‎09-12-2019

Does anyone have any good resources about indexes and index management?
Before I set up a bunch of indexes, I'd like to know more about the how indexes impact my deployment.

adukes_splunk · ‎09-12-2019

The Splunk Product Best Practices team provided this response. Read more about How Crowdsourcing is Shaping the Future of Splunk Best Practices.

Indexer, indexes, indexing?! What's what?

Well, to start, an indexer stores indexed data in indexes after such data is indexed. However, the real work happens when you run a search and the indexer fetches the indexed data from the indexes. If you're still with us, great. If not, don't panic, because we'll show you how Splunk creates and manages data repositories (indexes), and review the courses designed to help Splunk Administrators keep Splunk installations happy, healthy, and growing.

How multiple indexers helps you scale

Believe it or not, the more indexers you have, the better! Slow indexing? Add indexers! Slow searching? Add indexers! That's because Splunk forwarders distribute data to each of your indexers. That data-distribution results in opportunities for parallelized processing when you need to search that data. In other words, you win when you have a bunch of machines working on portions of your search rather than one machine trying to handle it all on its own.

You may ask, "But what happens to my data if one of those indexers goes down?" Great question! The Splunk indexer clustering feature manages multiple copies of the data to increase resiliency for your Splunk-ed data.

Now that you know enough to be smart and safe, take a moment to understand the relationship between indexers, buckets, and indexer clusters. These concepts will help you effectively plan and scale your deployments with Splunk Enterprise components.

Things to know

Indexers play a key role in how data moves through Splunk deployments.

An indexer is a Splunk Enterprise instance that stores incoming raw event data and transforms it into searchable events that it places on an index. Each index can contain a variety of data, and is made up of buckets, that is, smaller collections of data and their associated index files.

An indexer cluster, or the Splunk implementation of index replication, is a group of indexers configured to replicate the data of other indexers in the cluster group to ensure the system has redundant copies of all data. By maintaining multiple, identical copies of data, indexer clusters and index replication prevent data loss and ensure that data is available for searching. Key benefits include: data availability, data fidelity, data recovery, disaster recovery and search affinity.

Replication factor is the number of copies of data that the cluster maintains. For example, to ensure that your system can tolerate a failure of two peers, you would configure a replication factor of 3, which means that the cluster stores three identical copies of each bucket on separate nodes. As the replication factor increases, you need to run more indexers and provision more storage for the indexed data. The good news is data replication itself requires little processing power, so you can take advantage of the multiple indexers in a cluster to ingest and index more data.

Things to do

Begin your education track for Splunk Administrators. Enroll in the Education Track for Splunk Administrators to learn the concepts, tasks, and best practices for managing a single Splunk instance or a distributed deployment.
Lift the hood on the indexer. Find the buckets that store your data. How many hot buckets do you have? Do you have many warm buckets or just a few? See Managing Indexers and Clusters of Indexers.
Calculate the replication factor you need. Calculate the trade-offs and performance benefits of adding peers to your indexer clusters and index replication.

View solution in original post

adukes_splunk · ‎09-12-2019

The Splunk Product Best Practices team provided this response. Read more about How Crowdsourcing is Shaping the Future of Splunk Best Practices.

Indexer, indexes, indexing?! What's what?

Well, to start, an indexer stores indexed data in indexes after such data is indexed. However, the real work happens when you run a search and the indexer fetches the indexed data from the indexes. If you're still with us, great. If not, don't panic, because we'll show you how Splunk creates and manages data repositories (indexes), and review the courses designed to help Splunk Administrators keep Splunk installations happy, healthy, and growing.

How multiple indexers helps you scale

Believe it or not, the more indexers you have, the better! Slow indexing? Add indexers! Slow searching? Add indexers! That's because Splunk forwarders distribute data to each of your indexers. That data-distribution results in opportunities for parallelized processing when you need to search that data. In other words, you win when you have a bunch of machines working on portions of your search rather than one machine trying to handle it all on its own.

You may ask, "But what happens to my data if one of those indexers goes down?" Great question! The Splunk indexer clustering feature manages multiple copies of the data to increase resiliency for your Splunk-ed data.

Now that you know enough to be smart and safe, take a moment to understand the relationship between indexers, buckets, and indexer clusters. These concepts will help you effectively plan and scale your deployments with Splunk Enterprise components.

Things to know

Indexers play a key role in how data moves through Splunk deployments.

An indexer is a Splunk Enterprise instance that stores incoming raw event data and transforms it into searchable events that it places on an index. Each index can contain a variety of data, and is made up of buckets, that is, smaller collections of data and their associated index files.

An indexer cluster, or the Splunk implementation of index replication, is a group of indexers configured to replicate the data of other indexers in the cluster group to ensure the system has redundant copies of all data. By maintaining multiple, identical copies of data, indexer clusters and index replication prevent data loss and ensure that data is available for searching. Key benefits include: data availability, data fidelity, data recovery, disaster recovery and search affinity.

Replication factor is the number of copies of data that the cluster maintains. For example, to ensure that your system can tolerate a failure of two peers, you would configure a replication factor of 3, which means that the cluster stores three identical copies of each bucket on separate nodes. As the replication factor increases, you need to run more indexers and provision more storage for the indexed data. The good news is data replication itself requires little processing power, so you can take advantage of the multiple indexers in a cluster to ingest and index more data.

Things to do

Begin your education track for Splunk Administrators. Enroll in the Education Track for Splunk Administrators to learn the concepts, tasks, and best practices for managing a single Splunk instance or a distributed deployment.
Lift the hood on the indexer. Find the buckets that store your data. How many hot buckets do you have? Do you have many warm buckets or just a few? See Managing Indexers and Clusters of Indexers.
Calculate the replication factor you need. Calculate the trade-offs and performance benefits of adding peers to your indexer clusters and index replication.

gjanders · ‎09-12-2019

Both the title and the answer refer to "How managing indexes helps you scale your deployment", but you are actually referring to indexers or search peers in the answer.

Perhaps you can update this to "How managing indexes ... "? As initially I expected some advice around index naming standards, number of indexes in a deployment et cetera

-
Alerts for Splunk Admins, Version Control for Splunk, Decrypt2 VersionControl For SplunkCloud

adukes_splunk · ‎09-26-2019

Hi @gjanders I see your point! The post is about adding indexers. I've updated the the title and the heading to more accurately reflect the topic.

How many indexers do I need in my Splunk deployment?

The Splunk Product Best Practices team provided this response. Read more about How Crowdsourcing is Shaping the Future of Splunk Best Practices.

Indexer, indexes, indexing?! What's what?

How multiple indexers helps you scale

Things to know

Things to do

The Splunk Product Best Practices team provided this response. Read more about How Crowdsourcing is Shaping the Future of Splunk Best Practices.

Indexer, indexes, indexing?! What's what?

How multiple indexers helps you scale

Things to know

Things to do

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

SignalFlow: What? Why? How?