Solved: Where should I run my report that populates a summ...

markwymer · ‎02-05-2016

Hi all,

I've got a simple search and filter that gets piped into the collect command to create a Summary index. I've saved it as a report, scheduled it to run every hour on one of my, non-clustered search heads. The data is being extracted from an index that is on two, load balanced indexers.

Everything is fine and, for this particular search, subsequent search times have dramatically reduced.

My question is, would the scheduled report be better run from the two indexers and spread the summary index load over the two, or must it be stored on the search head?

If I can run the report on indexer1, will it pick up the indexed data from indexer2? Or can/should I run it on both?

Sorry, I just thought of a new question! If I run the report on a 'don't index and forward' Search head, will it automatically send the summary index data to the indexers?

Sorry for so many questions but your help, as always, is gratefully received.
Mark.

Jeremiah · ‎02-05-2016

So, lets break down what is happening when you schedule and run a summary search. When you run the search, like any search in Splunk, your search is distributed to the search peers (indexers) configured on the search head. The search head then takes the results of the search and stores them in stash files that it puts into $SPLUNK_HOME/var/spool/splunk directory. By default, Splunk has setup a batch input for this directory. So when the files are dropped into the spool directory, Splunk indexes the files and deletes them. If the search head is configured to forward its data, then it will not index the files locally, but instead forward them to wherever its forwarding destination is. What that means is that the source of the summary search results is not connected to the summary indexing destination. Now in practice, most everyone has their search heads configured to send data to their indexers, the same indexers they have configured as search peers. That's a best practice.

So lets look at the scenarios you mentioned:

If you run the summary index on a search head, and that search head is configured to forward data to your indexers, then any summary data will be evenly distributed among the indexers. This is what you want to do. There isn't any need to think about "distributing" the searches to the indexers, or distributing the results across the indexers, the search head takes care of that for you.

If you run the summary index on a search head, and that search head is not configured to forward data, the summary results will be indexed locally on the search head. You might want to do this, but then you'll have to deal with storage on the search head, and as the summary result set increases in size, your search won't scale accordingly-- you'll lose the benefit of distributed search.

If you run the summary index on an indexer, the data will remain on that indexer. You don't want to do this, because you have multiple indexers, so your search results will be incomplete. In general, you don't want to execute any searches directly on your indexer. Let the search head distribute them.

View solution in original post

Jeremiah · ‎02-05-2016

So, lets break down what is happening when you schedule and run a summary search. When you run the search, like any search in Splunk, your search is distributed to the search peers (indexers) configured on the search head. The search head then takes the results of the search and stores them in stash files that it puts into $SPLUNK_HOME/var/spool/splunk directory. By default, Splunk has setup a batch input for this directory. So when the files are dropped into the spool directory, Splunk indexes the files and deletes them. If the search head is configured to forward its data, then it will not index the files locally, but instead forward them to wherever its forwarding destination is. What that means is that the source of the summary search results is not connected to the summary indexing destination. Now in practice, most everyone has their search heads configured to send data to their indexers, the same indexers they have configured as search peers. That's a best practice.

So lets look at the scenarios you mentioned:

If you run the summary index on a search head, and that search head is configured to forward data to your indexers, then any summary data will be evenly distributed among the indexers. This is what you want to do. There isn't any need to think about "distributing" the searches to the indexers, or distributing the results across the indexers, the search head takes care of that for you.

If you run the summary index on a search head, and that search head is not configured to forward data, the summary results will be indexed locally on the search head. You might want to do this, but then you'll have to deal with storage on the search head, and as the summary result set increases in size, your search won't scale accordingly-- you'll lose the benefit of distributed search.

If you run the summary index on an indexer, the data will remain on that indexer. You don't want to do this, because you have multiple indexers, so your search results will be incomplete. In general, you don't want to execute any searches directly on your indexer. Let the search head distribute them.

markwymer · ‎02-05-2016

Thanks Jeramiah, that answers everything and gives some very useful background information too.

renjith_nair · ‎02-05-2016

We have almost the same set up and would suggest

Run the searches on the search head (it's made for that)
Forward the summary index to your load balanced indexers (indexes are supposed to be on indexers 🙂 )

Configuration for search head as a forwarder

# Turn off indexing on the search head
[indexAndForward]
index = false

[tcpout]
defaultGroup = my_search_peers 
forwardedindex.filter.disable = true  
indexAndForward = false 

[tcpout:my_search_peers]
server=10.10.10.1:9997,10.10.10.2:9997,10.10.10.3:9997
autoLB = true

---
What goes around comes around. If it helps, hit it with Karma 🙂

markwymer · ‎02-05-2016

Thanks Nair,

Definitely the answer that I was hoping for.

I don't have the infrastructure in my test environment to try this out, so I thought I would ask the question before diving straight into my live indexers/searcheads.

Brgds,
Mark.

Where should I run my report that populates a summary index?

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Join the Conversation

Where should I run my report that populates a summary index?

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...