Solved: Why indexer really needs knowledge objects from SH...

MichalG1 · ‎03-29-2024

Hello Team,

As per https://docs.splunk.com/Documentation/Splunk/9.2.0/DistSearch/Knowledgebundlereplication

"The search head needs to distribute this material to its search peers so that they can properly execute queries on its behalf"

Or

"The knowledge bundle consists of a set of files that the search peers ordinarily need in order to perform their searches"

Could you please give me one example why we really need it ? I had the impression that to return search results to SH indexer just need SPL query and it's locally indexed data + metadata.

One of my guesses for a good example were: lookup files, but i guess indexer should not need any lookup files since that job is done be search head, not indexer. The same with other KO objects like tags, event types, macros etc...-> those objects should not be needed on the indexer to perform search, those are used by search head to enrich data returned by the indexer.

Another theory: we distribute those files not to help with searching, but with parsing and indexing (for example using props.conf and transforms.conf). Maybe that is the case ?

Extra question: the conf files delivered in the bundle: if i do understand correctly those settings are in memory only, not modifying any existing conf files on the indexer ? But at the same modifying memory settings for (for example) index.conf ? If so - i should be able to run "splunk btool indexes list" to see something different then "splunk show" -> to compare the diff between current configuration files versus those sent from bundle and applied in memory ? What are the best practices here ?

What am i missing ?

Thanks,

Michal

richgalloway · ‎03-31-2024

Yes, you understand correctly.

---
If this reply helps you, Karma would be appreciated.

View solution in original post

MichalG1 · ‎03-31-2024

Thanks @richgalloway

So just to confirm:

"To know what results to return to the SH, the peers need to know the values of the tags, eventtypes, and macros used in the query. "

Example: "index=_audit eventtype=splunk_access". Since event type extraction is search-time (not index-time) indexer does not have definition for that event type. Because of this SH need to push to indexer definition for that event type:

[splunk_access]
search = index=_audit "action=login attempt" NOT "action=search"

Once that is done, indexer will actually expand original SQL query to "index=_audit index=_audit action=login attempt NOT action=search" and will be able to execute the query correctly.

The same would happen with most of the other Knowledge Objects. Including all the search time field extractions.

So the summary would be: Search Head needs to push Knowledge Objects to indexer, because for indexer those are "unknown variables/names". Indexer does not have those definitions and does not know how to expand/execute SQL queries using those KOs. This is applicable only to search-time operations/objects defined on SH (index-time related configurations like TRANSFORMS should be already on the indexer).

Could you please confirm @richgalloway all of this is correct ?

Thanks !

richgalloway · ‎03-31-2024

Yes, you understand correctly.

---
If this reply helps you, Karma would be appreciated.

MichalG1 · ‎03-31-2024

Thanks a lot @richgalloway

That behavior of SH seems to be unnecessarily complicated.

Instead of sending all of those KO bundles to indexers, could not SH first expand SPL query (to resolve all of the names/variables which are search-time) and then sent it to indexers ?

Thanks,

Michal

richgalloway · ‎03-31-2024

It may be complicated, but I think it's necessary. Perhaps it could be better, though.

Even if the SH did expand the query (and maybe it does) before sending to the peers, that's just a part of what the bundle is used for. Search-time field extractions and lookups done by the indexers make the query more efficient.

---
If this reply helps you, Karma would be appreciated.

richgalloway · ‎03-31-2024

Most of the work of a query is done by the indexers so they need to know as much about the search as possible. That is what the knowledge bundle is for. To know what results to return to the SH, the peers need to know the values of the tags, eventtypes, and macros used in the query. The also need to know which fields to extract and how to extract them. It's all part of the map/reduce process where the search activity is divided among many peers to make the query faster.

Information sent in the bundle does not modify the settings in the indexer. The bundle supplements the information the peer read from its .conf files. That supplementary data is not visible to either btool or splunk show.

---
If this reply helps you, Karma would be appreciated.

Why indexer really needs knowledge objects from SH ?

lookup

They're back! Join the SplunkTrust and MVP at .conf24

Enterprise Security Content Update (ESCU) | New Releases

Detecting Remote Code Executions With the Splunk Threat Research Team