Where are Source type definitions stored in Distributed environment? and How to manage them?
For example -
When I create a sourcetype using Splunk Web - I am assuming that they are being stored at Search Heads and later somehow migrated to indexers. I can edit them via splunk web UI directly. But Web UI won't show any source types defined in my indexers.
If I want to achieve selective parsing (ex: index only WARN logs), then since event processing occurs at indexer, I will have to define this sourcetype at indexers. And these change won't be visible to me at all since indexers are not accessible via splunk web. ( I am using splunkcloud) This is a major pain point in improving data quality since the sourcetype configuration is not all all visible to us.
What is the best practise, how do i manage my source types especially in case of splunkcloud. Is there any way to keep sourcetypes on indexers and Search Heads synchronised?
Sourcetypes are not
defined so they are not
stored anywhere; any arbitrary string can be used. Each one is simply a field value that is generally hard-coded inside of the
inputs.conf file that defines the input. Sometimes a
sourcetype is overriden by settings inside of
props.conf on the first full instance of Splunk Enterprise that handles the event.
regarding sourcetypes: They are stored on the search heads, correct. Sourcetypes are getting indexed in the .tsidx files. That's why the indexers know about them.
Search peers (indexers) get a so-called knowledge bundle with information about the search from the Search Head(s) when a search is being ran.
Regarding parsing: you're right. Everything that needs to be done before indexing an event, has to go on the first instance that does parsing, in your case an indexer (which could also be a Heavy Forwarder).
For an overview, Aplura has a nice cheat sheet about "where to put props": https://www.aplura.com/wp-content/uploads/where_to_put_props.pdf
Thank you for sharing the document. I have one doubt.
Most of my sourcetypes are defined at search head via splunk web UI. They use LINE_BREAKER attributes like - BREAK_ONLY_BEFORE, LINE _BREAKER etc.
When I make changes into any of these attributes, I can see changes in events breaking within few minutes.
Now, these LINE BREAKING key:value pairs needs to be utilised by indexer before the events are indexed.
How is it that I make a change on Splunk Web UI and it get's reflected in event processing while the changes should have been made at indexer level. Any idea how it works in backend?