I have sourcetype based definition in which I mentioned INDEXED_EXTRACTION=JSON. Under this sourcetype there are 10 sources configured. Out of 10, let us say one is not in JSON format. So how to use same sourcetype but no need to mentioned INDEXED_EXTRACTION=JSON for that particular source alone? I thought of using source:: based extraction in props with other attributes and not mentioning this INDEXED_EXTRACTION attribute. In that case will it be considered from the sourcetype declaration?
Hi
sourcetype defining your log file schema. For that reason it must be different for different log files/log event types. Here is one excellent description for it by Mark McCullough (Splunk Slack #bestpractices)
--8<--
I think I've finally figured out how to explain to "I know Splunk!" types what the sourcetype field means in a way that doesn't cause them to want to pick _json for everything that uses JSON syntax: "It's like a reference to a XSD file for XML. It specifies what fields are required, what fields are permitted, and the overall structure of the event."
--8<--
There is also some naming standards for KO in Splunk which helps you to manage all these KOs. In most cases I'm using naming schema "owner:system/vendor:app:subsystem:log file:#" There is no need to keep all those, but usually it has at least three of those and number as a suffix. When the format of log changed later I just increment last digit by one.
In most times when you have Splunk system where are even couple of different business / tech systems you should use this kind of naming standard for all your KO like apps, indexes, saved searches, alerts etc. This will help you and at least it helps your splunk admins.
r. Ismo
I'm not 100% sure what you want. As you can see in the docs (https://docs.splunk.com/Documentation/Splunk/Latest/Admin/Propsconf), you can define settings based on sourcetype, source or host so that some of the settings can be selectively applied to your sources.
But the main question here is - why do you want to have INDEXED_EXTRACTIONS (there is "S" at the end, it's important!). INDEXED_EXTRACTIONS are sometimes inevitable (if the log file has variable order of columns and the field order is determined by the header row, it's the only way to reasonably ingest such file) but often search-time parsing is enough and generally with Splunk search-time operations are the preferred method. So why not KV_MODE=json instead of INDEXED_EXTRACTIONS?
Hi @nags,
it isn't possible: sourcetype defines the specification of a data source (one of them is INDEXED_EXTRACTIONS) so you cannot use the same data definition for different data sources.
As a workaround, you could use a similar sourcetype (e.g. my_sourcetype and my_sourcetype_json) so in the searches you can use:
sourcetype=my_sourcetype*
and take both of them.
Ciao.
Giuseppe