This is a followup question to the solution on this thread:
I'm trying to do exactly what the original question asked but I need to apply different DELIM/FIELDS values to the different sourcetypes I create this way.
The solution says that once the new sourcetype is created "...just use additional transforms entries with regular expressions that fit the specific subset of data..." does this mean that if I want to further extract fields from the new sourcetype I can only do that using TRANSFORMS from that point forward or would I be able to put a new stanza further down in the props.conf for [my_new_st] and use additional REPORTs or EXTRACTs that only apply to that new sourcetype?
For example, can I do something like the following?:
Description: first split the individual events based on the value regex-matched on the 5th field then do different field extracts for each of the new sourcetypes.
props.conf:
[syslog]
TRANSFORMS-create_sourcetype1 = create_sourcetype1
TRANSFORMS-create_sourcetype2 = create_sourcetype2
[sourcetype1]
REPORT-extract = custom_delim_sourcetype1
[sourcetype2]
REPORT-extract = custom_delim_sourcetype2
transforms.conf:
[create_sourcetype1]
REGEX = ^(?:[^ \n]* ){5}(my_log_name_1:)\s
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::sourcetype1
[create_sourcetype2]
REGEX = ^(?:[^ \n]* ){5}(my_log_name_2:)\s
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::sourcetype2
[custom_delim_sourcetype1]
DELIMS = " "
FIELDS = d_month,d_date,d_time,d_source,d_logname,d_info,cs_url,cs_bytes,cs_port
[custom_delim_sourcetype2]
DELIMS = " "
FIELDS = d_month,d_date,d_time,d_source,d_logname,d_info,cs_username,sc_http_status
Something like that.
Explanation - Splunk works (except for all the maintenance stuff that happens behind the scenes) generally in two pipelines.
One set of things happens during event's ingestion - so called index-time operations. And after the event is indexed there are search-time operations which happen during searching from indexes and further processing.
So during indexing you rewrite the sourcetype metadata field using TRANSFORMs. The event is getting indexed with the new sourcetype.
Then when you search the event it is getting parsed according to the sourcetype-defined search-time extractions (REPORT and EXTRACT settings). And they are defined separately for each of "new" sourcetypes.
This is actually a quite typical use case - split a "combined" sourcetype during indexing into separate ones and define different search-time configurations for those sourcetypes.
Something like that.
Explanation - Splunk works (except for all the maintenance stuff that happens behind the scenes) generally in two pipelines.
One set of things happens during event's ingestion - so called index-time operations. And after the event is indexed there are search-time operations which happen during searching from indexes and further processing.
So during indexing you rewrite the sourcetype metadata field using TRANSFORMs. The event is getting indexed with the new sourcetype.
Then when you search the event it is getting parsed according to the sourcetype-defined search-time extractions (REPORT and EXTRACT settings). And they are defined separately for each of "new" sourcetypes.
This is actually a quite typical use case - split a "combined" sourcetype during indexing into separate ones and define different search-time configurations for those sourcetypes.
Ok, got it. So if I'm understanding you correctly, configs similar to my example should work to split my syslog events based on the regex during index-time and then when Splunk goes back to process the REPORT/EXTRACTs it should match fields to the new sourcetypes at search-time based on the already indexed sourcetypes from the TRANSFORMS, correct?
Yes. During ingestion you overwrite the original sourcetype. Since then Splunk has no idea of the original sourcetype whatsoever. During search time it behaves the same as if you'd ingested it with the new sourcetypes from scratch. Splunk has no idea during search time what happens during index-time. It only sees indexed effects of the index-time operations.