Solved: How to properly split fields from syslog events th...

StephenD1

This is a followup question to the solution on this thread:

https://community.splunk.com/t5/Getting-Data-In/create-multiple-sourcetypes-from-single-syslog-sourc...

I'm trying to do exactly what the original question asked but I need to apply different DELIM/FIELDS values to the different sourcetypes I create this way.

The solution says that once the new sourcetype is created "...just use additional transforms entries with regular expressions that fit the specific subset of data..." does this mean that if I want to further extract fields from the new sourcetype I can only do that using TRANSFORMS from that point forward or would I be able to put a new stanza further down in the props.conf for [my_new_st] and use additional REPORTs or EXTRACTs that only apply to that new sourcetype?

For example, can I do something like the following?:
Description: first split the individual events based on the value regex-matched on the 5th field then do different field extracts for each of the new sourcetypes.

props.conf:

[syslog]
TRANSFORMS-create_sourcetype1 = create_sourcetype1
TRANSFORMS-create_sourcetype2 = create_sourcetype2

[sourcetype1]
REPORT-extract = custom_delim_sourcetype1

[sourcetype2]
REPORT-extract = custom_delim_sourcetype2

transforms.conf:

[create_sourcetype1]
REGEX = ^(?:[^ \n]* ){5}(my_log_name_1:)\s
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::sourcetype1

[create_sourcetype2]
REGEX = ^(?:[^ \n]* ){5}(my_log_name_2:)\s
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::sourcetype2

[custom_delim_sourcetype1]
DELIMS = " "
FIELDS = d_month,d_date,d_time,d_source,d_logname,d_info,cs_url,cs_bytes,cs_port

[custom_delim_sourcetype2]
DELIMS = " "
FIELDS = d_month,d_date,d_time,d_source,d_logname,d_info,cs_username,sc_http_status

PickleRick

Something like that.

Explanation - Splunk works (except for all the maintenance stuff that happens behind the scenes) generally in two pipelines.

One set of things happens during event's ingestion - so called index-time operations. And after the event is indexed there are search-time operations which happen during searching from indexes and further processing.

So during indexing you rewrite the sourcetype metadata field using TRANSFORMs. The event is getting indexed with the new sourcetype.

Then when you search the event it is getting parsed according to the sourcetype-defined search-time extractions (REPORT and EXTRACT settings). And they are defined separately for each of "new" sourcetypes.

This is actually a quite typical use case - split a "combined" sourcetype during indexing into separate ones and define different search-time configurations for those sourcetypes.

View solution in original post

PickleRick

Something like that.

Explanation - Splunk works (except for all the maintenance stuff that happens behind the scenes) generally in two pipelines.

One set of things happens during event's ingestion - so called index-time operations. And after the event is indexed there are search-time operations which happen during searching from indexes and further processing.

So during indexing you rewrite the sourcetype metadata field using TRANSFORMs. The event is getting indexed with the new sourcetype.

Then when you search the event it is getting parsed according to the sourcetype-defined search-time extractions (REPORT and EXTRACT settings). And they are defined separately for each of "new" sourcetypes.

This is actually a quite typical use case - split a "combined" sourcetype during indexing into separate ones and define different search-time configurations for those sourcetypes.

StephenD1

Ok, got it. So if I'm understanding you correctly, configs similar to my example should work to split my syslog events based on the regex during index-time and then when Splunk goes back to process the REPORT/EXTRACTs it should match fields to the new sourcetypes at search-time based on the already indexed sourcetypes from the TRANSFORMS, correct?

PickleRick

Yes. During ingestion you overwrite the original sourcetype. Since then Splunk has no idea of the original sourcetype whatsoever. During search time it behaves the same as if you'd ingested it with the new sourcetypes from scratch. Splunk has no idea during search time what happens during index-time. It only sees indexed effects of the index-time operations.

How to properly split fields from syslog events that were split to different sourcetypes?

field extraction

indexer

props.conf

sourcetype

syslog

transforms.conf

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

How to Monitor Google Kubernetes Engine (GKE)

Index This | How can you make 45 using only 4?