Getting Data In

How to properly split fields from syslog events that were split to different sourcetypes?

StephenD1
Explorer

This is a followup question to the solution on this thread:

https://community.splunk.com/t5/Getting-Data-In/create-multiple-sourcetypes-from-single-syslog-sourc...

I'm trying to do exactly what the original question asked but I need to apply different DELIM/FIELDS values to the different sourcetypes I create this way.

The solution says that once the new sourcetype is created "...just use additional transforms entries with regular expressions that fit the specific subset of data..." does this mean that if I want to further extract fields from the new sourcetype I can only do that using TRANSFORMS from that point forward or would I be able to put a new stanza further down in the props.conf for [my_new_st] and use additional REPORTs or EXTRACTs that only apply to that new sourcetype?

For example, can I do something like the following?:
Description: first split the individual events based on the value regex-matched on the 5th field then do different field extracts for each of the new sourcetypes. 

 

 

props.conf:

[syslog]
TRANSFORMS-create_sourcetype1 = create_sourcetype1
TRANSFORMS-create_sourcetype2 = create_sourcetype2

[sourcetype1]
REPORT-extract = custom_delim_sourcetype1

[sourcetype2]
REPORT-extract = custom_delim_sourcetype2

 

 

 

 

 

transforms.conf:

[create_sourcetype1]
REGEX = ^(?:[^ \n]* ){5}(my_log_name_1:)\s
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::sourcetype1

[create_sourcetype2]
REGEX = ^(?:[^ \n]* ){5}(my_log_name_2:)\s
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::sourcetype2

[custom_delim_sourcetype1]
DELIMS = " "
FIELDS = d_month,d_date,d_time,d_source,d_logname,d_info,cs_url,cs_bytes,cs_port

[custom_delim_sourcetype2]
DELIMS = " "
FIELDS = d_month,d_date,d_time,d_source,d_logname,d_info,cs_username,sc_http_status

 

 

 

0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

Something like that.

Explanation - Splunk works (except for all the maintenance stuff that happens behind the scenes) generally in two pipelines.

One set of things happens during event's ingestion - so called index-time operations. And after the event is indexed there are search-time operations which happen during searching from indexes and further processing.

So during indexing you rewrite the sourcetype metadata field using TRANSFORMs. The event is getting indexed with the new sourcetype.

Then when you search the event it is getting parsed according to the sourcetype-defined search-time extractions (REPORT and EXTRACT settings). And they are defined separately for each of "new" sourcetypes.

This is actually a quite typical use case - split a "combined" sourcetype during indexing into separate ones and define different search-time configurations for those sourcetypes.

View solution in original post

PickleRick
SplunkTrust
SplunkTrust

Something like that.

Explanation - Splunk works (except for all the maintenance stuff that happens behind the scenes) generally in two pipelines.

One set of things happens during event's ingestion - so called index-time operations. And after the event is indexed there are search-time operations which happen during searching from indexes and further processing.

So during indexing you rewrite the sourcetype metadata field using TRANSFORMs. The event is getting indexed with the new sourcetype.

Then when you search the event it is getting parsed according to the sourcetype-defined search-time extractions (REPORT and EXTRACT settings). And they are defined separately for each of "new" sourcetypes.

This is actually a quite typical use case - split a "combined" sourcetype during indexing into separate ones and define different search-time configurations for those sourcetypes.

StephenD1
Explorer

Ok, got it. So if I'm understanding you correctly, configs similar to my example should work to split my syslog events based on the regex during index-time and then when Splunk goes back to process the REPORT/EXTRACTs it should match fields to the new sourcetypes at search-time based on the already indexed sourcetypes from the TRANSFORMS, correct?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Yes. During ingestion you overwrite the original sourcetype. Since then Splunk has no idea of the original sourcetype whatsoever. During search time it behaves the same as if you'd ingested it with the new sourcetypes from scratch. Splunk has no idea during search time what happens during index-time. It only sees indexed effects of the index-time operations.

Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...