Getting Data In

Is it possible to regex a sourcetype on a per file basis

freern
New Member

One of our 3rd party apps has some pretty unfriendly logging. The app itself carries out somewhere between 20-30 jobs, each of which has its own log. the issue we have is that all logs are written to one directory and the log files themselves are named such as this

20200213.445933.log

The only way to distinguish between job log files is by a header within each log that has a description included. A further issue is that every line in the file is prefixed with a date and time. This results in Splunk splitting every line into a separate event even when the true event may be several lines long. for example:

[2020-02-13 15:00:34] #########################################################
[2020-02-13 15:00:34] # Log File Path:  /data/logs/jobs/20200213.445933.log
[2020-02-13 15:00:34] # Creation Date:  Thu Feb 13 15:00:34 GMT 2020
[2020-02-13 15:00:34] # Description:    DQ:Import DQ CAR Files
[2020-02-13 15:00:34] # Parameters: --terminatetime 175000 -mapping 52000 -daemon yes -rb true 
[2020-02-13 15:00:34] #########################################################
[2020-02-13 15:00:34] 'INIT' actions:
[2020-02-13 15:00:34]  Collect Files
[2020-02-13 15:00:34] Collect Files Action
[2020-02-13 15:00:34] Connected: ftp://***********************
[2020-02-13 15:00:34] Filter: ^BT.*\.CAR
[2020-02-13 15:00:35] Files found: 0
[2020-02-13 15:00:35] Retrieving batches for mapping : DQ CAR Records
[2020-02-13 15:00:35] Found no Batch files to import
[2020-02-13 15:00:35] No 'CLSE' actions
[2020-02-13 15:01:35] 'INIT' actions:
[2020-02-13 15:01:35]  Collect Files
[2020-02-13 15:01:35] Collect Files Action
[2020-02-13 15:01:35] Connected: ftp://***********************
[2020-02-13 15:01:35] Filter: ^BT.*\.CAR
[2020-02-13 15:01:35] Files found: 0
[2020-02-13 15:01:35] Retrieving batches for mapping : DQ CAR Records
[2020-02-13 15:01:35] Found no Batch files to import
[2020-02-13 15:01:35] No 'CLSE' actions
[2020-02-13 15:02:45] 'INIT' actions:
[2020-02-13 15:02:45]  Collect Files
[2020-02-13 15:02:46] Collect Files Action
[2020-02-13 15:02:46] Connected: ftp://***********************
[2020-02-13 15:02:46] Filter: ^BT.*\.CAR
[2020-02-13 15:02:46] Files found: 0
[2020-02-13 15:02:46] Retrieving batches for mapping : DQ CAR Records
[2020-02-13 15:02:46] Found no Batch files to import
[2020-02-13 15:02:46] No 'CLSE' actions
[2020-02-13 15:03:47] 'INIT' actions:
[2020-02-13 15:03:47]  Collect Files
[2020-02-13 15:03:47] Collect Files Action
[2020-02-13 15:03:47] Connected: ftp://***********************
[2020-02-13 15:03:47] Filter: ^BT.*\.CAR
[2020-02-13 15:03:47] Files found: 0
[2020-02-13 15:03:47] Retrieving batches for mapping : DQ CAR Records
[2020-02-13 15:03:47] Found no Batch files to import
[2020-02-13 15:03:47] No 'CLSE' actions

One event would actually look like this:

 [2020-02-13 15:00:34] 'INIT' actions:
 [2020-02-13 15:00:34]  Collect Files
 [2020-02-13 15:00:34] Collect Files Action
 [2020-02-13 15:00:34] Connected: ftp://***********************
 [2020-02-13 15:00:34] Filter: ^BT.*\.CAR
 [2020-02-13 15:00:35] Files found: 0
 [2020-02-13 15:00:35] Retrieving batches for mapping : DQ CAR Records
 [2020-02-13 15:00:35] Found no Batch files to import
 [2020-02-13 15:00:35] No 'CLSE' actions

Our 3rd party developer has advised that this cannot be changed, so the only option is to work around this in splunk somehow.

I was wondering if it is possible to regex out the description in each log and assign it as a sourcetype. Each sourcetype could then have its own event splitting rules. Is this possible?

0 Karma

maciep
Champion

This one is a bit tricky I think. Breaking the events up based on the INIT line should be easy enough. Tying those back to the description is tougher. In Splunk, we have to create an event first from the data streaming in. It's not like we're parsing a file in whole, we're processing it as it streams by. And so we can't grab the header, store it and then tag that data to other events....at least no way that i know how.

BUT, we can do some things. For example, we can set the entire header aside as its own event and give it a different sourcetype. And then create events for the job actions. Later on when we search the data, we should be able to tie together those two events based on the host and source - since they will match. So at that point, we can sort of glue the description back in.

Not sure if it's the best method to do this, but maybe it's one. This quick example assumes the sourcetype set when you ingest the logs is app:job:log and creates one called app:job:header....both of which are easily changeable of course.

Indexer/Parse Config (On your indexers)
props.conf

[app:job:log]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)(?=\[[^\]]+\]\s*'INIT')
TIME_PREFIX = ^\[
MAX_TIMESTAMP_LOOKUPAHEAD = 20
TRANSFORMS-header_sourcetype = set_app_job_header_sourcetype

transforms.conf

[set_app_job_header_sourcetype]
REGEX = ^\[[^\]]+\]\s*#{5}
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::app:job:header

Search Config (on your search heads)
props.conf

[app:job:header]
REPORT-header_fields = app_job_header_fields

transforms.conf

[app_job_header_fields]
REGEX = \]\s+#\s+([^:]+):\s*([^\r\n]+)
FORMAT = $1::$2

And then assuming all that works, a simple sample search might be like

index=app sourcetype=app:job:*
| eventstats values(Description) as Description by host source
| where sourcetype="app:job:log"
0 Karma

richgalloway
SplunkTrust
SplunkTrust

It's possible to change the sourcetype, but Splunk will not then apply the new sourcetype's settings.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

 (view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...