Getting Data In

Why are there completely different formats in same logfile?

sini
Explorer

Hi all,

We have an application which produces logfiles where other logfiles are inserted (they are pulled from stdout when the other program is executed). We are only interested in the stdout that is generated by SQL statements of another program, which are multiline entries themselves in a specific format. So basically an SQL event starts with a date and ends with the next date of an SQL event. We have a RegEx which captures all the SQL lines we are interessted in, but we cannot see a way to ignore the rest that is contained in the logfile, since all routing to nullQueue or SEDCMD takes place after timestamp recognition and event breaking and those other entries are either messing up the event breaking or are attached to the SQL events if we specify a timeconfig which only matches the SQL statements.

Basically what needs to be done is that all lines not matching ^(\d+|\t+|\s\s+|CREATE|SELECT|DROP|UPDATE|INSERT|FROM|TBLPROPERTIES|\)).* need to be excluded before any timestamp recognition or eventbreaking is applied.

To make it clear again. The problem is that all events, also those we want to get rid of are multiline events with different start and end and the date for the eventtypes are specified in different locations and format, hence the exclusion must occur before merging takes place.

Is this possible? 

Regards

Labels (1)
Tags (2)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

As you know, timestamp extraction and event breaking happen early in the processing pipeline and the order cannot be changed.

Is it possible to break events based on the end of the SQL rather than the beginning of the next SQL?

Consider using Cribl (cribl.io) to filter out unwanted events before they get to Splunk.

---
If this reply helps you, Karma would be appreciated.

View solution in original post

0 Karma

PickleRick
SplunkTrust
SplunkTrust

You could use  INGEST_EVAL and/or CLONE_SOURCETYPE.

https://conf.splunk.com/files/2020/slides/PLA1154C.pdf

0 Karma

richgalloway
SplunkTrust
SplunkTrust

As you know, timestamp extraction and event breaking happen early in the processing pipeline and the order cannot be changed.

Is it possible to break events based on the end of the SQL rather than the beginning of the next SQL?

Consider using Cribl (cribl.io) to filter out unwanted events before they get to Splunk.

---
If this reply helps you, Karma would be appreciated.
0 Karma

sini
Explorer

Hi,

Thanks for confirming. I might as well just use different scripted inputs to get exactly what I need. The file isn't written constantly so it's sufficient to parse it once a day and then send the required contents to the Indexer.

Regards

Get Updates on the Splunk Community!

Updated Data Type Articles, Anniversary Celebrations, and More on Splunk Lantern

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

A Prelude to .conf25: Your Guide to Splunk University

Heading to Boston this September for .conf25? Get a jumpstart by arriving a few days early for Splunk ...

4 Ways the Splunk Community Helps You Prepare for .conf25

.conf25 is right around the corner, and whether you’re a first-time attendee or a seasoned Splunker, the ...