Getting Data In

Why are there completely different formats in same logfile?

sini
Explorer

Hi all,

We have an application which produces logfiles where other logfiles are inserted (they are pulled from stdout when the other program is executed). We are only interested in the stdout that is generated by SQL statements of another program, which are multiline entries themselves in a specific format. So basically an SQL event starts with a date and ends with the next date of an SQL event. We have a RegEx which captures all the SQL lines we are interessted in, but we cannot see a way to ignore the rest that is contained in the logfile, since all routing to nullQueue or SEDCMD takes place after timestamp recognition and event breaking and those other entries are either messing up the event breaking or are attached to the SQL events if we specify a timeconfig which only matches the SQL statements.

Basically what needs to be done is that all lines not matching ^(\d+|\t+|\s\s+|CREATE|SELECT|DROP|UPDATE|INSERT|FROM|TBLPROPERTIES|\)).* need to be excluded before any timestamp recognition or eventbreaking is applied.

To make it clear again. The problem is that all events, also those we want to get rid of are multiline events with different start and end and the date for the eventtypes are specified in different locations and format, hence the exclusion must occur before merging takes place.

Is this possible? 

Regards

Labels (1)
Tags (2)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

As you know, timestamp extraction and event breaking happen early in the processing pipeline and the order cannot be changed.

Is it possible to break events based on the end of the SQL rather than the beginning of the next SQL?

Consider using Cribl (cribl.io) to filter out unwanted events before they get to Splunk.

---
If this reply helps you, Karma would be appreciated.

View solution in original post

0 Karma

PickleRick
SplunkTrust
SplunkTrust

You could use  INGEST_EVAL and/or CLONE_SOURCETYPE.

https://conf.splunk.com/files/2020/slides/PLA1154C.pdf

0 Karma

richgalloway
SplunkTrust
SplunkTrust

As you know, timestamp extraction and event breaking happen early in the processing pipeline and the order cannot be changed.

Is it possible to break events based on the end of the SQL rather than the beginning of the next SQL?

Consider using Cribl (cribl.io) to filter out unwanted events before they get to Splunk.

---
If this reply helps you, Karma would be appreciated.
0 Karma

sini
Explorer

Hi,

Thanks for confirming. I might as well just use different scripted inputs to get exactly what I need. The file isn't written constantly so it's sufficient to parse it once a day and then send the required contents to the Indexer.

Regards

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...