Hi all,
We have an application which produces logfiles where other logfiles are inserted (they are pulled from stdout when the other program is executed). We are only interested in the stdout that is generated by SQL statements of another program, which are multiline entries themselves in a specific format. So basically an SQL event starts with a date and ends with the next date of an SQL event. We have a RegEx which captures all the SQL lines we are interessted in, but we cannot see a way to ignore the rest that is contained in the logfile, since all routing to nullQueue or SEDCMD takes place after timestamp recognition and event breaking and those other entries are either messing up the event breaking or are attached to the SQL events if we specify a timeconfig which only matches the SQL statements.
Basically what needs to be done is that all lines not matching ^(\d+|\t+|\s\s+|CREATE|SELECT|DROP|UPDATE|INSERT|FROM|TBLPROPERTIES|\)).* need to be excluded before any timestamp recognition or eventbreaking is applied.
To make it clear again. The problem is that all events, also those we want to get rid of are multiline events with different start and end and the date for the eventtypes are specified in different locations and format, hence the exclusion must occur before merging takes place.
Is this possible?
Regards
As you know, timestamp extraction and event breaking happen early in the processing pipeline and the order cannot be changed.
Is it possible to break events based on the end of the SQL rather than the beginning of the next SQL?
Consider using Cribl (cribl.io) to filter out unwanted events before they get to Splunk.
You could use INGEST_EVAL and/or CLONE_SOURCETYPE.
As you know, timestamp extraction and event breaking happen early in the processing pipeline and the order cannot be changed.
Is it possible to break events based on the end of the SQL rather than the beginning of the next SQL?
Consider using Cribl (cribl.io) to filter out unwanted events before they get to Splunk.
Hi,
Thanks for confirming. I might as well just use different scripted inputs to get exactly what I need. The file isn't written constantly so it's sufficient to parse it once a day and then send the required contents to the Indexer.
Regards