Splunk Search

Tracking down Regex Errors

ipicbc
Explorer

I have what I think should be a simple question.... how can I find in Splunk why a regex extraction failed? I bring in a log file with events which look pretty similar, but some of the records parse correctly and others fail with an error message - "Error in 'rex' command: Regex match error, please check log". Where is the log in question? Could I see this error in index=_internal? How can I find out what Splunk thinks happened with the records that failed?

If I paste the failed data and the regex into regex101.com they match fine. So maybe some records aren't using the correct field extractor? Who knows!

Thanks for your help

Tags (2)
0 Karma
1 Solution

woodcock
Esteemed Legend

Do it like this:

Your Base Search Here YourBrokenFieldNameHere!="*"

This will return all events where the field that should have been extracted does not exist. Then test these events and your RegEx with a tool like http://www.RegEx101.com. Fix your RegEx, deploy, keep looping until your search returns 0 events.

View solution in original post

0 Karma

woodcock
Esteemed Legend

Do it like this:

Your Base Search Here YourBrokenFieldNameHere!="*"

This will return all events where the field that should have been extracted does not exist. Then test these events and your RegEx with a tool like http://www.RegEx101.com. Fix your RegEx, deploy, keep looping until your search returns 0 events.

0 Karma

ipicbc
Explorer

The plot thickens...

  • I don't get any events returned for the query you suggested, although different queries show many events exist that have failed to parse. It's like when the regex fails then the field isn't created at all?!?!
  • The regex I have always gets a match on regex101.com when presented with the same data that fails to parse in Splunk.
  • Messing about with importing events manually 1 at a time I can see that the events that fail all appear to have a CR/LF in the body of the text.
  • The raw events appear to be loading correctly because _raw has all of the text, so the multiline load rules do not appear to be the problem.

I wonder if there is a regex expression that strips out CR/LF characters, and whether I should be doing that in Event Breaks on load so that the characters don't ever find their way into _raw?

Appreciate your help!!

0 Karma

ipicbc
Explorer

Got it... (?s) puts regex into single line mode which means that the dot includes line feed characters.

Works now. Thanks for your help!!

0 Karma

woodcock
Esteemed Legend

OK, then upvote any answers that were helpful and then click Accept on the best one to close the question.

0 Karma

asimagu
Builder

are these settings to extract one value per event or multivalue(s)?

0 Karma

ipicbc
Explorer

Thanks so much for your help. To continue, the regex is this :

(?P<log_timestamp>\d+\-\d+\-\d+\s+\d+:\d+:\d+\.\d+)\s+(?P<log_level>\w+)\s+[\[](?:[$]|[a-zA-Z\-\_]*)(?P<log_thread>[0-9 ]+)[\]]\s+(?P<log_msg>(?:.|\n|\r)+)

A raw event that works looks like this :
2016-01-29 20:32:33.724 INFO [ 1] Finished Precious Statement report (GenerateReports:128)

A raw event that failed looks like this :
2016-01-30 00:59:49.468 ERROR [ 1] Precious Account Statement Raised exception System.Data.SqlClient.SqlException (0x80131904): Login failed for user 'CAMInterfaceUser'. Reason: The password of the account must be changed.
at SBL.RB.CAM.ReportEngine.StatementHelper.GetLedgersForReportGeneration(SqlConnection sqlconn, Int64 reportTypeId, DateTime currentBusinessDate, DateTime NextBusinessDate) in d:\Code\GMO\merges\Cortex\Enterprise\CAM\Services\Win\SBL.RB.CAM.ReportEngine\SBL.RB.CAM.ReportEngine\StatementHelper.cs:line 143
at SBL.RB.CAM.ReportEngineEOD.PreciousStatement.GenerateReports() in d:\Code\GMO\merges\Cortex\Enterprise\CAM\Services\Win\SBL.RB.CAM.ReportEngine\SBL.RB.CAM.ReportEngineEOD\ReportGenerators\PreciousStatement.cs:line 30 (PreciousStatement:30)

0 Karma

rjthibod
Champion

I am afraid the log (if any) is probably just going to tell you the same error, not the reason for the error. There is no log that outputs details or indications of failed extractions when the issue is simply something wrong with the extraction method/pattern.

It is always worth checking the output of btool to make sure there isn't some syntax error that constitutes a bigger problem. Run btool like this from the command line.

$ <SPLUNK_HOME>/bin/splunk btool check

or this for more debug output

$ <SPLUNK_HOME>/bin/splunk btool check --debug

Also, make sure you are properly extracting fields inside of rex using the format (?<fieldName>SOME_PATTERN).

Post your rex command and some failing data if you want regex help.

Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...