Splunk Search

Rex vs field extraction with very large multiline events, latter not working properly?

Whistler
Engager

Hi,

I'm importing some very large multi-line events into Splunk and trying to extract fields from them. The events look something like this:

2017-05-11 08:42:44,3920 ERROR [231f97ad-36f7-46d1-9c11-4fb69e6d2cd9] [Shared.ErrorReports.ErrorReporterBase] - ErrorReport 361489: Exception Handling service TransactieOphalen, context:
<?xml version="1.0" encoding="utf-16"?>
<Profiel subset="userSet">
  [more xml]
</Profiel>
System.ServiceModel.FaultException: errormelding

Server stack trace: 
   at System.ServiceModel.Channels.ServiceChannel.HandleReply(ProxyOperationRuntime operation, ProxyRpc& rpc)
   at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)
   at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)
   at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)

Exception rethrown at [0]: 
   at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
   at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
   at IBetalingen.TransactieStatus(TransactieStatusRequest request)
   at BetalingTransactieOphalenService.handle(IServiceContext context) in BetalingTransactieOphalen.cs:line 87
   at Factories.ServiceWrapper.handle(IServiceContext context) in ServiceFactory.cs:line 97

Now this is one of the smaller events, the context xml can easily be more than 10000 lines. There are also single lines which follow the same formatting (the context part isnt logged normally, only when exceptions occur). For both the single lines and the above multiline event my field extraction by regular expression is working fine. However, for the larger events it isnt extracting anything.

When I place the same regular expression inside the rex function, it does extract fields for the large events.
The expression is:

    .{24}[ ]*(?<LogLevel>[A-Z]*)[ ]*(\[(?<KetenId>.*?)\])?[ ]*(\((?<BlueriqSessieId>.*)\))?[ ]*\[(?<Class>[A-Za-z0-9\.` \-]*?)\]([- ]*)?(?<LogInhoud>(?s).*)

Can anyone explain what's happening here? If I remove the (?s) before the last .* and put (?m-s) at the start, the expression works for all events (big or small), but only the first line gets extracted. That, together with it working properly in the rex funtion, says to me that the expression really seems to be OK and the problem lies somewhere else.

0 Karma

aakwah
Builder

Hello,

I think this is related to PCRE recursion limit, check splunkd.log for this error message like this one:

ERROR Regex - Failed in pcre_exec: Error PCRE_ERROR_RECURSIONLIMIT for regex: <Your Regex> 

What I've done in a similar case was optimizing my regex.

Regards,
Ahmed

0 Karma
Get Updates on the Splunk Community!

Splunk Observability for AI

Don’t miss out on an exciting Tech Talk on Splunk Observability for AI! Discover how Splunk’s agentic AI ...

[Puzzles] Solve, Learn, Repeat: Dereferencing XML to Fixed-length events

This challenge was first posted on Slack #puzzles channelFor a previous puzzle, I needed a set of fixed-length ...

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...