Splunk Search

How to parse events before indexing?

Thulasinathan_M
Contributor

Hi Splunk Experts,
I want to break all lines as a single Line event [\r\n]. But if there are logs with stacktrace I want to consider them as multi-line event. 
I've tested below regex and it works as expected, but I'm not sure on, what are the properties I should apply them for a sourcetype. This is for an application which logs millions of event in a minute. Please assist me with an optimized solution.

(.*[\n]((.*\)\])?(\s+at.*\)\n))+)

Sample logs:

 

 

[(2023-08-03 10:00:03)] INFO: Request completed successfully.
[(2023-08-03 10:00:03)] ERROR: Request got failed.
[(2023-08-03 10:00:02)] Exception in thread "main" java.lang.NullPointerException
	at com.example.MyClass.method1(MyClass.java:12)
	at com.example.MyClass.method2(MyClass.java:34)
	at com.example.AnotherClass.someMethod(AnotherClass.java:56)
	at com.example.Main.main(Main.java:23)
[(2023-08-03 10:00:03)] INFO: Request Submitted successfully.
[(2023-08-03 10:00:03)] INFO: Request completed successfully.
[(2023-08-03 10:00:03)] WARN: Request failed unsuccessfully.
[(2023-08-03 10:00:02)] java.io.FileNotFoundException: file.txt (No such file or directory)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.open0(Native Method)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.<init>(FileInputStream.java:112)
[(2023-08-03 10:00:02)]	at com.example.FileDemo.readFromFile(FileDemo.java:55)
[(2023-08-03 10:00:02)]	at com.example.Main.main(Main.java:12)
[(2023-08-03 10:00:03)] INFO: Request completed successfully.
[(2023-08-03 10:00:04)] DEBUG: Processing request: /api/v1/data?id=67890
[(2023-08-03 10:00:03)] WARN: Request failed unsuccessfully. java.lang.IllegalArgumentException: Invalid input: negative value not allowed
[(2023-08-03 10:00:02)]	at com.example.MathUtils.squareRoot(MathUtils.java:42)
[(2023-08-03 10:00:02)]	at com.example.Main.main(Main.java:33)
[(2023-08-03 10:00:02)] ERROR: Failed to fetch data from the database.

 

 


Expected First Multi-Line Event:

 

 

[(2023-08-03 10:00:02)] Exception in thread "main" java.lang.NullPointerException
	at com.example.MyClass.method1(MyClass.java:12)
	at com.example.MyClass.method2(MyClass.java:34)
	at com.example.AnotherClass.someMethod(AnotherClass.java:56)
	at com.example.Main.main(Main.java:23)

 

 


Expected Second Multi-Line Event:

 

 

[(2023-08-03 10:00:02)] java.io.FileNotFoundException: file.txt (No such file or directory)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.open0(Native Method)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.<init>(FileInputStream.java:112)
[(2023-08-03 10:00:02)]	at com.example.FileDemo.readFromFile(FileDemo.java:55)
[(2023-08-03 10:00:02)]	at com.example.Main.main(Main.java:12)

 

 


Expected Third Multi-Line Event:

 

 

[(2023-08-03 10:00:03)] WARN: Request failed unsuccessfully. java.lang.IllegalArgumentException: Invalid input: negative value not allowed
[(2023-08-03 10:00:02)]	at com.example.MathUtils.squareRoot(MathUtils.java:42)
[(2023-08-03 10:00:02)]	at com.example.Main.main(Main.java:33)

 

 

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Essentially, you want your events to break at a newline followed by a timestamp?

What do you currently have configured?

0 Karma

Thulasinathan_M
Contributor

Hi @ITWhisperer,

Currently the configuration is below

CHARSET=UTF-8
SHOULD_LINEMERGE=true
disabled=false
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true

 

0 Karma

inventsekar
SplunkTrust
SplunkTrust

you have  a single file, but you want to do multiple line breaking in that single file... is that right?

thanks and best regards,
Sekar

PS - If this or any post helped you in any way, pls consider upvoting, thanks for reading !
0 Karma

Thulasinathan_M
Contributor

Hi @inventsekar 
Yes that's correct!! I want to break each & every lines as a single line event, but if there are any logs met above condition mentioned in the post, then I want to wrap those logs as a multi-line event and this should be applicable to all files under my sourcetype.

0 Karma

inventsekar
SplunkTrust
SplunkTrust

Pls check this https://community.splunk.com/t5/Getting-Data-In/Line-break-with-multiple-Linebreaker/m-p/400335

the values for BREAK_ONLY_BEFORE, MUST_NOT_BREAK_AFTER, MUST_BREAK_AFTER should be updated for your requirement properly..  

rough one.. 

BREAK_ONLY_BEFORE=^Exception | ^java.io.FileNotFoundException | ^WARN
MUST_NOT_BREAK_AFTER=something here
MUST_BREAK_AFTER=something here

 

thanks and best regards,
Sekar

PS - If this or any post helped you in any way, pls consider upvoting, thanks for reading !
0 Karma

Thulasinathan_M
Contributor

Thanks for the pointers, I've come-up with below config, but still it's not working and it looks bit messy. What I've done is:
Line_Breaker: Broken down events based on the Timestamp/ Previous line of first matching line with 'at'/ Last matching line with 'at'
BREAK_ONLY_BEFORE: Timestamp/ Previous line of first matching line with 'at'
MUST_NOT_BREAK_AFTER: Previous line of first matching line with 'at'
MUST_BREAK_AFTER: Last matching line with 'at'
Any suggestion on what I've done wrong please.

 

SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)(?:(\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\])|(\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?!at).*)\n(?:\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?=at).*)|(\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?=at).*)\n(?:\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?!at).*)
NO_BINARY_CHECK=true
MUST_NOT_BREAK_AFTER=^(\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?!at).*)\n(?:\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?=at).*)
BREAK_ONLY_BEFORE=^(\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\])|^(\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?!at).*)\n(?:\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?=at).*)
MUST_BREAK_AFTER=^(\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?=at).*)\n(?:\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?!at).*)

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Have you try my example? Based on your example events, it works on my test env.
0 Karma

Thulasinathan_M
Contributor

Thanks @isoutamo. It's Working as expected with sample inputs. But I can't rely on below pattern because it could be anything. Only thing I could make sure is 'at' followed by path to the file(Stacktraces).

([\w\.]+:|Exception)
0 Karma

isoutamo
SplunkTrust
SplunkTrust

Ok. I just look that your log events contains <Log level>: or <java class>: or that Exception key words which are defining  a start of new individual event.

You could try to get correct regex to match this on Splunk Slack #regex channel https://splunk-usergroups.slack.com/archives/C3WFE5V5G

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

Based on your example you could try something like this. If/when needed you should add those keywords (into LINE_BREAKER) which are starting a new event here

 

[<your sourcetype>]
SHOULD_LINEMERGE=true
LINE_BREAKER=([\r\n]+)\[\(\d{4}-\d\d-\d\d \d\d:\d\d:\d\d\)\]\s([\w\.]+:|Exception)
NO_BINARY_CHECK=true
TIME_FORMAT=%F %T
TIME_PREFIX=^\[\(
MAX_TIMESTAMP_LOOKAHEAD=20

 

Usually it's better if you could avoid those BREAK_ONLY_BEFORE, UST_NOT_BREAK_AFTER and MUST_BREAK_AFTER. Those are working, but those use more resources than using only LINE_BREAKER definition.

r. Ismo

Get Updates on the Splunk Community!

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Your Next Big Security Credential: No Prerequisites Needed We know you’ve got the skills, and now, earning the ...

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code

This is the sixth post in the Splunk Observability Cloud’s AI Assistant in Action series that digs into how to ...

Splunk Answers Content Calendar, July Edition I

Hello Community! Welcome to another month of Community Content Calendar series! For the month of July, we will ...