Splunk Search

How to parse events before indexing?

Thulasinathan_M
Contributor

Hi Splunk Experts,
I want to break all lines as a single Line event [\r\n]. But if there are logs with stacktrace I want to consider them as multi-line event. 
I've tested below regex and it works as expected, but I'm not sure on, what are the properties I should apply them for a sourcetype. This is for an application which logs millions of event in a minute. Please assist me with an optimized solution.

(.*[\n]((.*\)\])?(\s+at.*\)\n))+)

Sample logs:

 

 

[(2023-08-03 10:00:03)] INFO: Request completed successfully.
[(2023-08-03 10:00:03)] ERROR: Request got failed.
[(2023-08-03 10:00:02)] Exception in thread "main" java.lang.NullPointerException
	at com.example.MyClass.method1(MyClass.java:12)
	at com.example.MyClass.method2(MyClass.java:34)
	at com.example.AnotherClass.someMethod(AnotherClass.java:56)
	at com.example.Main.main(Main.java:23)
[(2023-08-03 10:00:03)] INFO: Request Submitted successfully.
[(2023-08-03 10:00:03)] INFO: Request completed successfully.
[(2023-08-03 10:00:03)] WARN: Request failed unsuccessfully.
[(2023-08-03 10:00:02)] java.io.FileNotFoundException: file.txt (No such file or directory)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.open0(Native Method)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.<init>(FileInputStream.java:112)
[(2023-08-03 10:00:02)]	at com.example.FileDemo.readFromFile(FileDemo.java:55)
[(2023-08-03 10:00:02)]	at com.example.Main.main(Main.java:12)
[(2023-08-03 10:00:03)] INFO: Request completed successfully.
[(2023-08-03 10:00:04)] DEBUG: Processing request: /api/v1/data?id=67890
[(2023-08-03 10:00:03)] WARN: Request failed unsuccessfully. java.lang.IllegalArgumentException: Invalid input: negative value not allowed
[(2023-08-03 10:00:02)]	at com.example.MathUtils.squareRoot(MathUtils.java:42)
[(2023-08-03 10:00:02)]	at com.example.Main.main(Main.java:33)
[(2023-08-03 10:00:02)] ERROR: Failed to fetch data from the database.

 

 


Expected First Multi-Line Event:

 

 

[(2023-08-03 10:00:02)] Exception in thread "main" java.lang.NullPointerException
	at com.example.MyClass.method1(MyClass.java:12)
	at com.example.MyClass.method2(MyClass.java:34)
	at com.example.AnotherClass.someMethod(AnotherClass.java:56)
	at com.example.Main.main(Main.java:23)

 

 


Expected Second Multi-Line Event:

 

 

[(2023-08-03 10:00:02)] java.io.FileNotFoundException: file.txt (No such file or directory)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.open0(Native Method)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.<init>(FileInputStream.java:112)
[(2023-08-03 10:00:02)]	at com.example.FileDemo.readFromFile(FileDemo.java:55)
[(2023-08-03 10:00:02)]	at com.example.Main.main(Main.java:12)

 

 


Expected Third Multi-Line Event:

 

 

[(2023-08-03 10:00:03)] WARN: Request failed unsuccessfully. java.lang.IllegalArgumentException: Invalid input: negative value not allowed
[(2023-08-03 10:00:02)]	at com.example.MathUtils.squareRoot(MathUtils.java:42)
[(2023-08-03 10:00:02)]	at com.example.Main.main(Main.java:33)

 

 

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Essentially, you want your events to break at a newline followed by a timestamp?

What do you currently have configured?

0 Karma

Thulasinathan_M
Contributor

Hi @ITWhisperer,

Currently the configuration is below

CHARSET=UTF-8
SHOULD_LINEMERGE=true
disabled=false
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true

 

0 Karma

inventsekar
SplunkTrust
SplunkTrust

you have  a single file, but you want to do multiple line breaking in that single file... is that right?

0 Karma

Thulasinathan_M
Contributor

Hi @inventsekar 
Yes that's correct!! I want to break each & every lines as a single line event, but if there are any logs met above condition mentioned in the post, then I want to wrap those logs as a multi-line event and this should be applicable to all files under my sourcetype.

0 Karma

inventsekar
SplunkTrust
SplunkTrust

Pls check this https://community.splunk.com/t5/Getting-Data-In/Line-break-with-multiple-Linebreaker/m-p/400335

the values for BREAK_ONLY_BEFORE, MUST_NOT_BREAK_AFTER, MUST_BREAK_AFTER should be updated for your requirement properly..  

rough one.. 

BREAK_ONLY_BEFORE=^Exception | ^java.io.FileNotFoundException | ^WARN
MUST_NOT_BREAK_AFTER=something here
MUST_BREAK_AFTER=something here

 

0 Karma

Thulasinathan_M
Contributor

Thanks for the pointers, I've come-up with below config, but still it's not working and it looks bit messy. What I've done is:
Line_Breaker: Broken down events based on the Timestamp/ Previous line of first matching line with 'at'/ Last matching line with 'at'
BREAK_ONLY_BEFORE: Timestamp/ Previous line of first matching line with 'at'
MUST_NOT_BREAK_AFTER: Previous line of first matching line with 'at'
MUST_BREAK_AFTER: Last matching line with 'at'
Any suggestion on what I've done wrong please.

 

SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)(?:(\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\])|(\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?!at).*)\n(?:\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?=at).*)|(\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?=at).*)\n(?:\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?!at).*)
NO_BINARY_CHECK=true
MUST_NOT_BREAK_AFTER=^(\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?!at).*)\n(?:\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?=at).*)
BREAK_ONLY_BEFORE=^(\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\])|^(\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?!at).*)\n(?:\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?=at).*)
MUST_BREAK_AFTER=^(\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?=at).*)\n(?:\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?!at).*)

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Have you try my example? Based on your example events, it works on my test env.
0 Karma

Thulasinathan_M
Contributor

Thanks @isoutamo. It's Working as expected with sample inputs. But I can't rely on below pattern because it could be anything. Only thing I could make sure is 'at' followed by path to the file(Stacktraces).

([\w\.]+:|Exception)
0 Karma

isoutamo
SplunkTrust
SplunkTrust

Ok. I just look that your log events contains <Log level>: or <java class>: or that Exception key words which are defining  a start of new individual event.

You could try to get correct regex to match this on Splunk Slack #regex channel https://splunk-usergroups.slack.com/archives/C3WFE5V5G

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

Based on your example you could try something like this. If/when needed you should add those keywords (into LINE_BREAKER) which are starting a new event here

 

[<your sourcetype>]
SHOULD_LINEMERGE=true
LINE_BREAKER=([\r\n]+)\[\(\d{4}-\d\d-\d\d \d\d:\d\d:\d\d\)\]\s([\w\.]+:|Exception)
NO_BINARY_CHECK=true
TIME_FORMAT=%F %T
TIME_PREFIX=^\[\(
MAX_TIMESTAMP_LOOKAHEAD=20

 

Usually it's better if you could avoid those BREAK_ONLY_BEFORE, UST_NOT_BREAK_AFTER and MUST_BREAK_AFTER. Those are working, but those use more resources than using only LINE_BREAKER definition.

r. Ismo

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...