Splunk Search

How to parse events before indexing?

Thulasinathan_M
Contributor

Hi Splunk Experts,
I want to break all lines as a single Line event [\r\n]. But if there are logs with stacktrace I want to consider them as multi-line event. 
I've tested below regex and it works as expected, but I'm not sure on, what are the properties I should apply them for a sourcetype. This is for an application which logs millions of event in a minute. Please assist me with an optimized solution.

(.*[\n]((.*\)\])?(\s+at.*\)\n))+)

Sample logs:

 

 

[(2023-08-03 10:00:03)] INFO: Request completed successfully.
[(2023-08-03 10:00:03)] ERROR: Request got failed.
[(2023-08-03 10:00:02)] Exception in thread "main" java.lang.NullPointerException
	at com.example.MyClass.method1(MyClass.java:12)
	at com.example.MyClass.method2(MyClass.java:34)
	at com.example.AnotherClass.someMethod(AnotherClass.java:56)
	at com.example.Main.main(Main.java:23)
[(2023-08-03 10:00:03)] INFO: Request Submitted successfully.
[(2023-08-03 10:00:03)] INFO: Request completed successfully.
[(2023-08-03 10:00:03)] WARN: Request failed unsuccessfully.
[(2023-08-03 10:00:02)] java.io.FileNotFoundException: file.txt (No such file or directory)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.open0(Native Method)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.<init>(FileInputStream.java:112)
[(2023-08-03 10:00:02)]	at com.example.FileDemo.readFromFile(FileDemo.java:55)
[(2023-08-03 10:00:02)]	at com.example.Main.main(Main.java:12)
[(2023-08-03 10:00:03)] INFO: Request completed successfully.
[(2023-08-03 10:00:04)] DEBUG: Processing request: /api/v1/data?id=67890
[(2023-08-03 10:00:03)] WARN: Request failed unsuccessfully. java.lang.IllegalArgumentException: Invalid input: negative value not allowed
[(2023-08-03 10:00:02)]	at com.example.MathUtils.squareRoot(MathUtils.java:42)
[(2023-08-03 10:00:02)]	at com.example.Main.main(Main.java:33)
[(2023-08-03 10:00:02)] ERROR: Failed to fetch data from the database.

 

 


Expected First Multi-Line Event:

 

 

[(2023-08-03 10:00:02)] Exception in thread "main" java.lang.NullPointerException
	at com.example.MyClass.method1(MyClass.java:12)
	at com.example.MyClass.method2(MyClass.java:34)
	at com.example.AnotherClass.someMethod(AnotherClass.java:56)
	at com.example.Main.main(Main.java:23)

 

 


Expected Second Multi-Line Event:

 

 

[(2023-08-03 10:00:02)] java.io.FileNotFoundException: file.txt (No such file or directory)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.open0(Native Method)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
[(2023-08-03 10:00:02)]	at java.base/java.io.FileInputStream.<init>(FileInputStream.java:112)
[(2023-08-03 10:00:02)]	at com.example.FileDemo.readFromFile(FileDemo.java:55)
[(2023-08-03 10:00:02)]	at com.example.Main.main(Main.java:12)

 

 


Expected Third Multi-Line Event:

 

 

[(2023-08-03 10:00:03)] WARN: Request failed unsuccessfully. java.lang.IllegalArgumentException: Invalid input: negative value not allowed
[(2023-08-03 10:00:02)]	at com.example.MathUtils.squareRoot(MathUtils.java:42)
[(2023-08-03 10:00:02)]	at com.example.Main.main(Main.java:33)

 

 

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Essentially, you want your events to break at a newline followed by a timestamp?

What do you currently have configured?

0 Karma

Thulasinathan_M
Contributor

Hi @ITWhisperer,

Currently the configuration is below

CHARSET=UTF-8
SHOULD_LINEMERGE=true
disabled=false
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true

 

0 Karma

inventsekar
SplunkTrust
SplunkTrust

you have  a single file, but you want to do multiple line breaking in that single file... is that right?

thanks and best regards,
Sekar

PS - If this or any post helped you in any way, pls consider upvoting, thanks for reading !
0 Karma

Thulasinathan_M
Contributor

Hi @inventsekar 
Yes that's correct!! I want to break each & every lines as a single line event, but if there are any logs met above condition mentioned in the post, then I want to wrap those logs as a multi-line event and this should be applicable to all files under my sourcetype.

0 Karma

inventsekar
SplunkTrust
SplunkTrust

Pls check this https://community.splunk.com/t5/Getting-Data-In/Line-break-with-multiple-Linebreaker/m-p/400335

the values for BREAK_ONLY_BEFORE, MUST_NOT_BREAK_AFTER, MUST_BREAK_AFTER should be updated for your requirement properly..  

rough one.. 

BREAK_ONLY_BEFORE=^Exception | ^java.io.FileNotFoundException | ^WARN
MUST_NOT_BREAK_AFTER=something here
MUST_BREAK_AFTER=something here

 

thanks and best regards,
Sekar

PS - If this or any post helped you in any way, pls consider upvoting, thanks for reading !
0 Karma

Thulasinathan_M
Contributor

Thanks for the pointers, I've come-up with below config, but still it's not working and it looks bit messy. What I've done is:
Line_Breaker: Broken down events based on the Timestamp/ Previous line of first matching line with 'at'/ Last matching line with 'at'
BREAK_ONLY_BEFORE: Timestamp/ Previous line of first matching line with 'at'
MUST_NOT_BREAK_AFTER: Previous line of first matching line with 'at'
MUST_BREAK_AFTER: Last matching line with 'at'
Any suggestion on what I've done wrong please.

 

SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)(?:(\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\])|(\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?!at).*)\n(?:\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?=at).*)|(\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?=at).*)\n(?:\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?!at).*)
NO_BINARY_CHECK=true
MUST_NOT_BREAK_AFTER=^(\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?!at).*)\n(?:\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?=at).*)
BREAK_ONLY_BEFORE=^(\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\])|^(\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?!at).*)\n(?:\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?=at).*)
MUST_BREAK_AFTER=^(\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?=at).*)\n(?:\[\(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\)\]\s+(?!at).*)

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Have you try my example? Based on your example events, it works on my test env.
0 Karma

Thulasinathan_M
Contributor

Thanks @isoutamo. It's Working as expected with sample inputs. But I can't rely on below pattern because it could be anything. Only thing I could make sure is 'at' followed by path to the file(Stacktraces).

([\w\.]+:|Exception)
0 Karma

isoutamo
SplunkTrust
SplunkTrust

Ok. I just look that your log events contains <Log level>: or <java class>: or that Exception key words which are defining  a start of new individual event.

You could try to get correct regex to match this on Splunk Slack #regex channel https://splunk-usergroups.slack.com/archives/C3WFE5V5G

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

Based on your example you could try something like this. If/when needed you should add those keywords (into LINE_BREAKER) which are starting a new event here

 

[<your sourcetype>]
SHOULD_LINEMERGE=true
LINE_BREAKER=([\r\n]+)\[\(\d{4}-\d\d-\d\d \d\d:\d\d:\d\d\)\]\s([\w\.]+:|Exception)
NO_BINARY_CHECK=true
TIME_FORMAT=%F %T
TIME_PREFIX=^\[\(
MAX_TIMESTAMP_LOOKAHEAD=20

 

Usually it's better if you could avoid those BREAK_ONLY_BEFORE, UST_NOT_BREAK_AFTER and MUST_BREAK_AFTER. Those are working, but those use more resources than using only LINE_BREAKER definition.

r. Ismo

Get Updates on the Splunk Community!

Splunk Observability Cloud's AI Assistant in Action Series: Auditing Compliance and ...

This is the third post in the Splunk Observability Cloud’s AI Assistant in Action series that digs into how to ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

What You Read The Most: Splunk Lantern’s Most Popular Articles!

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...