Getting Data In

Dealing with OMG huge events

mikelanghorst
Motivator

I've got a few log4j application logs that can get extremely long when my developers decide to dump out message payloads into the log. Similiar to a large stack trace it's a many line event, and can be several hundred lines (at least that, haven't counted exact numbers yet). I've had issues with these events being broken into multiple event when they shouldn't be. I've set BREAK_ONLY_BEFORE, and MAX_EVENTS = 3000, but I'm still seeing the events broken up.

Realistically at some point the extra data in this event is simply unnecessary, and shouldn't be even included as INFO level messages. Is there a way to simply dump the data after a certain length?

Tags (2)

landen99
Motivator

Try this at searchtime:

| rex mode=sed "s/([\r\n].{1,100}).*/\1/g"

0 Karma

jspears
Communicator

For some kinds of events, you can readily determine what you want to throw away and do that with SEDCMD. Here's mine for Windows events that include egregious amounts of information after the actual event data:

SEDCMD-windows = s/This event is generated.+$//g

TRUNCATE sounds like an excellent option if you know a specific length where you want to start throwing out irrelevant data.

kbains
Splunk Employee
Splunk Employee

If you really REALLY want it as one event, you can use TRUNCATE=0. Just be warned that pulling back lots (>100,000) of large events (>1MB) will cause your browser to use a lot of memory and may even cause it to crash.

mikelanghorst
Motivator

Hmm, I'd been concerned with the performance hit on the server, not even thinking about the client.

There's not a good break point in the data, that wouldn't just be a fragment of the data and orphaned off.

For this data 99% of the time default handling works great for sourcetype=log4j, just they decided upon error to log the data to the standard server.log file, when it should really be dumped elsewhere, or not logged at all.

0 Karma

mikelanghorst
Motivator

Talking with dwaddle in #splunk, he's suggested using LINE_BREAKER, and increasing TRUNCATE to a large number rather than BREAK_ONLY_BEFORE and MAX_EVENTS. Gonna give that a shot instead, but still wondering if/how other users are dealing with really large event sets such as this.

mikelanghorst
Motivator

Works better, but I'm still getting messages broken where they shouldn't be.

0 Karma

mikelanghorst
Motivator

Quick check of the current offender, event count is about 30-35k lines...

What type of impact would setting MAX_EVENTS to like 40000 have on the indexers?

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...