Splunk Enterprise

Splunk log ingest help, one event is being split up into multiple events

danroberts
Explorer

Hello, 

I have just started to ingest some log files that are split up by lines e.g. -------- however for some reason Splunk is splitting the one log file into multiple events, can someone help me figure this out?

example log attached.

My input file is currently set as:

[monitor://C:\ProgramData\XXX\XXX\CaseManagement*.log]
disabled = 0
interval = 60
index = XXXXlogs
sourcetype = jlogs

Do I need a props file and if so what do I put in it?

Labels (2)
0 Karma

deepakc
Builder

This one is a bit tricky, but the below should get you started.

Splunk is going into auto mode to determine the what it thinks on how to split the log into events, as this is log is not like the normal logs with Date and a line of information say for arguments sake (they come in shapes and sizes, and you normally want well formatted logs) anyway you have to create a custom props and transforms.conf file.

Create the below props and transforms for the sourcetype, this should get you started at least and you will have to make tweaks.

It looks like you have redacted some of the lines with XXX..., so you may need to tweak the regex in transforms with the words as they look like extra header type of information, that you don’t want.

The main thing with this kind of log it as its multi line, so we need merge it. 

props.conf

[jlogs]
TIME_PREFIX = Job\sCompleted\sat:
TIME_FORMAT = %d/%m/%Y %H:%M:%S
BREAK_ONLY_BEFORE =Job\sCompleted\sat:
MUST_BREAK_AFTER =local\stime([\r\n]+)
MAX_TIMESTAMP_LOOKAHEAD = 25
SHOULD_LINEMERGE=true
TRUNCATE = 5000
NO_BINARY_CHECK = 1
KV_MODE = auto
#Remove unwanted headers or data
TRANSFORMS-null = remove_unwanted_data_from_jlog

transforms.conf

[remove_unwanted_data_from_jlog]
REGEX=^(?:X*|-+)\s
DEST_KEY = queue
FORMAT = nullQueue

There's a whole load of settings to help you with understanding this this config

https://docs.splunk.com/Documentation/SplunkCloud/latest/Data/Configureeventlinebreaking

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

@danrobertsContrary to the popular saying, here a snippet of (properly formatted) text is often worth a thousand pictures. Definitely a data sample in text form is easier to deal with than a screenshot.

@deepakcYour general idea is relatively ok but it's best to avoid line-merging whenever possible (it's relatively "heavy" performance-wise). So instead of enabling line merging it would be better to find some static part which can be always matched as the event boundary.

Also the TRUNCATE setting might be too low.

So the question to @danroberts is where exactly the event starts/ends and how "flexible" the format is, especially regarding the timestamp position.

Also remember that any additional "clearing" (by removing the lines of dashes which might or might not be desirable - in some cases we want to preserve the event in its original form due to compliance reasons regardless of extra license usage) comes after line breaking and timestamp recognition.

Edit: oh, and KV_MODE also should rather be not set to auto (even if it was kv-parsesble, it should be set statically to something instead of auto; as a rule of thumb you should not make Splunk guess).

danroberts
Explorer

Apologies I have pasted the log below and just changed the words, hopefully this is easier to work with?

The log file starts at "Software Version....." and always ends with the below line at the bottom of the log "software Completed at 10/05/2024 09:00:06 local time" 

Software Version 7.0.1890.0 on server.server.net
Entry 6828 starting at 10/05/2024 09:00:01
Starting via software on CustomerDomain
------------------------------------------------------------
Software Version 7.0.1890.0 on sql002
Entry 6828 starting at 10/05/2024 09:00:01
Submitted by software Autosubmit at 10/05/2024 08:00:04
Executing as company\account
Starting via software on CustomerDomain
Process ID XXXXX
------------------------------------------------------------
Activity: Preparing modules for first use. Current Operation: Status Description:

Name Used (GB) Free (GB) Provider Root CurrentLocation
---- --------- --------- -------- ---- ---------------
JD software company.company.net
2024-05-10T09:00:05.000Z | INFO | ba9992e7-1681-49b9-b984-711c34f89f4c | SQL002 | file| ICOMcheckfilearrival | Checking for arrival of new file

2024-05-10T09:00:06.000Z | INFO | ba9992e7-1681-49b9-b984-711c34f89f4c | SQL002 | file | ICOMcheckfilearrival | New File has been received.

2024-05-10T09:00:06.000Z | INFO | ba9992e7-1681-49b9-b984-711c34f89f4c | SQL002 | file | ICOMcheckfilearrival | Sync File has been received.

 

------------------------------------------------------------
Job Completed at: 10/05/2024 09:00:06
Elapsed Time: 00:00:04.2499362
Kernel mode CPU Time: 00:00:00.5468750
User mode CPU Time: 00:00:00.9531250
Read operation count: 2185
Write operation count: 73
Other operation count: 15510
Read byte count: 5156432
Write byte count: 1688
Other byte count: 205934
Total page faults: 36072
Total process count: 0
Peak process memory: 78073856
Peak job memory: 85004288
------------------------------------------------------------

------------------------------------------------------------
Final Status Code: 0, Severity: Success
Final Status: The operation completed successfully
------------------------------------------------------------
software Completed at 10/05/2024 09:00:06 local time

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Hmm... This is just a single event?

You can's use the starting string to break the events because it appears in the middle of the event as well. So you'd have to go for something like

[jlogs]

#This one assumes that this is _the_ timestamp for the event.
#Otherwise it needs to be changed to match appropriate part of the event
TIME_PREFIX = Entry\s+\d+\s+starting\sat

#Watch out, this might get messy since you don't have timezone info!
TIME_FORMAT = %d/%m/%Y %H:%M:%S

#This needs to be relatively big (might need tweaking) since the timestamp is
#relatively far down the event's contents
MAX_TIMESTAMP_LOOKAHEAD = 200

#Don't merge lines. It's a performance killer
SHOULD_LINEMERGE=false

#Might need increasing if your events get truncated
TRUNCATE = 10000

NO_BINARY_CHECK = 1

#It's not a well-formed known data format
KV_MODE = none

#We know that each event ends with a line saying "software Completed..."
LINE_BREAKER=(?:[\r\n]+)software\sCompleted\sat\s[^\r\n]+\slocal time([\r\n]+)

#We need the same settings as non-intuitively named EVENT_BREAKER because you
#want the UFs to split your data into chunks in proper places
EVENT_BREAKER=(?:[\r\n]+)software\sCompleted\sat\s[^\r\n]+\slocal time([\r\n]+)
EVENT_BREAKER_ENABLE=true

You should put this in props.conf on both your receiving indexer(s)/HF(s) and on your UF ingesting the file.

deepakc
Builder

@PickleRick “Hear, hear!”  : -) The option's are better, @danroberts  go with PickRick's, same results but his is more efficient.   

0 Karma

deepakc
Builder

So try this - think of this as version 1.0, its a bit of trial and error until you get it right. Tip - Its's always good practise to place the data into a test index first, to get it all working, once good move to a production index, just change the inputs.conf post testing/props dev work to prod index.   

props.conf 

[jlogs]
TIME_PREFIX = Entry\s\d+\sstarting\sat
TIME_FORMAT = %d/%m/%Y %H:%M:%S
BREAK_ONLY_BEFORE =([\r\n]+)\.net
MUST_BREAK_AFTER =local\stime([\r\n]+)
MAX_TIMESTAMP_LOOKAHEAD = 50
SHOULD_LINEMERGE=true
TRUNCATE = 10000
NO_BINARY_CHECK = 1
KV_MODE = auto

#Remove unwanted headers or data
#This config is no longer needed - left for reference
#TRANSFORMS-null = remove_unwanted_data_from_jlog

 

See what this looks like. 

0 Karma
Get Updates on the Splunk Community!

Index This | I’m short for "configuration file.” What am I?

May 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with a Special ...

New Articles from Academic Learning Partners, Help Expand Lantern’s Use Case Library, ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Your Guide to SPL2 at .conf24!

So, you’re headed to .conf24? You’re in for a good time. Las Vegas weather is just *chef’s kiss* beautiful in ...