Getting Data In

Inconsistent linebreaker behavior

danillopavan
Communicator

Hello all,

I have configured the props file to NOT break the event when encounters a new line with a date, however, sometimes the event is broken in the line containing the date and sometimes the event is not truncated. I don't understand the reason for different behaviors.

Props file:
SHOULD_LINEMERGE=true
BREAK_ONLY_BEFORE_DATE=false
BREAK_ONLY_BEFORE=SOMEJUNK
MAX_TIMESTAMP_LOOKAHEAD=450
TIME_PREFIX==\s+\w{3}
TIME_FORMAT=%m/%d/%y %H:%M:%S %Z

File that is being read:

= JOB : R3BRP#DECOUPLE_NFE[(0006 01/02/18),(0AAAAAAAAAAIO5BE)].CL_S09_IFIPD_DECOUPLE_NFE_R3BRP_01
= USER : tws 631/S/ATHOCO/IBM/AUTOMATION_COORD_HORTOLANDIA/
= JCLFILE : / -job IFIPD_DECOUPLE_NFE -user FF_PRO1 -i 23154800 -c a
= Job Number: 43977410
= Tue 01/02/18 15:50:05 BRST
*
* WARNING 914 *** EEWO0914W An internal error has occurred. Either the joblog or the job protocol for the following job does not exist:
*** WARNING 904 *** EEWO0904W The program could not copy the joblog to stdout.
*** WARNING 914 *** EEWO0914W An internal error has occurred. Either the joblog or the job protocol for the following job does not exist:
= Exit Status : 0
= System Time (Seconds) : 0 Elapsed Time (Minutes) : 0
= User Time (Seconds) : 0
= Tue 01/02/18 15:50:39 BRST

Sometimes I got the multiline event containing the 12 lines, but sometimes the event is truncated like below sample:

= JOB : R3BRP#DECOUPLE_NFE[(0006 01/02/18),(0AAAAAAAAAAIO5BE)].CL_S09_IFIPD_DECOUPLE_NFE_R3BRP_01
= USER : tws 631/S/*ATHOCO/IBM/AUTOMATION_COORD_HORTOLANDIA/
= JCLFILE : / -job IFIPD_DECOUPLE_NFE -user FF_PRO1 -i 23154800 -c a
= Job Number: 35391514
= Tue 01/02/18 15:51:10 BRST

I need to have all entire text log indexed (12 lines) and not only the 5 above lines. Dont know why for sometimes the event is broken in the line date.

Thanks and regard,
Danillo Pavan

0 Karma

danillopavan
Communicator

Hello alemarzu,

Actually the props file was edited in my last post. The correct SEDCMD is:
"SEDCMD-applychange01=s/[\r\n]\s*[A-z]+.+//g"
"SEDCMD-applychange02=s/(**+.)//g"
"SEDCMD-applychange04=s/(+++.
)//g"
"SEDCMD-applychange05=s/(==+[\r\n]*)//g"

It is missing the "\" in the last post.

About the command debug, I executed it and got the below result:

ANNOTATE_PUNCT = True
AUTO_KV_JSON = true
BREAK_ONLY_BEFORE = (= User Time (Seconds) : \d+\n= \w{3} \d{2}\/\d{2}\/\d{2}\d{2}:\d{2}:\d{2} [A-Z]+)
BREAK_ONLY_BEFORE_DATE = True
CHARSET = UTF-8
DATETIME_CONFIG =
EXTRACT-endTimeExecution = ^=\s+\w+\s+:\s+\w+\d+\w+#\w+\w+[(\d+\s+\d+/\d+/\d+),(\d+\w+)].\w+\w+\d+\w+\w+\w+\w+\d+\w+\d+\s+=\s+\w+\s+:\s+\w+\s+\d+/\w+/*\w+/\w+/\w+\w+\w+/\s+=\s+\w+\s+:\s+/\ s+-\w+\s+\w+\w+\w+\s+-\w+\s+\w+\w+\d+\s+-\w+\s+\d+\s+-\w+\s+\w+\s+=\s+\w+\s+\w+:\s+\d+\s+=\s+\w+\s+\d+/\d+/\d+\s+\d+:\ d+:\d+\s+\w+\s+=\s+\w+\s+\w+\s+:\s+\d+\s+=\s+\w+\s+\w+\s+(\w+)\s+:\s+\d+\s+\w+\s+\w+\s+(\w+)\s+:\s+\d+\s+=\s+\w+\s+\w+\s+ (\w+)\s+:\s+\d+\s+=\s+\w+\s+\d+/\d+/\d+\s+(?P[^ ]+)
EXTRACT-jobName = ^=\s+\w+\s+:\s+\w+\d+\w+#\w+\w+[(\d+\s+\d+/\d+/\d+),(\d+\w+)].\w+\w+\d+\w+\w+\w+\w+\d+\w+\d+\s+=\s+\w+\s+:\s+\w+\s+\d+/\w+/*\w+/\w+/\w+\w+\w+/\s+=\s+\w+\s+:\s+/\s+-\w+\s +(?P[^ ]+)
EXTRACT-jobNumber = ^=\s+\w+\s+:\s+\w+\d+\w+#\w+
\w+[(\d+\s+\d+/\d+/\d+),(\d+\w+)].\w+\w+\d+\w+\w+\w+\w+\d+\w+\d+\s+=\s+\w+\s+:\s+\w+\s+\d+/\w+/*\w+/\w+/\w+\w+\w+/\s+=\s+\w+\s+:\s+/\s+-\w+ \s+\w+\w+\w+\s+-\w+\s+\w+_\w+\d+\s+-\w+\s+\d+\s+-\w+\s+\w+\s+=\s+\w+\s+\w+:\s+(?P\d+)
HEADER_MODE =
LEARN_SOURCETYPE = true
LINE_BREAKER = (= User Time (Seconds) : \d+\n= \w{3} \d{2}\/\d{2}\/\d{2}\d{2}:\d{2}:\d{2} [A-Z]+)
LINE_BREAKER_LOOKBEHIND = 100
MAX_DAYS_AGO = 2000
MAX_DAYS_HENCE = 2
MAX_DIFF_SECS_AGO = 3600
MAX_DIFF_SECS_HENCE = 604800
MAX_EVENTS = 256
MAX_TIMESTAMP_LOOKAHEAD = 128
MUST_BREAK_AFTER =
MUST_NOT_BREAK_AFTER =
MUST_NOT_BREAK_BEFORE =
NO_BINARY_CHECK = true
SEDCMD-applychange01 = s/[\r\n]\s*[A-z]+.+//g
SEDCMD-applychange02 = s/(**+.)//g
SEDCMD-applychange04 = s/(+++.
)//g
SEDCMD-applychange05 = s/(==+[\r\n]*)//g
SEGMENTATION = indexing
SEGMENTATION-all = full
SEGMENTATION-inner = inner
SEGMENTATION-outer = outer
SEGMENTATION-raw = none
SEGMENTATION-standard = standard
SHOULD_LINEMERGE = false
TIME_FORMAT = %m/%d/%y %H:%M:%S %Z
TIME_PREFIX = = [A-Z][a-z]+\s
TRANSFORMS =
TRANSFORMS-set = setNullJob,setParsingJob
TRUNCATE = 1000000
detect_trailing_nulls = false
disabled = false
maxDist = 100
priority =
pulldown_type = true
sourcetype =

0 Karma

gjanders
SplunkTrust
SplunkTrust

Perhaps you could try:

SHOULD_LINEMERGE = false
MAX_TIMESTAMP_LOOKAHEAD = 40
TIME_PREFIX=\=\s+\w{3}
TIME_FORMAT=%m/%d/%y %H:%M:%S %Z
LINE_BREAKER = (\n)NOBREAKINGPLEASE

Assuming this file is always 12 lines because the above will never create a break.

skoelpin
SplunkTrust
SplunkTrust

@garethatiag is 100% correct. Once these base configs are applied then it will work correctly.

I try to stay away from the UI onboarding option and just edit props.conf directly.

0 Karma

danillopavan
Communicator

I tried this config and as I said before. It works however for sometimes there are some events which are truncated in the line date. If you execute this configuration via Data preview it works. But for any unknown reason the event is truncated some times (~20% of times)

0 Karma

gjanders
SplunkTrust
SplunkTrust

Can you update your question or post a splunk btool props list --debug ?
Perhaps also include the the transforms.conf as everyone is just guessing your full configuration at this point...
In one post you mentioned a SEDCMD?

0 Karma

danillopavan
Communicator

Hello garethatiag, I have posted all log file, props file and transform file in some posts below yesterday.

How can I execute this debug command on the btool props? I really dont know...

0 Karma

danillopavan
Communicator

Hello garethatiag,

I have included this one also. It seems that it has decreased the number of times the event is being truncated, however is still happening. Sometimes (around 20% of the total of events) are still being truncated in the line date.

0 Karma

danillopavan
Communicator

Maybe it is a important information. When I try to use the Data preview tool, using the same log file and sourcetype, it is showing only the correct lines (without truncate the lines) however it is showing a WARN information:
"Could not use strptime to parse timestamp from ": R3BRP#DECOUPLE_NFE[(0006 01/02/".
"Failed to parse timestamp. Defaulting to file modtime."

0 Karma

gjanders
SplunkTrust
SplunkTrust

If your using the LINE_BREAKER than the TRUNCATE setting should apply based on the amount of data, so you could increase that to avoid truncation, the splunkd log file should have a WARN or ERROR around the time of the issue if this is the case.

If your using the BREAK_ONLY_BEFORE_DATE (the default) then the parsing of the date matters.

0 Karma

lmaclean
Path Finder

Look within the _internal index for the answers and to get at the issue faster use:

These errors are the ones related to TIME_FORMAT or LINE_BREAKER errors:

index=_internal source=*splunkd.log component=DataParserVerbose WARN OR ERROR

For some related to Line Breaking issues:

index=_internal source=*splunkd.log component=LineBreakingProcessor WARN OR ERROR

These are the ones that will be related to MAX_EVENTS (256 Lines by default) & TRUNCATE (10,000 bytes by default) which are some of the top two causes but there are many others...

A good 2016 Splunk .Conf preso (also one in '13 & '14) is the "Jiffy lube quick tune up for you Splunk environment":

https://conf.splunk.com/files/2016/slides/jiffy-lube-quick-tune-up-for-your-splunk-environment.pdf

0 Karma

danillopavan
Communicator

Hello Imaclean, I have executed the both queries ( for the component DataParserVerbose and LineBreakingProcessor ), but didnt find anything.

For the search: index=_internal source=*splunkd.log component=LineBreakingProcessor

and just found some ERROR entries related to the BREAK_ONLY_BEFORE property that I have configured to read entire file, but it happened just few days ago - now i dont have any entry for this search.

"LineBreakingProcessor - Line breaking regex has no capturing groups: somethingjunk"

Executing the below query, didnt return any entry

index=_internal source=*splunkd.log component=DataParserVerbose

The problem is that it is so intermitent, sometimes all entire the file is indexed correctly and sometimes it is truncated in a specific line containing date. I have already used the SEDCMD to replace the data format by a string, but even with this replacement the file is truncated sometimes. So strange...

0 Karma

petercow
Path Finder

Check the _internal index for sourectype "splunkd" where you're indexing. Look for 'ERROR' or 'WARN' for that sourcetype.

0 Karma

danillopavan
Communicator

I have included the property: "TRUNCATE = 0" in props file and still not work. Sometimes the file is truncated.

0 Karma

skoelpin
SplunkTrust
SplunkTrust

Don't do this..

0 Karma

danillopavan
Communicator

Hello petercow, I have executed the below query:

index=_internal source=*splunkd.log component=LineBreakingProcessor

and just found some ERROR entries related to the BREAK_ONLY_BEFORE property that I have configured to read entire file, but it happened just few days ago - now i dont have any entry for this search.

"LineBreakingProcessor - Line breaking regex has no capturing groups: somethingjunk"

Executing the below query, didnt return any entry

index=_internal source=*splunkd.log component=DataParserVerbose

Please let me know if there is any other search command that I could run to try to find out the reasons...

0 Karma

skoelpin
SplunkTrust
SplunkTrust

You should use LINE_BREAKER rather than BREAK_ONLY_BEFORE. You should also set SHOULD_LINEMERGE = false

0 Karma

petercow
Path Finder

So are you saying you get the regex error when you get the bad line-breaking?

0 Karma

danillopavan
Communicator

Some days ago, i was getting this error that I posted saying that it was not encountered the word that I have configured for the property "BREAK_ONLY_BEFORE". Now i am not facing any issue anymore, but the event is getting truncated sometimes incorrectly as I posted in the question.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...