Getting Data In

How do I edit my props.conf for proper line breaking of my sample multiline event data?

michael_sleep
Communicator

Hello once again. Working with a distributed environment (Universal Forwarder > Heavy Forwarder > Indexer) I have a particular log file that writes a timestamp every line, though the "event" itself is actually a group of these lines that can be delimited by a specific line that appears. Here is what a couple events in the log file look like:

[2016-04-11T01:00:10][D00034][T00041][Info] ------------------[7.3.1.18]-----------------------
[2016-04-11T01:00:10][D00034][T00041][Info] Processing...
[2016-04-11T01:00:10][D00034][T00041][Info] Retries disabled
[2016-04-11T01:00:10][D00034][T00041][Info] Company: xxxxxxxxxxxx
[2016-04-11T01:00:10][D00034][T00041][Info] Filename: xxxxxxxxxxxx
[2016-04-11T01:00:10][D00034][T00041][Info] RenderFormat: CSV
[2016-04-11T01:00:10][D00034][T00041][Info] domain_id: 2
[2016-04-11T01:00:10][D00034][T00041][Info] user_id: 27090
[2016-04-11T01:00:10][D00034][T00041][Info] template_id: 2100
[2016-04-11T01:00:10][D00034][T00041][Info] rdl: /xxxxxx/xxxxxxxxxxxx/xxxxxxxxxxxx
[2016-04-11T01:00:10][D00034][T00041][Info] schedule_id: 1096
[2016-04-11T01:00:10][D00034][T00041][Info] lang_id: en_US
[2016-04-11T01:00:10][D00034][T00041][Info] result_name: xxxxxx
[2016-04-11T01:00:10][D00034][T00041][Info] result_description: xxxxxxx
[2016-04-11T01:00:10][D00034][T00041][Info] queued_result_id: 0
[2016-04-11T01:00:10][D00034][T00041][Info] metrics_smdb_key: 0
[2016-04-11T01:00:10][D00034][T00041][Info] schedule_type: 0
[2016-04-11T01:00:10][D00034][T00041][Info] xxxxxxxxxxxx : succeeded. Result ID=18226
[2016-04-11T01:00:10][D00034][T00041][Info] xxxxxxxxxxxx is: disabled
[2016-04-11T01:00:10][D00034][T00041][Info] xxxxxxxxxxxx : succeeded
[2016-04-11T01:00:10][D00034][T00041][Info] xxxxxxxxxxxx : succeeded
[2016-04-11T01:00:10][D00034][T00041][Info] xxxxxxxxxxxx : Rendering Report (CSV)
[2016-04-11T01:06:10][D00034][T00041][Info] xxxxxxxxxxxx : Rendering Complete
[2016-04-11T01:06:10][D00034][T00041][Info] xxxxxxxxxxxx : succeeded (1 rows)
[2016-04-11T01:06:10][D00034][T00041][Info] Post processing CSV...
[2016-04-11T01:06:10][D00034][T00041][Info] templateName=xxxxxxxxxxxxr
[2016-04-11T01:06:10][D00034][T00041][Info] xxxxxxxxxxxx : succeeded (16 rows)
[2016-04-11T01:06:10][D00034][T00041][Info] xxxxxxxxxxxx : succeeded (63 rows)
[2016-04-11T01:06:10][D00034][T00041][Info] Metadata updated with user stat display names: 16
[2016-04-11T01:06:10][D00034][T00041][Info] Collating 'required' fields
[2016-04-11T01:06:10][D00034][T00041][Info] Collating 20 custom fields
[2016-04-11T01:06:10][D00034][T00041][Info] Creating processed csv stream (Unicode (UTF-8))
[2016-04-11T01:06:10][D00034][T00041][Info] CSV Processing Complete
[2016-04-11T01:06:11][D00034][T00041][Info] checkDirectoryExists: Full path found (\\xxxxxxxxxxxx\)
[2016-04-11T01:06:11][D00034][T00041][Info] createUniqueFile: Created file: xxxxxxxxxxxx.csv
[2016-04-11T01:06:11][D00034][T00041][Info] xxxxxxxxxxxx: Report created: \\xxxxxxxxxxxx\xxxxxxxxxxxx\xxxxxxxxxxxx.csv
[2016-04-11T01:06:11][D00034][T00041][Info] xxxxxxxxxxxx: File size: 317 bytes
[2016-04-11T01:06:11][D00034][T00041][Info] xxxxxxxxxxxx : succeeded
[2016-04-11T01:06:11][D00034][T00041][Info] DeliverToEmail: Email sent
[2016-04-11T01:06:11][D00034][T00041][Info] xxxxxxxxxxxx : succeeded
[2016-04-11T01:06:11][D00034][T00041][Info] Completed
[2016-04-11T01:06:11][D00034][T00041][Info] Exiting: true
[2016-04-11T01:46:08][D00034][T00038][Info] ------------------[7.3.1.18]-----------------------
[2016-04-11T01:46:08][D00034][T00038][Info] Processing...
[2016-04-11T01:46:08][D00034][T00038][Info] Retries disabled
[2016-04-11T01:46:08][D00034][T00038][Info] Company: xxxxxxxxxxxx
[2016-04-11T01:46:08][D00034][T00038][Info] Filename: xxxxxxxxxxxx
[2016-04-11T01:46:08][D00034][T00038][Info] RenderFormat: CSV
[2016-04-11T01:46:08][D00034][T00038][Info] domain_id: 2
[2016-04-11T01:46:08][D00034][T00038][Info] user_id: 3599
[2016-04-11T01:46:08][D00034][T00038][Info] template_id: 5051
[2016-04-11T01:46:08][D00034][T00038][Info] rdl: xxxxxxxxxxxx
[2016-04-11T01:46:08][D00034][T00038][Info] schedule_id: 4211
[2016-04-11T01:46:08][D00034][T00038][Info] lang_id: en_US
[2016-04-11T01:46:08][D00034][T00038][Info] result_name: xxxxxxxxxxxx
[2016-04-11T01:46:08][D00034][T00038][Info] result_description: xxxxxxxxxxxx
[2016-04-11T01:46:08][D00034][T00038][Info] queued_result_id: 5541
[2016-04-11T01:46:08][D00034][T00038][Info] metrics_smdb_key: 0
[2016-04-11T01:46:08][D00034][T00038][Info] schedule_type: 3
[2016-04-11T01:46:08][D00034][T00038][Info] xxxxxxxxxxxx is: disabled
[2016-04-11T01:46:08][D00034][T00038][Info] Verifying Queued result exists: ResultID=5541
[2016-04-11T01:46:08][D00034][T00038][Info] xxxxxxxxxxxx : succeeded
[2016-04-11T01:46:08][D00034][T00038][Info] xxxxxxxxxxxx : succeeded
[2016-04-11T01:46:08][D00034][T00038][Info] xxxxxxxxxxxx : succeeded
[2016-04-11T01:46:08][D00034][T00038][Info] xxxxxxxxxxxx : Rendering Report (CSV)
[2016-04-11T01:46:36][D00034][T00038][Info] xxxxxxxxxxxx : Rendering Complete
[2016-04-11T01:46:36][D00034][T00038][Info] xxxxxxxxxxxx : succeeded (1 rows)
[2016-04-11T01:46:36][D00034][T00038][Info] Post processing CSV...
[2016-04-11T01:46:36][D00034][T00038][Info] templateName=xxxxxxxxxxxx
[2016-04-11T01:46:36][D00034][T00038][Info] xxxxxxxxxxxx : succeeded (0 rows)
[2016-04-11T01:46:36][D00034][T00038][Info] xxxxxxxxxxxx : succeeded (14 rows)
[2016-04-11T01:46:36][D00034][T00038][Info] User stat fields not required
[2016-04-11T01:46:36][D00034][T00038][Info] Collating 'required' fields
[2016-04-11T01:46:36][D00034][T00038][Info] Collating 9 custom fields
[2016-04-11T01:46:36][D00034][T00038][Info] Creating processed csv stream (Unicode (UTF-8))
[2016-04-11T01:46:36][D00034][T00038][Info] CSV Processing Complete
[2016-04-11T01:46:36][D00034][T00038][Info] checkDirectoryExists: Full path found (\\xxxxxxxxxxxx\xxxxxxxxxxxx\)
[2016-04-11T01:46:36][D00034][T00038][Info] createUniqueFile: Created file: xxxxxxxxxxxx.csv
[2016-04-11T01:46:36][D00034][T00038][Info] xxxxxxxxxxxx: Report created: \\xxxxxxxxxxxx\xxxxxxxxxxxx.csv
[2016-04-11T01:46:36][D00034][T00038][Info] xxxxxxxxxxxx: File size: 14,833,025 bytes
[2016-04-11T01:46:36][D00034][T00038][Info] xxxxxxxxxxxx : succeeded
[2016-04-11T01:46:36][D00034][T00038][Info] Completed
[2016-04-11T01:46:36][D00034][T00038][Info] Exiting: true

The start of every event should be this line:

[2016-04-11T01:00:10][D00034][T00041][Info] ------------------[7.3.1.18]-----------------------

To accomplish this goal, I've set the following configuration for the sourcetype on the heavy forwarder in props.conf:

SHOULD_LINEMERGE=true
BREAK_ONLY_BEFORE=^(\[\d+\-\d+\-\d+\w+:\d+:\d+\]\[\w+\]\[\w+]\[\w+\].\-{18}\[.*)
BREAK_ONLY_BEFORE_DATE=false
pulldown_type=1
category=Custom
MAX_DIFF_SECS_HENCE=2147483646
MAX_DAYS_HENCE=7
MAX_EVENTS=1024
TRUNCATE=10000
MAX_TIMESTAMP_LOOKAHEAD=22

This is only half working... it does in fact always break on this line:

[2016-04-11T01:00:10][D00034][T00041][Info] ------------------[7.3.1.18]-----------------------

But it is also breaking on -these- lines:

[2016-04-11T01:00:10][D00034][T00041][Info] xxxxxxxxxxxx : Rendering Report (CSV)
[2016-04-11T01:06:10][D00034][T00041][Info] xxxxxxxxxxxx : Rendering Complete

If the timestamps are close enough together, it will not break. If they a difference in time apart, it will break, regardless of whether the line meets the BREAK_ONLY_BEFORE regex. I ran a props list through the btool on the heavy forwarder... these are the settings being picked up:

ANNOTATE_PUNCT = True
AUTO_KV_JSON = true
BREAK_ONLY_BEFORE = ^(\[\d+\-\d+\-\d+\w+:\d+:\d+\]\[\w+\]\[\w+]\[\w+\].\-{18}\[.*)
BREAK_ONLY_BEFORE_DATE = false
CHARSET = AUTO
DATETIME_CONFIG = \etc\datetime.xml
HEADER_MODE = 
LEARN_SOURCETYPE = true
LINE_BREAKER_LOOKBEHIND = 100
MAX_DAYS_AGO = 2000
MAX_DAYS_HENCE = 7
MAX_DIFF_SECS_AGO = 3600
MAX_DIFF_SECS_HENCE = 2147483646
MAX_EVENTS = 1024
MAX_TIMESTAMP_LOOKAHEAD = 22
MUST_BREAK_AFTER = 
MUST_NOT_BREAK_AFTER = 
MUST_NOT_BREAK_BEFORE = 
SEGMENTATION = indexing
SEGMENTATION-all = full
SEGMENTATION-inner = inner
SEGMENTATION-outer = outer
SEGMENTATION-raw = none
SEGMENTATION-standard = standard
SHOULD_LINEMERGE = true
TRANSFORMS = 
TRUNCATE = 100000
category = Custom
detect_trailing_nulls = auto
maxDist = 100
priority = 
pulldown_type = 1
sourcetype = 

Any idea why these events are still breaking in the middle?

0 Karma

michael_sleep
Communicator

I just had a thought that maybe someone would like to weigh in on... no matter how I configure the props.conf, the events, for the most part, seem to always break the same way. I can't find any rhyme or reason to it. I've looked at my regex, thinking maybe it was too greedy... I changed it to something more limited:

([\d+-\d+-\d+\w+:\d+:\d+][\w+][\w+][\w+].-{18})

I've even taken the regex out and changed several other settings... while this affects things in a minor way, the breaking actually works more or less the same. Thinking it was timestamp related, I set DATETIME_CONFIG to NONE... and it still doesn't work (though I can see the setting is making the data offset with Splunk's indexed time by a a few seconds).

How is this for a theory... it occurred to me that maybe it's not the timestamps causing the issue but the timestamps are more indicative of when the file has been opened/closed by the app writing the log... what if the app is constantly writing it "End of File" everytime it adds a block of log data? How would the Splunk heavy forwarder handle this information? How would it join these events, if it keeps seeing an end of file for the log? Just a theory... that maybe this is getting broken in some way that isn't related at all to the property settings?

0 Karma

somesoni2
Revered Legend

Give this a try (props.conf on heavy forwarder)

[yoursourcetype]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)\[\d+\-\d+\-\d+\w+:\d+:\d+\]\[\w+\]\[\w+]\[\w+\].-{18}
TIME_PREFIX=^\[
TIME_FORMAT=%Y-%m-%dT%H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD=19
pulldown_type=1
category=Custom
MAX_DIFF_SECS_HENCE=2147483646
MAX_DAYS_HENCE=7
MAX_EVENTS=1024
TRUNCATE=10000
0 Karma

michael_sleep
Communicator

This didn't work unfortunately... still looks like the attached.

alt text

Something that might be interesting though is that I don't have this problem if I manually upload log data onto the indexer to test it with the parameters I've been using. It seems to break the events correctly... so maybe this is some kind of bug?

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...