Hi to all,
I've got a log file in which there are many XML messages printed.
One single log message is split into many rows (as you can see from the example below), but I have to merge those rows into a single Splunk event.
I'm on Splunk Enterprise Cluster Environment 6.6.2, and these logs are provided by many Universal Forwarders which sends them to two Heavy Forwarders 6.6.1 (HF) who send the logs to indexer cluster (IDX).
I've tried many props.conf configurations, on HF (BREAK_ONLY_BEFORE, MUST_NOT_BREAK_AFTER, DATETIME_CONFIG, etc...), also on IDX, but Splunk continues to split the event on tag "" given that it finds a timestamp.
== props.conf (on HF and IDX) ==
[my_sourcetype]
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE_DATE = false
BREAK_ONLY_BEFORE = \d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\.\d{4}
TIME_FORMAT = %Y-%m-%d %H:%M:%S.%4N
MAX_TIMESTAMP_LOOKAHEAD = 26
MUST_NOT_BREAK_AFTER = \s*(<a:Timestamp|<Timestamp)
MUST_NOT_BREAK_BEFORE = \s*(<a:Timestamp|<Timestamp)
== log ==
2018-03-22 13:57:23.0119 INFO - Output Message: {
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Header>
<Action a:mustUnderstand="1" xmlns="http://schemas.microsoft.com/ws/2005/05/addressing/none" xmlns:a="http://schemas.xmlsoap.org/soap/envelope/">http://tempuri.org/Service/tag_a</Action>
</s:Header>
<s:Body>
<tag_a xmlns="http://tempuri.org/">
<tag_b xmlns:a="http://schemas.daact.org/2004/07/IService" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<a:tag_c>false</a:tag_c>
<a:tag_d>true</a:tag_d>
<a:tag_e>
<a:tag_f></a:tag_f>
<a:tag_g>999999999</a:tag_g>
<a:tag_h>ffffffffff</a:tag_h>
<a:tag_i>99</a:tag_i>
<a:tag_l>ffffffffffffffff</a:tag_l>
<a:tag_m>999999999</a:tag_m>
<a:tag_n>fffffff</a:tag_n>
<a:tag_o>9,99</a:tag_o>
<a:tag_p>fffff</a:tag_p>
<a:tag_q>
<a:tag_r>
<a:tag_s>true</a:tag_s>
<a:tag_h>ffffffffff</a:tag_h>
<a:tag_l>ffffffffffffffff</a:tag_l>
<a:tag_t>22/03/2018</a:tag_t>
<a:tag_u>fffff</a:tag_u>
<a:tag_v>9999</a:tag_v>
<a:tag_z>9</a:tag_z>
</a:tag_r>
<a:TimestampLastupdate>2018-02-20T20:31:20.097</a:TimestampLastupdate>
<a:tag_j>ff</a:tag_j>
<a:tag_x>XML</a:tag_x>
</a:tag_q>
</a:tag_e>
<a:IsError>false</a:IsError>
<a:tag_k>
<a:tag_w></a:tag_w>
<a:ErrorDescription></a:ErrorDescription>
</a:tag_k>
</tag_b>
</tag_a>
</s:Body>
</s:Envelope>
}
Have you got any ideas how to fix this behavior?
Also, do I have to configure only HF props.conf or only IDX props.conf or both?
All configurations defined here are correct.
In my case Splunk does not parse correctly the props.conf because the source type that I want to process overrides, by a transforms, another one.
Given that Splunk reads once a props.conf, it does not process the source type which overrides the first
Check this case: https://answers.splunk.com/answers/636220/what-are-the-precedence-of-stanza-and-option-in-pr.htm
All configurations defined here are correct.
In my case Splunk does not parse correctly the props.conf because the source type that I want to process overrides, by a transforms, another one.
Given that Splunk reads once a props.conf, it does not process the source type which overrides the first
Check this case: https://answers.splunk.com/answers/636220/what-are-the-precedence-of-stanza-and-option-in-pr.htm
Give this a try
[my_sourcetype]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)(?=\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d+\s+\w+)
TIME_FORMAT = %Y-%m-%d %H:%M:%S.%4N
MAX_TIMESTAMP_LOOKAHEAD = 26
TRUNCATE = 100000
I add the configuration in Heavy Forwarder and Indexer props.conf, but Splunk continues to split the event.
Inside the following screenshots you can check the source event which is indexed by Splunk and the search result which highlightes the default field "_indextime"
Thanks
Thanks for the advice, but it does not work for me.
When Splunk monitors that log file, continues to split the event.
Please check the next post in which I add the screenshots
Using Splunk input data GUI, the props works perfectly, but when Splunk uses the configuration for the file which is monitoring, does not work correctly
I also tryed this:
[my_sourcetype]
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
CHARSET=UTF-8
disabled=false
SEDCMD-blfRemover=s/\x0A//g
SEDCMD-acrRemover=s/\x0D//g
TRUNCATE=100000
LINE_BREAKER=([\r\n]+)(?=\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d+\s+\w+)
TIME_FORMAT=%Y-%m-%d %H:%M:%S.%4N
MAX_TIMESTAMP_LOOKAHEAD=24
It works only on web input data Web GUI, but not on runtime environment
Hope you're putting this configuration in both you HF and restarting the splunkd instance. Also add following to props.conf which I missed earlier.
TIME_PREFIX = ^
I've just found that "my_sourcetype" is generated dynamically by a transform: before I've got "others_sourcetype" and, after the transform, Splunk overwrite "others_sourcetype" with "my_sourcetype"
I tried the following configurations but it still not working:
[my-sourceytpe]
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
CHARSET=UTF-8
disabled=false
SEDCMD-blfRemover=s/\x0A//g
SEDCMD-acrRemover=s/\x0D//g
TRUNCATE=100000
LINE_BREAKER=([\r\n]+)(?=\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d+\s+\w+)
TIME_FORMAT=%Y-%m-%d %H:%M:%S.%4N
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD=24
[my-sourcetype]
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
CHARSET=UTF-8
disabled=false
TRUNCATE=100000
LINE_BREAKER=([\r\n]+)(?=\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d+\s+\w+)
TIME_FORMAT=%Y-%m-%d %H:%M:%S.%4N
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD=24
For further investigation I add that the sourcetype name is composed by 2 words separated by the "-" char (example: "service-asource")
Where do you want it to break? What is one event in this file? Is the whole body a single event?
I want to break only at the start of the event where I find the date "2018-03-22 13:57:23.0119"
Unfortunately Splunk splits also when it finds the timestamp in the middle "2018-02-20T20:31:20.097"