Getting Data In

How to merge XML rows in one event through props.conf?

robertosegantin
Path Finder

Hi to all,
I've got a log file in which there are many XML messages printed.
One single log message is split into many rows (as you can see from the example below), but I have to merge those rows into a single Splunk event.
I'm on Splunk Enterprise Cluster Environment 6.6.2, and these logs are provided by many Universal Forwarders which sends them to two Heavy Forwarders 6.6.1 (HF) who send the logs to indexer cluster (IDX).
I've tried many props.conf configurations, on HF (BREAK_ONLY_BEFORE, MUST_NOT_BREAK_AFTER, DATETIME_CONFIG, etc...), also on IDX, but Splunk continues to split the event on tag "" given that it finds a timestamp.

== props.conf (on HF and IDX) ==

[my_sourcetype]
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE_DATE = false
BREAK_ONLY_BEFORE = \d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\.\d{4}
TIME_FORMAT = %Y-%m-%d %H:%M:%S.%4N
MAX_TIMESTAMP_LOOKAHEAD = 26
MUST_NOT_BREAK_AFTER = \s*(<a:Timestamp|<Timestamp)
MUST_NOT_BREAK_BEFORE = \s*(<a:Timestamp|<Timestamp)

== log ==

2018-03-22 13:57:23.0119  INFO - Output Message: {
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
  <s:Header>
    <Action a:mustUnderstand="1" xmlns="http://schemas.microsoft.com/ws/2005/05/addressing/none" xmlns:a="http://schemas.xmlsoap.org/soap/envelope/">http://tempuri.org/Service/tag_a</Action>
  </s:Header>
  <s:Body>
    <tag_a xmlns="http://tempuri.org/">
      <tag_b xmlns:a="http://schemas.daact.org/2004/07/IService" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
        <a:tag_c>false</a:tag_c>
        <a:tag_d>true</a:tag_d>
        <a:tag_e>
          <a:tag_f></a:tag_f>
          <a:tag_g>999999999</a:tag_g>
          <a:tag_h>ffffffffff</a:tag_h>
          <a:tag_i>99</a:tag_i>
          <a:tag_l>ffffffffffffffff</a:tag_l>
          <a:tag_m>999999999</a:tag_m>
          <a:tag_n>fffffff</a:tag_n>
          <a:tag_o>9,99</a:tag_o>
          <a:tag_p>fffff</a:tag_p>
          <a:tag_q>
            <a:tag_r>
              <a:tag_s>true</a:tag_s>
              <a:tag_h>ffffffffff</a:tag_h>
              <a:tag_l>ffffffffffffffff</a:tag_l>
              <a:tag_t>22/03/2018</a:tag_t>
              <a:tag_u>fffff</a:tag_u>
              <a:tag_v>9999</a:tag_v>
              <a:tag_z>9</a:tag_z>
            </a:tag_r>
          <a:TimestampLastupdate>2018-02-20T20:31:20.097</a:TimestampLastupdate>
          <a:tag_j>ff</a:tag_j>
          <a:tag_x>XML</a:tag_x>
      </a:tag_q>
        </a:tag_e>
        <a:IsError>false</a:IsError>
        <a:tag_k>
          <a:tag_w></a:tag_w>
          <a:ErrorDescription></a:ErrorDescription>
        </a:tag_k>
      </tag_b>
    </tag_a>
  </s:Body>
</s:Envelope>
}

Have you got any ideas how to fix this behavior?
Also, do I have to configure only HF props.conf or only IDX props.conf or both?

0 Karma
1 Solution

robertosegantin
Path Finder

All configurations defined here are correct.
In my case Splunk does not parse correctly the props.conf because the source type that I want to process overrides, by a transforms, another one.
Given that Splunk reads once a props.conf, it does not process the source type which overrides the first

Check this case: https://answers.splunk.com/answers/636220/what-are-the-precedence-of-stanza-and-option-in-pr.htm

View solution in original post

0 Karma

robertosegantin
Path Finder

All configurations defined here are correct.
In my case Splunk does not parse correctly the props.conf because the source type that I want to process overrides, by a transforms, another one.
Given that Splunk reads once a props.conf, it does not process the source type which overrides the first

Check this case: https://answers.splunk.com/answers/636220/what-are-the-precedence-of-stanza-and-option-in-pr.htm

0 Karma

somesoni2
Revered Legend

Give this a try

[my_sourcetype]
 SHOULD_LINEMERGE = false
 LINE_BREAKER = ([\r\n]+)(?=\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d+\s+\w+)
 TIME_FORMAT = %Y-%m-%d %H:%M:%S.%4N
 MAX_TIMESTAMP_LOOKAHEAD = 26
 TRUNCATE = 100000
0 Karma

robertosegantin
Path Finder

I add the configuration in Heavy Forwarder and Indexer props.conf, but Splunk continues to split the event.
Inside the following screenshots you can check the source event which is indexed by Splunk and the search result which highlightes the default field "_indextime"

alt text

alt text

Thanks

0 Karma

robertosegantin
Path Finder

Thanks for the advice, but it does not work for me.
When Splunk monitors that log file, continues to split the event.
Please check the next post in which I add the screenshots

0 Karma

robertosegantin
Path Finder

Using Splunk input data GUI, the props works perfectly, but when Splunk uses the configuration for the file which is monitoring, does not work correctly

0 Karma

robertosegantin
Path Finder

I also tryed this:

    [my_sourcetype]
    SHOULD_LINEMERGE=false
    NO_BINARY_CHECK=true
    CHARSET=UTF-8
    disabled=false
    SEDCMD-blfRemover=s/\x0A//g
    SEDCMD-acrRemover=s/\x0D//g
    TRUNCATE=100000
    LINE_BREAKER=([\r\n]+)(?=\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d+\s+\w+)
    TIME_FORMAT=%Y-%m-%d %H:%M:%S.%4N
    MAX_TIMESTAMP_LOOKAHEAD=24

It works only on web input data Web GUI, but not on runtime environment

0 Karma

somesoni2
Revered Legend

Hope you're putting this configuration in both you HF and restarting the splunkd instance. Also add following to props.conf which I missed earlier.

TIME_PREFIX = ^
0 Karma

robertosegantin
Path Finder

I've just found that "my_sourcetype" is generated dynamically by a transform: before I've got "others_sourcetype" and, after the transform, Splunk overwrite "others_sourcetype" with "my_sourcetype"

0 Karma

robertosegantin
Path Finder

I tried the following configurations but it still not working:

[my-sourceytpe]
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
CHARSET=UTF-8
disabled=false
SEDCMD-blfRemover=s/\x0A//g
SEDCMD-acrRemover=s/\x0D//g
TRUNCATE=100000
LINE_BREAKER=([\r\n]+)(?=\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d+\s+\w+)
TIME_FORMAT=%Y-%m-%d %H:%M:%S.%4N
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD=24

[my-sourcetype]
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
CHARSET=UTF-8
disabled=false
TRUNCATE=100000
LINE_BREAKER=([\r\n]+)(?=\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d+\s+\w+)
TIME_FORMAT=%Y-%m-%d %H:%M:%S.%4N
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD=24

For further investigation I add that the sourcetype name is composed by 2 words separated by the "-" char (example: "service-asource")

0 Karma

lguinn2
Legend

Where do you want it to break? What is one event in this file? Is the whole body a single event?

0 Karma

robertosegantin
Path Finder

I want to break only at the start of the event where I find the date "2018-03-22 13:57:23.0119"
Unfortunately Splunk splits also when it finds the timestamp in the middle "2018-02-20T20:31:20.097"

0 Karma
Get Updates on the Splunk Community!

Cloud Platform | Customer Change Announcement: Email Notification Will Be Available ...

The Notification Team is migrating our email service provider from Postmark to AWS Simple Email ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...