Hello everybody,
I am facing some challenges with some custom log file containing bits of xml surrounded by some sort of headers...
The file looks something like this:
[1][DATA]BEGIN --- - 06:03:09[012]
<xml>
<tag1>value</tag1>
<nestedTag>
<tag2>another value</tag2>
</nestedTag>
</xml>
[1][DATA]END --- - 06:03:09[012]
[1][DATA]BEGIN --- - 07:03:09[123]
<xml>
<tag1>some stuff</tag1>
<nestedTag>
<tag2>other stuff</tag2>
</nestedTag>
</xml>
[1][DATA]END --- - 07:03:09[123]
[1][DATA]BEGIN --- - 08:03:09[456]
<xml>
<tag1>some more data</tag1>
<nestedTag>
<tag2>fooband a bit more</tag2>
</nestedTag>
</xml>
[1][DATA]END --- - 08:03:09[456]
It is worth noting that the xml parts can be very large.
I would like to take advantage of Splunk's automatic xml parsing as it is not realistic to do it manually in this case, but the square bracket lines around each xml block seem to prevent the xml parser to do its job and I get no field extraction.
So, what I would like to do is:
What I have tried with props.conf and transforms.conf:
props.conf
[my_sourcetype]
BREAK_ONLY_BEFORE_DATE =
DATETIME_CONFIG =
KV_MODE = xml
LINE_BREAKER = \]([\r\n]+)\[1\]\[DATA\]BEGIN
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
category = Custom
pulldown_type = true
TRANSFORMS-full=my_transform # only with transforms.conf v1
TRANSFORMS-begin=begin # only with transforms.conf v2
TRANSFORMS-end=end # only with transforms.conf v2
transforms.conf (version 1):
[my_transform]
REGEX = (?m)\[1\]\[DATA\]BEGIN --- - (\d{2}:\d{2}:\d{2}).*([\r\n]+)([^\[]*)\[1\]\[DATA\]END.*$[\r\n]*
FORMAT = <time>$1</time>$2$3
WRITE_META = true
DEST_KEY = _raw
transforms.conf (version 2):
[begin]
REGEX = (?m)^\[1\]\[DATA\]BEGIN --- - (\d{2}:\d{2}:\d{2}).*$
FORMAT = <time>$1</time>
WRITE_META = true
DEST_KEY = _raw
[end]
REGEX = (?m)^\[1\]\[DATA\]END.*$
DEST_KEY = queue
FORMAT = nullQueue
With the various combinations listed here, I got all sorts of results:
Could anybody help me out with this use case?
Many thanks,
Alex
Hi @scelikok
Thanks a lot for your reply, it was most helpful, and it helped me finding a solution.
However, I realised that the snippet I had provided had some subtle differences with the actual data, and so I had to slightly adapt your solution. That being said, I was under the impression that your regex was not quite right either as I ran it through regex101 first and it only matched the first xml block (I stripped the beginning of the square bracket line to emulate the line breaker in props.conf)
So, to recap, here is a more accurate example of the log:
[1][DATA]BEGIN --- - 06:03:09[012]
<?xml version="1.0" encoding="UTF-8"?>
<root>
<tag1>value</tag1>
<nestedTag>
<tag2>another value</tag2>
</nestedTag>
</root>
[1][DATA]END --- - 06:03:09[012]
[1][DATA]BEGIN --- - 07:03:09[123]
<?xml version="1.0" encoding="UTF-8"?>
<root>
<tag1>some stuff</tag1>
<nestedTag>
<tag2>other stuff</tag2>
</nestedTag>
</root>
[1][DATA]END --- - 07:03:09[123]
[1][DATA]BEGIN --- - 08:03:09[456]
<?xml version="1.0" encoding="UTF-8"?>
<root>
<tag1>some more data</tag1>
<nestedTag>
<tag2>fooband a bit more</tag2>
</nestedTag>
</root>
[1][DATA]END --- - 08:03:09[456]
Here is the props.conf I ended up using (as per @scelikok's suggestion):
[my_sourcetype]
LINE_BREAKER = (\[1\]\[DATA\]BEGIN[-\s]+)
SHOULD_LINEMERGE = false
TRANSFORM-transform2xml = transform2xml
KV_MODE = xml
And here is the corresponding transforms.conf, slightly tweaked - I ended up being a bit more explicit on the end of the event and removed some of the capturing groups:
[transform2xml]
REGEX = ^([^\[]+)\[\d+\][\r\n]+(<\?xml.*>[^\[]+)\[1\]\[DATA\]END --- - [\d:]+\[\d+\][\r\n]*
FORMAT = <time>$1</time>$2
DEST_KEY = _raw
It may not be a perfect xml, but that it works as expected and the xml is now automatically parsed.
Thanks again for your help @scelikok !
Hi @Alex_LC,
You can try below;
props.conf
[my_sourcetype]
LINE_BREAKER = (\[1\]\[DATA\]BEGIN[-\s]+)
SHOULD_LINEMERGE = false
TRANSFORM-transform2xml = transform2xml
KV_MODE = xml
transform.conf
[transform2xml]
REGEX = ([^\[]+)(\[\d+\][\r\n]+<xml>)([^\[]+)(<\/xml>[^$]+)
FORMAT = <xml><time>$1</time>$3</xml>
DEST_KEY = _raw
It should create a separate event for each block with time field like below;
<xml><time>08:03:09</time>
<tag1>some more data</tag1>
<nestedTag>
<tag2>fooband a bit more</tag2>
</nestedTag>
</xml>
Hi @scelikok
Thanks a lot for your reply, it was most helpful, and it helped me finding a solution.
However, I realised that the snippet I had provided had some subtle differences with the actual data, and so I had to slightly adapt your solution. That being said, I was under the impression that your regex was not quite right either as I ran it through regex101 first and it only matched the first xml block (I stripped the beginning of the square bracket line to emulate the line breaker in props.conf)
So, to recap, here is a more accurate example of the log:
[1][DATA]BEGIN --- - 06:03:09[012]
<?xml version="1.0" encoding="UTF-8"?>
<root>
<tag1>value</tag1>
<nestedTag>
<tag2>another value</tag2>
</nestedTag>
</root>
[1][DATA]END --- - 06:03:09[012]
[1][DATA]BEGIN --- - 07:03:09[123]
<?xml version="1.0" encoding="UTF-8"?>
<root>
<tag1>some stuff</tag1>
<nestedTag>
<tag2>other stuff</tag2>
</nestedTag>
</root>
[1][DATA]END --- - 07:03:09[123]
[1][DATA]BEGIN --- - 08:03:09[456]
<?xml version="1.0" encoding="UTF-8"?>
<root>
<tag1>some more data</tag1>
<nestedTag>
<tag2>fooband a bit more</tag2>
</nestedTag>
</root>
[1][DATA]END --- - 08:03:09[456]
Here is the props.conf I ended up using (as per @scelikok's suggestion):
[my_sourcetype]
LINE_BREAKER = (\[1\]\[DATA\]BEGIN[-\s]+)
SHOULD_LINEMERGE = false
TRANSFORM-transform2xml = transform2xml
KV_MODE = xml
And here is the corresponding transforms.conf, slightly tweaked - I ended up being a bit more explicit on the end of the event and removed some of the capturing groups:
[transform2xml]
REGEX = ^([^\[]+)\[\d+\][\r\n]+(<\?xml.*>[^\[]+)\[1\]\[DATA\]END --- - [\d:]+\[\d+\][\r\n]*
FORMAT = <time>$1</time>$2
DEST_KEY = _raw
It may not be a perfect xml, but that it works as expected and the xml is now automatically parsed.
Thanks again for your help @scelikok !