Getting xml data into Splunk consistently?

Strangertinz · ‎07-24-2023

I am having trouble with ingesting my data into Splunk consistently. I have an XML log file that is constantly being written into (about 100 entry per minute) however, when I search for the data in Splunk I am only seeing sporadic results of the data in Splunk where I see results for 10 minutes then nothing for the next 20 and so on and so forth .

I have my inputs and props config below.

inputs config:

[monitor:///var/log/sample_xml_file.xml]
disabled = false
index = sample_xml_index
sourcetype= sample_xml_st

props.conf:

---------------------

[ sample_xml_st ]
CHARSET=UTF-8
KV_MODE=xml
LINE_BREAKER=(<log_entry>)
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=FALSE
TIME_FORMAT=%Y%m%d-%H:%M:%S
TIME_PREFIX=<log_time>
TRUNCATE=0
description=describing props config
disabled=false
pulldown_type=1
TZ=-05:00

---------------------

Sample xml log:

<?xml version="1.0" encoding="utf-8" ?>
<log>
  <log_entry>
    <log_time>20230724-05:42:00</log_time>
    <description>some random data 1</description>
  </log_entry>
   <log_entry>
    <log_time>20230724-05:43:00</log_time>
    <description>some random data 2</description>
  </log_entry>
   <log_entry>
    <log_time>20230724-05:43:20</log_time>
    <description>some random data 3</description>
  </log_entry>
</log>

And this xml log file gets constantly written into with the a new log_entry

yeahnah · ‎07-24-2023

Hi @Strangertinz

To correctly break the events the LINE_BREAKER value would be ([\r\n]+)<\?xml. The newlines in the regex capture group define the line break and are not ingested.

So, something like this should work on the heavy forwarder or parsing tier.

[sample_xml_st]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)<\?xml
TIME_PREFIX=<log_time>
TIME_FORMAT=%Y%m%d-%H:%M:%S
TRUNCATE=0

Hope this helps

Strangertinz · ‎07-25-2023

Hi @yeahnah,

I am able to parse the data correctly, my issue is with the data being received by Splunk sporadically.

yeahnah · ‎07-25-2023

Hi @Strangertinz

Really! It is being parsed correctly? I've know idea how it could be based on your example sample data and the props.conf shown. Using LINE_BREAKER=(<log_entry>) and SHOULD_LINEMERGE=FALSE would rip the <log_entry> line out of the XML which would break the XML structured data format. This might explain why the data appears clumped as timestamp extractions would only work on events which had the log_time value in it. Events without timestamps would have to full back on to other sources, such as the mod time of the source file, for example.

If you use the hidden _indextime metadata (you need to rename the field to see it, e.g. "rename _indextime to indextime) this will give you the time (in epoch seconds) the data is ingested by Splunk (written to index) and you can check if the event time and index time match or vary wildly.

Getting xml data into Splunk consistently?

index

props.conf

sourcetype

XML

Unlock Database Monitoring with Splunk Observability Cloud

Purpose in Action: How Splunk Is Helping Power an Inclusive Future for All

[Upcoming Webinar] Demo Day: Transforming IT Operations with Splunk

Join the Conversation

Getting xml data into Splunk consistently?

index

props.conf

sourcetype

XML

Unlock Database Monitoring with Splunk Observability Cloud

Purpose in Action: How Splunk Is Helping Power an Inclusive Future for All

[Upcoming Webinar] Demo Day: Transforming IT Operations with Splunk