Getting Data In

Getting xml data into Splunk consistently?

Strangertinz
Path Finder

I am having trouble with ingesting my data into Splunk consistently. I have an XML log file that is constantly being written into (about 100 entry per minute) however,  when I search for the data in Splunk I am only seeing sporadic results of the data in Splunk where I see results for 10 minutes then nothing for the next 20 and so on and so forth . 

I have my inputs and props config below. 


inputs config:


[monitor:///var/log/sample_xml_file.xml]
disabled = false
index = sample_xml_index
sourcetype= sample_xml_st

 

 

 

props.conf:

---------------------

[ sample_xml_st ]
CHARSET=UTF-8
KV_MODE=xml
LINE_BREAKER=(<log_entry>)
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=FALSE
TIME_FORMAT=%Y%m%d-%H:%M:%S
TIME_PREFIX=<log_time>
TRUNCATE=0
description=describing props config
disabled=false
pulldown_type=1
TZ=-05:00

---------------------



Sample xml log:

<?xml version="1.0" encoding="utf-8" ?>
<log>
  <log_entry>
    <log_time>20230724-05:42:00</log_time>
    <description>some random data 1</description>
  </log_entry>
   <log_entry>
    <log_time>20230724-05:43:00</log_time>
    <description>some random data 2</description>
  </log_entry>
   <log_entry>
    <log_time>20230724-05:43:20</log_time>
    <description>some random data 3</description>
  </log_entry>
</log>

And this xml log file gets constantly written into with the a new log_entry 

Labels (4)
0 Karma

yeahnah
Motivator

Hi @Strangertinz 

To correctly break the events the LINE_BREAKER value would be ([\r\n]+)<\?xml.  The newlines in the regex capture group define the line break and are not ingested. 

So, something like this should work on the heavy forwarder or parsing tier.

[sample_xml_st]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)<\?xml
TIME_PREFIX=<log_time>
TIME_FORMAT=%Y%m%d-%H:%M:%S
TRUNCATE=0

 
Hope this helps

0 Karma

Strangertinz
Path Finder

Hi @yeahnah,

 

I am able to parse the data correctly, my issue is with the data being received by Splunk sporadically. 

0 Karma

yeahnah
Motivator

Hi @Strangertinz 

Really!  It is being parsed correctly?  I've know idea how it could be based on your example sample data and the props.conf shown.  Using LINE_BREAKER=(<log_entry>) and SHOULD_LINEMERGE=FALSE would rip the <log_entry> line out of the XML which would break the XML structured data format.   This might explain why the data appears clumped as timestamp extractions would only work on events which had the log_time value in it.  Events without timestamps would have to full back on to other sources, such as the mod time of the source file, for example. 

If you use the hidden _indextime metadata (you need to rename the field to see it, e.g. "rename _indextime to indextime) this will give you the time (in epoch seconds) the data is ingested by Splunk (written to index) and you can check if the event time and index time match or vary wildly.


0 Karma
Get Updates on the Splunk Community!

Index This | What is broken 80% of the time by February?

December 2025 Edition   Hayyy Splunk Education Enthusiasts and the Eternally Curious!    We’re back with this ...

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Hello Splunk Community,   We're thrilled to share an exciting update that will help you manage your data more ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Discover how the Splunk Model Context Protocol (MCP) Server can revolutionize the way your organization uses ...