Getting Data In

Getting xml data into Splunk consistently?

Strangertinz
Path Finder

I am having trouble with ingesting my data into Splunk consistently. I have an XML log file that is constantly being written into (about 100 entry per minute) however,  when I search for the data in Splunk I am only seeing sporadic results of the data in Splunk where I see results for 10 minutes then nothing for the next 20 and so on and so forth . 

I have my inputs and props config below. 


inputs config:


[monitor:///var/log/sample_xml_file.xml]
disabled = false
index = sample_xml_index
sourcetype= sample_xml_st

 

 

 

props.conf:

---------------------

[ sample_xml_st ]
CHARSET=UTF-8
KV_MODE=xml
LINE_BREAKER=(<log_entry>)
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=FALSE
TIME_FORMAT=%Y%m%d-%H:%M:%S
TIME_PREFIX=<log_time>
TRUNCATE=0
description=describing props config
disabled=false
pulldown_type=1
TZ=-05:00

---------------------



Sample xml log:

<?xml version="1.0" encoding="utf-8" ?>
<log>
  <log_entry>
    <log_time>20230724-05:42:00</log_time>
    <description>some random data 1</description>
  </log_entry>
   <log_entry>
    <log_time>20230724-05:43:00</log_time>
    <description>some random data 2</description>
  </log_entry>
   <log_entry>
    <log_time>20230724-05:43:20</log_time>
    <description>some random data 3</description>
  </log_entry>
</log>

And this xml log file gets constantly written into with the a new log_entry 

Labels (4)
0 Karma

yeahnah
Motivator

Hi @Strangertinz 

To correctly break the events the LINE_BREAKER value would be ([\r\n]+)<\?xml.  The newlines in the regex capture group define the line break and are not ingested. 

So, something like this should work on the heavy forwarder or parsing tier.

[sample_xml_st]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)<\?xml
TIME_PREFIX=<log_time>
TIME_FORMAT=%Y%m%d-%H:%M:%S
TRUNCATE=0

 
Hope this helps

0 Karma

Strangertinz
Path Finder

Hi @yeahnah,

 

I am able to parse the data correctly, my issue is with the data being received by Splunk sporadically. 

0 Karma

yeahnah
Motivator

Hi @Strangertinz 

Really!  It is being parsed correctly?  I've know idea how it could be based on your example sample data and the props.conf shown.  Using LINE_BREAKER=(<log_entry>) and SHOULD_LINEMERGE=FALSE would rip the <log_entry> line out of the XML which would break the XML structured data format.   This might explain why the data appears clumped as timestamp extractions would only work on events which had the log_time value in it.  Events without timestamps would have to full back on to other sources, such as the mod time of the source file, for example. 

If you use the hidden _indextime metadata (you need to rename the field to see it, e.g. "rename _indextime to indextime) this will give you the time (in epoch seconds) the data is ingested by Splunk (written to index) and you can check if the event time and index time match or vary wildly.


0 Karma
Get Updates on the Splunk Community!

Index This | Forward, I’m heavy; backward, I’m not. What am I?

April 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

A Guide To Cloud Migration Success

As enterprises’ rapid expansion to the cloud continues, IT leaders are continuously looking for ways to focus ...

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...