Getting Data In

Getting xml data into Splunk consistently?

Strangertinz
Path Finder

I am having trouble with ingesting my data into Splunk consistently. I have an XML log file that is constantly being written into (about 100 entry per minute) however,  when I search for the data in Splunk I am only seeing sporadic results of the data in Splunk where I see results for 10 minutes then nothing for the next 20 and so on and so forth . 

I have my inputs and props config below. 


inputs config:


[monitor:///var/log/sample_xml_file.xml]
disabled = false
index = sample_xml_index
sourcetype= sample_xml_st

 

 

 

props.conf:

---------------------

[ sample_xml_st ]
CHARSET=UTF-8
KV_MODE=xml
LINE_BREAKER=(<log_entry>)
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=FALSE
TIME_FORMAT=%Y%m%d-%H:%M:%S
TIME_PREFIX=<log_time>
TRUNCATE=0
description=describing props config
disabled=false
pulldown_type=1
TZ=-05:00

---------------------



Sample xml log:

<?xml version="1.0" encoding="utf-8" ?>
<log>
  <log_entry>
    <log_time>20230724-05:42:00</log_time>
    <description>some random data 1</description>
  </log_entry>
   <log_entry>
    <log_time>20230724-05:43:00</log_time>
    <description>some random data 2</description>
  </log_entry>
   <log_entry>
    <log_time>20230724-05:43:20</log_time>
    <description>some random data 3</description>
  </log_entry>
</log>

And this xml log file gets constantly written into with the a new log_entry 

Labels (4)
0 Karma

yeahnah
Motivator

Hi @Strangertinz 

To correctly break the events the LINE_BREAKER value would be ([\r\n]+)<\?xml.  The newlines in the regex capture group define the line break and are not ingested. 

So, something like this should work on the heavy forwarder or parsing tier.

[sample_xml_st]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)<\?xml
TIME_PREFIX=<log_time>
TIME_FORMAT=%Y%m%d-%H:%M:%S
TRUNCATE=0

 
Hope this helps

0 Karma

Strangertinz
Path Finder

Hi @yeahnah,

 

I am able to parse the data correctly, my issue is with the data being received by Splunk sporadically. 

0 Karma

yeahnah
Motivator

Hi @Strangertinz 

Really!  It is being parsed correctly?  I've know idea how it could be based on your example sample data and the props.conf shown.  Using LINE_BREAKER=(<log_entry>) and SHOULD_LINEMERGE=FALSE would rip the <log_entry> line out of the XML which would break the XML structured data format.   This might explain why the data appears clumped as timestamp extractions would only work on events which had the log_time value in it.  Events without timestamps would have to full back on to other sources, such as the mod time of the source file, for example. 

If you use the hidden _indextime metadata (you need to rename the field to see it, e.g. "rename _indextime to indextime) this will give you the time (in epoch seconds) the data is ingested by Splunk (written to index) and you can check if the event time and index time match or vary wildly.


0 Karma
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

 (view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...