Splunk Search

Why is my data not parsing correctly?

mhouse3
Path Finder

I am trying to make sure I know how to configure an environment to ingest weblogs that are correctly parsed and I am running into trouble in that I am only getting 1 single event. I have used feedback provided to similar questions to build out my configurations. Note that the original intent of this exercise was to see what would the different effect be with two different props.conf.

My weblog source is this on both forwarders:

  '<photo id="123" title="Birthday" format="jpg">
       <owner id="1111">Jason</owner>
    <CreationDate>2009-11-06T02:22:37.063</CreationDate>
       <comments>
           <comment ownerid="112">Good pic!</comment>
           <comment ownerif="223">Happy birthday</comment>
       <comments>
   </photo>


  <photo id="123" title="Birthday" format="jpg">
       <owner id="1111">Jason</owner>
    <CreationDate>2009-11-06T02:22:37.063</CreationDate>
       <comments>
           <comment ownerid="112">Good pic!</comment>
           <comment ownerif="223">Happy birthday</comment>
       <comments>
   </photo>


  <photo id="123" title="Birthday" format="jpg">
       <owner id="1111">Jason</owner>
    <CreationDate>2009-11-06T02:22:37.063</CreationDate>
       <comments>
           <comment ownerid="112">Good pic!</comment>
           <comment ownerif="223">Happy birthday</comment>
       <comments>
   </photo>'

My inputs.conf on FW1is this:
'[monitor:///home/labuser/xmldata/]
index=web
sourcetype=xml
disabled=false'

My inputs.conf on FW2 is this so that I could figure out which props.conf works:
'[monitor:///home/labuser/xmldata/]
index=web2
sourcetype=xml2
disabled=false'

My props.conf on FW1 is this:
'KV_MODE = xml
LINE_BREAKER = ()
MUST_BREAK_AFTER = \
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false
TRUNCATE = 0
TIME_PREFIX = \
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N'

My props.conf on FW2 is this:
'KV_MODE = xml
LINE_BREAKER = ([\r\n]+)()
MUST_BREAK_AFTER = \
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false
TRUNCATE = 0
TIME_PREFIX = \
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N'

All the data in both web index and web2 index looks identical in Splunk that both index=web1 or index=web2 produce identical results in that I only get a single event back instead of multiple events. What am I doing wrong?

0 Karma

manjunathmeti
Champion

For LINE_BREAKER, regex captured group should contain everything between end of previous event and beginning of current event. Try this:

[xml]
KV_MODE = xml
LINE_BREAKER = \<\/photo\>([\r\n\s]+)\<photo
NO_BINARY_CHECK = 1
TRUNCATE = 0
TIME_PREFIX = \<CreationDate\>
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
0 Karma

mhouse3
Path Finder

@manjunathmeti I just realized that my questions got cut off.

I was actually using:
LINE_BREAKER = ()

and later tried using this:
LINE_BREAKER = ([\r\n]+)()

In both cases all the data came in as a single event. These should have worked right?

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas     Cisco Live 2026 is almost here, and this ...

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Hello Splunkers,   So you searched, “what is the name of the usb key inserted by bob smith?”  Not gonna lie… ...

Automating Threat Operations and Threat Hunting with Recorded Future

    Automating Threat Operations and Threat Hunting with Recorded Future June 29, 2026 | Register   Is your ...