Getting Data In

Ingest entire XML file

nicofantinato
Path Finder

Hello everybody,

we are monitoring via Universal Forwarder several directories with a large XML file in there (around 1000 lines). These files changes every few seconds, and the change also involves the timestamp which is written in the first 256 bytes of the file.

I need to ingest these files entirely at every change but, instead, Splunk ingest me these files only one time every some hours or even days. Do you have any suggestion on how can I fix this?

Here's the props.conf in my heavy forwarders (we have a distributed environment):
[xml_atm]
TRANSFORMS-routing=xmlatm-route
SHOULD_LINEMERGE=true
LINE_BREAKER=(?:restart)([\r\n]+)
CHARSET=ISO-8859-1
CHECK_METHOD = modtime
MAX_EVENTS=4000
TRUNCATE=0
disabled=false
TIME_PREFIX=restart-flag="
REPORT-xmlext=xml-extr

While inputs.conf in UF is this:
[monitor://D:\ABC\Monitor\Monitor\Inputs\*\*.xml]
disabled = 0
host_segment = 5
index = my_index
sourcetype = xml_atm

Thanks in advance.

0 Karma
1 Solution

nicofantinato
Path Finder

Hi, turned out we also needed to add directive crcSalt = <SOURCE> in inputs.conf on UFs. Adding this all worked as expected.

inputs.conf became simply:

[monitor://D:\ABC\Monitor\Monitor\Inputs\*\*.xml]
disabled = 0
host_segment = 5
crcSalt = <SOURCE>
index = my_index
sourcetype = xml_atm

View solution in original post

0 Karma

nicofantinato
Path Finder

Hi, turned out we also needed to add directive crcSalt = <SOURCE> in inputs.conf on UFs. Adding this all worked as expected.

inputs.conf became simply:

[monitor://D:\ABC\Monitor\Monitor\Inputs\*\*.xml]
disabled = 0
host_segment = 5
crcSalt = <SOURCE>
index = my_index
sourcetype = xml_atm

0 Karma

ff9231
Loves-to-Learn

I have the exact same config as you, the only difference is that I want "source" also, if I define a source value then host names goes back to default servername, in this case host/host_regex/host_segment, nothing works if "source" is defined.

Do you have any suggestion what I can try? I am also configuring on UF.

Sample:

[monitor://D:\ABC\Monitor\Monitor\Inputs\*\*.xml]
disabled = 0
host_segment = 5
crcSalt = <SOURCE>
index = my_index
sourcetype = xml_atm

source = B_wks

0 Karma

nicofantinato
Path Finder

Hi, do you have any suggestion? I'm still unable to ingest entire XML files every time their modification time changes.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Please share the relevant inputs.conf stanza from the UFs.

---
If this reply helps you, Karma would be appreciated.
0 Karma

nicofantinato
Path Finder

Here is the inputs.conf on the UF
[monitor://D:\ABC\Monitor\Monitor\Inputs\*\*.xml]
disabled = 0
host_segment = 5
index = my_index
sourcetype = xml_atm

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Splunk tries its best to avoid re-indexing entire files that are ingesting via a monitor stanza.  I'm not aware of any setting to override that behavior.

Consider using batch input, instead.  Splunk will read the entire file, but will delete it afterward.  That means your application must be prepared to re-create the file.  It also runs the risk of a race condition between Splunk and your app.  Can the application be configured to write a new file instead of overwriting existing files?

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In November, the Splunk Threat Research Team had one release of new security content via the Enterprise ...

Index This | Divide 100 by half. What do you get?

November 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

❄️ Celebrate the season with our December lineup of Community Office Hours, Tech Talks, and Webinars! ...