Dashboards & Visualizations

How to onboard a large XML file without breaking it up into multiple events?

mcbradford
Contributor

I have been asked to onboard large xml files. Each file contains about 105k lines. There is one date in the file. The file MUST not be broken into events. I am having trouble getting the props correct to index the file properly without breaking the file into lots of events. I tried setting max_events to 150000, but I do not think this is working properly. I also tried TRUINCATE=150000, but this is not working.

BTW, these files only come in once a day, so it is not like they are coming in every min or sec.

0 Karma

lguinn2
Legend

If you want all the data in the file to be a single event, you can probably do that better with LINE_BREAKER

Try this in props.conf

[yoursourcetype]
SHOULD_LINEMERGE=false
LINE_BREAKER=((thismustneverappearinyourfile))
TRUNCATE=0

I don't think that you need to set MAX_EVENTS at all when using this method. But feel free to add in MAX_EVENTS as well...
This technique works by using LINE_BREAKER to define the split between events - and then assigning an "impossible" character string as the line-breaking condition.

0 Karma

ddrillic
Ultra Champion

@mcbradford - I like to upload the file manually and play with the config parameters interactively - saves lots of time ; -)

0 Karma

mcbradfordwcb
Engager

Due to the sensitivity of the data, I cannnot share, and due to the size, sanitizing would be a nightmare. I decided to open a case with Splunk since the max_events does not appear to be working properly as documented.

0 Karma

maciep
Champion

I think the suggestion was not to upload here, but upload manually in your splunk env (probably a test box). So Settings -> Add Data -> Upload. From there you can interactively play with props config to see how Splunk reacts. You may already be doing that, but if not, it's better than waiting every day to see how the latest change you made goes.

Also, don't forget to post the answer out here if Splunk Support solves the problem.

0 Karma

mcbradford
Contributor

Additional information

If I set:

MAX_EVENTS=25000

I get 5 events, first 4 have 25k line, and the last has 5k lines

If I set:

MAX_EVENTS=100000

I get no events??????

0 Karma

cmerriman
Super Champion

what is the entire config stanza?

0 Karma

mcbradfordwcb
Engager
[baz_voice]
BREAK_ONLY_BEFORE =
DATETIME_CONFIG =
MAX_EVENTS = 100000
MAX_TIMESTAMP_LOOKAHEAD = 300
NO_BINARY_CHECK = true
TRUNCATE = 999999
category = Custom
disabled = false
pulldown_type = true
0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...