Dashboards & Visualizations

How to onboard a large XML file without breaking it up into multiple events?

mcbradford
Contributor

I have been asked to onboard large xml files. Each file contains about 105k lines. There is one date in the file. The file MUST not be broken into events. I am having trouble getting the props correct to index the file properly without breaking the file into lots of events. I tried setting max_events to 150000, but I do not think this is working properly. I also tried TRUINCATE=150000, but this is not working.

BTW, these files only come in once a day, so it is not like they are coming in every min or sec.

0 Karma

lguinn2
Legend

If you want all the data in the file to be a single event, you can probably do that better with LINE_BREAKER

Try this in props.conf

[yoursourcetype]
SHOULD_LINEMERGE=false
LINE_BREAKER=((thismustneverappearinyourfile))
TRUNCATE=0

I don't think that you need to set MAX_EVENTS at all when using this method. But feel free to add in MAX_EVENTS as well...
This technique works by using LINE_BREAKER to define the split between events - and then assigning an "impossible" character string as the line-breaking condition.

0 Karma

ddrillic
Ultra Champion

@mcbradford - I like to upload the file manually and play with the config parameters interactively - saves lots of time ; -)

0 Karma

mcbradfordwcb
Engager

Due to the sensitivity of the data, I cannnot share, and due to the size, sanitizing would be a nightmare. I decided to open a case with Splunk since the max_events does not appear to be working properly as documented.

0 Karma

maciep
Champion

I think the suggestion was not to upload here, but upload manually in your splunk env (probably a test box). So Settings -> Add Data -> Upload. From there you can interactively play with props config to see how Splunk reacts. You may already be doing that, but if not, it's better than waiting every day to see how the latest change you made goes.

Also, don't forget to post the answer out here if Splunk Support solves the problem.

0 Karma

mcbradford
Contributor

Additional information

If I set:

MAX_EVENTS=25000

I get 5 events, first 4 have 25k line, and the last has 5k lines

If I set:

MAX_EVENTS=100000

I get no events??????

0 Karma

cmerriman
Super Champion

what is the entire config stanza?

0 Karma

mcbradfordwcb
Engager
[baz_voice]
BREAK_ONLY_BEFORE =
DATETIME_CONFIG =
MAX_EVENTS = 100000
MAX_TIMESTAMP_LOOKAHEAD = 300
NO_BINARY_CHECK = true
TRUNCATE = 999999
category = Custom
disabled = false
pulldown_type = true
0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...