Dashboards & Visualizations

XML Breaking looks good in sourcetype builder but breaks differently for every event once the data starts coming in

Contributor

I've seen several XML break examples, but none yet which quite matched this issue.

I have an app that writes a file per log in XML format. it begins with the xml header, <?xml and ends with </Job> and these can be very large XML logs.

My props.conf looks like this:

[DumbApp:ActLogs]
KV_MODE = xml
BREAK_ONLY_BEFPRE = (<?xml)
category = Application
description = Job Logs from DumbApp
disabled = false
MAX_EVENTS = 100000
pulldown_type = true
SHOULD_LINEMERGE = false

But for every log it breaks after <PercentComplete> and then each XML stanza in that log after that is treated as a different event. I want it not to break at all basically, and treat each log file as one solid event. Every combination of linemerge settings, or setting breakonlybefore to something that would encompass the whole file, beginning to end, has not worked.

I'm running splunk 6.2.3.

Tags (2)
0 Karma

Motivator

Okay, so you've got some config mismatch. BREAK_ONLY_BEFORE is only applicable if SHOULD_LINEMERGE=true, which it isn't. When you have that set to false you want to use LINE_BREAKER to tell it how to break events. Also, both BREAK_ONLY_BEFORE and LINE_BREAKER take regex expressions, so what you have would not match what you intend. I would start with something like this:

[DumbApp:ActLogs]
KV_MODE = xml
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)(?:<\?xml)
MAX_EVENTS = 100000

If these XML segments are one per file, then another option is to set your LINE_BREAKER to something that will never appear.

You will also want to tell splunk where to find the timestamp in your event. I'm not sure which of the different time values you want to use, so I'll just say you want to set the following: TIME_PREFIX, TIME_FORMAT, and MAX_TIMESTAMP_LOOKAHEAD

Finally, depending on how these logs are written, you may run into other issues:

  1. If the xml is not written in one shot, Splunk may get confused as it will see the modtime update, read the file, and grab the partially written logs.
  2. If blocks are injected into the XML schema, it will wreak havoc. (Think a XML schema where events are inserted into it as time goes on, so the end of the block keeps getting pushed down instead of appending) Splunk will keep re-reading the entire file when it encounters this.
  3. If you have one file per event, and a lot of events are generated, this will result in Splunk tracking a LOT of files unnecessarily. (They will never be appended once written, so tracking them is unneccesary) In the event this becomes a problem, you may want to investigate the [BATCH://...] inputs. This will ingest the file and delete it when done. Keep in mind it is destructive, so may not fit your use case.
0 Karma

Contributor

😞 no such luck.. This is a very odd issue. Especially since using the 6.2 add-data tool, it breaks the event perfectly, evn when the event I used to build the sourcetype interactively is the very same event that got indexed!

0 Karma

Motivator

Where is your props.conf file?

0 Karma

Contributor

It is in the application which I am doing the searching from, in this case /etc/apps/HPCI/local/props.conf

0 Karma

Motivator

Couple more questions then:

  1. What is your environment? Is this a single instance serving as indexer and search head? If not, is this props.conf on the indexer? (That is where event-breaking happens)
  2. When you tried different things, how did you 'purge' the index? If you're doing ...| delete in a search, then you're not actually deleting anything, just making it invisible. Actually cleaning an index happens on the command line while splunk is stopped. ( $SPLUNK_HOME/bin/splunk clean eventdata -index $INDEX )
  3. | extract reload=t is for search time extractions and does not apply to index-time configurations. Did you restart Splunk after updating your props.conf?

In short, to test this you need to do the following:

On your indexer:
- Update your props.conf
- stop splunk
- splunk clean eventdata from the command line. (This drops EVERYTHING in an index, mind)
- start splunk

On your forwarder:
- stop splunk
- clean fish bucket
- start splunk

0 Karma

Contributor

I really Appreciate all of your help!

Our environment is rather large. We have eight different indexers across two physical locations. I am setting the configuration entries on the search head but verifying that the bundle replicated to the indexer and that the stanza reflects what is on the search head.

I was doing the delete command so based on that perhaps I was just unhiding the events when they were reindexed. I will try again soon issuing the clean command on the index itself and see if I get different results. Thanks again for everything and I will let you know

0 Karma

Motivator

Not sure what you're referring to by saying the "bundle replicated" as that can refer to a couple of things, not all of which affect indexing.

On your indexers, run the following: $SPLUNK_HOME/bin/splunk btool props list $STANZA_NAME

Check the output and make sure it reflects the current state of your props.conf file. I suspect it doesn't.

The delete command simply makes items invisible to search. (Doesn't free up disk space or anything) However, it doesn't prevent the data from being re-indexed. So if you clean your fishbucket on your forwarder, that'll re-ingest the data. My hunch is that your props changes are happening on the search head, which isn't involved in the indexing process at all. When you run searches, a 'search bundle' is pushed to the indexers, but that is just for searching, it isn't involved in the indexing pipeline.

What I would really recommend is to stop testing this live. You're eating up license and disk space to do so. Spin up a local install of splunk, put your props there, restart, and one-shot the file. If it doesn't work, clean the index and try again. Once you have a working stanza, then you put it on your real indexers and start ingesting data.

0 Karma

Contributor

So I tried both approaches you suggested:

the first with linebreaker to a string that would never ever show up in this data, and the last with the regex you described above. Rinsed and repeated the same effort:

1) purged index
2) cleaned forwarder event data/fishbucket
3) | extract reload=t and /debug/refresh
4) waited for data to be resent
the events still come back in breaking at line 18, , for every event going back on logs with mod times over a year old which definitely aren't being written to any longer.

This almost feels like a bug, because I am definitely breaking other xml with more complexity and haven't experienced this issue.

0 Karma

Motivator

One more thing to try: add TRUNCATE=0 to your props.

0 Karma

Contributor

Thanks for this. Trying it now.
with regard to timestamping, I am going on modtime because the directory I'm pulling from is an archival directory, so the logs will have definitely been completed and moved by the application in order for me to read them. I opted to do this because I was worried about the exact issue you were talking about. 🙂

0 Karma

SplunkTrust
SplunkTrust

Try this for your sourcetype definition in props.conf

[DumbApp:ActLogs]
 KV_MODE = xml
 category = Application
 description = Job Logs from DumbApp
 disabled = false
 MAX_EVENTS = 100000
 pulldown_type = true
SHOULD_LINEMERGE=true
BREAK_ONLY_BEFORE=^\<\?xml
MUST_BREAK_AFTER=\<\/Job\>
NO_BINARY_CHECK=true
0 Karma

Contributor

First, thank you for helping!

I put in the suggestions you made here but unfortunately I get the exact same result after | extract reload=t and doing a /debug/refresh.

0 Karma

SplunkTrust
SplunkTrust

This is an index time configuration (event breaking). this will reflect in the new data ingested, no on the data that is already ingested. In fact, you can't update already existing data at all.

0 Karma

Contributor

Sorry, I forgot a step. 🙂 In my situation, I was able to add the can_delete role temporarily to my ID, so I purged the index, and the indexer event data/fishbucket and reindexed it all.

0 Karma

SplunkTrust
SplunkTrust

Can you post one sample events?

0 Karma

Contributor
<Job>
<Tag>J8675309</Tag>
<AppJobID>269</AppJobID>
<Name>MEH</Name>
<ApplicationName>UNKNOWN</ApplicationName>
<ApplicationVersion>0.0</ApplicationVersion>
<Site>OUR_SITE</Site>
<Project>
<P-Survey>FEUD0101</P-Survey>
<P-Stage>PROJECT_T</P-Stage>
<P-Name>PROJECT001</P-Name>
</Project>
<UserID>USERID</UserID>
<Master>MASTERNODE</Master>
<PrintoutFilePath>
/PATH/</PrintoutFilePath>
<Status>Abend</Status>
<PercentDone>0.0</PercentDone>
<CollectionDate>2015-08-21 14:59</CollectionDate>
<StartDate>2015-08-21 14:57</StartDate>
<EndDate>2015-08-21 14:59</EndDate>
<ArchiveDate>2015-08-24 23:55</ArchiveDate>
<AbortDate>2015-08-21 14:59</AbortDate>
<SystemMessage>Job Failed - JOB_FAILED file found</SystemMessage>
<UserComments/>
<ProjectPriority>8</ProjectPriority>
<SequenceGroupName>NOT_USED</SequenceGroupName>
<SequenceGroupNumber>0</SequenceGroupNumber>
<JobPriority/>
<FailedNodeCount>0</FailedNodeCount>
<BaseLine>2014.1ext</BaseLine>
<Installation>2014.1ext</Installation>
<Parameters>
<JobServer>COMPUTED</JobServer>
<TargetDirectory>/PATH/</TargetDirectory>
<PoolOverQuota>NO</PoolOverQuota>
<ResourceLoad>light</ResourceLoad>
</Parameters>
</Job>
0 Karma

Contributor

Also, because of the log format, happens to be line 18 every time, if that sheds any light.

0 Karma