I've seen several XML break examples, but none yet which quite matched this issue.
I have an app that writes a file per log in XML format. it begins with the xml header, <?xml and ends with </Job> and these can be very large XML logs.
My props.conf looks like this:
[DumbApp:ActLogs]
KV_MODE = xml
BREAK_ONLY_BEFPRE = (<?xml)
category = Application
description = Job Logs from DumbApp
disabled = false
MAX_EVENTS = 100000
pulldown_type = true
SHOULD_LINEMERGE = false
But for every log it breaks after <PercentComplete> and then each XML stanza in that log after that is treated as a different event. I want it not to break at all basically, and treat each log file as one solid event. Every combination of linemerge settings, or setting break_only_before to something that would encompass the whole file, beginning to end, has not worked.
I'm running splunk 6.2.3.
Okay, so you've got some config mismatch. BREAK_ONLY_BEFORE
is only applicable if SHOULD_LINEMERGE=true
, which it isn't. When you have that set to false
you want to use LINE_BREAKER
to tell it how to break events. Also, both BREAK_ONLY_BEFORE
and LINE_BREAKER
take regex expressions, so what you have would not match what you intend. I would start with something like this:
[DumbApp:ActLogs]
KV_MODE = xml
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)(?:<\?xml)
MAX_EVENTS = 100000
If these XML segments are one per file, then another option is to set your LINE_BREAKER
to something that will never appear.
You will also want to tell splunk where to find the timestamp in your event. I'm not sure which of the different time values you want to use, so I'll just say you want to set the following: TIME_PREFIX
, TIME_FORMAT
, and MAX_TIMESTAMP_LOOKAHEAD
Finally, depending on how these logs are written, you may run into other issues:
[BATCH://...]
inputs. This will ingest the file and delete it when done. Keep in mind it is destructive, so may not fit your use case.😞 no such luck.. This is a very odd issue. Especially since using the 6.2 add-data tool, it breaks the event perfectly, evn when the event I used to build the sourcetype interactively is the very same event that got indexed!
Where is your props.conf file?
It is in the application which I am doing the searching from, in this case /etc/apps/HPCI/local/props.conf
Couple more questions then:
...| delete
in a search, then you're not actually deleting anything, just making it invisible. Actually cleaning an index happens on the command line while splunk is stopped. ( $SPLUNK_HOME/bin/splunk clean eventdata -index $INDEX
)| extract reload=t
is for search time extractions and does not apply to index-time configurations. Did you restart Splunk after updating your props.conf?In short, to test this you need to do the following:
On your indexer:
- Update your props.conf
- stop splunk
- splunk clean eventdata
from the command line. (This drops EVERYTHING in an index, mind)
- start splunk
On your forwarder:
- stop splunk
- clean fish bucket
- start splunk
I really Appreciate all of your help!
Our environment is rather large. We have eight different indexers across two physical locations. I am setting the configuration entries on the search head but verifying that the bundle replicated to the indexer and that the stanza reflects what is on the search head.
I was doing the delete command so based on that perhaps I was just unhiding the events when they were reindexed. I will try again soon issuing the clean command on the index itself and see if I get different results. Thanks again for everything and I will let you know
Not sure what you're referring to by saying the "bundle replicated" as that can refer to a couple of things, not all of which affect indexing.
On your indexers, run the following: $SPLUNK_HOME/bin/splunk btool props list $STANZA_NAME
Check the output and make sure it reflects the current state of your props.conf file. I suspect it doesn't.
The delete command simply makes items invisible to search. (Doesn't free up disk space or anything) However, it doesn't prevent the data from being re-indexed. So if you clean your fishbucket on your forwarder, that'll re-ingest the data. My hunch is that your props changes are happening on the search head, which isn't involved in the indexing process at all. When you run searches, a 'search bundle' is pushed to the indexers, but that is just for searching, it isn't involved in the indexing pipeline.
What I would really recommend is to stop testing this live. You're eating up license and disk space to do so. Spin up a local install of splunk, put your props there, restart, and one-shot the file. If it doesn't work, clean the index and try again. Once you have a working stanza, then you put it on your real indexers and start ingesting data.
So I tried both approaches you suggested:
the first with linebreaker to a string that would never ever show up in this data, and the last with the regex you described above. Rinsed and repeated the same effort:
1) purged index
2) cleaned forwarder event data/fishbucket
3) | extract reload=t and /debug/refresh
4) waited for data to be resent
the events still come back in breaking at line 18, , for every event going back on logs with mod times over a year old which definitely aren't being written to any longer.
This almost feels like a bug, because I am definitely breaking other xml with more complexity and haven't experienced this issue.
One more thing to try: add TRUNCATE=0
to your props.
Thanks for this. Trying it now.
with regard to timestamping, I am going on modtime because the directory I'm pulling from is an archival directory, so the logs will have definitely been completed and moved by the application in order for me to read them. I opted to do this because I was worried about the exact issue you were talking about. 🙂
Try this for your sourcetype definition in props.conf
[DumbApp:ActLogs]
KV_MODE = xml
category = Application
description = Job Logs from DumbApp
disabled = false
MAX_EVENTS = 100000
pulldown_type = true
SHOULD_LINEMERGE=true
BREAK_ONLY_BEFORE=^\<\?xml
MUST_BREAK_AFTER=\<\/Job\>
NO_BINARY_CHECK=true
First, thank you for helping!
I put in the suggestions you made here but unfortunately I get the exact same result after | extract reload=t and doing a /debug/refresh.
This is an index time configuration (event breaking). this will reflect in the new data ingested, no on the data that is already ingested. In fact, you can't update already existing data at all.
Sorry, I forgot a step. 🙂 In my situation, I was able to add the can_delete role temporarily to my ID, so I purged the index, and the indexer event data/fishbucket and reindexed it all.
Can you post one sample events?
<Job>
<Tag>J8675309</Tag>
<AppJobID>269</AppJobID>
<Name>MEH</Name>
<ApplicationName>UNKNOWN</ApplicationName>
<ApplicationVersion>0.0</ApplicationVersion>
<Site>OUR_SITE</Site>
<Project>
<P-Survey>FEUD0101</P-Survey>
<P-Stage>PROJECT_T</P-Stage>
<P-Name>PROJECT001</P-Name>
</Project>
<UserID>USERID</UserID>
<Master>MASTERNODE</Master>
<PrintoutFilePath>
/PATH/</PrintoutFilePath>
<Status>Abend</Status>
<PercentDone>0.0</PercentDone>
<CollectionDate>2015-08-21 14:59</CollectionDate>
<StartDate>2015-08-21 14:57</StartDate>
<EndDate>2015-08-21 14:59</EndDate>
<ArchiveDate>2015-08-24 23:55</ArchiveDate>
<AbortDate>2015-08-21 14:59</AbortDate>
<SystemMessage>Job Failed - JOB_FAILED file found</SystemMessage>
<UserComments/>
<ProjectPriority>8</ProjectPriority>
<SequenceGroupName>NOT_USED</SequenceGroupName>
<SequenceGroupNumber>0</SequenceGroupNumber>
<JobPriority/>
<FailedNodeCount>0</FailedNodeCount>
<BaseLine>2014.1ext</BaseLine>
<Installation>2014.1ext</Installation>
<Parameters>
<JobServer>COMPUTED</JobServer>
<TargetDirectory>/PATH/</TargetDirectory>
<PoolOverQuota>NO</PoolOverQuota>
<ResourceLoad>light</ResourceLoad>
</Parameters>
</Job>
Also, because of the log format, happens to be line 18 every time, if that sheds any light.