I have events in xml format. Some of the events include this header:
xml version="1.0" encoding="UTF-8" standalone="yes"?><event_change_list xmlns="http://www.hp.com/2009/software/opr/data_model" total_size="682475" page_size="20" start_index="1" type="urn:x-hp:2009:software:data_model:opr:type:event:event_change_list" version="1.0">
I want to get rid of this header from any events that contain it. I have tried including the following in a stanza in props.conf
PREAMBLE_REGEX=xml version(.+)\>\<event_change_list(.+)\>
but doing caused Splunk not to index any events of this sourcetype. Any help on this is appreciated.
Not knowing the rest of your config - or what your other events look like, I'd still guess that the regex is a bit too greedy. I have not experimented with the PREAMBLE_REGEX setting, but I guess that this might actually happen before linemerging(?), or is it perhaps a special purpose TRANSFORM à la nullQueue
. Somebody more knowledgeable may have that answer.
If you change your regex to the following, it should at least not be too greedy.
PREAMBLE_REGEX=xml version[^>]+><event_change_list[^>]+>
/K
I think < > are special characters. Try escaping them maybe? xml version[^\>]+\>\
Thanks a bunch. At first I thought the regex you provided didn't work at all (but I was just being stupid). It still doesn't work for PREAMBLE_REGEX but I discovered I was being to greedy so maybe I can use your answer in transforms.conf. Thanks again!
oops, typo... fixed it now.