I have an xml file that I've tried to index but have had a very difficult time with it. I just want a new event made after every tag.. In props.conf, I have the following:
[bsm_event_changes]
#KV_MODE=xml
TIME_PREFIX = <time_created>
MAX_TIMESTAMP_LOOKAHEAD = 1000
TRUNCATE = 0
MAX_EVENTS = 40
#BREAK_ONLY_BEFORE = (<event_change type)=([a-zA-Z0-9"-://#=_. ]*)>
MUST_BREAK_AFTER=</event_change>
BREAK_ONLY_BEFORE_DATE = False
SHOULD_LINEMERGE = True
Note that I have tried setting KV_MODE=xml (but that made everything worse for some reason. I've also tried both my BREAK_ONLY_BEFORE and MUST_BREAK_AFTER lines but neither have done the trick for me. Anyway, with this in props.conf (and yes I've restarted Splunk after editing props.conf), I'm still getting the following in Splunk (basically one big long event that contains many actual events):
xml version="1.0" encoding="UTF-8" standalone="yes"?>
The following is better. Note that <
is special character in regular expressions, so that is one reason why it might not have worked.
[bsm_event_changes]
TIME_PREFIX = \<time_created\>
MAX_TIMESTAMP_LOOKAHEAD = 100
TRUNCATE = 0
MAX_EVENTS = 40
MUST_BREAK_AFTER=\</event_change\>
SHOULD_LINEMERGE = True
However, you have another, more serious problem. When you use MUST_BREAK_AFTER
or BREAK_ONLY_BEFORE
, Splunk breaks on the line boundary, not in the middle of the line. It looks like your events should break in the middle of the line. So the following may work better for you (I haven't tried this, so I am not sure):
[bsm_event_changes]
TIME_PREFIX = \<time_created\>
MAX_TIMESTAMP_LOOKAHEAD = 100
TRUNCATE = 0
MAX_EVENTS = 40
LINE_BREAKER=\</event_change\>(.*?)\<event_change\>
SHOULD_LINEMERGE = False
Also, FWIW, MAX_TIMESTAMP_LOOKAHEAD
counts from the TIME_PREFIX
, not the beginning of the event, so I cut it back to a more reasonable size.
Finally - you may already know this, but just a reminder: when you update props.conf
, the new parsing rules will apply only to new data as it is received. Existing data will not be changed. So you might want to delete the old data or clean the index...
The following is better. Note that <
is special character in regular expressions, so that is one reason why it might not have worked.
[bsm_event_changes]
TIME_PREFIX = \<time_created\>
MAX_TIMESTAMP_LOOKAHEAD = 100
TRUNCATE = 0
MAX_EVENTS = 40
MUST_BREAK_AFTER=\</event_change\>
SHOULD_LINEMERGE = True
However, you have another, more serious problem. When you use MUST_BREAK_AFTER
or BREAK_ONLY_BEFORE
, Splunk breaks on the line boundary, not in the middle of the line. It looks like your events should break in the middle of the line. So the following may work better for you (I haven't tried this, so I am not sure):
[bsm_event_changes]
TIME_PREFIX = \<time_created\>
MAX_TIMESTAMP_LOOKAHEAD = 100
TRUNCATE = 0
MAX_EVENTS = 40
LINE_BREAKER=\</event_change\>(.*?)\<event_change\>
SHOULD_LINEMERGE = False
Also, FWIW, MAX_TIMESTAMP_LOOKAHEAD
counts from the TIME_PREFIX
, not the beginning of the event, so I cut it back to a more reasonable size.
Finally - you may already know this, but just a reminder: when you update props.conf
, the new parsing rules will apply only to new data as it is received. Existing data will not be changed. So you might want to delete the old data or clean the index...
Actually, I think I figured out my error! thanks!
Thanks. I didn't know that < is a special character so that helps. However, I'm still getting the same output. I have been cleaning out the index that is designated for these events every time I try something different in props.conf so it's not that I'm just seeing "old" events. But can you please explain with examples maybe what the difference b/w LINE_BREAKER and MUST_BREAK_AFTER? I'm just confused about why I couldn't use MUST_BREAK_AFTER=</event_change> to tell Splunk that I want a new event starting after the keyword "".
try this regex ^\s+