I’m trying to create a new source type for the first time. I’ve been at it all morning and I’m pretty sure I must be missing something fundamental.
The data I’m importing is quite a messy log file. Some events contain XML spread across multiple lines, and where I’m having trouble is keeping the multiline events as one event.
To make life easier (I hope), I’ve added a new column (called “blah”) to the beginning of each event so it’s easier to tell when a new event starts. To help troubleshoot, I’ve created this very simple dummy log file:
blah data blah a single line 1 blah multiple lines one multiple lines two multiple lines three blah a single line 2
This should appear as a header with two columns called “blah” and “data”, there are three events:
blah a single line 1
blah multiple lines one
multiple lines two
multiple lines three
blah a single line 2
Here’s what the event type looks like in props.conf:
[fsv] SHOULD_LINEMERGE = True BREAK_ONLY_BEFORE = blah pulldown_type = true INDEXED_EXTRACTIONS = tsv FIELD_DELIMITER=tab HEADER_FIELD_DELIMITER=tab KV_MODE = none category = Structured description = Fancy seperated value format. Set header and other settings in "Delimited Settings"
I’ve also tried various configurations with LINE_BREAKER but everything I’ve tried still results in one event for each line.
This stanza worked for me on you sample data set. I'm not sure, however, if you'll be able to use indexed extractions and get the data in the format you want. I think INDEXED_EXTRACTIONS is done at input time and may then actually skip parsing. It might be easier to just forgo the header in your file, index the data the way you want and then create extractions to pull the fields out of your data at search time.
[my:sourcetype] SHOULD_LINEMERGE = false LINE_BREAKER = ([\r\n]+)(?=blah)
Thanks. In the end I used:
BREAK_ONLY_BEFORE = blah SHOULD_LINEMERGE=true
and removed the INDEXED_EXTRACTIONS. Then used a field extraction to pull out the tab-delimited fields.