Hi,
I'm running index-time field extractions for a large TXT report. For this particular regex searches, I'm searching and capturing 3 fields, and then using the repeat_match = true flag to crawl the rest of the TXT file.
My goal is to extract data, but also somehow keep the sets of data extracted together but separate from the next set of regex captures. For example:
repeat 0 regex: Title, CCI, FixText
repeat 1 regex: Title, CCI, FixText
repeat 2 regex: Title, CCI, FixText
But I need to keep the repeat0 fields connected somehow, repeat1 fields connected somehow, and repeat2 fields connected somehow; but also separate the repeat0 set from the repeat1 set. In this example, I want to ensure that Title for repeat 0 doesn't end up being attached to the CCI in repeat 1.
In my current rendition, I get all the data I need, but they are inside giant fields, where "CCI" might contain 100 items, and "FixText" might contain 100 items. But I can't seem to figure out how to divide/expand them so that i can ensure that each group has the correct correlated information. The "FixText" field could include 1 line or many lines, so I can't separate those from one another easily after the get grouped.
I would like to note, that I'm ok with expanding these at search time as opposed to index time, but i'm thinking it might be easier reference the fields if they get separated at index time? Maybe I could add a pipe or something to the end of each capture, and then use a delimiter to expand the fields?
Any help is appreciated.
Thank you
Transforms:
[SCAP_FAIL_INFO]
REGEX = Title\s+\:\s(?<scap_fail_title>V.+)[\s\S]+?NIST\sSP\s800\-53\sRev\s4\:\s(?<scap_cci>.+);[\s\S]+?Fix\sText\s+(?<scap_fix_text>[\S\s]*?)?\nSeverity
LOOKAHEAD = 600000
REPEAT_MATCH = true
WRITE_META = true
Hi,
A better approach might be to index each result as a separate event. This allows Splunk to manage the data more efficiently while satisfying your requirement to keep result fields together.
Can you provide a sample of your SCAP report format?