Currently, I have a field called pluginText which is the following (italicized words are anonymized to what they represent):
<plugin_output>
The following software are installed on the remote host:
Vendor Software [version versionnumber] [installed on date]
...
...
...
</plugin_output>
I wish to extract out Vendor, Software and versionnumber to separate fields and require a rex to do so. I am unfamiliar with using rex on this type of list, so I was hoping someone could point me in the right direction
I would highly recommend the website https://regex101.com/ as it allows you to see previews of your regex extractions as you write them.
This regex might work:
on the remote host:\n\n(?<Vendor>[^\[\s]*)\s(?<Software>[^\[\s]*)\s*\[version\s(?<Version>[^\]]*)\]\s\[installed on (?<Date>[^\]]*)\]
@marnallHas better eyes than me and spotted the mix of italics and non-italics in the bracketed text. The final regex likely will be a combination of our suggestions.
This regular expression works in regex101.com using the sample data.
| rex field=pluginText "host:\s+(?<vendorSoftware>.+?)\s+\[(?<version>[^\]]+)] \[(?<installedDate>[^\]]+)"
It looks for the "host" introductory text and skips the spaces which follow. The next set of text (terminated by whitespace before a left bracket) is the software name. The text in the two sets of brackets become the version and date, respectively.