Splunk Search

Extracting fields from a large text field

psomeshwar
Path Finder

Currently, I have a field called pluginText which is the following (italicized words are anonymized to what they represent):

<plugin_output>
The following software are installed on the remote host:

Vendor Software  [version versionnumber] [installed on date]
...
...
...
</plugin_output>

I wish to extract out Vendor, Software and versionnumber to separate fields and require a rex to do so. I am unfamiliar with using rex on this type of list, so I was hoping someone could point me in the right direction

Labels (2)
0 Karma

marnall
Motivator

I would highly recommend the website https://regex101.com/ as it allows you to see previews of your regex extractions as you write them. 

This regex might work:

on the remote host:\n\n(?<Vendor>[^\[\s]*)\s(?<Software>[^\[\s]*)\s*\[version\s(?<Version>[^\]]*)\]\s\[installed on (?<Date>[^\]]*)\]

 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

@marnallHas better eyes than me and spotted the mix of italics and non-italics in the bracketed text.  The final regex likely will be a combination of our suggestions.

---
If this reply helps you, Karma would be appreciated.
0 Karma

richgalloway
SplunkTrust
SplunkTrust

This regular expression works in regex101.com using the sample data.

| rex field=pluginText "host:\s+(?<vendorSoftware>.+?)\s+\[(?<version>[^\]]+)] \[(?<installedDate>[^\]]+)"

It looks for the "host" introductory text and skips the spaces which follow.  The next set of text (terminated by whitespace before a left bracket) is the software name.  The text in the two sets of brackets become the version and date, respectively.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

New Year, New Changes for Splunk Certifications

As we embrace a new year, we’re making a small but important update to the Splunk Certification ...

Stay Connected: Your Guide to January Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...

[Puzzles] Solve, Learn, Repeat: Reprocessing XML into Fixed-Length Events

This challenge was first posted on Slack #puzzles channelFor a previous puzzle, I needed a set of fixed-length ...