I want to extract a field that has multiple email addresses, each one followed by an IP address, all of which appear at the very end of a MS Windows event. My ultimate goal is to capture all of the email addresses in one field, up to the end of the event, but then remove the IP addresses, so I am left with just the email addresses. My regexes so far will not capture beyond the first "Authorized Recipient" email address, and there could a hundred or more recipient addresses listed, depending on the size of a distro list.
Here's a sample of the event, with the exact formatting for how Splunk displays it in search results:
etc etc
Message=Message Validation Success
This action was requested by blah.blah@blah.com. (-no problem extracting these other values to fields)
Message Subject:
Other info:
blah blah
blah text blah.
Authorized Recipients:
blah1.blah@blah.com ( (- this is an IPv4 address)
blah2.blah@blah.com (
blah3.blah@blah.com (
etc, etc, etc.....
Here's the pertinent portion of my latest regex:
rex "(?m)Authorized\sRecipients:\s+(?P.*)"
...but it only captures the first email address and IP under recipients. I want to capture all of them, regardless how many are listed.
I'm still a regex newbie, but I know the capture should be greedy, up to the end of the event.
Try this
your base search | rex field=_raw "Recipients:\s+(?P<DataPortion>.*)" | rex field=DataPortion "(?P<EMAIL>\S+)\s+\((?P<IPADDRESS>\d+\.\d+\.\d+\.\d+)\)" max_match=0
This gives me the same issue again. I think I get what you suggested...perform another regex on the "DataPortion" field after it has been extracted. But all this gave me was the same data - the first address and IP address only. I still think I need to make the regex greedy enough to just capture all of the email addresses, from the first one to any others that follow, to the end of the event - although the regex you suggest will help afterwards, once I have the capture, to strip the IP address off of the "DataPortion" field, which I actually won't need.....
Use max_match = 0, which will extract multiple values for a regex expression.
rex field=_raw "Recipients:\s+(?P<EMAIL>\S+)\s+\((?P<IPADDRESS>\d+\.\d+\.\d+\.\d+)\)" max_match=0
Could the issue be that the "Recipients: " portion is not repeated more than once? The boundary, after the first recipient, changes from "Recipients: email@blah.com (IPaddress)" to just "email@blah.com (IPaddress)", repeated. Should I capture the first instance, and then look for other instances afterward? This would be conditional, as there may or may not be any additional addressees to follow.
Yes, That would be the reason. Try the somesoni2 method. That should work. Or else try this
rex field=_raw "\s+(?P<Email>\S+@\S+)\s+\((?P<IPADDRESS>\d+\.\d+\.\d+\.\d+)\)"
No joy. The regex is still only capturing the first email recipient as the EMAIL field, even though I'm sending to multiple addresses.