Hello all,
I'm having some trouble formatting and dealing with multivalued fields.
My use case is as follows:
A sample log for sourcetype A looks like this:
Field Values
Event 1 indicator x.xxx.x.xx
hash
someDomain.com
http://DomainA.com
supermalicious.com
Event 2 indicator someDomain.com
www.domainA.com
someEmailAddress@domain.com
http://helpmepls.com
When I use | eval indicator=mvfilter(match(indicator, "\."))
and |stats values(indicator)
, I receive somewhat of expected results (hashes are now gone and values are deduped across all events), but I still have the issue of having to exclude everything else that's not a domain or a URL.
I was thinking of using something like a URL parser app for Splunk to help with the formatting issues, but for that, I don't think I'm able to get by using |stats values(indicators)
Expected results:
someDomain.com
domainA.com
supermalicious.com
helpmepls.com
I'd appreciate if someone could point me in the correct direction or tell me if this is even possible through Splunk.
Thanks!
Like this:
| makeresults
| eval raw="10.123.4.56,hash,someDomain.com,http://DomainA.com,supermalicious.com someDomain.com,www.domainA.com,someEmailAddress@domain.com,http://helpmepls.com"
| makemv raw
| mvexpand raw
| rename raw AS _raw
| rex max_match=0 "(?<indicator>[^,]+)"
| rename COMMENT AS "Everything above generates sample event data; everything below is your solution"
| rex field=indicator mode=sed "s%^[^:/]+://%% s/^www\.//"
| eval indicator=mvfilter(match(indicator, "\.") AND NOT match(indicator, "(^\d+\.\d+\.\d+\.\d+$)|@"))
| eval indicator=lower(indicator)
| stats values(indicator)
Like this:
| makeresults
| eval raw="10.123.4.56,hash,someDomain.com,http://DomainA.com,supermalicious.com someDomain.com,www.domainA.com,someEmailAddress@domain.com,http://helpmepls.com"
| makemv raw
| mvexpand raw
| rename raw AS _raw
| rex max_match=0 "(?<indicator>[^,]+)"
| rename COMMENT AS "Everything above generates sample event data; everything below is your solution"
| rex field=indicator mode=sed "s%^[^:/]+://%% s/^www\.//"
| eval indicator=mvfilter(match(indicator, "\.") AND NOT match(indicator, "(^\d+\.\d+\.\d+\.\d+$)|@"))
| eval indicator=lower(indicator)
| stats values(indicator)
Brilliant! This works as expected! I'll need to tinker with the regex to also omit IP addresses with specified ports such as123.123.123.2:8080
but once I add this, the provided answer will do exactly what I'm looking for.
Thank you so much!