I have raw data, I would like to search for domains within the data, output it to a field and then run stats to show a count of each unique domain.
Example of raw data:
"This investigation is really great and we found the suspicious domain google.com"
I would like to:
1. search for domains within raw data and output the domain to a field that I can show in a table (Lets call it "Domain")
2. run stats that show the number of occurrences
So ideally, my finished result would be:
Domain | count |
google.com | 50 |
yahoo.com | 30 |
Any assistance is greatly appreciated, thank you.
Key is how to recognise a domain. You can google for regex to extract domains and get some examples, but this search will show you how to get started
| makeresults
| eval d=split("google.com,abc.net.au,bbc.co.uk,google.com,splunk.com,www.nytimes.com", ",")
| mvexpand d
| rex field=d "(?<domain>(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9])"
| stats count by domain
In your example, use rex field=_raw rather than 'd' in the above.
If you might have more than one domain in your raw data then add the 'max_match=0' to the rex statement