This may not be the best place to ask given my issue isn't technically Splunk related, but hopefully I can get some help from people smarter than me anyway. (?i)(?P<scheme>(?:http|ftp|hxxp)s?(?::...
See more...
This may not be the best place to ask given my issue isn't technically Splunk related, but hopefully I can get some help from people smarter than me anyway. (?i)(?P<scheme>(?:http|ftp|hxxp)s?(?:://|-3A__|%3A%2F%2F))?(?:%[\da-f][\da-f])?(?P<domain>(?:[\p{L}\d\-–]+(?:\.|\[\.\]))+[\p{L}]{2,})(@|%40)?(?:\b| |[[:punct:]]|$) The above regex is a template I'm working from(lol, I'm not nearly good enough to write this). While it's not too hard to read and see how it works, in a nut shell, it matches on the domain of a URL and nothing else. It does this by first looking for the optional beginning 'https://' and storing that in the 'scheme' group. Following that, it parses the following domain. For example, the URL 'https://community.splunk.com/t5/forums/postpage/board-id/splunk-search' would match 'community.splunk.com' My issue is that the way it looks for domains following the 'scheme' group requires it use a TLD(.com, .net, etc). Unfortunately, internal services used by my company don't use a TLD, and this causes the regex not to catch them. I need to change this so it can do this. I want to modify the regex expression above to detect on URLs like: 'https://mysite/resources/rules/123456' wherein the domain would be 'mysite'. I've attempted to do so, but with my limited understanding of how regex really works, my attempts lead to too many matches as shown below. (?i)(?P<scheme>(?:http|ftp|hxxp)s?(?::\/\/|-3A__|%3A%2F%2F))?(?:%[\da-f][\da-f])?(?P<domain>((?:[\p{L}\d\-–]+(?:\.|\[\.\]))+)?[\p{L}]{2,})(@|%40)?(?:\b| |[[:punct:]]|$) I tried to throw in an extra non-capturing group within the named 'domain' ground and make the entire first half of the 'domain' group optional, but it leads to matches beyond the domain. Thank you to whomever may be able to assist. This doesn't feel like it should be such a difficult thing, but it's been vexing me for hours.