I want to view all the HTTP GET Requests in the Proxy Logs to any website of the following format:
http://example.com/<format>/welcome.html
here,
It consists of a total of 8 characters which may include numbers (0-9), alphabets both, lowercase and upper case (a-z, A-Z)
few examples:
/HXut2jHC/welcome.html
/mK151WbA/welcome.html
/gMsyk6kT/welcome.html
My Splunk Search is as following:
sourcetype="bcoat_proxysg" | rex field=uri_path "(?uri_path between angle brackets /^[a-zA-Z0-9]{8}/welcome.html$)"
here, uri_path is the field in Proxy Logs which will contain the URI Path to which the HTTP Request was sent.
However, this does not seem to work. I think I need to include more conditions in the Regex like:
The format string should appear between the first and second forward slash of the GET Request followed by welcome.html.
Note: Why am I not able to write text between angle brackets?
Thanks.
Try
sourcetype="bcoat_proxysg" |
regex uri_path="http://.*?/\w{8}/welcome.html$"
Problems in your search
BTW, the \w character class includes alphanumeric characters, plus the underscore. If you prefer, you could use
sourcetype="bcoat_proxysg" |
regex uri_path="http://.*?/[A-Za-z0-9]{8}/welcome.html$"
I don't think that the caret ^ is going to work if you actually have the http:// as part of the field. I suspect that you could get exactly what you want by using conditional look ahead and/or look behind in your regex. But those things make my head hurt - I'd rather write a custom Splunk command! (And that's not trivial.)
You might take the regex problem to a forum that specializes in regexes or maybe Perl.
Thanks. An exact match would be,
"^/[a-zA-Z0-9]{8}/welcome.html$"
However, this would also match something like /shopping/welcome.html, /politics/welcome.html.
Can the regex be modified even further so that it looks for a format which has at least 1 or more character from each Charset, [a-z][A-Z][0-9]. I need to match it in such a way that it has at least 1 or more character from each Character Classes.