Splunk Search

Regex Question for Proxy Logs

NeonFlash
Explorer

I want to view all the HTTP GET Requests in the Proxy Logs to any website of the following format:

http://example.com/<format>/welcome.html

here, is as follows:

It consists of a total of 8 characters which may include numbers (0-9), alphabets both, lowercase and upper case (a-z, A-Z)

few examples:

/HXut2jHC/welcome.html
/mK151WbA/welcome.html
/gMsyk6kT/welcome.html

My Splunk Search is as following:

sourcetype="bcoat_proxysg" | rex field=uri_path "(?uri_path between angle brackets /^[a-zA-Z0-9]{8}/welcome.html$)"

here, uri_path is the field in Proxy Logs which will contain the URI Path to which the HTTP Request was sent.

However, this does not seem to work. I think I need to include more conditions in the Regex like:

The format string should appear between the first and second forward slash of the GET Request followed by welcome.html.

Note: Why am I not able to write text between angle brackets?

Thanks.

Tags (1)
0 Karma

lguinn2
Legend

Try

sourcetype="bcoat_proxysg" |
regex uri_path="http://.*?/\w{8}/welcome.html$"

Problems in your search

  • The rex command creates a new, temporary field. I think you want the regex command, which keeps events that match the pattern, and eliminates events that don't match
  • Your regular expression seemed not to match the string that you were searching for

BTW, the \w character class includes alphanumeric characters, plus the underscore. If you prefer, you could use

sourcetype="bcoat_proxysg" |
regex uri_path="http://.*?/[A-Za-z0-9]{8}/welcome.html$"

lguinn2
Legend

I don't think that the caret ^ is going to work if you actually have the http:// as part of the field. I suspect that you could get exactly what you want by using conditional look ahead and/or look behind in your regex. But those things make my head hurt - I'd rather write a custom Splunk command! (And that's not trivial.)

You might take the regex problem to a forum that specializes in regexes or maybe Perl.

0 Karma

NeonFlash
Explorer

Thanks. An exact match would be,

"^/[a-zA-Z0-9]{8}/welcome.html$"

However, this would also match something like /shopping/welcome.html, /politics/welcome.html.

Can the regex be modified even further so that it looks for a format which has at least 1 or more character from each Charset, [a-z][A-Z][0-9]. I need to match it in such a way that it has at least 1 or more character from each Character Classes.

0 Karma
.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!