Re: Regex Question for Proxy Logs

NeonFlash · ‎05-18-2012

I want to view all the HTTP GET Requests in the Proxy Logs to any website of the following format:

http://example.com/<format>/welcome.html

here, is as follows:

It consists of a total of 8 characters which may include numbers (0-9), alphabets both, lowercase and upper case (a-z, A-Z)

few examples:

/HXut2jHC/welcome.html
/mK151WbA/welcome.html
/gMsyk6kT/welcome.html

My Splunk Search is as following:

sourcetype="bcoat_proxysg" | rex field=uri_path "(?uri_path between angle brackets /^[a-zA-Z0-9]{8}/welcome.html$)"

here, uri_path is the field in Proxy Logs which will contain the URI Path to which the HTTP Request was sent.

However, this does not seem to work. I think I need to include more conditions in the Regex like:

The format string should appear between the first and second forward slash of the GET Request followed by welcome.html.

Note: Why am I not able to write text between angle brackets?

Thanks.

lguinn2 · ‎05-18-2012

Try

sourcetype="bcoat_proxysg" |
regex uri_path="http://.*?/\w{8}/welcome.html$"

Problems in your search

The rex command creates a new, temporary field. I think you want the regex command, which keeps events that match the pattern, and eliminates events that don't match
Your regular expression seemed not to match the string that you were searching for

BTW, the \w character class includes alphanumeric characters, plus the underscore. If you prefer, you could use

sourcetype="bcoat_proxysg" |
regex uri_path="http://.*?/[A-Za-z0-9]{8}/welcome.html$"

lguinn2 · ‎05-19-2012

I don't think that the caret ^ is going to work if you actually have the http:// as part of the field. I suspect that you could get exactly what you want by using conditional look ahead and/or look behind in your regex. But those things make my head hurt - I'd rather write a custom Splunk command! (And that's not trivial.)

You might take the regex problem to a forum that specializes in regexes or maybe Perl.

NeonFlash · ‎05-19-2012

Thanks. An exact match would be,

"^/[a-zA-Z0-9]{8}/welcome.html$"

However, this would also match something like /shopping/welcome.html, /politics/welcome.html.

Can the regex be modified even further so that it looks for a format which has at least 1 or more character from each Charset, [a-z][A-Z][0-9]. I need to match it in such a way that it has at least 1 or more character from each Character Classes.

Regex Question for Proxy Logs

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Splunk App Dev Quarterly Roundup: AI, Agents, and Innovation!

What’s New in Splunk AI: Volume 02

Value Insights: Now Generally Available in the CMC

Join the Conversation