Splunk Search

Regex Question for Proxy Logs

NeonFlash
Explorer

I want to view all the HTTP GET Requests in the Proxy Logs to any website of the following format:

http://example.com/<format>/welcome.html

here, is as follows:

It consists of a total of 8 characters which may include numbers (0-9), alphabets both, lowercase and upper case (a-z, A-Z)

few examples:

/HXut2jHC/welcome.html
/mK151WbA/welcome.html
/gMsyk6kT/welcome.html

My Splunk Search is as following:

sourcetype="bcoat_proxysg" | rex field=uri_path "(?uri_path between angle brackets /^[a-zA-Z0-9]{8}/welcome.html$)"

here, uri_path is the field in Proxy Logs which will contain the URI Path to which the HTTP Request was sent.

However, this does not seem to work. I think I need to include more conditions in the Regex like:

The format string should appear between the first and second forward slash of the GET Request followed by welcome.html.

Note: Why am I not able to write text between angle brackets?

Thanks.

Tags (1)
0 Karma

lguinn2
Legend

Try

sourcetype="bcoat_proxysg" |
regex uri_path="http://.*?/\w{8}/welcome.html$"

Problems in your search

  • The rex command creates a new, temporary field. I think you want the regex command, which keeps events that match the pattern, and eliminates events that don't match
  • Your regular expression seemed not to match the string that you were searching for

BTW, the \w character class includes alphanumeric characters, plus the underscore. If you prefer, you could use

sourcetype="bcoat_proxysg" |
regex uri_path="http://.*?/[A-Za-z0-9]{8}/welcome.html$"

lguinn2
Legend

I don't think that the caret ^ is going to work if you actually have the http:// as part of the field. I suspect that you could get exactly what you want by using conditional look ahead and/or look behind in your regex. But those things make my head hurt - I'd rather write a custom Splunk command! (And that's not trivial.)

You might take the regex problem to a forum that specializes in regexes or maybe Perl.

0 Karma

NeonFlash
Explorer

Thanks. An exact match would be,

"^/[a-zA-Z0-9]{8}/welcome.html$"

However, this would also match something like /shopping/welcome.html, /politics/welcome.html.

Can the regex be modified even further so that it looks for a format which has at least 1 or more character from each Charset, [a-z][A-Z][0-9]. I need to match it in such a way that it has at least 1 or more character from each Character Classes.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...