Splunk Search

What do I need to change to make this regex work in Splunk?

rwmilligan
Explorer

I've been fighting with and researching Splunk regex for the past month, and I just cannot seem to get the PCREs being produced by another source to work for me for searching proxy logs in Splunk. I'm assuming there are some syntaxual differences, possibly some missing features, but I haven't been able to find any solid documentation on what those may be.

Can anyone help me get the below working properly in a Splunk search? I've been trying variations on vendor = proxyname | regex = "<expressioin>" but it doesn't work.

^http:\/\/(?!www|forums?)(?:[^\.]+\.[^\.\x2f]+|[^\.]+\.[^\.]+\.(?:[^\.\x2f]+?|[^\.]+\.[^\.]+))\/[^\x3f]+\/(?:index\.php\?PHPSESSID=[^&]+?&action=(?!dlattach)[^&]+?&?|view(?:forum|topic)\.php\?[a-z]=[^&]{1,5}&[a-z]{1,3}=(?![0-9a-f]{32})[0-9a-z\._-]{13,})&?$ 
Tags (2)
0 Karma

hortonew
Builder

In Splunk, the syntax to do regex matching in a search is:

<base search> | rex field=_raw_or_another_field "some regex here (?<extracted_field> regex here for match) some ending regex here" | table extracted_field

Verify that you're utilizing the rex command in this fashion, then we can talk what is or is not matching.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

It would help if you could share some sample data.
regex101.com is a good site for testing regex strings. It is pretty compatible with Splunk regexes.

---
If this reply helps you, Karma would be appreciated.
0 Karma

rwmilligan
Explorer

I'll edit in a sample URL... I HAVE checked it at regex101.com, and it checks out there. But it fails in Splunk.

0 Karma

rwmilligan
Explorer

Ok, it won't let me revise it apparently, here's the URL, with the disclaimer that it was a live Angler EK link a week or so back. I've defanged it for safety reasons, so you'll have to fix the http and .com parts to check it properly. hxxp://nosprivsliikeradan.pfgfoxriver-localguide2[.]com/boards/viewforum.php?f=5x827&sid=7q0as14.5i4x8

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Your regex string matches the URL example, but nothing is extracted because the regex has no capturing groups. What are you attempting to do with the regex?

---
If this reply helps you, Karma would be appreciated.
0 Karma

rwmilligan
Explorer

I want to be able to search the proxy logs for any and all instances of the regex. If there's a log with a URL matching that regex, I want to see it when I run the search.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

So when you enter index=foo | regex "^http:\/\/(?!www|forums?)(?:[^\.]+\.[^\.\x2f]+|[^\.]+\.[^\.]+\.(?:[^\.\x2f]+?|[^\.]+\.[^\.]+))\/[^\x3f]+\/(?:index\.php\?PHPSESSID=[^&]+?&action=(?!dlattach)[^&]+?&?|view(?:forum|topic)\.php\?[a-z]=[^&]{1,5}&[a-z]{1,3}=(?![0-9a-f]{32})[0-9a-z\._-]{13,})&?$", what do you get?

---
If this reply helps you, Karma would be appreciated.
0 Karma

rwmilligan
Explorer

If I do that I get no results returned, but I just figured out the problem. It's the way the proxy logs are stored in Splunk. Which is a single line that is more or less a hash style data structure, with metadata tags and values. So, when I'm searching the regex like that, the ^ and $ characters at the beginning and end of the regex, while good for regex filtering on the proxy, break the Splunk searches since they show up in line surrounded by other garbage.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Removing the anchors was going to be my next suggestion. 😉

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...