I am indexing web logs in Splunk and one thing I am trying to do is attempt to match the URI against a list of regexes to categorize the type of request...
index=weblog | replace *wp-login.php* with "WordPress Login" in uri_path | replace *wp-content* with "WordPress Content", *wp-include* with "WordPress Include", *wp-comment* with "WordPress Comment" in uri_path | replace *wp-admin* with "WordPress Admin Access" in uri_path |replace *wpad.dat* with "WebProxy AutoDetection" in uri_path | ...
What I would like to do is add a
request_type field to the events that contains that information. The problem is that not everything is a
* wildcard. Some of the
request_type information I want to capture is more of a regex. For example:
Is there a way to do this via a lookup table? I could do it with an external script, but I seem to run into issues when I have more than a couple hundred things to lookup (I'll see results while the list is small, but then as the list grows, the lookup results start to disappear).
index=weblog | stats count by uri_path | lookup REQUEST_lookup uri_path OUTPUT request_type
If you are only using RegEx for case-(in)sensitivity, you can do this without RegEx by using the
case_sensitive_match = false directive in
transforms.conf for your automatic lookup.
Are the URI's and request types so unique that you actually have to look them up against a list? Can you give some idea as to what the request types are to you and how they're being determined? I'm not sure what that regex example is supposed to be... you might need to clarify a bit.