I am indexing web logs in Splunk and one thing I am trying to do is attempt to match the URI against a list of regexes to categorize the type of request...
index=weblog | replace *wp-login.php* with "WordPress Login" in uri_path | replace *wp-content* with "WordPress Content", *wp-include* with "WordPress Include", *wp-comment* with "WordPress Comment" in uri_path | replace *wp-admin* with "WordPress Admin Access" in uri_path |replace *wpad.dat* with "WebProxy AutoDetection" in uri_path | ...
What I would like to do is add a request_type field
to the events that contains that information. The problem is that not everything is a *
wildcard. Some of the request_type
information I want to capture is more of a regex. For example:
/[Mm][aA4][iIl1][1lL][eE3][rR].php
/[Mm][aA4][iIl1][eE3][1lL][rR].php
Is there a way to do this via a lookup table? I could do it with an external script, but I seem to run into issues when I have more than a couple hundred things to lookup (I'll see results while the list is small, but then as the list grows, the lookup results start to disappear).
index=weblog | stats count by uri_path | lookup REQUEST_lookup uri_path OUTPUT request_type
If you are only using RegEx for case-(in)sensitivity, you can do this without RegEx by using the case_sensitive_match = false
directive in transforms.conf
for your automatic lookup.
Are the URI's and request types so unique that you actually have to look them up against a list? Can you give some idea as to what the request types are to you and how they're being determined? I'm not sure what that regex example is supposed to be... you might need to clarify a bit.