We are looking at Splunk as way to log specific activities on our website.
I think in writing this, I see what i am missing, but cannot figure out where to place it.
From our webserver, we have a universal forwarder sending access_log to Splunk index, "newweb"
I have system/local/transforms.conf with 3 stanzas like - you will see i have attempted two different versions of filtering - the string patterns in the REGEX are parts of URLs we are interested in...
Currently, no requests are being filtered. We want only those urls in the two setparsing stanzas to make it into the index.
[setnull] REGEX = . DEST_KEY = queue FORMAT = nullQueue [setparsing1] REGEX = GET\s/about/people/cord_thomas DEST_KEY = queue FORMAT = indexQueue [setparsing2] REGEX = about/history/company_at_a_glance DEST_KEY = queue FORMAT = indexQueue
I have system/local/props.conf with a single stanza:
[apache_log] TRANSFORMS-set=setnull,setparsing1, setparsing
I see that in no way am I telling splunk to apply the transforms to my index, newweb. Where might i do that?
YOu should have transforms.conf and props.conf on the Indexers instance (a restart/reload will be required after adding). On Universal forwarder side, update inputs.conf entry for this file to specify the sourcetype as [apache_log].
Thank you - you added a bit of insight and I feel I am closer.
Now, i am not seeing ANY data being added to the index.
using tcpdump, i am certain the universal fowarder is still forwarding
i did as you suggest and set sourcetype=apache_log to match what is in props.conf
I wonder how whether possibly my regular expression is wrong. thoughts on ways to validate that? I don't see anything in the splunkd.log file indicating any problems with my conf file, but that may be the case.
As a concrete example, i have a log entry of:
... host.domain.org - [11/Mar/2015:17:22:54 -0400] "GET /pubs/perspectives/PE113.html HTTP/1.1" 200 13968 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 1095) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.76 Safari/537.36" "c4kaqS2x=1; ic=; __utmz=145273911.1421782255.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none); lira-8001-PORTAL-PSJSESSION ...
And I have this as an entry in transforms.conf
REGEX = /pubs/perspectives/PE113
DEST_KEY = queue
FORMAT = indexQueue
i also tried (with escape slashes which don't appear to show up here):
REGEX = GET\s\/pubs\/perspectives\/PE113
still no luck. Ways to debug indexer?
And to be clear somesoni2, i was not being sarcastic, i feel I am headed in the right direction...
I'm not positive but it looks like you're applying the setnull transform first. And since the regex for that stanza is "." then it will match everything...so everything is getting sent to the nullQueue (i.e. getting trashed). Maybe try putting it last in the list? Or possibly in another setting below the first. Maybe something like this?
Also, you may still need to escape the slashes (/) in the regex for your other stanzas with a backslash. For example:
I recommend going to a site like regex101 to verify that your regex matches your data first.
Hope that helps a little.
Thank you both - each of you contributed a part of the answer.
I got it working. It was a combination of needing to understand the role of the sourcetype in the inputs.conf and then getting the right regular expression. In the end, my regex looks like this:
REGEX = "GETs/pubs/research_reports/RR604.html HTTP/1.1"