I introduced a new sourcetype "access_combined_wperformance" but I cannot get it utilized as "access_combined_wcookie" always wins.
Here is my etc/system/local/props.conf:
########## WEBSERVERS ########## [access_combined_wperformance] pulldown_type = true MAX_TIMESTAMP_LOOKAHEAD = 128 REPORT-access = access-extractions SHOULD_LINEMERGE = False TIME_PREFIX = \[ ########## RULE BASED CONDITIONS ########## [rule::access_combined_wperformance] sourcetype = access_combined_wperformance MORE_THAN_50 = ^\S+ \S+ \S+ \S* ?\[[^\]]+\] "[^"]*" \S+ \S+ \S+ "[^"]*" \d+$ priority = 100
I believe there's a conflict with
etc/system/default/props.conf rule for
[rule::access_combined_wcookie] that (apparently) also matches the data and sets the sourcetype to
access_combined_wcookie. However, I'm not sure how this is possible, since it seems to me the regexes are different enough that if one rule matches, the other can not also match, so I'm not sure if that's what's happening.
See my comment above. The only default rule that sets a sourcetype to
[rule::access_combined_wcookie] sourcetype = access_combined_wcookie MORE_THAN_75 = ^\S+ \S+ \S+ \S* ?\[[^\]]+\] "[^"]*" \S+ \S+(?: \S+)? "[^"]*" "[^"]*"
While it's possible for both rules (yours and the default) to match, is this actually the case with your data, i.e., does your data actually end with two double-quoted fields and one unquoted umeric field? (vs what your regex suggests, which is just a single double-quoted field and then an unquoted numeric field). This is the only way that both
rule:: stanzas could match. Are you certain that your regex is actually matching the data?
Also, would it be easier to use
source:: rather than
rule:: stanzas to specify your sourcetypes?
This is a short example from the access log file:
10.93.192.7 - - [22/Apr/2010:00:00:50 -0700] "GET / " 200 318 "null" "null" 0
As you can see a line always ends with a number. Therefore I though that my regex for "accesscombinedwperformance" is more precise than "accesscombinedwcookie".
Just some thoughts. This is not really an "answer", but more than can will fit into a comment.
How are you testing this, are you just letting splunk pickup new content after a restart? If so, let me just point out that you can do immediate testing with the following command:
splunk test sourcetype /var/log/apache/my_log_file
This will spit out all the props settings applied to this file. (Finding this utility has saved me countless hours of messing around; so I try to advertise it as much as possible.)
Also, there are times where I've had to go into the
$SLUNK_HOME/etc/apps/learned/local/sourcetypes.conf and do some housekeeping. I believe this is because once splunk identifies a sourcetype of a file, it prefers to stick with that same sourcetype unless somethings changes, and changing a
rule entry may not be a strong enough suggestion. Normally this is the behavior that you want, but in this case you may find that deleting a few entries related to your file in question, may be helpful. Also note that the utility I mentioned will possibly create or update an entry in the
learned folder as well.
Sometimes its helpful to come at this from a slightly different perspective. I had a similar issue with my own custom apache longing sourcetype (
vhost_access_combined) that was getting confused with
access_combined, I think. The issue came down to the rule for
access_combined was slightly too lose. So instead of just trying to make a better rule for my
vhost_access_combined sourcetype, I found that I had to instead add an additional rule to the builtin
access_combined to make it more restrictive in the first place. In my situation, I knew that my sourcetype would start with a named virtual host whereas
access_combined starts with the clientip address, so I could add a new rule to
access_combined to make sure that it starts with an IP address and therefore my own logs could then match against my sourcetype:
[rule::access_combined] MORE_THAN_66 = ^\d+\.\d+\.\d+\.\d+[ ]
Also note that I'm using
MORE_THAN_66 and not
MORE_THAN_75 which is used base config. We don't want to replace splunk's builtin definition, only refine it. In my case, I simply added this to my own custom config file. As always, make sure you are making changes like this in a
local folder somewhere, I am not suggesting that you modify the entry in
$SPLUNK_HOME/etc/system/default/props.conf. Again, I'm not proposing this as a solution to your problem, but as a different approach to consider.
I should also point out that while this approach worked for me, I ultimately don't rely on it. I found that I ran into issues when my log files were rotated. The new log was too small for splunk to identify (I think it needs 100 or more line before it attempts rule-based recognition) and our apache volume is't super high, so we didn't have enough events and the result was that the sourcetype was wrong half the time anyways. Perhaps this has been improved or I was doing something wrong, but in any case, I went back to using a simple
[source:/var/log/apache/vhost_access.log] stanza with
sourcetype=vhost_access_combined setup on the forwarder on the apache server, and it solve all these issues quite nicely.
Hope this gives you something to think about.
Awesome trick about "splunk test sourcetype", thank you! It does save time. Still gives me "ccesscombinedwcookie" even after I wiped out "$SLUNK_HOME/etc/apps/learned/local/sourcetypes.conf".