Greetings,
I introduced a new sourcetype "access_combined_wperformance" but I cannot get it utilized as "access_combined_wcookie" always wins.
Here is my etc/system/local/props.conf:
########## WEBSERVERS ##########
[access_combined_wperformance]
pulldown_type = true
MAX_TIMESTAMP_LOOKAHEAD = 128
REPORT-access = access-extractions
SHOULD_LINEMERGE = False
TIME_PREFIX = \[
########## RULE BASED CONDITIONS ##########
[rule::access_combined_wperformance]
sourcetype = access_combined_wperformance
MORE_THAN_50 = ^\S+ \S+ \S+ \S* ?\[[^\]]+\] "[^"]*" \S+ \S+ \S+ "[^"]*" \d+$
priority = 100
See my comment above. The only default rule that sets a sourcetype to access_combined_wcookie
is:
[rule::access_combined_wcookie]
sourcetype = access_combined_wcookie
MORE_THAN_75 = ^\S+ \S+ \S+ \S* ?\[[^\]]+\] "[^"]*" \S+ \S+(?: \S+)? "[^"]*" "[^"]*"
While it's possible for both rules (yours and the default) to match, is this actually the case with your data, i.e., does your data actually end with two double-quoted fields and one unquoted umeric field? (vs what your regex suggests, which is just a single double-quoted field and then an unquoted numeric field). This is the only way that both rule::
stanzas could match. Are you certain that your regex is actually matching the data?
Also, would it be easier to use source::
rather than rule::
stanzas to specify your sourcetypes?
Just some thoughts. This is not really an "answer", but more than can will fit into a comment.
How are you testing this, are you just letting splunk pickup new content after a restart? If so, let me just point out that you can do immediate testing with the following command:
splunk test sourcetype /var/log/apache/my_log_file
This will spit out all the props settings applied to this file. (Finding this utility has saved me countless hours of messing around; so I try to advertise it as much as possible.)
Also, there are times where I've had to go into the $SLUNK_HOME/etc/apps/learned/local/sourcetypes.conf
and do some housekeeping. I believe this is because once splunk identifies a sourcetype of a file, it prefers to stick with that same sourcetype unless somethings changes, and changing a rule
entry may not be a strong enough suggestion. Normally this is the behavior that you want, but in this case you may find that deleting a few entries related to your file in question, may be helpful. Also note that the utility I mentioned will possibly create or update an entry in the learned
folder as well.
Additional thought:
Sometimes its helpful to come at this from a slightly different perspective. I had a similar issue with my own custom apache longing sourcetype (vhost_access_combined
) that was getting confused with access_combined
, I think. The issue came down to the rule for access_combined
was slightly too lose. So instead of just trying to make a better rule for my vhost_access_combined
sourcetype, I found that I had to instead add an additional rule to the builtin access_combined
to make it more restrictive in the first place. In my situation, I knew that my sourcetype would start with a named virtual host whereas access_combined
starts with the clientip address, so I could add a new rule to access_combined
to make sure that it starts with an IP address and therefore my own logs could then match against my sourcetype:
[rule::access_combined] MORE_THAN_66 = ^\d+\.\d+\.\d+\.\d+[ ]
Also note that I'm using MORE_THAN_66
and not MORE_THAN_75
which is used base config. We don't want to replace splunk's builtin definition, only refine it. In my case, I simply added this to my own custom config file. As always, make sure you are making changes like this in a local
folder somewhere, I am not suggesting that you modify the entry in $SPLUNK_HOME/etc/system/default/props.conf
. Again, I'm not proposing this as a solution to your problem, but as a different approach to consider.
I should also point out that while this approach worked for me, I ultimately don't rely on it. I found that I ran into issues when my log files were rotated. The new log was too small for splunk to identify (I think it needs 100 or more line before it attempts rule-based recognition) and our apache volume is't super high, so we didn't have enough events and the result was that the sourcetype was wrong half the time anyways. Perhaps this has been improved or I was doing something wrong, but in any case, I went back to using a simple [source:/var/log/apache/vhost_access.log]
stanza with sourcetype=vhost_access_combined
setup on the forwarder on the apache server, and it solve all these issues quite nicely.
Hope this gives you something to think about.
I suspect they both match and the cookie rule wins on being sorted earlier lexically.
If it's not essential to have the match performed on content, you could create a path-based assignment (source stanza that assigns sourcetype) or an input layer assignment.
Thank you everyone for your answers! All of you were right: "access_combined_wcookie" rule was generic enough and that's why it was always picked. I tightened it up and now "access_combined_wperformance" is used where it should.
I suspect they both match and the cookie rule wins on being sorted earlier lexically.
If it's not essential to have the match performed on content, you could create a path-based assignment (source stanza that assigns sourcetype) or an input layer assignment.
Just some thoughts. This is not really an "answer", but more than can will fit into a comment.
How are you testing this, are you just letting splunk pickup new content after a restart? If so, let me just point out that you can do immediate testing with the following command:
splunk test sourcetype /var/log/apache/my_log_file
This will spit out all the props settings applied to this file. (Finding this utility has saved me countless hours of messing around; so I try to advertise it as much as possible.)
Also, there are times where I've had to go into the $SLUNK_HOME/etc/apps/learned/local/sourcetypes.conf
and do some housekeeping. I believe this is because once splunk identifies a sourcetype of a file, it prefers to stick with that same sourcetype unless somethings changes, and changing a rule
entry may not be a strong enough suggestion. Normally this is the behavior that you want, but in this case you may find that deleting a few entries related to your file in question, may be helpful. Also note that the utility I mentioned will possibly create or update an entry in the learned
folder as well.
Additional thought:
Sometimes its helpful to come at this from a slightly different perspective. I had a similar issue with my own custom apache longing sourcetype (vhost_access_combined
) that was getting confused with access_combined
, I think. The issue came down to the rule for access_combined
was slightly too lose. So instead of just trying to make a better rule for my vhost_access_combined
sourcetype, I found that I had to instead add an additional rule to the builtin access_combined
to make it more restrictive in the first place. In my situation, I knew that my sourcetype would start with a named virtual host whereas access_combined
starts with the clientip address, so I could add a new rule to access_combined
to make sure that it starts with an IP address and therefore my own logs could then match against my sourcetype:
[rule::access_combined] MORE_THAN_66 = ^\d+\.\d+\.\d+\.\d+[ ]
Also note that I'm using MORE_THAN_66
and not MORE_THAN_75
which is used base config. We don't want to replace splunk's builtin definition, only refine it. In my case, I simply added this to my own custom config file. As always, make sure you are making changes like this in a local
folder somewhere, I am not suggesting that you modify the entry in $SPLUNK_HOME/etc/system/default/props.conf
. Again, I'm not proposing this as a solution to your problem, but as a different approach to consider.
I should also point out that while this approach worked for me, I ultimately don't rely on it. I found that I ran into issues when my log files were rotated. The new log was too small for splunk to identify (I think it needs 100 or more line before it attempts rule-based recognition) and our apache volume is't super high, so we didn't have enough events and the result was that the sourcetype was wrong half the time anyways. Perhaps this has been improved or I was doing something wrong, but in any case, I went back to using a simple [source:/var/log/apache/vhost_access.log]
stanza with sourcetype=vhost_access_combined
setup on the forwarder on the apache server, and it solve all these issues quite nicely.
Hope this gives you something to think about.
The rules for the 'default' sourcetypes are in etc/system/default/sourcetypes.conf You could override one as disabled in local if you need to?
Awesome trick about "splunk test sourcetype", thank you! It does save time. Still gives me "ccess_combined_wcookie" even after I wiped out "$SLUNK_HOME/etc/apps/learned/local/sourcetypes.conf".
See my comment above. The only default rule that sets a sourcetype to access_combined_wcookie
is:
[rule::access_combined_wcookie]
sourcetype = access_combined_wcookie
MORE_THAN_75 = ^\S+ \S+ \S+ \S* ?\[[^\]]+\] "[^"]*" \S+ \S+(?: \S+)? "[^"]*" "[^"]*"
While it's possible for both rules (yours and the default) to match, is this actually the case with your data, i.e., does your data actually end with two double-quoted fields and one unquoted umeric field? (vs what your regex suggests, which is just a single double-quoted field and then an unquoted numeric field). This is the only way that both rule::
stanzas could match. Are you certain that your regex is actually matching the data?
Also, would it be easier to use source::
rather than rule::
stanzas to specify your sourcetypes?
By the way I tested my regex and did match my data.
Well, you get the point 🙂
Hm.. No line breaks. Let me try again. Here is an example:
10.93.192.7 - - [22/Apr/2010:00:00:50 -0700] "GET / " 200 318 "null" "null" 0
This is a short example from the access log file:
10.93.192.7 - - [22/Apr/2010:00:00:50 -0700] "GET / " 200 318 "null" "null" 0
As you can see a line always ends with a number. Therefore I though that my regex for "access_combined_wperformance" is more precise than "access_combined_wcookie".
I believe there's a conflict with etc/system/default/props.conf
rule for [rule::access_combined_wcookie]
that (apparently) also matches the data and sets the sourcetype to access_combined_wcookie
. However, I'm not sure how this is possible, since it seems to me the regexes are different enough that if one rule matches, the other can not also match, so I'm not sure if that's what's happening.
What do you mean by wins?