Splunk Search

sourcetype precedence

yzubarev
Explorer

Greetings,

I introduced a new sourcetype "access_combined_wperformance" but I cannot get it utilized as "access_combined_wcookie" always wins.

Here is my etc/system/local/props.conf:

########## WEBSERVERS ##########

[access_combined_wperformance]
pulldown_type = true
MAX_TIMESTAMP_LOOKAHEAD = 128
REPORT-access = access-extractions
SHOULD_LINEMERGE = False
TIME_PREFIX = \[

########## RULE BASED CONDITIONS ##########

[rule::access_combined_wperformance]
sourcetype = access_combined_wperformance
MORE_THAN_50 = ^\S+ \S+ \S+ \S* ?\[[^\]]+\] "[^"]*" \S+ \S+ \S+ "[^"]*" \d+$
priority = 100
Tags (1)
3 Solutions

gkanapathy
Splunk Employee
Splunk Employee

See my comment above. The only default rule that sets a sourcetype to access_combined_wcookie is:

[rule::access_combined_wcookie]
sourcetype = access_combined_wcookie
MORE_THAN_75   = ^\S+ \S+ \S+ \S* ?\[[^\]]+\] "[^"]*" \S+ \S+(?: \S+)? "[^"]*" "[^"]*"

While it's possible for both rules (yours and the default) to match, is this actually the case with your data, i.e., does your data actually end with two double-quoted fields and one unquoted umeric field? (vs what your regex suggests, which is just a single double-quoted field and then an unquoted numeric field). This is the only way that both rule:: stanzas could match. Are you certain that your regex is actually matching the data?

Also, would it be easier to use source:: rather than rule:: stanzas to specify your sourcetypes?

View solution in original post

0 Karma

Lowell
Super Champion

Just some thoughts. This is not really an "answer", but more than can will fit into a comment.

How are you testing this, are you just letting splunk pickup new content after a restart? If so, let me just point out that you can do immediate testing with the following command:

splunk test sourcetype /var/log/apache/my_log_file

This will spit out all the props settings applied to this file. (Finding this utility has saved me countless hours of messing around; so I try to advertise it as much as possible.)

Also, there are times where I've had to go into the $SLUNK_HOME/etc/apps/learned/local/sourcetypes.conf and do some housekeeping. I believe this is because once splunk identifies a sourcetype of a file, it prefers to stick with that same sourcetype unless somethings changes, and changing a rule entry may not be a strong enough suggestion. Normally this is the behavior that you want, but in this case you may find that deleting a few entries related to your file in question, may be helpful. Also note that the utility I mentioned will possibly create or update an entry in the learned folder as well.

Additional thought:

Sometimes its helpful to come at this from a slightly different perspective. I had a similar issue with my own custom apache longing sourcetype (vhost_access_combined) that was getting confused with access_combined, I think. The issue came down to the rule for access_combined was slightly too lose. So instead of just trying to make a better rule for my vhost_access_combined sourcetype, I found that I had to instead add an additional rule to the builtin access_combined to make it more restrictive in the first place. In my situation, I knew that my sourcetype would start with a named virtual host whereas access_combined starts with the clientip address, so I could add a new rule to access_combined to make sure that it starts with an IP address and therefore my own logs could then match against my sourcetype:

[rule::access_combined]
MORE_THAN_66 = ^\d+\.\d+\.\d+\.\d+[ ]

Also note that I'm using MORE_THAN_66 and not MORE_THAN_75 which is used base config. We don't want to replace splunk's builtin definition, only refine it. In my case, I simply added this to my own custom config file. As always, make sure you are making changes like this in a local folder somewhere, I am not suggesting that you modify the entry in $SPLUNK_HOME/etc/system/default/props.conf. Again, I'm not proposing this as a solution to your problem, but as a different approach to consider.

I should also point out that while this approach worked for me, I ultimately don't rely on it. I found that I ran into issues when my log files were rotated. The new log was too small for splunk to identify (I think it needs 100 or more line before it attempts rule-based recognition) and our apache volume is't super high, so we didn't have enough events and the result was that the sourcetype was wrong half the time anyways. Perhaps this has been improved or I was doing something wrong, but in any case, I went back to using a simple [source:/var/log/apache/vhost_access.log] stanza with sourcetype=vhost_access_combined setup on the forwarder on the apache server, and it solve all these issues quite nicely.

Hope this gives you something to think about.

View solution in original post

jrodman
Splunk Employee
Splunk Employee

I suspect they both match and the cookie rule wins on being sorted earlier lexically.

If it's not essential to have the match performed on content, you could create a path-based assignment (source stanza that assigns sourcetype) or an input layer assignment.

View solution in original post

0 Karma

yzubarev
Explorer

Thank you everyone for your answers! All of you were right: "access_combined_wcookie" rule was generic enough and that's why it was always picked. I tightened it up and now "access_combined_wperformance" is used where it should.

0 Karma

jrodman
Splunk Employee
Splunk Employee

I suspect they both match and the cookie rule wins on being sorted earlier lexically.

If it's not essential to have the match performed on content, you could create a path-based assignment (source stanza that assigns sourcetype) or an input layer assignment.

0 Karma

Lowell
Super Champion

Just some thoughts. This is not really an "answer", but more than can will fit into a comment.

How are you testing this, are you just letting splunk pickup new content after a restart? If so, let me just point out that you can do immediate testing with the following command:

splunk test sourcetype /var/log/apache/my_log_file

This will spit out all the props settings applied to this file. (Finding this utility has saved me countless hours of messing around; so I try to advertise it as much as possible.)

Also, there are times where I've had to go into the $SLUNK_HOME/etc/apps/learned/local/sourcetypes.conf and do some housekeeping. I believe this is because once splunk identifies a sourcetype of a file, it prefers to stick with that same sourcetype unless somethings changes, and changing a rule entry may not be a strong enough suggestion. Normally this is the behavior that you want, but in this case you may find that deleting a few entries related to your file in question, may be helpful. Also note that the utility I mentioned will possibly create or update an entry in the learned folder as well.

Additional thought:

Sometimes its helpful to come at this from a slightly different perspective. I had a similar issue with my own custom apache longing sourcetype (vhost_access_combined) that was getting confused with access_combined, I think. The issue came down to the rule for access_combined was slightly too lose. So instead of just trying to make a better rule for my vhost_access_combined sourcetype, I found that I had to instead add an additional rule to the builtin access_combined to make it more restrictive in the first place. In my situation, I knew that my sourcetype would start with a named virtual host whereas access_combined starts with the clientip address, so I could add a new rule to access_combined to make sure that it starts with an IP address and therefore my own logs could then match against my sourcetype:

[rule::access_combined]
MORE_THAN_66 = ^\d+\.\d+\.\d+\.\d+[ ]

Also note that I'm using MORE_THAN_66 and not MORE_THAN_75 which is used base config. We don't want to replace splunk's builtin definition, only refine it. In my case, I simply added this to my own custom config file. As always, make sure you are making changes like this in a local folder somewhere, I am not suggesting that you modify the entry in $SPLUNK_HOME/etc/system/default/props.conf. Again, I'm not proposing this as a solution to your problem, but as a different approach to consider.

I should also point out that while this approach worked for me, I ultimately don't rely on it. I found that I ran into issues when my log files were rotated. The new log was too small for splunk to identify (I think it needs 100 or more line before it attempts rule-based recognition) and our apache volume is't super high, so we didn't have enough events and the result was that the sourcetype was wrong half the time anyways. Perhaps this has been improved or I was doing something wrong, but in any case, I went back to using a simple [source:/var/log/apache/vhost_access.log] stanza with sourcetype=vhost_access_combined setup on the forwarder on the apache server, and it solve all these issues quite nicely.

Hope this gives you something to think about.

jrodman
Splunk Employee
Splunk Employee

The rules for the 'default' sourcetypes are in etc/system/default/sourcetypes.conf You could override one as disabled in local if you need to?

0 Karma

yzubarev
Explorer

Awesome trick about "splunk test sourcetype", thank you! It does save time. Still gives me "ccess_combined_wcookie" even after I wiped out "$SLUNK_HOME/etc/apps/learned/local/sourcetypes.conf".

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

See my comment above. The only default rule that sets a sourcetype to access_combined_wcookie is:

[rule::access_combined_wcookie]
sourcetype = access_combined_wcookie
MORE_THAN_75   = ^\S+ \S+ \S+ \S* ?\[[^\]]+\] "[^"]*" \S+ \S+(?: \S+)? "[^"]*" "[^"]*"

While it's possible for both rules (yours and the default) to match, is this actually the case with your data, i.e., does your data actually end with two double-quoted fields and one unquoted umeric field? (vs what your regex suggests, which is just a single double-quoted field and then an unquoted numeric field). This is the only way that both rule:: stanzas could match. Are you certain that your regex is actually matching the data?

Also, would it be easier to use source:: rather than rule:: stanzas to specify your sourcetypes?

0 Karma

yzubarev
Explorer

By the way I tested my regex and did match my data.

0 Karma

yzubarev
Explorer

Well, you get the point 🙂

0 Karma

yzubarev
Explorer

Hm.. No line breaks. Let me try again. Here is an example:

10.93.192.7 - - [22/Apr/2010:00:00:50 -0700] "GET / " 200 318 "null" "null" 0

0 Karma

yzubarev
Explorer

This is a short example from the access log file:

10.93.192.7 - - [22/Apr/2010:00:00:50 -0700] "GET / " 200 318 "null" "null" 0

As you can see a line always ends with a number. Therefore I though that my regex for "access_combined_wperformance" is more precise than "access_combined_wcookie".

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

I believe there's a conflict with etc/system/default/props.conf rule for [rule::access_combined_wcookie] that (apparently) also matches the data and sets the sourcetype to access_combined_wcookie. However, I'm not sure how this is possible, since it seems to me the regexes are different enough that if one rule matches, the other can not also match, so I'm not sure if that's what's happening.

0 Karma

Dan
Splunk Employee
Splunk Employee

What do you mean by wins?

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...