I noticed there are 2 default sourcetype for apache log. However, we are using a different format in out apache web server. (see Logformat below). I assume I need to use regular expression in transforms.conf. Is that correct? If yes, where can I see the default sample so I can create the right kv fields in transforms.conf. Thanks!
%t %h \"%{Proxy-Remote-User}i\" \"%{User-Agent}i\" %m %H \"%U\" \"%q\" %>s %b %T
I am not sure what you mean by "the default sample." So here is an example of the configuration files that define a customized sourcetype. You can do this for any sort of input that is in a format that Splunk does not already recognize. I tried to do it for your actual custom log, but I am sure I didn't get it exactly right.
I show the host override below, but you may not need it. If you don't, delete it and things will be more efficient. But - if your Apache log will contain information from a variety of web hosts, you must have the override to make sure that Splunk assigns the proper host name to each event in the data.
The apache_custom_fields stanza in transforms.conf is where the field extraction is actually set up. The fields are defined by a regular expression. I hope I got it right, but I might not have, depending on your actual data. I suggest that you take the regular expression below and put it in a regular expression testing tool. (Note the the regular expression is line-wrapped below - there is not actually a newline in the regular expression.) Add a sample of your log file and see if the results make sense. You can try http://gskinner.com/RegExr/ but there are others.
Look in the manuals at the following locations for more details:
http://docs.splunk.com/Documentation/Splunk/latest/Data/Whysourcetypesmatter
http://docs.splunk.com/Documentation/Splunk/latest/Data/Createsourcetypes#Edit_props.conf
http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Addfieldsatsearchtime
inputs.conf
[monitor://pathtoyourlogfiles]
sourcetype=apache_custom
props.conf
[apache_custom]
TRANSFORMS-h1=hostoverride
REPORT-r1=apache_custom_fields
transforms.conf
[apache_custom_fields]
REGEX=] \w+ (\S+) "(.*?)" (OPTIONS|GET|HEAD|POST|PUT|DELETE|TRACE|CONNECT) (HTTP.\S+) (\S+) (\S+) (\d{3}) (\d+) (\d+)
FORMAT=clientip::$1 useragent::$2 method::$3 protocol::$4 url::$5 uri_query::$6 status::$7 bytes::$8 timetaken::$9
[hostoverride]
DEST_KEY = MetaData:Host
REGEX = ] (\w+)
FORMAT = host::$1
I am not sure what you mean by "the default sample." So here is an example of the configuration files that define a customized sourcetype. You can do this for any sort of input that is in a format that Splunk does not already recognize. I tried to do it for your actual custom log, but I am sure I didn't get it exactly right.
I show the host override below, but you may not need it. If you don't, delete it and things will be more efficient. But - if your Apache log will contain information from a variety of web hosts, you must have the override to make sure that Splunk assigns the proper host name to each event in the data.
The apache_custom_fields stanza in transforms.conf is where the field extraction is actually set up. The fields are defined by a regular expression. I hope I got it right, but I might not have, depending on your actual data. I suggest that you take the regular expression below and put it in a regular expression testing tool. (Note the the regular expression is line-wrapped below - there is not actually a newline in the regular expression.) Add a sample of your log file and see if the results make sense. You can try http://gskinner.com/RegExr/ but there are others.
Look in the manuals at the following locations for more details:
http://docs.splunk.com/Documentation/Splunk/latest/Data/Whysourcetypesmatter
http://docs.splunk.com/Documentation/Splunk/latest/Data/Createsourcetypes#Edit_props.conf
http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Addfieldsatsearchtime
inputs.conf
[monitor://pathtoyourlogfiles]
sourcetype=apache_custom
props.conf
[apache_custom]
TRANSFORMS-h1=hostoverride
REPORT-r1=apache_custom_fields
transforms.conf
[apache_custom_fields]
REGEX=] \w+ (\S+) "(.*?)" (OPTIONS|GET|HEAD|POST|PUT|DELETE|TRACE|CONNECT) (HTTP.\S+) (\S+) (\S+) (\d{3}) (\d+) (\d+)
FORMAT=clientip::$1 useragent::$2 method::$3 protocol::$4 url::$5 uri_query::$6 status::$7 bytes::$8 timetaken::$9
[hostoverride]
DEST_KEY = MetaData:Host
REGEX = ] (\w+)
FORMAT = host::$1
I think if you look earlier in transforms.conf, you will see these expressions. They aren't documented that I can find, and they aren't any official flavor of regex that I know. But, they are sort of "character classes" that Splunk uses as a shorthand for the sourcetypes that are predefined within Splunk.
Funny you should ask, I just got an answer to this very question a few days ago. 🙂
But that's why I wrote out the regexes in my original example - I couldn't really tell you how to use this syntax correctly.
Cool. Thanks!
One last question -- Where can I find the definition of the reg strings. e.g. nspaces, sbstring, etc
REGEX = ^[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]
]\s++[nspaces:bytes]?[[al
l:other]]
You can find the default settings for access_combined and other sourcetypes in $SPLUNK_HOME/etc/system/default
You should look specifically at props.conf and transforms.conf
You will find the regular expressions in transforms.conf
However, you should not make your changes in the default directory.
Thank you! I probably didn't ask my question properly. Let me try to rephrase my question. When I installed splunk, I can see 2 sourcetypes for common apache log files -- acceess_common and access_combined_cookie.
Since my apache log format is coutomized, I have to create the regular expression myself.
This part is time consuming and it will be great if I can reuse old transforms.conf.
My sample eventdata
[21/May/2012:11:50:16 -0400] 10.39.208.3 "my-user-id" "libwww-perl/5.77" GET HTTP/1.1 "http://www.amazon.com" "?search-alias%3Daps&field-keywords=ipad+3&sprefix=ipad%2Caps%2C210" 200 495 0