Getting Data In

sourcetype best practices

a212830
Champion

Hi,

I'm looking for some help on sourcetype naming. I have a bunch of logfiles - some apache error logs, some apache access logs, some custom application error logs. I want to give my customers an easy way to search on these logs (there will be dozens of them). Should I use a pretrained source type? Wouldn't that make it more difficult to search on the logs? If I use a custom sourcetype (say "appname"), will Splunk recognize the logfile formats?

Tags (1)
1 Solution

hexx
Splunk Employee
Splunk Employee

The concept of sourcetype was introduced so that a metadata field associated with an event would describe the nature of the data, which typically tells us something about the structure of the data rather than its precise origin. "Where is this data coming from?" is a question best answered with the 'host' and 'source' metadata fields. The sourcetype is rather there to answer "What kind of data is this?".

For that reason, I would not recommend to assign the same sourcetype to access logs and application logs, for example. You are probably better off using a pre-trained sourcetype whenever one is available, such as 'access_common' or 'access_combined' for HTTPD access logs. This will bring the benefit of pre-packaged field extractions, among other things.

Note that most pre-trained sourcetypes are defined in $SPLUNK_HOME/etc/system/default/props.conf.

View solution in original post

hexx
Splunk Employee
Splunk Employee

The concept of sourcetype was introduced so that a metadata field associated with an event would describe the nature of the data, which typically tells us something about the structure of the data rather than its precise origin. "Where is this data coming from?" is a question best answered with the 'host' and 'source' metadata fields. The sourcetype is rather there to answer "What kind of data is this?".

For that reason, I would not recommend to assign the same sourcetype to access logs and application logs, for example. You are probably better off using a pre-trained sourcetype whenever one is available, such as 'access_common' or 'access_combined' for HTTPD access logs. This will bring the benefit of pre-packaged field extractions, among other things.

Note that most pre-trained sourcetypes are defined in $SPLUNK_HOME/etc/system/default/props.conf.

View solution in original post

ChrisG
Splunk Employee
Splunk Employee

I do also recommend reading http://docs.splunk.com/Documentation/Splunk/latest/Data/Whysourcetypesmatter and the topics that follow it.

0 Karma

hexx
Splunk Employee
Splunk Employee

There are many ways to do this, and it really depends on what qualifies the event set that you want your search to return. You can use:

- Wildcards in your search terms
- Eventtypes
- Tags

a212830
Champion

Thanks, this helps me understand the usage better. Still, if I have dozens of logfiles, across multiple hosts, and I want to search them, how would I easily do that? I don't want to type in each host or logfile - that's a lot of work. Is there an alias, or something like that?

0 Karma