Getting Data In

What are the default sourcetypes and how are they determined?

Yancy
Path Finder

Sometimes Splunk sets the sourcetype on an incoming file as breakable_text or too_small. What determines these sourcetypes? Are there other common sourcetypes that Splunk sets?

Tags (2)
1 Solution

hulahoop
Splunk Employee
Splunk Employee

Hi Yancy,

You have several options for configuring sourcetype when configuring a data input.

  1. If a sourcetype is not set, Splunk will attempt to auto-recognize the data format and assign one. This is why you sometimes get breakable_text or too_small as the sourcetype.
  2. Set a manual sourcetype. Name it anything your heart desires.
  3. Choose from a list of sourcetypes already known to Splunk (e.g. syslog, weblogic_stdout, access_combined). This just means you get some configuration out of the box for these sourcetypes, such as field extractions, timestamp recognition, host identification).

The options above are available when configuring a data input from the Manager UI. But what if you want to do something more advanced? For example, if you have a directory full of logs and the logs have several different data formats? Or what if your syslog server is collecting data from multiple sources with different formats?

More advanced sourcetype configuration is detailed here: http://www.splunk.com/base/Documentation/4.0.11/Knowledge/Aboutsourcetypes (The link refers to version 4.0 but concept and configuration are applicable to 3.x and 4.1.)

Why is it important to get the sourcetyping correct? Organizing your data into sensible sourcetypes makes it easier to apply other configuration such as field extractions and lookups, and may also simplify rules for access controls. It will also make for a more powerful and succinct search experience. For example, if you have a repository of web access logs, db2 error logs and syslog, wouldn't it be nice if you could simply search on just db2 error logs, or just syslog? Sourcetyping will allow you to do so.

View solution in original post

hulahoop
Splunk Employee
Splunk Employee

Hi Yancy,

You have several options for configuring sourcetype when configuring a data input.

  1. If a sourcetype is not set, Splunk will attempt to auto-recognize the data format and assign one. This is why you sometimes get breakable_text or too_small as the sourcetype.
  2. Set a manual sourcetype. Name it anything your heart desires.
  3. Choose from a list of sourcetypes already known to Splunk (e.g. syslog, weblogic_stdout, access_combined). This just means you get some configuration out of the box for these sourcetypes, such as field extractions, timestamp recognition, host identification).

The options above are available when configuring a data input from the Manager UI. But what if you want to do something more advanced? For example, if you have a directory full of logs and the logs have several different data formats? Or what if your syslog server is collecting data from multiple sources with different formats?

More advanced sourcetype configuration is detailed here: http://www.splunk.com/base/Documentation/4.0.11/Knowledge/Aboutsourcetypes (The link refers to version 4.0 but concept and configuration are applicable to 3.x and 4.1.)

Why is it important to get the sourcetyping correct? Organizing your data into sensible sourcetypes makes it easier to apply other configuration such as field extractions and lookups, and may also simplify rules for access controls. It will also make for a more powerful and succinct search experience. For example, if you have a repository of web access logs, db2 error logs and syslog, wouldn't it be nice if you could simply search on just db2 error logs, or just syslog? Sourcetyping will allow you to do so.

Get Updates on the Splunk Community!

Dashboards: Hiding charts while search is being executed and other uses for tokens

There are a couple of features of SimpleXML / Classic dashboards that can be used to enhance the user ...

Splunk Observability Cloud's AI Assistant in Action Series: Explaining Metrics and ...

This is the fourth post in the Splunk Observability Cloud’s AI Assistant in Action series that digs into how ...

Brains, Bytes, and Boston: Learn from the Best at .conf25

When you think of Boston, you might picture colonial charm, world-class universities, or even the crack of a ...