Getting Data In

What are the default sourcetypes and how are they determined?

Yancy
Path Finder

Sometimes Splunk sets the sourcetype on an incoming file as breakable_text or too_small. What determines these sourcetypes? Are there other common sourcetypes that Splunk sets?

Tags (2)
1 Solution

hulahoop
Splunk Employee
Splunk Employee

Hi Yancy,

You have several options for configuring sourcetype when configuring a data input.

  1. If a sourcetype is not set, Splunk will attempt to auto-recognize the data format and assign one. This is why you sometimes get breakable_text or too_small as the sourcetype.
  2. Set a manual sourcetype. Name it anything your heart desires.
  3. Choose from a list of sourcetypes already known to Splunk (e.g. syslog, weblogic_stdout, access_combined). This just means you get some configuration out of the box for these sourcetypes, such as field extractions, timestamp recognition, host identification).

The options above are available when configuring a data input from the Manager UI. But what if you want to do something more advanced? For example, if you have a directory full of logs and the logs have several different data formats? Or what if your syslog server is collecting data from multiple sources with different formats?

More advanced sourcetype configuration is detailed here: http://www.splunk.com/base/Documentation/4.0.11/Knowledge/Aboutsourcetypes (The link refers to version 4.0 but concept and configuration are applicable to 3.x and 4.1.)

Why is it important to get the sourcetyping correct? Organizing your data into sensible sourcetypes makes it easier to apply other configuration such as field extractions and lookups, and may also simplify rules for access controls. It will also make for a more powerful and succinct search experience. For example, if you have a repository of web access logs, db2 error logs and syslog, wouldn't it be nice if you could simply search on just db2 error logs, or just syslog? Sourcetyping will allow you to do so.

View solution in original post

hulahoop
Splunk Employee
Splunk Employee

Hi Yancy,

You have several options for configuring sourcetype when configuring a data input.

  1. If a sourcetype is not set, Splunk will attempt to auto-recognize the data format and assign one. This is why you sometimes get breakable_text or too_small as the sourcetype.
  2. Set a manual sourcetype. Name it anything your heart desires.
  3. Choose from a list of sourcetypes already known to Splunk (e.g. syslog, weblogic_stdout, access_combined). This just means you get some configuration out of the box for these sourcetypes, such as field extractions, timestamp recognition, host identification).

The options above are available when configuring a data input from the Manager UI. But what if you want to do something more advanced? For example, if you have a directory full of logs and the logs have several different data formats? Or what if your syslog server is collecting data from multiple sources with different formats?

More advanced sourcetype configuration is detailed here: http://www.splunk.com/base/Documentation/4.0.11/Knowledge/Aboutsourcetypes (The link refers to version 4.0 but concept and configuration are applicable to 3.x and 4.1.)

Why is it important to get the sourcetyping correct? Organizing your data into sensible sourcetypes makes it easier to apply other configuration such as field extractions and lookups, and may also simplify rules for access controls. It will also make for a more powerful and succinct search experience. For example, if you have a repository of web access logs, db2 error logs and syslog, wouldn't it be nice if you could simply search on just db2 error logs, or just syslog? Sourcetyping will allow you to do so.

Get Updates on the Splunk Community!

.conf25 Community Recap

Hello Splunkers, And just like that, .conf25 is in the books! What an incredible few days — full of learning, ...

Splunk App Developers | .conf25 Recap & What’s Next

If you stopped by the Builder Bar at .conf25 this year, thank you! The retro tech beer garden vibes were ...

Congratulations to the 2025-2026 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...