Getting Data In

What are the default sourcetypes and how are they determined?

Yancy
Path Finder

Sometimes Splunk sets the sourcetype on an incoming file as breakable_text or too_small. What determines these sourcetypes? Are there other common sourcetypes that Splunk sets?

Tags (2)
1 Solution

hulahoop
Splunk Employee
Splunk Employee

Hi Yancy,

You have several options for configuring sourcetype when configuring a data input.

  1. If a sourcetype is not set, Splunk will attempt to auto-recognize the data format and assign one. This is why you sometimes get breakable_text or too_small as the sourcetype.
  2. Set a manual sourcetype. Name it anything your heart desires.
  3. Choose from a list of sourcetypes already known to Splunk (e.g. syslog, weblogic_stdout, access_combined). This just means you get some configuration out of the box for these sourcetypes, such as field extractions, timestamp recognition, host identification).

The options above are available when configuring a data input from the Manager UI. But what if you want to do something more advanced? For example, if you have a directory full of logs and the logs have several different data formats? Or what if your syslog server is collecting data from multiple sources with different formats?

More advanced sourcetype configuration is detailed here: http://www.splunk.com/base/Documentation/4.0.11/Knowledge/Aboutsourcetypes (The link refers to version 4.0 but concept and configuration are applicable to 3.x and 4.1.)

Why is it important to get the sourcetyping correct? Organizing your data into sensible sourcetypes makes it easier to apply other configuration such as field extractions and lookups, and may also simplify rules for access controls. It will also make for a more powerful and succinct search experience. For example, if you have a repository of web access logs, db2 error logs and syslog, wouldn't it be nice if you could simply search on just db2 error logs, or just syslog? Sourcetyping will allow you to do so.

View solution in original post

hulahoop
Splunk Employee
Splunk Employee

Hi Yancy,

You have several options for configuring sourcetype when configuring a data input.

  1. If a sourcetype is not set, Splunk will attempt to auto-recognize the data format and assign one. This is why you sometimes get breakable_text or too_small as the sourcetype.
  2. Set a manual sourcetype. Name it anything your heart desires.
  3. Choose from a list of sourcetypes already known to Splunk (e.g. syslog, weblogic_stdout, access_combined). This just means you get some configuration out of the box for these sourcetypes, such as field extractions, timestamp recognition, host identification).

The options above are available when configuring a data input from the Manager UI. But what if you want to do something more advanced? For example, if you have a directory full of logs and the logs have several different data formats? Or what if your syslog server is collecting data from multiple sources with different formats?

More advanced sourcetype configuration is detailed here: http://www.splunk.com/base/Documentation/4.0.11/Knowledge/Aboutsourcetypes (The link refers to version 4.0 but concept and configuration are applicable to 3.x and 4.1.)

Why is it important to get the sourcetyping correct? Organizing your data into sensible sourcetypes makes it easier to apply other configuration such as field extractions and lookups, and may also simplify rules for access controls. It will also make for a more powerful and succinct search experience. For example, if you have a repository of web access logs, db2 error logs and syslog, wouldn't it be nice if you could simply search on just db2 error logs, or just syslog? Sourcetyping will allow you to do so.

Get Updates on the Splunk Community!

.conf23 Registration is Now Open!

Time to toss the .conf-etti 🎉 —  .conf23 registration is open!   Join us in Las Vegas July 17-20 for ...

Don't wait! Accept the Mission Possible: Splunk Adoption Challenge Now and Win ...

Attention everyone! We have exciting news to share! We are recruiting new members for the Mission Possible: ...

Unify Your SecOps with Splunk Mission Control

In today’s post, I'm excited to share some recent Splunk Mission Control innovations. With Splunk Mission ...