Getting Data In

How to define the source type by the prefix of filename ?

New Member

I have the following log structure. Splunk is configured to monitor /var/logs directory, and the host is defined by path segment #3. How do I define the source type by the prefix (aaa and bbb) of filename ?





Regards, Eric

Tags (1)
0 Karma

Re: How to define the source type by the prefix of filename ?


If you already know the various prefixes you have, you can accomplish this via props.conf entries.

Edit splunk_install_dir/etc/system/local/props.conf, adding sections like the following:

sourcetype = aaa

sourcetype = bbb

Not sure if there is a way to automatically extract from the source via a regex to then populate the sourcetype (but maybe there is).

Hope that helps,



Re: How to define the source type by the prefix of filename ?

Super Champion

This can be done automatically using a transformer. For example, you could use a props.conf entry like this:

sourcetype = my_dynamic_sourcetype

# Place your props.conf settings here:
TRANSFORMS-sourcetype = dynamic_sourcetype

Then in your transforms.conf, put an entry like this:

# Use the non-digits portion of the filename as the sourcetype:  "bbb20100809.log" becomes  "bbb"
SOURCE_KEY = MetaData:Source
DEST_KEY   = MetaData:Sourcetype
REGEX      = (?i)source::.*[/\\](\w+)\d*(\.log)?$
FORMAT     = sourcetype::$1

So basically this approach is setting the sourcetype of all your incoming logs (within the given path) to a temporary source type of "my_dynamic_sourcetype". This is the sourcetype that is used to determine all your index-time settings. Before your event is actually indexed, the event is passed to the "my_dynamic_sourcetype" transformer which updates the "sourcetype::" key to a modified value, which in this case is extracted from the path of your source file. So, either "aaa" or "bbb", in your given example. So you should never see any events matching sourcetype=my_dynamic_sourcetype (unless your transformer is broken), you should simply search for sourcetype=aaa, or sourcetype=bbb within splunk.

Keep in mind that only your settings in your my_dynamic_sourcetype stanza will be used for index time settings. So this means that you cannot make a stanza named [aaa] and give that sourcetype different timestamp format, for example. Now, it is possible to create a [aaa] stanza for field extractions (or any other search-time properties) and that will work fine, but you should really understand the differences between index-time and search-time properties if your going to use this approach.

My basic rule of thumb, is to only use this type of sourcetype renaming transformer trick if (1) all of the log files can use the exact same index-time prop settings (in other words, they all have the same timestamp formats, same line breaking logic, same timezone, same character set, etc.) So these files will all look very similar in format, but may differ in content and meaning. And (2), it's not possible to know all of the possible sourcetypes names ahead of time, but you want each different log name to be stored with a different sourcetype name. So unless both of these two conditions are true, then I would recommend building individual sourcetypes for each new file type you encounter (which I the normal case), but there are times where this transformer-approach is the best option.

Side note:

It looks like your source also contains a timestamp/date within the file names. I've found it very helpful to have splunk rip out that date whenever it's indexing log files. (I don't want to see tons of similarity named log files, all with different dates, I'd rather just see the base name of my the log file within the splunk world.) The solution to this is very similar to the solution shown above, you can add a transformer to strip out the dates with a transforms.conf entry like this:

# Remove a 'YYYYMMDD' style date from the filename
#   jakarta_service_20091007.log --> jakarta_service.log
#   jakarta_service.log_20091007 --> jakarta_service.log
#   jakarta_service.log.20091007 --> jakarta_service.log
#   jakarta_service.log-20091007 --> jakarta_service.log
DEST_KEY   = MetaData:Source
SOURCE_KEY = MetaData:Source
REGEX    = source::(.*)([-._ ]\d{8})(|\D.*)$
FORMAT   = source::$1$3

You would simply add the line TRANFORMS-source-drop-date = source_clean-YYYYMMDD in your [my_dynamic_sourcetype] props.conf entry, or whatever other sourcetype you would like to apply this too.