Splunk Search

How can I get rid of thousands of automatically created sourcetypes

markgo
Engager

I've had the misfortune of feeding 30K input files from Amazon S3 Cloudfront logs into my live Splunk instance, without specifying a sourcetype.

This has created a serious problem in that it has resulted in thousands of automatically created variants of sourcetype-too-small from the bizarre headers that Amazon likes to use (note that the REAL data does not cause this issue).

As a result, performance has slowed to a crawl.

I've deleted the "bad" events, but is there something I can do about the bad automatically created sourcetypes?

As to why I didn't notice this--it didn't become a problem until the number of sourcetypes grew to a prodigous value. And since my searches excluded bad events, I never noticed the sourcetypes.

MuS
SplunkTrust
SplunkTrust

Hi markgo

I recently fixed that by adding this to my props.conf & transforms.conf:

**props.conf**
[default]
TRANSFORMS-meta = fix_auto_source

**transforms.conf**
[fix_auto_source]
SOURCE_KEY = MetaData:Source
DEST_KEY = MetaData:Source
REGEX = ^(/.*|.:.*)
FORMAT = source::splunktcp://25000

this changes all those automatically created sources to splunktcp://25000.

hope this helps a bit and don't forget to change the regex to match your pattern.

regards

Get Updates on the Splunk Community!

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

It’s Monday morning, and your phone is buzzing with alert escalations – your customer-facing portal is running ...

What’s New in Splunk Observability – September 2025

What's NewWe are excited to announce the latest enhancements to Splunk Observability, designed to help ITOps ...

Fun with Regular Expression - multiples of nine

Fun with Regular Expression - multiples of nineThis challenge was first posted on Slack #regex channel ...