Splunk Search

How can I get rid of thousands of automatically created sourcetypes

markgo
Engager

I've had the misfortune of feeding 30K input files from Amazon S3 Cloudfront logs into my live Splunk instance, without specifying a sourcetype.

This has created a serious problem in that it has resulted in thousands of automatically created variants of sourcetype-too-small from the bizarre headers that Amazon likes to use (note that the REAL data does not cause this issue).

As a result, performance has slowed to a crawl.

I've deleted the "bad" events, but is there something I can do about the bad automatically created sourcetypes?

As to why I didn't notice this--it didn't become a problem until the number of sourcetypes grew to a prodigous value. And since my searches excluded bad events, I never noticed the sourcetypes.

MuS
Legend

Hi markgo

I recently fixed that by adding this to my props.conf & transforms.conf:

**props.conf**
[default]
TRANSFORMS-meta = fix_auto_source

**transforms.conf**
[fix_auto_source]
SOURCE_KEY = MetaData:Source
DEST_KEY = MetaData:Source
REGEX = ^(/.*|.:.*)
FORMAT = source::splunktcp://25000

this changes all those automatically created sources to splunktcp://25000.

hope this helps a bit and don't forget to change the regex to match your pattern.

regards

Get Updates on the Splunk Community!

See just what you’ve been missing | Observability tracks at Splunk University

Looking to sharpen your observability skills so you can better understand how to collect and analyze data from ...

Weezer at .conf25? Say it ain’t so!

Hello Splunkers, The countdown to .conf25 is on-and we've just turned up the volume! We're thrilled to ...

How SC4S Makes Suricata Logs Ingestion Simple

Network security monitoring has become increasingly critical for organizations of all sizes. Splunk has ...