I noticed that we have > 2200 sources listed (and growing) and researching the matter seems to indicate that I can use a transformer based approach to prevent this from continuing to occur. But this raises a couple of questions:
1: What is the downside to having "zillions" of source files listed. Is it a performance consideration, or just a convenience issue from the standpoint of the user being able to select a specific source file?
2: Even though one can correct the issue going forward with a transformer based approach so that the source files won't continue to proliferate, what can be done to collapse the existing 2200 displayed source files, or to remove that clutter from the Sources window on the Search screen?
In general I don't see a performance problem with having thousands of distinct sources. If things get excessive, say one distinct source per event times eleventy gazillion, things may get messy of course.
Concerning user convenience, you may be better off identifying a set of sources by their sourcetype and timerange rather than the particular source - that's depending on your specific use cases of course.
For example, a customer of ours runs large WAS clusters where the source field of most logs contains cluster name, node name, and so on - but a large set of sources share the same sourcetype. If you want them all you just specify that.
You could rewrite the source field of course, but whether that makes sense again depends on your environment. If rewriting means you get a 1:1 relationship between source and sourcetype then it'd be fairly pointless.
By "clutter from the Sources window on the Search screen" you are referring to the sources panel in the Summary (dashboard_live)? Given a large enough environment, that list is pretty useless for starting your searches most of the time, I'd recommend typing ahead straight away instead.
One entirely different thing you could consider to clean up the Summary page is to split things into different indexes, and setting the default index(es) for each user to what he uses most of the time. Then the Summary will only list sources from that particular index.
Thanks much. Guess I won't worry about all the source files then. I was worried that it was reading all the source file names to create that Summary sources panel and that that would cause unnecessary delay.
I suppose we need to delete the index to clean the data. While forwarding the data we have to explicitly define source/sourcetype to avoid the confusion. But the source name is important. So we can use blacklist/whitelist to forward the required files only.