We have a very large environment.. and with Splunk charging by the GB/day, we obviously have an interest in controlling what data goes into Splunk and what doesn't.
For the most part, if someone's host is "spamming" Splunk (sending wayy too much data, bad sourcetypes, etc), I'll just look up who owns that host and work with them from there.
However.. I ran into a problem where a host's Universal Forwarder is configured to identify itself to Splunk as something that is not its proper hostname. It is currently sending about 15+GB/day of garbage syslog data into Splunk.
Aside from blasting out an email to the entire group, how can I find where the data is truly originating from? Is it possible to perhaps get the IP address of where the data is coming from?
Additionally on this topic - Is it possible to control who can connect to my indexers at all? It seems like anyone can set up a forwarder and connect it to our (publicly known) indexer's hostname.
Alright.. Kidnof answered part of my own question. There has to be a better way of doing this... but it worked for now.
I was able to find the culprit using tcpdump.
tcpdump -A -vv -i eth0 port 9997 > /tmp/splunkdump
My indexers are listening on port 9997 for incoming data. This host in particular was sending a lot of data, so I didn't have to wait that long to kill the dump.
After that I scanned the file for the bogus hostname it was reporting, and saw where the data was really originating from.
Still-standing questions are:
Can you elborate on what you define as spam?
Thanks for the SoS app suggestion. Quite helpful!
I still don't have anything for the controlling spam into Splunk aspect though.
Why are you not using SOS aka Splunk on Splunk App? It was built for that purpose.