Splunk allows you to assign host, source, and sourcetype (metadata) to all indexed events. These can be setup statically or dynamically in
inputs.conf and they can be changed by transformers at index time. What techniques can you use to verify that accuracy of these metadata fields? More specifically, if an event comes in and the metadata seems wrong, what techniques can be used to identify and resolve this kind of a problem?
Every deployment is different, so a made a more specific scenario for the purpose of illustrating the problem. This focuses on the accuracy of the
Let's say a Splunk system consists of 1000 servers each with a Universal Forwarders. All events are forwarded to one of 4 indexers using automatic load balancing, using SSL. One of the servers (containing a UF) has been compromised and the security team suspects an issue because of an increase in bogus anomalies being reported by Splunk. The Splunk admin noticed that data from new hosts has recently started coming in for hosts which don't yet have a UF deployed. Weird. Upon further investigation, the only events for these hosts are "bogus" and frequently trigger false-positives. The Splunk admin team suspect that a UF is lying about it's "host" and is intentionally sending bogus data, presumably to coverup true intentions by causing distractions. Your a Splunk admin, how do you track down which of the 1000 UFs has been compromised? (You have Splunk admin and root access to the Splunk indexers, but not to the other servers on the network.)
As a complement to other ideas: use the
acceptFrom attribute in inputs.conf on the indexers. Each indexer will have a
[splunktcp://<port>] stanza. Add
acceptFrom to this stanza to limit the forwarder connections. I am not sure which protocol layer is used here, but I think it is TCP and not Splunk settings which determine this.
acceptFrom = <network_acl> ... * Lists a set of networks or addresses to accept data from. These rules are separated by commas or spaces * Each rule can be in the following forms: * 1. A single IPv4 or IPv6 address (examples: "10.1.2.3", "fe80::4a3") * 2. A CIDR block of addresses (examples: "10/8", "fe80:1234/32") * 3. A DNS name, possibly with a '*' used as a wildcard (examples: "myhost.example.com", "*.splunk.com") * 4. A single '*' which matches anything * Entries can also be prefixed with '!' to cause the rule to reject the connection. Rules are applied in order, and the first one to match is used. For example, "!10.1/16, *" will allow connections from everywhere except the 10.1.*.* network. * Defaults to "*" (accept from anywhere)
Of course, you need to be careful that this list of accepted connections corresponds to the list of forwarders in your forwarder management (or deployment server) configuration. Otherwise, you could configure forwarders to send data to indexers, while the indexers are not configured to accept connections from the forwarders - yikes!
Yeah, this would have to be at the TCP layer controlling the inbound connections to the splunktcp port. Fundamentally I don't think this would stop a forwarder from lying about itself, would it? And the same is true with
requireClientCert. Once a forwarder is connected (after coming from an approved network source and/or having the right cryptographic signature), it can say whatever it wants about who it is. (Much like email or snail mail can be given a bogus "from" address.)
deploy an app having a scripted input that outputs true hostname. look for mismatches of host field and host name in message of script-based input source.
deploy an app having scripted input that retroactively searches for log entry in question (message, time, type, etc) and then returns "host found! with actional computername"
Nice idea. Be sure to create a serverclass of "everyone" to make sure that every client gets the app.
Although it would be possible for a rogue forwarder to avoid this app simply by turning off the deployment client configuration.