I have 3 kinds of devices:
All of them send their log data via syslog to a Splunk Heavy Forwarder (HF) that acts as a syslog collector for all the devices that cannot run a Universal Forwarder.
HF processes the data (sets the sourcetype and index) and forwards it to Splunk Indexers. HF processes and forwards only the data received from device1, device2 or device3 IPs. If anyone else sends anything to HF syslog, that data is dropped (license limits, as well as I would like to have a control over what is being sent to indexers).
Currently I'm filtering stuff based on the device IP, but please do not focus on that. The REGEX filtering could (and most likely will) be done based on something else.
I have the following configuration:
[udp://514] index = nullIndex connection_host = ip disabled = 0
[source::udp:514] TRANSFORMS-010-device1 = device1_sourcetype, device1_index TRANSFORMS-020-device2 = device2_sourcetype, device2_index TRANSFORMS-030-device3 = device3_sourcetype, device3_index TRANSFORMS-999-drop_everything = drop_null_index
[device1_index] SOURCE_KEY = MetaData:Host REGEX = ^host::192\.168\.10\.(1|2)$ DEST_KEY = _MetaData:Index FORMAT = device1_i [device1_sourcetype] SOURCE_KEY = MetaData:Host REGEX = ^host::192\.168\.10\.(1|2)$ DEST_KEY = MetaData:Sourcetype FORMAT = device1_st [device2_index] SOURCE_KEY = MetaData:Host REGEX = ^host::192\.168\.20\.(1|2)$ DEST_KEY = _MetaData:Index FORMAT = device2_i [device2_sourcetype] SOURCE_KEY = MetaData:Host REGEX = ^host::192\.168\.20\.(1|2)$ DEST_KEY = MetaData:Sourcetype FORMAT = device2_st [device3_index] SOURCE_KEY = MetaData:Host REGEX = ^host::192\.168\.30\.(1|2)$ DEST_KEY = _MetaData:Index FORMAT = device3_i [device3_sourcetype] SOURCE_KEY = MetaData:Host REGEX = ^host::192\.168\.30\.(1|2)$ DEST_KEY = MetaData:Sourcetype FORMAT = device3_st [drop_null_index] REGEX = ^nullIndex$ SOURCE_KEY = _MetaData:Index DEST_KEY = queue FORMAT = nullQueue
[tcpout] defaultGroup = splunk_indexers disabled = 0 [tcpout:splunk_indexers] server = splunk-indexers.local:9997 maxQueueSize = 500MB useACK = true disabled = 0
Pretty much all input data is marked as nullIndex right away in inputs, and then in props and transforms the desired devices are re-marked with their respective sourcetypes and indexes, and everything else is dropped.
I am not sure if this is the BEST configuration (I've spent some time trying to get my head around this), but it certainly WORKS for me quite well.
However, the problem is - device1 creates about 50 GB log data / day. That means 50 GB of data goes through seven REGEX filters, even though only the first two apply to device1 data. From the performance perspective, I would like to avoid the other five REGEX filters.
So here's my question: is there any way to tell Splunk, right after finishing
TRANSFORMS-010-device1 line in props.conf, to pretty much be done with other transformations and send the device1 data immediately to Splunk Indexers?
Something similar like "Discard" tilde in RSYSLOG configuration:
:fromhost-ip, isequal, "192.168.10.12" @@(o)syslog.local:6514 & ~ :fromhost-ip, isequal, "192.168.10.13" @@(o)syslog.local:6514 & ~ :fromhost-ip, isequal, "192.168.10.27" @@(o)syslog.local:6514 & ~ :fromhost-ip, isequal, "192.168.10.28" @@(o)syslog.local:6514 & ~
however not to discard the data, but remove it from any further processing other than just sending it to Splunk Indexers.
So my props.conf would look something like this (?) :
[source::udp:514] TRANSFORMS-010-device1 = device1_sourcetype, device1_index, device1_fast_exit TRANSFORMS-020-device2 = device2_sourcetype, device2_index, device2_fast_exit TRANSFORMS-030-device3 = device3_sourcetype, device3_index, device3_fast_exit TRANSFORMS-999-drop_everything = drop_null_index
where device1fastexit would be (pseudo-code) "send the device1 log data immediately to Splunk Indexers and do not process that data with TRANSFORMS-020-device2 and TRANSFORMS-030-device3 lines."
Not sure if there is a discard option like you described. Seems like it would be useful.
I havent tested the following solution but based on the docs I think it might work.
What if you created a tcp monitor for each host you want to accept data from. You can then directly set the index and sourcetype in the monitor stanza. You also get the bonus of no longer needing to send data to the nullQueue
index = device1i
connection_host = ip
disabled = 0
index = device2index
connection_host = ip
disabled = 0
index = device3index
connection_host = ip
disabled = 0
Unfortunately, this would not solve my problem, even though it would definitely work in an environment with a small number of remote clients. The solution you suggest was actually a solution I started with at the beginning, when I had just a couple of network nodes.
The configuration example I used in my post is a small portion of my actual configuration. My environment now has hundreds of client nodes scattered across multiple subnets. And I believe UDP stanza does not allow regular expressions, but I could be wrong.
But as I mentioned in my post, I wouldn't like you to focus on filtering based on IP address (I was expecting someone would suggest the solution you mentioned above). The filtering can and will also be based on a text string in the log message.
But I really appreciate your help. Thank you.
I can't think of a way to do it but if you are concerned about performance (and you are right to be), you can switch to
Heavy Forwarders and pull the performance hit away from the Indexer cluster and redistribute it across your (much larger) Forwarder pool.
Thank you for your response woodcock. That is actually what I have in my Splunk environment. I had it in my post before I posted it here, but I thought it would be a redundant information, so I removed that part. But I guess it might be good to share that as well. So this is what my Splunk environment looks like:
syslog clients -> HF cluster -> Indexers cluster <- SearchHead
- 2 HF nodes with identical configuration managed by Puppet
- HA mode with one vIP managed by keepalived
- 2 Indexer nodes managed by Cluster Master
- Splunk cluster with DNS RR
HF cluster is pretty much a central syslog collector, and HA mode guarantees that it's always present in the network. All the configuration stanzas in my example above are from my HF.
I was hopping there would be something like sendQueue (in addition to existing nullQueue and indexQueue) that would remove log event from any further processing, and would send it straight to the tcpout. Or a workaround that would help me to achieve the same or similar result.
I think you have it pretty much the only/best way right now but I definitely like your idea of Splunk adding a
dispatchQueue that removes from all further processing and dumps directly into
indexQueue. You should definitely request this feature.