Getting Data In

Is there a configuration to have Splunk exit from further PROPS / TRANSFORMS processing?

ontkanin
Path Finder

Hi there,

I have 3 kinds of devices:

  • device1 (IP: 192.168.10.12, 192.168.10.13, 192.168.10.27, 192.168.10.28)
  • device2 (IP: 192.168.20.12, 192.168.20.13, 192.168.20.27, 192.168.20.28)
  • device3 (IP: 192.168.30.12, 192.168.30.13, 192.168.30.27, 192.168.30.28)

All of them send their log data via syslog to a Splunk Heavy Forwarder (HF) that acts as a syslog collector for all the devices that cannot run a Universal Forwarder.

HF processes the data (sets the sourcetype and index) and forwards it to Splunk Indexers. HF processes and forwards only the data received from device1, device2 or device3 IPs. If anyone else sends anything to HF syslog, that data is dropped (license limits, as well as I would like to have a control over what is being sent to indexers).

Currently I'm filtering stuff based on the device IP, but please do not focus on that. The REGEX filtering could (and most likely will) be done based on something else.

I have the following configuration:

inputs.conf

[udp://514]
index = nullIndex
connection_host = ip
disabled = 0

props.conf

[source::udp:514]
TRANSFORMS-010-device1 = device1_sourcetype, device1_index
TRANSFORMS-020-device2 = device2_sourcetype, device2_index
TRANSFORMS-030-device3 = device3_sourcetype, device3_index
TRANSFORMS-999-drop_everything = drop_null_index

transforms.conf

[device1_index]
SOURCE_KEY = MetaData:Host
REGEX = ^host::192\.168\.10\.(1[23]|2[78])$
DEST_KEY = _MetaData:Index
FORMAT = device1_i

[device1_sourcetype]
SOURCE_KEY = MetaData:Host
REGEX = ^host::192\.168\.10\.(1[23]|2[78])$
DEST_KEY = MetaData:Sourcetype
FORMAT = device1_st

[device2_index]
SOURCE_KEY = MetaData:Host
REGEX = ^host::192\.168\.20\.(1[23]|2[78])$
DEST_KEY = _MetaData:Index
FORMAT = device2_i

[device2_sourcetype]
SOURCE_KEY = MetaData:Host
REGEX = ^host::192\.168\.20\.(1[23]|2[78])$
DEST_KEY = MetaData:Sourcetype
FORMAT = device2_st

[device3_index]
SOURCE_KEY = MetaData:Host
REGEX = ^host::192\.168\.30\.(1[23]|2[78])$
DEST_KEY = _MetaData:Index
FORMAT = device3_i

[device3_sourcetype]
SOURCE_KEY = MetaData:Host
REGEX = ^host::192\.168\.30\.(1[23]|2[78])$
DEST_KEY = MetaData:Sourcetype
FORMAT = device3_st

[drop_null_index]
REGEX = ^nullIndex$
SOURCE_KEY = _MetaData:Index
DEST_KEY = queue
FORMAT = nullQueue

outputs.conf

[tcpout]
defaultGroup = splunk_indexers
disabled = 0

[tcpout:splunk_indexers]
server = splunk-indexers.local:9997
maxQueueSize = 500MB
useACK = true
disabled = 0

Pretty much all input data is marked as nullIndex right away in inputs, and then in props and transforms the desired devices are re-marked with their respective sourcetypes and indexes, and everything else is dropped.

I am not sure if this is the BEST configuration (I've spent some time trying to get my head around this), but it certainly WORKS for me quite well.

However, the problem is - device1 creates about 50 GB log data / day. That means 50 GB of data goes through seven REGEX filters, even though only the first two apply to device1 data. From the performance perspective, I would like to avoid the other five REGEX filters.

So here's my question: is there any way to tell Splunk, right after finishing TRANSFORMS-010-device1 line in props.conf, to pretty much be done with other transformations and send the device1 data immediately to Splunk Indexers?

Something similar like "Discard" tilde in RSYSLOG configuration:

:fromhost-ip, isequal, "192.168.10.12" @@(o)syslog.local:6514
& ~
:fromhost-ip, isequal, "192.168.10.13" @@(o)syslog.local:6514
& ~
:fromhost-ip, isequal, "192.168.10.27" @@(o)syslog.local:6514
& ~
:fromhost-ip, isequal, "192.168.10.28" @@(o)syslog.local:6514
& ~

however not to discard the data, but remove it from any further processing other than just sending it to Splunk Indexers.

So my props.conf would look something like this (?) :

[source::udp:514]
TRANSFORMS-010-device1 = device1_sourcetype, device1_index, device1_fast_exit
TRANSFORMS-020-device2 = device2_sourcetype, device2_index, device2_fast_exit
TRANSFORMS-030-device3 = device3_sourcetype, device3_index, device3_fast_exit
TRANSFORMS-999-drop_everything = drop_null_index

where device1_fast_exit would be (pseudo-code) "send the device1 log data immediately to Splunk Indexers and do not process that data with TRANSFORMS-020-device2 and TRANSFORMS-030-device3 lines."

woodcock
Esteemed Legend

I can't think of a way to do it but if you are concerned about performance (and you are right to be), you can switch to Heavy Forwarders and pull the performance hit away from the Indexer cluster and redistribute it across your (much larger) Forwarder pool.

0 Karma

ontkanin
Path Finder

Thank you for your response woodcock. That is actually what I have in my Splunk environment. I had it in my post before I posted it here, but I thought it would be a redundant information, so I removed that part. But I guess it might be good to share that as well. So this is what my Splunk environment looks like:

syslog clients -> HF cluster -> Indexers cluster <- SearchHead

HF cluster:
- 2 HF nodes with identical configuration managed by Puppet
- HA mode with one vIP managed by keepalived

Indexers cluster:
- 2 Indexer nodes managed by Cluster Master
- Splunk cluster with DNS RR

HF cluster is pretty much a central syslog collector, and HA mode guarantees that it's always present in the network. All the configuration stanzas in my example above are from my HF.

I was hopping there would be something like sendQueue (in addition to existing nullQueue and indexQueue) that would remove log event from any further processing, and would send it straight to the tcpout. Or a workaround that would help me to achieve the same or similar result.

0 Karma

woodcock
Esteemed Legend

I think you have it pretty much the only/best way right now but I definitely like your idea of Splunk adding a dispatchQueue that removes from all further processing and dumps directly into indexQueue. You should definitely request this feature.

0 Karma

cramasta
Builder

Not sure if there is a discard option like you described. Seems like it would be useful.

I havent tested the following solution but based on the docs I think it might work.

What if you created a tcp monitor for each host you want to accept data from. You can then directly set the index and sourcetype in the monitor stanza. You also get the bonus of no longer needing to send data to the nullQueue

[udp://192.168.10.12:514]
index = device1_i
sourcetype=device1_st
connection_host = ip
disabled = 0

[udp://192.168.20.12:514]
index = device2_index
sourcetype=device2_st
connection_host = ip
disabled = 0

[udp://192.168.30.12:514]
index = device3_index
sourcetype=device3_st
connection_host = ip
disabled = 0

0 Karma

ontkanin
Path Finder

Unfortunately, this would not solve my problem, even though it would definitely work in an environment with a small number of remote clients. The solution you suggest was actually a solution I started with at the beginning, when I had just a couple of network nodes.

The configuration example I used in my post is a small portion of my actual configuration. My environment now has hundreds of client nodes scattered across multiple subnets. And I believe UDP stanza does not allow regular expressions, but I could be wrong.

But as I mentioned in my post, I wouldn't like you to focus on filtering based on IP address (I was expecting someone would suggest the solution you mentioned above). The filtering can and will also be based on a text string in the log message.

But I really appreciate your help. Thank you.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...