We are receiving syslog data from a bunch of devicestypes. Syslog server has a universal forwarder and is sending data to iHF which is routing data to indexclusterA or indexclusterB.
Currently wo have a configuration for a few hosts like this:
[routtootherindexercluster] REGEX=.+ DEST_KEY=_TCP_ROUTING FORMAT=otherindexercluster
The question is if this is scaling for something like 10.000+ devices with in sum 30MB/s? What might be the performance impact?
Would it be more performant to use a large single OR stanza (are there any limits for characters) or multiple stanzas?
Do you might have any ideas to test this out? What would be a good way to simulate ~30MB/s syslogdata from 10.000+ devices?
What exactly are you trying to accomplish? Is it really a split, or is there also data that needs to go to both?
If it is really a split: why not set up 2 syslog servers, each with a UF forwarding to the respective index cluster and make the hosts send to the correct syslog server? If it is difficult to manage the config on the hosts like that, you could even think of a network load balancer that does this split for you, routing based on source host.
Or: use your current syslog server to split the data into 2 folders, based on hostname (especially recent syslog-ng versions have a very nice filtering method where you can point to a file containing a list of hosts) and then you can process it more easily in splunk.
Actually for UF devices we are setting a datarouting meta field on the inputs which is checked on iHF and the iHF is routing the event in one or multiple idx clusters. So you can decide if data from a certain input should go to idxcluster0 and/or idxcluster1 and/or staging and/or dev environment.
For syslog it's another story as you typically have one input app for n hosts as you doing kind of [monitor://logs/cisco_ios//.log] input thing.
For us it would be the easier way to filter at the central iHF instances than on 30+ syslog servers which have all their individual config or have a certain (port) configuration on the syslog clients. At this central instance we can automate quite well with just generating the app. We are just haven't done this at scale and get some information if somebody other out there might can share some experience.
Got it, you didn't mention the 30 syslog servers, I can imagine that changes the game a bit 🙂
I have no experience with extensive routing rules on HFs, so I'll leave that to someone else to answer then.
I mean the number doesn't count if you have them well automated. But today every of these hosts having multiple ports for multiple device types. I would feel more comfortable with filtering at a central place than filtering on every syslog-port configuration, writing the output to a different directory and create a new input app.