Getting Data In

What might be the performance impact of doing host-based _TCP_ROUTING on a large scale?

schose
Builder

Hi all,

We are receiving syslog data from a bunch of devicestypes. Syslog server has a universal forwarder and is sending data to iHF which is routing data to indexclusterA or indexclusterB.

Currently wo have a configuration for a few hosts like this:
props.conf:

[host::(host0|host1|host2|host3)]
TRANSFORMS-routing=routtootherindexercluster

transforms.conf

[routtootherindexercluster]
REGEX=.+
DEST_KEY=_TCP_ROUTING
FORMAT=otherindexercluster

The question is if this is scaling for something like 10.000+ devices with in sum 30MB/s? What might be the performance impact?

Would it be more performant to use a large single OR stanza (are there any limits for characters) or multiple stanzas?
Do you might have any ideas to test this out? What would be a good way to simulate ~30MB/s syslogdata from 10.000+ devices?

Best regards,

Andreas

0 Karma

FrankVl
Ultra Champion

What exactly are you trying to accomplish? Is it really a split, or is there also data that needs to go to both?

If it is really a split: why not set up 2 syslog servers, each with a UF forwarding to the respective index cluster and make the hosts send to the correct syslog server? If it is difficult to manage the config on the hosts like that, you could even think of a network load balancer that does this split for you, routing based on source host.

Or: use your current syslog server to split the data into 2 folders, based on hostname (especially recent syslog-ng versions have a very nice filtering method where you can point to a file containing a list of hosts) and then you can process it more easily in splunk.

0 Karma

schose
Builder

Hi,

Actually for UF devices we are setting a datarouting meta field on the inputs which is checked on iHF and the iHF is routing the event in one or multiple idx clusters. So you can decide if data from a certain input should go to idxcluster0 and/or idxcluster1 and/or staging and/or dev environment.

For syslog it's another story as you typically have one input app for n hosts as you doing kind of [monitor://logs/cisco_ios//.log] input thing.

For us it would be the easier way to filter at the central iHF instances than on 30+ syslog servers which have all their individual config or have a certain (port) configuration on the syslog clients. At this central instance we can automate quite well with just generating the app. We are just haven't done this at scale and get some information if somebody other out there might can share some experience.

Best Regards,

Andreas

0 Karma

FrankVl
Ultra Champion

Got it, you didn't mention the 30 syslog servers, I can imagine that changes the game a bit 🙂

I have no experience with extensive routing rules on HFs, so I'll leave that to someone else to answer then.

0 Karma

schose
Builder

I mean the number doesn't count if you have them well automated. But today every of these hosts having multiple ports for multiple device types. I would feel more comfortable with filtering at a central place than filtering on every syslog-port configuration, writing the output to a different directory and create a new input app.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...