What is the best architecture for a huge amount of...

sunrise · ‎02-18-2016

Hi splunkers,

I'm think about the best architecture for a huge amount of syslog data.
At first, I used rsyslog in rhel with single Splunk server. But syslog is written very slowly when udp syslog data is about 2GB per day in total and affordable cpu cores and RAM, sometimes Splunk indexed it mistakenly when rsyslog stopped to write the middle of event. I use time_before_close = 300 in an inputs.conf.
It works temporary, but I concerns about happening again when syslog data transferred to this Splunk server increase .

So now I'm thinking about the best architecture for it. We have several options.

Using heavy forwarder instead of rsyslog
Tuning rsyslog parameters

I don't know about the difference of performance between rsyslog and Splunk tcpinput. And in this case, what parameter in rsyslog does increase performance ?
If you know anything about it, please let me know. Thank you very much.

s2_splunk · ‎02-18-2016

Best practice for ingesting syslog data is to send syslog to a syslog-ng infrastructure, which can easily break out the various syslog event streams into files/directories, which makes proper sourcetyping for later searching in Splunk a heck of a lot easier. It also allows you to restart your Splunk environment when needed (upgrades, config changes that require restart, outages) without losing any of your syslog data. A UF installed on a syslog-ng server (or two, for redundancy) can then be easily configured to monitor the various files and directories, assigning meaningful sourcetypes to the syslog data, as Jeremiah points out in his comments. It also provides you with load balancing, restart capabilities, compression, SSL encryption and throttling capabilities.

If you decide to receive it directly into Splunk, you will have to provide props/transforms to break the syslog stream into the various sourcetypes, because a syslog listener (TCP/UDP port) in Splunk has a 1:1 relationship to a sourcetype. If you don't properly sourcetype your data, everything will end up under a single sourcetype (often inappropriately named "syslog"), which makes searching for specific data more complex than it needs to be. This is assuming that you have various kinds of sources emitting syslog data, i.e. switches, proxies, firewalls, etc.
You also expose yourself to data loss during Splunk restarts as there is no architectural component that can buffer the data while your Splunk system is restarting.

BTW, I wouldn't consider 2GB/day a huge amount of syslog data by any stretch, unless I misunderstood what you were saying.

Jeremiah · ‎02-18-2016

I think you have the right architecture. We do something similar, but we use syslog-ng. We write syslog streams to files, and then read those files using a universal forwarder. We use a mix of physical and servers and VMs, but even our VMs are capable of processing many GB of logs per day.

2 GB/day of syslog data is not very much. That is less than 25 KB/sec, which means you are processing maybe 50 500 byte messages per second? What kind of systems are you using? Are you using VMs? Whats your storage system? Is this system performing other tasks as well?

sunrise · ‎02-18-2016

Thank you for your reply, Jeremiah.
I'm also using a deployment server which managed about 50 UFs.
The total of logs including these UFs are 8GB per day.
My Splunk is on VMs and has SCSI based local storage.
It is single splunk server which works as a search head, indexer, and deployment server.
Search performances of Splunk is not so bad, so I'm not thinking that disk IO is a root cause.

Jeremiah · ‎02-18-2016

Ah ok, so you have those services all running on the same host? Still your volume is pretty low. Any metrics you can look at from the hypervisor to see VM performance? What are the specs for the VM?

What is the best architecture for a huge amount of syslog data ?

Building Reliable Asset and Identity Frameworks in Splunk ES

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

Automatic Discovery Part 3: Practical Use Cases

Are you a member of the Splunk Community?

What is the best architecture for a huge amount of syslog data ?

Building Reliable Asset and Identity Frameworks in Splunk ES

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

Automatic Discovery Part 3: Practical Use Cases