Splunk Dev

Split data from different hosts comming from syslog to various index

lakromani
Builder

We have today a 250GB/day Splunk Enterprice lisens and are growing.

Every system that uses UF agents are easy to handle and gets their data inn to various inexes.
Problem is Syslog data. Today its a big mess and I need help to sort they inn to different indexes and not just a big syslog iindex.

As example. We have four types of system (have many more)
Cisco switch
Cisco routers
VmWare esxi server
Cisco ISE servers.

All send their Syslog data to one sentral rsyslog server.
Splunk reads the file of this server and gets data inn to syslog index.

I see tree ways of sorting these.

  1. use tag. (I can on cisco switch/router use logging origin-id and set a unique tag)
  2. use subnet, if alldevices in this net is same type
  3. use IP (white list)

I will use one app for each type of system

How do I configure these app so that it move data to correct index (cisco switch, cisco router, esxi, ISE etc)?
What is the correct settings in props.conf transform.conf (other files)?
I know how to set different source type, but I need them inn to different indexes.

Can it be done?

Tags (1)
0 Karma

FrankVl
Ultra Champion

If you already have it coming in to an rsyslog daemon, I would strongly suggest to use that to do the structuring.

Create separate dynamic file templates and filtered output actions for different types of sources (based on whatever syslog property or message regex that can help distinguish) using a structure like this (replacing with a specific value for each template you define:

/opt/rsyslog/logs-%$myhostname%/<system-type>/%hostname%/%$YEAR%-%$MONTH%-%$DAY% %$HOUR%:00.log

e.g.

/opt/rsyslog/logs-%$myhostname%/cisco-ise/%hostname%/%$YEAR%-%$MONTH%-%$DAY% %$HOUR%:00.log

%$myhostname% puts the hostname of the syslog server in the path, so this can be found in the source value later in Splunk. Can be useful if you have similar data flowing through multiple syslog servers (e.g. regionally, or a few behind a loadbalancer) and you need to know which specific server the data came through for debugging.

%hostname% puts logs for each source device into a separate folder (which you can then use with the host_segment setting in inputs.conf to populate the host field). Depending on whether the events contain a proper hostname, you might need to use %fromhost% or %fromhost-ip% instead. Be careful with fromhost as it relies on revdns. With older (v5) rsyslog versions, this results in 1 dns request for each incoming UDP packet.

%$YEAR%-%$MONTH%-%$DAY% %$HOUR%:00.log creates a new file each hour. This removes the need to run separate log rotation jobs and again makes debugging easier (you still want to run some script that removes / compresses older files periodically).

This way, you can simply write separate inputs.conf stanzas with specific sourcetype and index settings for each data feed's own specific sub-folder.

Personally I prefer to strive for separating the incoming data types on the network level already, by having different data source types send to different ports (or different virtual IPs if port is not configurable). That way it becomes real easy to let rsyslog write different types, to different folders, by defining separate UDP/TCP listeners, each bound to their own specific ruleset.

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...