Getting Data In

Events getting tagged with forwarder host name

SplunkTrust
SplunkTrust

Hi,

I have several forwarders that rsyslog listens on 514 and I set it up so that certain logs go to separate files so I can set the sourcetype in the inputs.conf for that specific file.

Example:

:fromhost, startswith, "arc-"   /var/log/wireless    
:fromhost, startswith, "arc-"   ~    
*.emerg;auth.*;*.info;*.debug;mail.crit;lpr.crit;mark.none;audit.notice /var/log/remote

So all hostnames that start with arc- should only log to /var/log/wireless and be dropped from the global logs.

then I put in inputs.conf:

[monitor:///var/log/wireless]
followTail = 0
index = main
sourcetype = wireless

Now what happened was that for some reason all these logs got tagged with the host name of the forwarder instead of the hostname the is in the actual log. Not so with the rest of the logs that are getting tagged correctly.

What I assume is happening is that for some reason when I do rsyslog filtering the log loses the hostames.

How do I re-tag the logs so the extract the hostname from the actual text in the log?

Here are some sample logs:

2013-10-10T17:38:16-04:00 2013 arc-1-bny fpapps[1599]: <208008> <INFO> <arc-1-bny 10.200.8.11>  No change in the Vlan Interface 2150 state UP Vlan Interface has tunnels configured

2013-10-10T17:38:16-04:00 2013 arc-1-440 mdns[1808]: <527000> <DBUG> <arc-1-440 10.100.8.11>  mdns_token_timer_handler 121 mac 00:14:38:d4:81:9d record expiry: generate query packet; state=MDNS_CACHE_EXPIRY2; percent=90; name=HP\032LaserJet\0323390\032\040D4819D\041._printer._tcp.local, type=SRV/NBSTAT

2013-10-10T17:38:16-04:00 2013 arc-1-440 mdns[1808]: <527000> <DBUG> <arc-1-440 10.100.8.11>  mdns_token_timer_init 47 expiry time: sec=54.74, msec=54735

2013-10-10T17:38:16-04:00 2013 arc-1-bny fpapps[1599]: <208007> <INFO> <arc-1-bny 10.200.8.11>  Vlan interface 3066 state is DOWN
2013-10-10T17:13:16-04:00 2013 arc-1-440 mdns[1808]: <527000> <DBUG> <arc-1-440 10.100.8.11>  mdns_parse_packet 2216 mdns response packet received; mac=78:e7:d1:a0:e0:96, ip=10.100.97.17, origin=1

2013-10-10T17:13:16-04:00 2013 arc-1-440 mdns[1808]: <527000> <DBUG> <arc-1-440 10.100.8.11>  mdns_parse_packet 2252 Un-anticipated mdns response packet received; mac=78:e7:d1:a0:e0:96, ip=10.100.97.17, origin=1

2013-10-10T17:13:16-04:00 2013 arc-1-440 authmgr[1729]: <124006> <WARN> <arc-1-440 10.100.8.11>  {4674133} TCP srcip=10.100.151.147 srcport=4193 dstip=10.50.1.53 dstport=5494, action=permit, role=scanner_auth_role, policy=scanner_auth

2013-10-10T17:13:17-04:00 2013 arc-1-440 cfgm[1553]: <307026> <DBUG> <arc-1-440 10.100.8.11>  master: Refreshing the lms list
Tags (3)
0 Karma

Ultra Champion

Hm, the hostname is automatically extracted at index-time for events that have the 'syslog' sourcetype. By setting sourcetype to 'wireless' for your arc-logs, this extraction no longer takes place.

I would suggest that you configure rsyslog to write the files in the /var/log/wireless directory with %HOSTNAME% or $fromhost-ip in the filename or as part of the path under /var/log/wireless, e.g.; /var/log/wireless/10.11.12.13.log or /var/log/wireless/serverA/error.log.

NB. Not 100% sure of if it's %HOSTNAME% or $fromhost-ip or the exact syntax for setting that up, but I guess that this is a good place to start;

http://www.rsyslog.com/doc/manual.html

Then you can use either host_segment or host_regex in inputs.conf to set the correct source host for these messages.

For those hosts that have already been indexed... not much to do about it (i.e. you can't change it in the index), but you can write search time extractions to retrieve the 'real' hostname from each event.

Hope this helps,

K

Ultra Champion

Yeah, but you'll have to keep an eye on which indexes are filling up faster than you thought, and which do not. And if you've already 'booked' all available diskspace it means you'll have to reduce the max size for some indexes in order to be able to increase it for others etc etc. That's the micromanaging part.

Setting the permissions is more easily fixed, as your users tend to complain if they can't access the data they need 🙂

/k

0 Karma

SplunkTrust
SplunkTrust

Lots as in say 25.

Storage is not a problem, we sized the storage so we can keep a 3x repfactor for at least 3 years, so I don't really care about retention times and storage.

So I guess it will boil down to permissions.

Thanks,
David

0 Karma

Ultra Champion

Define 'lots' 🙂

Normally it's not too hard to keep track of a dozen or so indexes.

Splunk can handle many indexes, it's more about you having to micromanage disk space, retention times, access rights, etc

/K

0 Karma

SplunkTrust
SplunkTrust

Thanks for the reply.

I have actually considered that, but it would become very complex to handle so many logfiles and also to make sure that all the log rotations works so the filesystem does not fill up. Also I will have to put a wildcard in the monitor stanza which will require more cpu time for splunk which I am trying to avoid.

One of the other options I was considering, was to have those applications in separate indexes and leave the sourcetype as syslog, which would probably solve this problem. The question is if there is any drawback to have lots of indexes.

Thanks

0 Karma