Getting Data In

Source types in /var/log

jackjack
Path Finder

I would like to retrieve the data in /var/log as correctly as possible.

Currently I am simply monitoring the entire /var/log folder with no pre-selected source type.

On the List of pretrained source types I see a few callouts for log files such as syslog but the majority of log files are not present in this list. Perhaps some of these types can be used elsewhere though? For example, I see the linux_messages_syslog pretrained type refers to logs in /var/log/messages and since syslog != messages I presume this type may be useful on other files as well? 

So I can use the few pretrained source types and then do I need to make my own source types for all the other log files? 

Is there any repository with user created source types? I have to imagine most log file types have had source types created for them by now? Or do people just not apply source types and simply search on the unstructured data?

Labels (2)
0 Karma
1 Solution

isoutamo
SplunkTrust
SplunkTrust

Hi

as @PickleRick said, probably best way is start to look from splunkbase what apps / TAs there is already by someone done. 

You could start e.g. with this https://splunkbase.splunk.com/app/833/

When you are thinking what sourcetype actually is you realise that it just define format of individual log event. Basically nothing else. Then it's totally another story how to use those to help your queries etc. But just remember that ST is just lexical format of log file/stream/something. And if/when this change you should change the name of ST e.g. adding increased number after it (my:own:sourcetype:0 vs. my:own:sourcetype:1). Of course you should use more descriptive names for those. There are several docs where you could found naming standards for those.

Naming a sourcetype is that you just add it's name into inputs.conf, nothing else. Then when you want to use / tokenise / extract some fields from it, you need to do additional definition on props.conf and/or transforms.conf.

r. Ismo

View solution in original post

PickleRick
SplunkTrust
SplunkTrust

Well, it's a tricky subject 😉

In case of a "pre-defined" appliances or similar solutions (like pi-hole, for example) you usually have an app and in the documentation it often specifies what to do on both splunk's input side and the solution's logging settings in order to achieve interoperability (although sometimes the app might be prepared with logging to files in mind and you want to get events by syslog, or you might encounter other issues).

Often, the apps might define several separate sourcetypes for various types of logs coming from a single application and dynamicaly rewrite the sourcetype on input when the ingest pipeline is able to match the event to a particular kind of an event (that's a completely different thing that eventtype on search!).

 

In case of "general logs", well, it's up to you. As a general rule, from my experience - think what you need the events for. For example, in one of my production environments I restrict syslog messages forwarder into splunk by process name and only get messages from a very strict set of programs.

I'd start by looking for app for a particular application output on splunkbase.

isoutamo
SplunkTrust
SplunkTrust

Hi

as @PickleRick said, probably best way is start to look from splunkbase what apps / TAs there is already by someone done. 

You could start e.g. with this https://splunkbase.splunk.com/app/833/

When you are thinking what sourcetype actually is you realise that it just define format of individual log event. Basically nothing else. Then it's totally another story how to use those to help your queries etc. But just remember that ST is just lexical format of log file/stream/something. And if/when this change you should change the name of ST e.g. adding increased number after it (my:own:sourcetype:0 vs. my:own:sourcetype:1). Of course you should use more descriptive names for those. There are several docs where you could found naming standards for those.

Naming a sourcetype is that you just add it's name into inputs.conf, nothing else. Then when you want to use / tokenise / extract some fields from it, you need to do additional definition on props.conf and/or transforms.conf.

r. Ismo

jackjack
Path Finder

Thank you both for your assistance! I am working to setup the app now. This seems like a much better solution.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

It depends greatly on what is the source of the log entries. In /var/log you can have:

  • files created directly by particular software (for example /var/log/httpd or /var/log/apache - dependong on distro)
  • files filtered by yohr system's configuration to specific files (for example /var/log/maillog in some typical cases)
  • files created as a default "sink" by syslog where you get all the events generated by any program
  • and maybe more

So just because something is in /var/log, it doesn't tell you for sure what it is and what kind of events it contains. You have to know what type of data you're ingesting (and it's usually best to split it by source - meaning specific program - into separate files).

Then it gets easier. You look for TA or at least raw parsing rules for a specific application and create appropriate input reading from a file containing given sourcetype.

jackjack
Path Finder

We meet again PickleRick,

Thank you for your response. So here is my current inputs.conf file. I've gone through everything in my /var/log folder and attempted to classify it. As you can see I have a number of logs with an unknown source type (or a guess that I'm unsure if it's correct or not) - how can I figure out what to put there? 

## inputs.conf for splunk universal forwarders
## /var/log

# update-alternatives, symbolic links
# Ok to ignore?
[monitor:///var/log/alternatives.log]
disabled = false
index = main
# sourcetype = ???

# auth log (sudo, ssh, etc.)
[monitor:///var/log/auth.log]
disabled = false
index = main
sourcetype = linux_secure

# bootstrap (this may not actually get updated on boot)
# Ok to ignore?
[monitor:///var/log/bootstrap.log]
disabled = false
index = main
# sourcetype = linux_bootlog

# btmp log (failed login attempts)
# Splunk cannot index this data type
# TODO figure out how to get this in splunk
# [monitor:///var/log/btmp]
# disabled = false
# index = main
# sourcetype = ??

# dpkg log (dpkg and apt installs)
[monitor:///var/log/dpkg.log]
disabled = false
index = main
# sourcetype = ??

# faillog (failed user logins)
# Splunk cannot index this data type
# TODO figure out how to get this in splunk
# [monitor:///var/log/faillog]
# disabled = false
# index = main
# sourcetype = ??

# kern log (kernel logs)
[monitor:///var/log/kern.log]
disabled = false
index = main
# sourcetype = linux_messages_syslog

# lastlog (last login by user)
# Splunk cannot index this data type
# TODO figure out how to get this in splunk
# [monitor:///var/log/lastlog]
# disabled = false
# index = main
# sourcetype = ??

# syslog (system logs)
[monitor:///var/log/syslog]
disabled = false
index = main
sourcetype = linux_messages_syslog

# tallylog (count of attempted logins/fails)
# Ok to ignore?
# [monitor:///var/log/tallylog]
# disabled = false
# index = main
# sourcetype = ??

# ufw log (firewall)
[monitor:///var/log/ufw.log]
disabled = false
index = main
# sourcetype = linux_messages_syslog

# wtmp (login records)
# Splunk cannot index this data type
# TODO figure out how to get this in splunk
# [monitor:///var/log/wtmp]
# disabled = false
# index = main
# sourcetype = ??

## /var/log/subdirs

# apache access log
[monitor:///var/log/apache/access.log]
disabled = false
index = main
sourcetype = access_combined

# apache error log
[monitor:///var/log/apache/error.log]
disabled = false
index = main
sourcetype = apache_error

# apache other vhosts access log
[monitor:///var/log/apache/other_vhosts_access.log]
disabled = false
index = main
sourcetype = access_combined

# apt history log
[monitor:///var/log/apt/history.log]
disabled = false
index = main
# sourcetype = ??

# apt term log
[monitor:///var/log/apt/term.log]
disabled = false
index = main
# sourcetype = ??

# Ignoring /chrony (empty)
# Ignoring /installer (tons of files)
# Ignorning /journal (binaries)

# letsencrypt log
[monitor:///var/log/letsencrypt/letsencrypt.log]
disabled = false
index = main
# sourcetype = ??

# mysql error log
[monitor:///var/log/mysql/error.log]
disabled = false
index = main
sourcetype = mysqld_error

# unattended upgrades dpkg log
[monitor:///var/log/unattended-upgrades/unattended-upgrades-dpkg.log]
disabled = false
index = main
# sourcetype = ??

# unattended upgrades shutdown log
[monitor:///var/log/unattended-upgrades/unattended-upgrades-shutdown.log]
disabled = false
index = main
# sourcetype = ??
0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...