Getting Data In

Consolidate similarly named log files into a single source

jones4bob
Explorer

Our Splunk environment takes input from log files dropped off by an IronPort web security appliance. The files are named in a format where a long string representing the date is appended to a common file name. An input is set up to monitor this folder for files as they are dropped off. Since each file name is different, each one represents a different "source" in Splunk. How can I cause them to be considered a single source?

Tags (2)
1 Solution

Genti
Splunk Employee
Splunk Employee

very easy:
Set up a monitor like this in the inputs.conf file

[monitor:///directory/where/your/files/get/dropped/]
disabled = false
followTail = 0
source = mysource

This way, all files that get dropped in that directory, when indexed will have the source set as "mysource"

Chers,
.gz

View solution in original post

murthychitturi
New Member

HI

[source_clean-YYYY-MM-DD] # Remove 'YYYY-MM-DD' or 'YYYY_MM_DD' style date from the filename # catalina.2009-09-17.log --> catalina.log # quartz-xyz.log.2009-07-08 --> quartz-xyz.log # xyz.log.2009-07-15 --> xyz.log # xyz_2010-01-01.log --> xyz.log # info_012345_2010_08_17.txt --> info_012345.txt DEST_KEY = MetaData:Source SOURCE_KEY = MetaData:Source REGEX = source::(.)([-. ]\d{4}([-])\d\d\3\d\d)(|\D.)$ FORMAT = source::$1$4

in the regex what does \3\ represent. does it not supposed to be [-] to find out YYYY-MM-DD

0 Karma

blee_i365
Explorer

Thank you for the transforms.conf method. It works great!!

0 Karma

Lowell
Super Champion

Another approach is to use a transformer to rename source. I come across this quite frequently and it's a pain to make a new transformer all the time when adding new sources, so I've built a collection of common renaming scenarios and build one transformer for each.

I have also used the explicit setting of "source=" in the monitor input stanza, but I believe I ran into problems with that (but there are been many splunk release since then, so perhaps this is a bug fixed long ago.) Also, using a transformer to rename a source also works when you have multiple types of files (various sourcetypes, for example), that are monitored with a single monitor input.

Here are a couple of example transformers that I use frequently.

transforms.conf:

[source_clean-YYYY-MM-DD]
# Remove 'YYYY-MM-DD' or 'YYYY_MM_DD' style date from the filename
#   catalina.2009-09-17.log     --> catalina.log
#   quartz-xyz.log.2009-07-08   --> quartz-xyz.log
#   xyz.log.2009-07-15          --> xyz.log
#   xyz_2010-01-01.log          --> xyz.log
#   info_012345_2010_08_17.txt  --> info_012345.txt
DEST_KEY   = MetaData:Source
SOURCE_KEY = MetaData:Source
REGEX    = source::(.*)([-._ ]\d{4}([-_])\d\d\3\d\d)(|\D.*)$
FORMAT   = source::$1$4


[source_drop-YYYYMMDD]
# Remove a 'YYYYMMDD' style date from the filename
#   jakarta_service_20091007.log --> jakarta_service.log
#   jakarta_service.log_20091007 --> jakarta_service.log
#   jakarta_service.log.20091007 --> jakarta_service.log
#   jakarta_service.log-20091007 --> jakarta_service.log
DEST_KEY   = MetaData:Source
SOURCE_KEY = MetaData:Source
REGEX    = source::(.*)([-._ ]\d{8})(|\D.*)$
FORMAT   = source::$1$3


[source_clean-trailing-digits]
# Remove any trailing digits from a filename
#   syslog.3 --> syslog
#   server.log.12345 --> server.log
#   access_log.1262174400 --> access_log
DEST_KEY   = MetaData:Source
SOURCE_KEY = MetaData:Source
REGEX    = source::(.*)[-._]\d+$
FORMAT   = source::$1


[source_clean-digits-before-ext]
# Remove any trailing digits from a filename
#   server.12345.log      --> server.log
#   access-1262174400_log --> access_log
DEST_KEY   = MetaData:Source
SOURCE_KEY = MetaData:Source
REGEX    = source::(.*)[-._]\d+([._][a-zA-Z]+)$
FORMAT   = source::$1$2

So to the use one of these transformers, you simply have to add an entry in your props file based on source or sourcetype. For example:

props.conf:

[my_source_type]
TRANSFORMS-fix_source = source_drop-YYYYMMDD

Note about using comments:

I recommend the use of comments in this type of situation to document example inputs and outputs. First, comments like this make it easier to find the right transformer than by relying on the stanza name or trying to interpret the regex/format lines. And secondly, and more importantly, the examples double as test cases. So if (or when) I find a slightly new file pattern I want to be able to handle, I can update the regex/format and then quickly use a regex testing tool to verify that I didn't break the previously supported examples. This is especially important for the source field since it needs to be reliable. And in my case, these settings get deployed out to all of my splunk installs so a careless mistake could lead to a large number of sources getting messed up all at once.

Genti
Splunk Employee
Splunk Employee

very easy:
Set up a monitor like this in the inputs.conf file

[monitor:///directory/where/your/files/get/dropped/]
disabled = false
followTail = 0
source = mysource

This way, all files that get dropped in that directory, when indexed will have the source set as "mysource"

Chers,
.gz

Lowell
Super Champion

This should normally work fine. However, there could be issues if more than one of these files is written to at once. (This can also happen if you're indexing many historical log files; in which case you've have to omit the "followTail=0"). I believe this is related to this: http://answers.splunk.com/questions/451/why-do-my-scripted-inputs-data-get-intermingled (I know this is talking about scripted inputs, but I think the same principle applies here. I know I've seen this issue on my system before with scripted inputs and regular files.)

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...