Splunk Search

extract a field from event source filename

Explorer

How can I configure Splunk to extract some fields from the source filename.

I already specify a host_regex and that works great. Also I understand that if there is a date in the filename, splunk will find it automatically. The field can be extracted at index-time if it must.

I have Splunk watch a lot of files and directories. For some source types, there are fields in the filename that aren't the 'host', or a 'date' field. Furthermore these fields aren't repeated in the event data themselves (i.e. not in the file content, only in the filename).

Here's an example from a host collecting oracle alert logs,.

<logdir>/<host>.<sid>.log

/tmp/splunk_alert_logs/db01.TOOL.log

This might have been hit already, but I'm having some difficulty finding an answer that doesn't involve an automatically located field.

Tags (1)
1 Solution

Hi, as a rule of thumb, it is bad to have splunk index new fields if not really necessary (higher burden on the indexer and so on). What you might need most is a search-time field extraction that you can configure like this.

Suppose your oracle alert logs have the sourcetype "oracle_alert", then in local/props.conf:

[oracle_alert]
EXTRACT-sourcefields = (?<logdir>[\w\W/]+)/(?<host_2>[^\.]+)\.(?<sid>[^\.]+)\.log in source
# (double check the regex) (edit: the "in source" is what tells splunk to look into the source field)

That would instruct splunk to extract 3 fields: logdir (anything before the last /), host_2 (which I renamed to not override the original "host" field), and sid.

You don't need to modify fields.conf for this.

Another method would be to also use transforms.conf

For further info on the alternative methods, you can write a comment here or refer to: Props.conf documentation and search for the keyword "EXTRACT".

If you want to test the regex before applying the configuration, you can use the rex command on the search bar; in this case, you could run a search like:

sourcetype=oracle_alert | rex field=source max_match=10 "(?<logdir>[\w\W/]+)/(?<host_2>[^\.]+)\.(?<sid>[^\.]+)\.log"

and check that the three fields appear on the left field-picker menu.

Hope that helped a bit, Paolo

View solution in original post

Path Finder

I was looking to do a similar thing, and ran into this thread.
Thought I'd post what I did with the "rex field=source", if it helps anyone who wants to do something similar:

If the "hidden" source field is something like:


source=/home/MyName/logs/Area310LosAngeles/2012/07/log.txt

You could use:


(splunk search expression here) | rex field=source "Area(?<areacode>.{3})"

to extract the 3 character Area Code out of the source path name, etc.

0 Karma

Explorer

Thanks, I know to avoid indexing fields, I just knew that 'host' was indexed, so I wasn't sure how fields form the filename where going to work out. I understand now though. Thanks both of you.

0 Karma

Hi, as a rule of thumb, it is bad to have splunk index new fields if not really necessary (higher burden on the indexer and so on). What you might need most is a search-time field extraction that you can configure like this.

Suppose your oracle alert logs have the sourcetype "oracle_alert", then in local/props.conf:

[oracle_alert]
EXTRACT-sourcefields = (?<logdir>[\w\W/]+)/(?<host_2>[^\.]+)\.(?<sid>[^\.]+)\.log in source
# (double check the regex) (edit: the "in source" is what tells splunk to look into the source field)

That would instruct splunk to extract 3 fields: logdir (anything before the last /), host_2 (which I renamed to not override the original "host" field), and sid.

You don't need to modify fields.conf for this.

Another method would be to also use transforms.conf

For further info on the alternative methods, you can write a comment here or refer to: Props.conf documentation and search for the keyword "EXTRACT".

If you want to test the regex before applying the configuration, you can use the rex command on the search bar; in this case, you could run a search like:

sourcetype=oracle_alert | rex field=source max_match=10 "(?<logdir>[\w\W/]+)/(?<host_2>[^\.]+)\.(?<sid>[^\.]+)\.log"

and check that the three fields appear on the left field-picker menu.

Hope that helped a bit, Paolo

View solution in original post

New Member

Paolo,

I was trying to create Field extractions from source for multiple sourcetype.

Is there a way to create single extract for multiple sourcetype?

Ex: sourcetypes: (The source is has the serverName)
soa:access:log
soa:server:log

Trying to create Field extraction as type=inline
Name:EXTRACT-SOA-ServerName
sourcetype: soa:.*:log
Extraction / Transformation: (?SOA[0-9]+) in source

However the above one is not working
when I am trying to search : index="soa" sourcetype="soa:server:log"

Let me know what i missed out?

Thanks in advance

0 Karma

Path Finder

I had a very similar situation, where pertinent information was in the filenames - this solution worked perfectly for me as well. Thank You!

0 Karma

Splunk Employee
Splunk Employee

props.conf instructions worked perfectly, thanks.
Only thing I had to add were quotes around the regex.

0 Karma

Splunk Employee
Splunk Employee

You should be able to just define a transform.conf with SOURCE_KEY set as "source" and a REGEX defining your fieldname. Something like:

[a_transform]  
SOURCE_KEY = source  
REGEX = (?i)[\/A-Za-z]+\/(?<give_it_a_fieldname>\w+)(?=\.\w+)  

In your props.conf your reference the "a_transform" such as:

[a_sourcetype]  
REPORT-transform = a_transform  

You'll probably also have to define the fieldname in fields.conf as well since field value would not have been indexed; such as:

[give_it_a_fieldname]  
INDEXED_VALUE = false

Splunk Employee
Splunk Employee

Note that if you search for this field alone, because it's marked as a non-indexed value, Splunk will perform a full table scan to find matches. To get around this performance issue, you could extract the field at index time, set up a lookup table that maps all sources to your fields, or set up a set of eventtypes.

0 Karma