How can I configure Splunk to extract some fields from the source filename.
I already specify a host_regex and that works great. Also I understand that if there is a date in the filename, splunk will find it automatically. The field can be extracted at index-time if it must.
I have Splunk watch a lot of files and directories. For some source types, there are fields in the filename that aren't the 'host', or a 'date' field. Furthermore these fields aren't repeated in the event data themselves (i.e. not in the file content, only in the filename).
Here's an example from a host collecting oracle alert logs,.
<logdir>/<host>.<sid>.log
/tmp/splunk_alert_logs/db01.TOOL.log
This might have been hit already, but I'm having some difficulty finding an answer that doesn't involve an automatically located field.
Hi, as a rule of thumb, it is bad to have splunk index new fields if not really necessary (higher burden on the indexer and so on). What you might need most is a search-time field extraction that you can configure like this.
Suppose your oracle alert logs have the sourcetype "oracle_alert", then in local/props.conf:
[oracle_alert]
EXTRACT-sourcefields = (?<logdir>[\w\W/]+)/(?<host_2>[^\.]+)\.(?<sid>[^\.]+)\.log in source
# (double check the regex) (edit: the "in source" is what tells splunk to look into the source field)
That would instruct splunk to extract 3 fields: logdir (anything before the last /), host_2 (which I renamed to not override the original "host" field), and sid.
You don't need to modify fields.conf for this.
Another method would be to also use transforms.conf
For further info on the alternative methods, you can write a comment here or refer to: Props.conf documentation and search for the keyword "EXTRACT".
If you want to test the regex before applying the configuration, you can use the rex command on the search bar; in this case, you could run a search like:
sourcetype=oracle_alert | rex field=source max_match=10 "(?<logdir>[\w\W/]+)/(?<host_2>[^\.]+)\.(?<sid>[^\.]+)\.log"
and check that the three fields appear on the left field-picker menu.
Hope that helped a bit, Paolo
I was looking to do a similar thing, and ran into this thread.
Thought I'd post what I did with the "rex field=source", if it helps anyone who wants to do something similar:
If the "hidden" source field is something like:
source=/home/MyName/logs/Area_310_LosAngeles/2012/07/log.txt
You could use:
(splunk search expression here) | rex field=source "Area_(?.{3})"
Thanks, I know to avoid indexing fields, I just knew that 'host' was indexed, so I wasn't sure how fields form the filename where going to work out. I understand now though. Thanks both of you.
Hi, as a rule of thumb, it is bad to have splunk index new fields if not really necessary (higher burden on the indexer and so on). What you might need most is a search-time field extraction that you can configure like this.
Suppose your oracle alert logs have the sourcetype "oracle_alert", then in local/props.conf:
[oracle_alert]
EXTRACT-sourcefields = (?<logdir>[\w\W/]+)/(?<host_2>[^\.]+)\.(?<sid>[^\.]+)\.log in source
# (double check the regex) (edit: the "in source" is what tells splunk to look into the source field)
That would instruct splunk to extract 3 fields: logdir (anything before the last /), host_2 (which I renamed to not override the original "host" field), and sid.
You don't need to modify fields.conf for this.
Another method would be to also use transforms.conf
For further info on the alternative methods, you can write a comment here or refer to: Props.conf documentation and search for the keyword "EXTRACT".
If you want to test the regex before applying the configuration, you can use the rex command on the search bar; in this case, you could run a search like:
sourcetype=oracle_alert | rex field=source max_match=10 "(?<logdir>[\w\W/]+)/(?<host_2>[^\.]+)\.(?<sid>[^\.]+)\.log"
and check that the three fields appear on the left field-picker menu.
Hope that helped a bit, Paolo
Paolo,
I was trying to create Field extractions from source for multiple sourcetype.
Is there a way to create single extract for multiple sourcetype?
Ex: sourcetypes: (The source is has the serverName)
soa:access:log
soa:server:log
Trying to create Field extraction as type=inline
Name:EXTRACT-SOA-ServerName
sourcetype: soa:.*:log
Extraction / Transformation: (?SOA[0-9]+) in source
However the above one is not working
when I am trying to search : index="soa" sourcetype="soa:server:log"
Let me know what i missed out?
Thanks in advance
I had a very similar situation, where pertinent information was in the filenames - this solution worked perfectly for me as well. Thank You!
props.conf instructions worked perfectly, thanks.
Only thing I had to add were quotes around the regex.
You should be able to just define a transform.conf with SOURCE_KEY set as "source" and a REGEX defining your fieldname. Something like:
[a_transform]
SOURCE_KEY = source
REGEX = (?i)[\/A-Za-z]+\/(?<give_it_a_fieldname>\w+)(?=\.\w+)
In your props.conf your reference the "a_transform" such as:
[a_sourcetype]
REPORT-transform = a_transform
You'll probably also have to define the fieldname in fields.conf as well since field value would not have been indexed; such as:
[give_it_a_fieldname]
INDEXED_VALUE = false
Note that if you search for this field alone, because it's marked as a non-indexed value, Splunk will perform a full table scan to find matches. To get around this performance issue, you could extract the field at index time, set up a lookup table that maps all sources to your fields, or set up a set of eventtypes.