I have a field called File_Name that I've generate by trimming the filepath off of my source from a local data input.
The files are either XML or txt files but the names all follow the same format.
They contain the protocol, Device IP, A three-part transaction sequence number and a message type.
Example:
TCP_10.101.100.111_1478-1573570987-8723-DeviceToNCE.xml
I want to extract the protocol, Device_IP, the first two parts of the transaction sequence number (for event correlation) and the message type.
Here's what I've written so far, forgive me if it's inelegant, I'm still learning!
| rex File_Name="(?<Proto>\w+)_(?<Device_IP>\d+\.\d+\.\d+\.\d+)_(?<Seq_ID>\d+\-\d+)-\d+-(?<Message_Type>\w+\.\w+)"
try this run anywhere search:
| makeresults
| eval File_Name="TCP_10.101.100.111_1478-1573570987-8723-DeviceToNCE.xml"
| rex field=File_Name "(?<Proto>[^_]+)_(?<Device_IP>[^_]+)_(?<Seq_ID>\d+\-\d+)-\d+-(?<Message_Type>\w+\.\w+)"
try this run anywhere search:
| makeresults
| eval File_Name="TCP_10.101.100.111_1478-1573570987-8723-DeviceToNCE.xml"
| rex field=File_Name "(?<Proto>[^_]+)_(?<Device_IP>[^_]+)_(?<Seq_ID>\d+\-\d+)-\d+-(?<Message_Type>\w+\.\w+)"
Perfect! Thank you! So three things:
1. I don't know the syntax for rex, it seems.
2. Is it better to use a define a character class with a negative match case than trying to extract digits or words?
3. Would you know the syntax if I were to bake this regex into my props.conf file for my local data inputs?
If it's straightforward and you know that it's present in each event then you could use digits or words. your regex is also correct. but most of the time if you are unsure then just follow the common delimiters from the raw data and then check if you are getting all the extracted values in the field as expected or not.
yes, you would need to do changes in props.conf. refer
https://docs.splunk.com/Documentation/Splunk/8.0.0/Admin/Propsconf
[my_sourcetype]
EXTRACT-extract_ip = (?<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
Okay, so by that passage, my line in Props.conf should be:
EVAL-File_Name = ltrim(source,"C:\\Program Files (x86)\\Folder1\Folder2\\SavedCopies\\")
EXTRACT-File_Name = (?<Proto>[^_]+)_(?<Device_IP>[^_]+)_(?<Seq_ID>\d+\-\d+)-\d+-(?<Message_Type>\w+\.\w+)
Included my Eval for the File_Name here in case that may be causing issues. It doesn't appear to be working. Does this EXTRACT perhaps not belong in props.conf? I see some talk of a transforms.conf but I don't have that file in my Splunk\etc\apps\search\local dir by default.
have a look at this: your syntax is incorrect. by default, it looks from _raw. ya, there are two ways to do this:
This can be done by using the SOURCE_KEY option in the transforms.conf. So,
in props.conf
[mysourcetype]
REPORT-file_name = file_name
Then in transforms.conf:
[file_name]
SOURCE_KEY = File_Name
REGEX = (?<Proto>[^_]+)_(?<Device_IP>[^_]+)_(?<Seq_ID>\d+\-\d+)-\d+-(?<Message_Type>\w+\.\w+)
Also, if you want to do just by using props.conf then read this:
EXTRACT-<class> = [<regex>|<regex> in <src_field>]
* Used to create extracted fields (search-time field extractions) that do
not reference transforms.conf stanzas.
* Performs a regex-based field extraction from the value of the source
field.
* <class> is a unique literal string that identifies the namespace of the
field you're extracting.
NOTE: <class> values do not have to follow field name syntax
restrictions. You can use characters other than a-z, A-Z, and 0-9, and
spaces are allowed. <class> values are not subject to key cleaning.
* The <regex> is required to have named capturing groups. When the <regex>
matches, the named capturing groups and their values are added to the
event.
* dotall (?s) and multi-line (?m) modifiers are added in front of the regex.
So internally, the regex becomes (?ms)<regex>.
* Use '<regex> in <src_field>' to match the regex against the values of a
specific field. Otherwise it just matches against _raw (all raw event
data).
* NOTE: <src_field> has the following restrictions:
* It can only contain alphanumeric characters and underscore
(a-z, A-Z, 0-9, and _).
* It must already exist as a field that has either been extracted at
index time or has been derived from an EXTRACT-<class> configuration
whose <class> ASCII value is *higher* than the configuration in which
you are attempting to extract the field. For example, if you
have an EXTRACT-ZZZ configuration that extracts <src_field>, then
you can only use 'in <src_field>' in an EXTRACT configuration with
a <class> of 'aaa' or lower, as 'aaa' is lower in ASCII value
than 'ZZZ'.
* It cannot be a field that has been derived from a transform field
extraction (REPORT-<class>), an automatic key-value field extraction
(in which you configure the KV_MODE setting to be something other
than 'none'), a field alias, a calculated field, or a lookup,
as these operations occur after inline field extractions (EXTRACT-
<class>) in the search-time operations sequence.
Sorry, I had to put this on hold for a few days. I was unable to get the transforms.conf method to work at first so I took a step back and revisited the props.conf route. I suspected that trying to EXTRACT off of the EVAL to trim the file path off my source was the culprit and I think i'm correct. I wrote the following regex into props.conf to extract values from the source field.
[mysourcetype]
category = Custom
description = Saved_Copies
pulldown_type = 1
EXTRACT-file = (?<File_Path>[^_]+)_(?<Source_IP>[^_]+)_(?<Seq_ID>\d+\-\d+)-\d+-(?<Message_Type>\w+\.\w+) in source
where an example source value is:
C:\Program Files (x86)\Folder1\Folder2\SavedCopies\TCP_10.101.100.111_1478-1573570987-8723-DeviceToNCE.xml
It seems to be working but I need to develop the regex further as I'm losing the Protocol at the moment.