Solved: How to extract the protocol, Device_IP, transactio...

cb046891 · ‎11-12-2019

I have a field called File_Name that I've generate by trimming the filepath off of my source from a local data input.
The files are either XML or txt files but the names all follow the same format.
They contain the protocol, Device IP, A three-part transaction sequence number and a message type.

Example:

TCP_10.101.100.111_1478-1573570987-8723-DeviceToNCE.xml
I want to extract the protocol, Device_IP, the first two parts of the transaction sequence number (for event correlation) and the message type.

Here's what I've written so far, forgive me if it's inelegant, I'm still learning!

| rex File_Name="(?<Proto>\w+)_(?<Device_IP>\d+\.\d+\.\d+\.\d+)_(?<Seq_ID>\d+\-\d+)-\d+-(?<Message_Type>\w+\.\w+)"

mayurr98 · ‎11-12-2019

try this run anywhere search:

| makeresults 
| eval File_Name="TCP_10.101.100.111_1478-1573570987-8723-DeviceToNCE.xml" 
| rex field=File_Name "(?<Proto>[^_]+)_(?<Device_IP>[^_]+)_(?<Seq_ID>\d+\-\d+)-\d+-(?<Message_Type>\w+\.\w+)"

View solution in original post

mayurr98 · ‎11-12-2019

try this run anywhere search:

| makeresults 
| eval File_Name="TCP_10.101.100.111_1478-1573570987-8723-DeviceToNCE.xml" 
| rex field=File_Name "(?<Proto>[^_]+)_(?<Device_IP>[^_]+)_(?<Seq_ID>\d+\-\d+)-\d+-(?<Message_Type>\w+\.\w+)"

cb046891 · ‎11-12-2019

Perfect! Thank you! So three things:
1. I don't know the syntax for rex, it seems.
2. Is it better to use a define a character class with a negative match case than trying to extract digits or words?
3. Would you know the syntax if I were to bake this regex into my props.conf file for my local data inputs?

mayurr98 · ‎11-12-2019

rex syntax
If it's straightforward and you know that it's present in each event then you could use digits or words. your regex is also correct. but most of the time if you are unsure then just follow the common delimiters from the raw data and then check if you are getting all the extracted values in the field as expected or not.
yes, you would need to do changes in props.conf. refer
https://docs.splunk.com/Documentation/Splunk/8.0.0/Admin/Propsconf

[my_sourcetype]
EXTRACT-extract_ip = (?<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})

cb046891 · ‎11-12-2019

Okay, so by that passage, my line in Props.conf should be:

EVAL-File_Name = ltrim(source,"C:\\Program Files (x86)\\Folder1\Folder2\\SavedCopies\\")
EXTRACT-File_Name = (?<Proto>[^_]+)_(?<Device_IP>[^_]+)_(?<Seq_ID>\d+\-\d+)-\d+-(?<Message_Type>\w+\.\w+)

Included my Eval for the File_Name here in case that may be causing issues. It doesn't appear to be working. Does this EXTRACT perhaps not belong in props.conf? I see some talk of a transforms.conf but I don't have that file in my Splunk\etc\apps\search\local dir by default.

mayurr98 · ‎11-12-2019

have a look at this: your syntax is incorrect. by default, it looks from _raw. ya, there are two ways to do this:

https://answers.splunk.com/answers/47982/extracting-field-from-a-field-other-than-raw-in-props-conf....

This can be done by using the SOURCE_KEY option in the transforms.conf. So,
in props.conf

[mysourcetype]
REPORT-file_name = file_name

Then in transforms.conf:

[file_name]
SOURCE_KEY = File_Name
REGEX = (?<Proto>[^_]+)_(?<Device_IP>[^_]+)_(?<Seq_ID>\d+\-\d+)-\d+-(?<Message_Type>\w+\.\w+)

Also, if you want to do just by using props.conf then read this:

EXTRACT-<class> = [<regex>|<regex> in <src_field>]
* Used to create extracted fields (search-time field extractions) that do
  not reference transforms.conf stanzas.
* Performs a regex-based field extraction from the value of the source
  field.
* <class> is a unique literal string that identifies the namespace of the
  field you're extracting.
  NOTE: <class> values do not have to follow field name syntax
  restrictions. You can use characters other than a-z, A-Z, and 0-9, and
  spaces are allowed. <class> values are not subject to key cleaning.
* The <regex> is required to have named capturing groups. When the <regex>
  matches, the named capturing groups and their values are added to the
  event.
* dotall (?s) and multi-line (?m) modifiers are added in front of the regex.
  So internally, the regex becomes (?ms)<regex>.
* Use '<regex> in <src_field>' to match the regex against the values of a
  specific field.  Otherwise it just matches against _raw (all raw event
  data).
* NOTE: <src_field> has the following restrictions:
  * It can only contain alphanumeric characters and underscore
    (a-z, A-Z, 0-9, and _).
  * It must already exist as a field that has either been extracted at
    index time or has been derived from an EXTRACT-<class> configuration
    whose <class> ASCII value is *higher* than the configuration in which
    you are attempting to extract the field. For example, if you
    have an EXTRACT-ZZZ configuration that extracts <src_field>, then
    you can only use 'in <src_field>' in an EXTRACT configuration with
    a <class> of 'aaa' or lower, as 'aaa' is lower in ASCII value
    than 'ZZZ'.
  * It cannot be a field that has been derived from a transform field
    extraction (REPORT-<class>), an automatic key-value field extraction
    (in which you configure the KV_MODE setting to be something other
    than 'none'), a field alias, a calculated field, or a lookup,
    as these operations occur after inline field extractions (EXTRACT-
    <class>) in the search-time operations sequence.

cb046891 · ‎11-15-2019

Sorry, I had to put this on hold for a few days. I was unable to get the transforms.conf method to work at first so I took a step back and revisited the props.conf route. I suspected that trying to EXTRACT off of the EVAL to trim the file path off my source was the culprit and I think i'm correct. I wrote the following regex into props.conf to extract values from the source field.

[mysourcetype]
category = Custom
description = Saved_Copies
pulldown_type = 1
EXTRACT-file = (?<File_Path>[^_]+)_(?<Source_IP>[^_]+)_(?<Seq_ID>\d+\-\d+)-\d+-(?<Message_Type>\w+\.\w+) in source

where an example source value is:

C:\Program Files (x86)\Folder1\Folder2\SavedCopies\TCP_10.101.100.111_1478-1573570987-8723-DeviceToNCE.xml

It seems to be working but I need to develop the regex further as I'm losing the Protocol at the moment.

How to extract the protocol, Device_IP, transaction sequence number and the message type with regex

How to Monitor Google Kubernetes Engine (GKE)

Index This | How can you make 45 using only 4?

Splunk Education Goes to Washington | Splunk GovSummit 2024