Splunk Search

How to extract the protocol, Device_IP, transaction sequence number and the message type with regex

cb046891
New Member

I have a field called File_Name that I've generate by trimming the filepath off of my source from a local data input.
The files are either XML or txt files but the names all follow the same format.
They contain the protocol, Device IP, A three-part transaction sequence number and a message type.

Example:

TCP_10.101.100.111_1478-1573570987-8723-DeviceToNCE.xml
I want to extract the protocol, Device_IP, the first two parts of the transaction sequence number (for event correlation) and the message type.

Here's what I've written so far, forgive me if it's inelegant, I'm still learning!

| rex File_Name="(?<Proto>\w+)_(?<Device_IP>\d+\.\d+\.\d+\.\d+)_(?<Seq_ID>\d+\-\d+)-\d+-(?<Message_Type>\w+\.\w+)"
0 Karma
1 Solution

mayurr98
Super Champion

try this run anywhere search:

| makeresults 
| eval File_Name="TCP_10.101.100.111_1478-1573570987-8723-DeviceToNCE.xml" 
| rex field=File_Name "(?<Proto>[^_]+)_(?<Device_IP>[^_]+)_(?<Seq_ID>\d+\-\d+)-\d+-(?<Message_Type>\w+\.\w+)"

View solution in original post

0 Karma

mayurr98
Super Champion

try this run anywhere search:

| makeresults 
| eval File_Name="TCP_10.101.100.111_1478-1573570987-8723-DeviceToNCE.xml" 
| rex field=File_Name "(?<Proto>[^_]+)_(?<Device_IP>[^_]+)_(?<Seq_ID>\d+\-\d+)-\d+-(?<Message_Type>\w+\.\w+)"
0 Karma

cb046891
New Member

Perfect! Thank you! So three things:
1. I don't know the syntax for rex, it seems.
2. Is it better to use a define a character class with a negative match case than trying to extract digits or words?
3. Would you know the syntax if I were to bake this regex into my props.conf file for my local data inputs?

0 Karma

mayurr98
Super Champion
  1. rex syntax

  2. If it's straightforward and you know that it's present in each event then you could use digits or words. your regex is also correct. but most of the time if you are unsure then just follow the common delimiters from the raw data and then check if you are getting all the extracted values in the field as expected or not.

  3. yes, you would need to do changes in props.conf. refer
    https://docs.splunk.com/Documentation/Splunk/8.0.0/Admin/Propsconf

    [my_sourcetype]
    EXTRACT-extract_ip = (?<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})

0 Karma

cb046891
New Member

Okay, so by that passage, my line in Props.conf should be:

EVAL-File_Name = ltrim(source,"C:\\Program Files (x86)\\Folder1\Folder2\\SavedCopies\\")
EXTRACT-File_Name = (?<Proto>[^_]+)_(?<Device_IP>[^_]+)_(?<Seq_ID>\d+\-\d+)-\d+-(?<Message_Type>\w+\.\w+)

Included my Eval for the File_Name here in case that may be causing issues. It doesn't appear to be working. Does this EXTRACT perhaps not belong in props.conf? I see some talk of a transforms.conf but I don't have that file in my Splunk\etc\apps\search\local dir by default.

0 Karma

mayurr98
Super Champion

have a look at this: your syntax is incorrect. by default, it looks from _raw. ya, there are two ways to do this:

https://answers.splunk.com/answers/47982/extracting-field-from-a-field-other-than-raw-in-props-conf....

This can be done by using the SOURCE_KEY option in the transforms.conf. So,
in props.conf

[mysourcetype]
REPORT-file_name = file_name

Then in transforms.conf:

[file_name]
SOURCE_KEY = File_Name
REGEX = (?<Proto>[^_]+)_(?<Device_IP>[^_]+)_(?<Seq_ID>\d+\-\d+)-\d+-(?<Message_Type>\w+\.\w+)

Also, if you want to do just by using props.conf then read this:

EXTRACT-<class> = [<regex>|<regex> in <src_field>]
* Used to create extracted fields (search-time field extractions) that do
  not reference transforms.conf stanzas.
* Performs a regex-based field extraction from the value of the source
  field.
* <class> is a unique literal string that identifies the namespace of the
  field you're extracting.
  NOTE: <class> values do not have to follow field name syntax
  restrictions. You can use characters other than a-z, A-Z, and 0-9, and
  spaces are allowed. <class> values are not subject to key cleaning.
* The <regex> is required to have named capturing groups. When the <regex>
  matches, the named capturing groups and their values are added to the
  event.
* dotall (?s) and multi-line (?m) modifiers are added in front of the regex.
  So internally, the regex becomes (?ms)<regex>.
* Use '<regex> in <src_field>' to match the regex against the values of a
  specific field.  Otherwise it just matches against _raw (all raw event
  data).
* NOTE: <src_field> has the following restrictions:
  * It can only contain alphanumeric characters and underscore
    (a-z, A-Z, 0-9, and _).
  * It must already exist as a field that has either been extracted at
    index time or has been derived from an EXTRACT-<class> configuration
    whose <class> ASCII value is *higher* than the configuration in which
    you are attempting to extract the field. For example, if you
    have an EXTRACT-ZZZ configuration that extracts <src_field>, then
    you can only use 'in <src_field>' in an EXTRACT configuration with
    a <class> of 'aaa' or lower, as 'aaa' is lower in ASCII value
    than 'ZZZ'.
  * It cannot be a field that has been derived from a transform field
    extraction (REPORT-<class>), an automatic key-value field extraction
    (in which you configure the KV_MODE setting to be something other
    than 'none'), a field alias, a calculated field, or a lookup,
    as these operations occur after inline field extractions (EXTRACT-
    <class>) in the search-time operations sequence.
0 Karma

cb046891
New Member

Sorry, I had to put this on hold for a few days. I was unable to get the transforms.conf method to work at first so I took a step back and revisited the props.conf route. I suspected that trying to EXTRACT off of the EVAL to trim the file path off my source was the culprit and I think i'm correct. I wrote the following regex into props.conf to extract values from the source field.

[mysourcetype]
category = Custom
description = Saved_Copies
pulldown_type = 1
EXTRACT-file = (?<File_Path>[^_]+)_(?<Source_IP>[^_]+)_(?<Seq_ID>\d+\-\d+)-\d+-(?<Message_Type>\w+\.\w+) in source

where an example source value is:

C:\Program Files (x86)\Folder1\Folder2\SavedCopies\TCP_10.101.100.111_1478-1573570987-8723-DeviceToNCE.xml

It seems to be working but I need to develop the regex further as I'm losing the Protocol at the moment.

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...