Splunk Search

How to ignore minor breakers when searching for a term?

dtaylor
Explorer

I've been attempting to see if it's possible to search for a term while ignoring all minor breakers that may or may not be in it. For example, in my case, I'm trying to search for a mac address 12:EA:5F:72:11:AB, but I'd also like to find all instances of 12-EA-5F-72-11-AB or 12EA.5F72.11AB or even just 12EA5F7211AB without needing to deliberately specify each of these variations? I thought I could do it using TERM(), but so far I haven't had any luck, and after reading the docs, I can see I may have misunderstood that command. Is there anyway to do this simply?

Labels (1)
Tags (2)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

You don't need to list all the variations, just specify that you don't want the hex characters

| rex "(?<mac>([0-9A-F]{2}[^0-9A-F]?){5}[0-9A-F]{2})"

View solution in original post

ITWhisperer
SplunkTrust
SplunkTrust

You don't need to list all the variations, just specify that you don't want the hex characters

| rex "(?<mac>([0-9A-F]{2}[^0-9A-F]?){5}[0-9A-F]{2})"

PickleRick
SplunkTrust
SplunkTrust

Be aware though that it's not _searching_ for particular MAC address - it's extraction. So if you want to find a specific MAC you'll have to firstly extract it with rex _from every event_ and then compare the extracted value with what you're looking for. It's not very effective performance-wise.

0 Karma

dtaylor
Explorer

Thank you! While not the solution I was hoping for, this'll get the job done easily enough. I'd actually already considered using the rex command, but wasn't able to get my regex to look neat enough for me to be happy with it.

0 Karma

tscroggins
Influencer

You can use the regex command to filter by a regular expression, but it's slower and more cumbersome than just combining TERM() functions in a search predicate.

As alternatives, you can extract and normalize a mac field at index time with a combination of transforms or you can create a single-field data model that acts as a secondary time series index.

For the latter, create a search-time field extraction using a transform with MV_ADD = true to capture strings that look like MAC addresses matching your 48-bit patterns (xx-xx-xx-xx-xx-xx, xx:xx:xx:xx:xx:xx, and xxxx.xxxx.xxxx). For example, using source type mac_addr:

# props.conf

[mac_addr]
REPORT-raw_mac = raw_mac

# transforms.conf

[raw_mac]
CLEAN_KEYS = 0
MV_ADD = 1
REGEX = (?<raw_mac>(?<![-.:])\b(?:[0-9A-Fa-f]{2}(?:(?(2)(?:\2)|([-:]?))[0-9A-Fa-f]{2}){5}|[0-9A-Fa-f]{4}(?:(\.)[0-9A-Fa-f]{4}){2})\b(?!\2|\3))

Create a subsequent calculated (eval) field that removes separators:

# props.conf

[mac_addr]
REPORT-raw_mac = raw_mac
EVAL-mac = mvdedup(mvmap(raw_mac, replace(raw_mac, "[-.:]", "")))

Then, define and accelerate a data model with a single dataset and field:

# datamodels.conf

[my_mac_datamodel]
acceleration = true
# 1 month, for example
acceleration.earliest_time = -1mon
acceleration.hunk.dfs_block_size = 0

# data/models/my_mac_datamodel.xml

{
    "modelName": "my_mac_datamodel",
    "displayName": "my_mac_datamodel",
    "description": "",
    "objectSummary": {
        "Event-Based": 0,
        "Transaction-Based": 0,
        "Search-Based": 1
    },
    "objects": [
        {
            "objectName": "my_mac_dataset",
            "displayName": "my_mac_dataset",
            "parentName": "BaseSearch",
            "comment": "",
            "fields": [
                {
                    "fieldName": "mac",
                    "owner": "my_mac_dataset",
                    "type": "string",
                    "fieldSearch": "mac=*",
                    "required": true,
                    "multivalue": false,
                    "hidden": false,
                    "editable": true,
                    "displayName": "mac",
                    "comment": ""
                }
            ],
            "calculations": [],
            "constraints": [],
            "lineage": "my_mac_dataset",
            "baseSearch": "index=main sourcetype=mac_addr"
        }
    ],
    "objectNameList": [
        "my_mac_dataset"
    ]
}

All of the above can be added to a search head using SplunkWeb settings in the following order:

  1. Define shared field transformation.
  2. Define shared field extraction.
  3. Define shared calculated field.
  4. Define shared data model.

Finally, use the datamodel command to optimize the search:

| datamodel summariesonly=t my_mac_datamodel my_mac_dataset flat
| search mac=12EA5F7211AB

Note that some undocumented conditions (source type renaming?) may force Splunk to disable the optimizations used by the datamodel command when distributing the search, in which case it will be no faster than a regular search of the extracted mac field.

If it's working correctly, the search log should include an optimized search with a READ_SUMMARY directive as well as various ReadSummaryDirective log entries. The datamodel command with the flat argument will return the raw events and the undecorated mac field values, but no other extractions will be performed.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Since they are indexed as terms split by major and minor breakers, the best you can do is search for all the "minor terms" and use regex to match the particular sequence. Unfortunately it won't work if the original sequence was not split at all or split into larger chunks.

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Deprecation of Splunk Observability Kubernetes “Classic Navigator” UI starting ...

Access to Splunk Observability Kubernetes “Classic Navigator” UI will no longer be available starting January ...