Solved: How to ignore minor breakers when searching for a ...

dtaylor

I've been attempting to see if it's possible to search for a term while ignoring all minor breakers that may or may not be in it. For example, in my case, I'm trying to search for a mac address 12:EA:5F:72:11:AB, but I'd also like to find all instances of 12-EA-5F-72-11-AB or 12EA.5F72.11AB or even just 12EA5F7211AB without needing to deliberately specify each of these variations? I thought I could do it using TERM(), but so far I haven't had any luck, and after reading the docs, I can see I may have misunderstood that command. Is there anyway to do this simply?

ITWhisperer

You don't need to list all the variations, just specify that you don't want the hex characters

| rex "(?<mac>([0-9A-F]{2}[^0-9A-F]?){5}[0-9A-F]{2})"

View solution in original post

ITWhisperer

You don't need to list all the variations, just specify that you don't want the hex characters

| rex "(?<mac>([0-9A-F]{2}[^0-9A-F]?){5}[0-9A-F]{2})"

PickleRick

Be aware though that it's not _searching_ for particular MAC address - it's extraction. So if you want to find a specific MAC you'll have to firstly extract it with rex _from every event_ and then compare the extracted value with what you're looking for. It's not very effective performance-wise.

dtaylor

Thank you! While not the solution I was hoping for, this'll get the job done easily enough. I'd actually already considered using the rex command, but wasn't able to get my regex to look neat enough for me to be happy with it.

tscroggins

You can use the regex command to filter by a regular expression, but it's slower and more cumbersome than just combining TERM() functions in a search predicate.

As alternatives, you can extract and normalize a mac field at index time with a combination of transforms or you can create a single-field data model that acts as a secondary time series index.

For the latter, create a search-time field extraction using a transform with MV_ADD = true to capture strings that look like MAC addresses matching your 48-bit patterns (xx-xx-xx-xx-xx-xx, xx:xx:xx:xx:xx:xx, and xxxx.xxxx.xxxx). For example, using source type mac_addr:

# props.conf

[mac_addr]
REPORT-raw_mac = raw_mac

# transforms.conf

[raw_mac]
CLEAN_KEYS = 0
MV_ADD = 1
REGEX = (?<raw_mac>(?<![-.:])\b(?:[0-9A-Fa-f]{2}(?:(?(2)(?:\2)|([-:]?))[0-9A-Fa-f]{2}){5}|[0-9A-Fa-f]{4}(?:(\.)[0-9A-Fa-f]{4}){2})\b(?!\2|\3))

Create a subsequent calculated (eval) field that removes separators:

# props.conf

[mac_addr]
REPORT-raw_mac = raw_mac
EVAL-mac = mvdedup(mvmap(raw_mac, replace(raw_mac, "[-.:]", "")))

Then, define and accelerate a data model with a single dataset and field:

# datamodels.conf

[my_mac_datamodel]
acceleration = true
# 1 month, for example
acceleration.earliest_time = -1mon
acceleration.hunk.dfs_block_size = 0

# data/models/my_mac_datamodel.xml

{
    "modelName": "my_mac_datamodel",
    "displayName": "my_mac_datamodel",
    "description": "",
    "objectSummary": {
        "Event-Based": 0,
        "Transaction-Based": 0,
        "Search-Based": 1
    },
    "objects": [
        {
            "objectName": "my_mac_dataset",
            "displayName": "my_mac_dataset",
            "parentName": "BaseSearch",
            "comment": "",
            "fields": [
                {
                    "fieldName": "mac",
                    "owner": "my_mac_dataset",
                    "type": "string",
                    "fieldSearch": "mac=*",
                    "required": true,
                    "multivalue": false,
                    "hidden": false,
                    "editable": true,
                    "displayName": "mac",
                    "comment": ""
                }
            ],
            "calculations": [],
            "constraints": [],
            "lineage": "my_mac_dataset",
            "baseSearch": "index=main sourcetype=mac_addr"
        }
    ],
    "objectNameList": [
        "my_mac_dataset"
    ]
}

All of the above can be added to a search head using SplunkWeb settings in the following order:

Define shared field transformation.
Define shared field extraction.
Define shared calculated field.
Define shared data model.

Finally, use the datamodel command to optimize the search:

| datamodel summariesonly=t my_mac_datamodel my_mac_dataset flat
| search mac=12EA5F7211AB

Note that some undocumented conditions (source type renaming?) may force Splunk to disable the optimizations used by the datamodel command when distributing the search, in which case it will be no faster than a regular search of the extracted mac field.

If it's working correctly, the search log should include an optimized search with a READ_SUMMARY directive as well as various ReadSummaryDirective log entries. The datamodel command with the flat argument will return the raw events and the undecorated mac field values, but no other extractions will be performed.

PickleRick

Since they are indexed as terms split by major and minor breakers, the best you can do is search for all the "minor terms" and use regex to match the particular sequence. Unfortunately it won't work if the original sequence was not split at all or split into larger chunks.

How to ignore minor breakers when searching for a term?

Other

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Explore the Latest Educational Offerings from Splunk [January 2025 Updates]

Developer Spotlight with Paul Stout