Splunk Search

What is the most efficent way to do partial matches on a field?

frbuser
Path Finder

I want to check if a field contains a specific value and the field is multivalue.

What is the most efficient way to check this? I understand that using wildcards is only efficient when matching at the end of a string. So a match like field=*somevalue* is very inefficient. Would using the regex command be more efficient or would mvfind be better?

Example
As an example lets say I am searching proxy logs which contain a host field and a category field. Host field will only contain one value where category is a multivalue field.

If I want to match all records which contain a specific category "email" but doesn't contain the following string in the host field ".domain.com" where the beginning of the domain is variable, what is the most efficient way?

0 Karma

jabezds
Path Finder

Hi frbuser ,uhaq,

Where you able to get a optimized approach?,Im also searching for a solution to optimize rather than search with  * .

eg ..category=Blogs,Software/Technology,Malware,Block List,Computer Security

i want to search if only malware is available as a substring

Im using this SPL -->index=abc category="*malware*"| stats count by category

Can i make it more optimized?

 

Thanks in Advance.

Jbz

0 Karma

DavidHourani
Super Champion

Hi @frbuser,

Generally speaking, if what you're looking for is a part of a field that could be anywhere (begining, middle,end) then what you suggested is the only solution.

More specifically, in some case you can optimize by replacing the * at the begining or end by a regexp if you already have an idea of what that is but that is considering you already know what the possible values would be.

Finally and to talk about mvfind this applies to multi values field and will behave the same way as would writing a regex would.

So I would say to optimize a partial match you would need to know what are the possibilities for the unknown part to be able to improve it. If it's all unknown then that's a * for you 😄

Cheers,
David

gcusello
SplunkTrust
SplunkTrust

Hi frbuser,
As you said, regex command is the most efficient that you can use but it depends on the string to search.
You only have to create a correct regex, if you share some example and the results you like, I could help you more.

bye.
Giuseppe

0 Karma

frbuser
Path Finder

@gcusello My questions is more of a general one. The scenario is anytime you want to match a value that is a substring of a field. So the value you are matching may appear anywhere in the field. It could be at the beginning, middle, or end, or it may be the entire field itself.

Also, the field may be a multivalue field, and the value you are trying to match may be a substring of any of the values.

0 Karma

uhaq
Explorer

If the question is about how to make the search efficient, follow best practices on adding filters as early as possible (e.g define your index,st,source etc). Using fields and then stats to reduce the amount of unnecessary fields also helps.

If you know that the value of your FQDN host will be hostname.domain.com you can try incorporating NOT TERM(hostname.domain.com) in your search as well. This does not always increase your search efficiency and really depends on the scenario.

0 Karma

frbuser
Path Finder

@uhaq The question is what is the most efficient way to do a partial match on a field. E.g. is field=*somevalue* more efficient than regex field=somevalue.

0 Karma
Get Updates on the Splunk Community!

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...