I want to check if a field contains a specific value and the field is multivalue.
What is the most efficient way to check this? I understand that using wildcards is only efficient when matching at the end of a string. So a match like field=*somevalue* is very inefficient. Would using the regex command be more efficient or would mvfind be better?
Example
As an example lets say I am searching proxy logs which contain a host field and a category field. Host field will only contain one value where category is a multivalue field.
If I want to match all records which contain a specific category "email" but doesn't contain the following string in the host field ".domain.com" where the beginning of the domain is variable, what is the most efficient way?
Hi frbuser ,uhaq,gcusello,DavidHourani
Where you able to get a optimized approach?,Im also searching for a solution to optimize rather than search with * .
eg ..category=Blogs,Software/Technology,Malware,Block List,Computer Security
i want to search if only malware is available as a substring
Im using this SPL -->index=abc category="*malware*"| stats count by category
Can i make it more optimized?
Thanks in Advance.
Jbz
Hi @frbuser,
Generally speaking, if what you're looking for is a part of a field that could be anywhere (begining, middle,end) then what you suggested is the only solution.
More specifically, in some case you can optimize by replacing the *
at the begining or end by a regexp if you already have an idea of what that is but that is considering you already know what the possible values would be.
Finally and to talk about mvfind
this applies to multi values field and will behave the same way as would writing a regex would.
So I would say to optimize a partial match you would need to know what are the possibilities for the unknown part to be able to improve it. If it's all unknown then that's a *
for you 😄
Cheers,
David
Hi frbuser,
As you said, regex command is the most efficient that you can use but it depends on the string to search.
You only have to create a correct regex, if you share some example and the results you like, I could help you more.
bye.
Giuseppe
@gcusello My questions is more of a general one. The scenario is anytime you want to match a value that is a substring of a field. So the value you are matching may appear anywhere in the field. It could be at the beginning, middle, or end, or it may be the entire field itself.
Also, the field may be a multivalue field, and the value you are trying to match may be a substring of any of the values.
If the question is about how to make the search efficient, follow best practices on adding filters as early as possible (e.g define your index,st,source etc). Using fields and then stats to reduce the amount of unnecessary fields also helps.
If you know that the value of your FQDN host will be hostname.domain.com you can try incorporating NOT TERM(hostname.domain.com) in your search as well. This does not always increase your search efficiency and really depends on the scenario.
@uhaq The question is what is the most efficient way to do a partial match on a field. E.g. is field=*somevalue* more efficient than regex field=somevalue.