Splunk Search

Algorithm to search multiple related events based on set of words

VatsalJagani
SplunkTrust
SplunkTrust

Hello Splunkers,

I’m looking for the best algorithm to search for events. with the below criteria.

I have a lookup with only one field but multi-valued. About 10000 lines, for example,
“vatsal, jagani”
“10.0.0.1,“10.0.0.2”

I want to search index=abc, for the last 2 hours (about 50 events) to see if there are at least two events (but can be more) that contain words from one set.

 

For example.
event-1 - “hello, I’m Vatsal.
event-2 - “hello, I’m jagani too.”

here, two events have matching words from the same lookup field.

 

Another example,

event-3 - “hi, vatsal”
event-4 - “hello, vatsal”

this also considers matching.

 

And I want to run this alert every hour.

 

Solution-1 - I could use the map command as below but I don't think that's very efficient.

 

| inputlookup words_lookup.py
| eval or_field = <convert words to or list like "vatsal" OR "jagani">
| map max_count=1000000 "search index=abc $or_field$"

 

 

Solution-2 - I could write a Python script, but I'm not sure what algorithm to use.

 

I'm looking for a more efficient query or python algorithm to do this efficiently.

Labels (2)
Tags (2)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

That's a tough problem. It's not a Splunk-tough problem but a generally tough problem.

in order to find the matches... you need to do the comparisons. And that's the biggest problem here. Since you don't have a fixed field which you want to look up but want to use the lookup as a list of patterns to match against your whole raw event (at least that's how I interpret your requirement), you have to do m*n "searches" against your data where m is the number of your events and n is the number of distinct values in your lookup.

If you know you can split the events into separate words, that might make it a bit easier because you don't have to match your raw event against terms from the lookup but rather do a lookup with the words from the event (which could be marginally faster since it's more probable than you'll match something before reaching the end of the lookup).

There are several possible approaches here but I'm not sure which one would be fastest given the size of your data. The more events you have to match, the more it's tempting to create something matching cleverly over a sorted list of your terms from the lookup.

(to make things a bit more complicated one has to remember that each "comparison" is also not an atomic operation but also depends on the length of the strings and the match ratio).

0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

You are right @PickleRick .

 

I'm guessing I'm left with 2nd option of building with Python script inside a custom command. And I need to spend some time on building an algorithm that best suits performance. I'll experiment.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...