Splunk Search

inputlookup with fuzzy matching

warren
Explorer

Hello,

I'm building a query which matches entries in an inputlookup table against a set of log data. The original working query (thanks to @ITWhisperer ) is:

dataFeedTypeId=AS
[ | inputlookup approvedsenders | fields Value
| rename Value as sender]
| stats count as cnt_sender by sender
| append
[inputlookup approvedsenders | fields Value | rename Value as sender]
| fillnull cnt_sender
| stats sum(cnt_sender) as count BY sender

This is correctly providing a list of all of the emails address entries in the lookup file with the number of times they occur in the email address field (sender) of the dataset.

However, the "Value" field from the lookup file may contain an email address, an IP address, or a domain name:

"Value,description,spoof,spam,heuristic,newsletter
training.cloud@bob,Email Tests,Y,Y,Y,Y
mk.mimecast.com,mimecast emails,Y,Y,Y,Y
blah@yahoo.com,more belgrat,Y,Y,Y,Y
bbc.co.uk,BBC sends me lots of useful information,N,Y,Y,Y
81.96.24.195,test IP,Y,Y,Y,Y
yahoo.com, Yahoo domain,Y,Y,Y,Y"

What I need to do next is to widen the search, at the moment it is doing an exact match on the sender field and I need the entries in the lookup file to be used against a number of fields in the log data.

Examples: This query seems to provide the correct results but it does not show the null values as above(n.b. the return 1000 isn't optimal code, I'd prefer it to simply return all of the results):

dataFeedTypeId=AS
[| inputlookup approvedsenders
| fields Value
| return 1000 $Value]
| stats count as cnt_sender by sender

This query also provides the correct results but again does not show the null values (there is no domain field in the log data, I am using an app to extract it from the sender field)

dataFeedTypeId=AS
| rex field=sender "\@(?<domain_detected>.*)"
| eval list="mozilla"
| `ut_parse_extended(domain_detected, list)`
| lookup approvedsenders Value AS domain_detected OUTPUTNEW description Value
| lookup approvedsenders Value AS sender OUTPUTNEW description Value
| lookup approvedsenders Value AS senderIp OUTPUTNEW description Value
| search description="*"
| table subject sender domain_detected senderIP description Value
| lookup approvedsenders description OUTPUTNEW Value as Matched
| stats count by Matched

 

My apologies for the long question, thank you for taking the time to read this far.

Labels (1)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

You need to clarify the question.

Examples: This query seems to provide the correct results but it does not show the null values as above(n.b. the return 1000 isn't optimal code, I'd prefer it to simply return all of the results):


What do you mean by "null values" here?  Do you mean the 0s that fillnull command supplied?  You can apply the same trick as before, and it will have the same effect. (BTW, do not do fillnull outside of append subsearch.  That adds to compute.)

dataFeedTypeId=AS
[| inputlookup approvedsenders
| fields Value
| return 9999 $Value]
| stats count as cnt_sender by sender
| append
  [inputlookup approvedsenders
  | fields Value
  | rename Value AS sender]
| fillnull cnt_sender
| stats sum(cnt_sender) as cnt_sender by sender

(I changed 1000 to 9999 just as a suggestion simply give it a large enough number.)  Is this what you are looking for?

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Network to App: Observability Unlocked [May & June Series]

In today’s digital landscape, your environment is no longer confined to the data center. It spans complex ...

SPL2 Deep Dives, AppDynamics Integrations, SAML Made Simple and Much More on Splunk ...

Splunk Lantern is Splunk’s customer success center that provides practical guidance from Splunk experts on key ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...