Splunk Search

How to lookup the best matching ingress url for url field in log?

tgravvold
Engager

Dear Splunk community,

I'm new to Splunk, so excuse my incompetence...

What I'm trying to do is enriching my web access log with app name and team name from a csv lookup file.

The CSV file "ingress_map.csv" looks like this:

 

ingress,app,team
https://mycompany.com/abc,foo-bar,a-team
https://app.mycompany.com,good-app,b-team
https://app.mycompany.com/abc,better-app,c-team
https://app.mycompany.com/abc/xyz,best-app,d-team

 

 

The url field of my web access log will seldom match exactly one of the ingresses, is it possible to have a lookup that finds the best matching ingress and adds the fields app and team to the log line? Or is there a better way of solving this problem?

 

Regards

Terje Gravvold

Labels (1)
0 Karma

tgravvold
Engager

Thanks @bowesmana ! I'll give your rex uri depth match a try. It will be a fairly simple solution if it fits the requirements. 

0 Karma

tgravvold
Engager

I found this post that points out ways to compare strings the way I want:

https://community.splunk.com/t5/Splunk-Search/Can-splunk-compare-two-strings-and-return-likeness-sim...

But I don't know if this kind of comparison is possible with lookup tables. Would it be better to feed the CSV data into a index?

0 Karma

yuanliu
SplunkTrust
SplunkTrust

I agree that a Python/external script is better suited for what you need.  Ingesting CSV into index has lots of drawbacks, however.

One possibility - maybe an easy solution, is to just read the CSV in the script.  You can return all needed values from the script directly.  No need to pull the CSV into search at all.  If you really need the content in SPL after running the script, there can be several approaches.  For example,

  • Make the script return the best matched string in CSV, then use this string to perform lookup.  This will return all bested matched field values again. 
  • Make a "dummy" field in CSV and use that dummy field to return every entry in the lookup file into every event.  That technique is only useful in vary limited use cases.
  • Append | inputlookup into search, then use stats or some other technique to utilize it.
0 Karma

yuanliu
SplunkTrust
SplunkTrust

You need to first define what is a "best match" in terms of data.  Will domain match suffice?  Domain and protocol?  Domain, protocol, plus a fixed number of paths?

Another question is how much flexibility is in the content of that lookup file.

0 Karma

tgravvold
Engager

Thanks for the answer. Unfortunately I can not assume matching only hostname or a fixed path depth. If it helps, I can manipulate the ingress in the CSV. The CSV is exported via script. For example I can sort the CSV by index if it helps.

What I'm seeking is a function that matches each ingress from the CSV to the url in the log entry and picks the ingress that best matches (matches most characters from start of field.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

What about using wildcard lookups which will get you part of the way

https://mycompany.com/*foo-bara-team1
https://app.mycompany.comgood-appb-team0
https://app.mycompany.com/*better-appc-team1
https://app.mycompany.com/*/*best-appd-team2
https://app.mycompany.com/*/*/*gold-star-appz-team3

 

Made with this search

 

| makeresults
| eval _raw="ingress,app,team
https://mycompany.com/*,foo-bar,a-team
https://app.mycompany.com,good-app,b-team
https://app.mycompany.com/*,better-app,c-team
https://app.mycompany.com/*/*,best-app,d-team
https://app.mycompany.com/*/*/*,gold-star-app,z-team"
| multikv forceheader=1
| table ingress,app,team
| rex field=ingress max_match=0 "(?<p>/\*)"
| eval depth=mvcount(p)
| fillnull depth
| fields - p
| outputlookup ingress.csv

 

 

where the last column is path depth (based on number of /* elements in the path) and then (having made a wildcard lookup definition - WILDCARD(ingress), this example will 

 

| makeresults
| eval _raw="ingress,app,team
https://mycompany.com/abc,foo-bar,a-team
https://app.mycompany.com,good-app,b-team
https://app.mycompany.com/abc,better-app,c-team
https://app.mycompany.com/abc/xyz,best-app,d-team
https://app.mycompany.com/zyx/123/abc,gold-star-app,z-team"
| multikv forceheader=1
| table ingress,app,team
| lookup ingress ingress OUTPUT team as f_team depth as f_depth
| eval actual_team=mvindex(f_team, max(f_depth, 1) - 1)

 

then return a match where it is exact or the match with the greatest path depth on the match.

Not sure if this would work in all cases and your requirements may be more specific than this...

BTW, I have used the fuzzy lookup app, which works reasonably well - I had to make a tweak to the underlying Python to get it to work in my context

https://splunkbase.splunk.com/app/5237

but naturally it can use quite a bit of compute

Get Updates on the Splunk Community!

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...