Re: How to lookup the best matching ingress url fo...

tgravvold · ‎10-01-2022

Dear Splunk community,

I'm new to Splunk, so excuse my incompetence...

What I'm trying to do is enriching my web access log with app name and team name from a csv lookup file.

The CSV file "ingress_map.csv" looks like this:

ingress,app,team
https://mycompany.com/abc,foo-bar,a-team
https://app.mycompany.com,good-app,b-team
https://app.mycompany.com/abc,better-app,c-team
https://app.mycompany.com/abc/xyz,best-app,d-team

The url field of my web access log will seldom match exactly one of the ingresses, is it possible to have a lookup that finds the best matching ingress and adds the fields app and team to the log line? Or is there a better way of solving this problem?

Regards

Terje Gravvold

tgravvold · ‎10-10-2022

Thanks @bowesmana ! I'll give your rex uri depth match a try. It will be a fairly simple solution if it fits the requirements.

tgravvold · ‎10-02-2022

I found this post that points out ways to compare strings the way I want:

https://community.splunk.com/t5/Splunk-Search/Can-splunk-compare-two-strings-and-return-likeness-sim...

But I don't know if this kind of comparison is possible with lookup tables. Would it be better to feed the CSV data into a index?

yuanliu · ‎10-02-2022

I agree that a Python/external script is better suited for what you need. Ingesting CSV into index has lots of drawbacks, however.

One possibility - maybe an easy solution, is to just read the CSV in the script. You can return all needed values from the script directly. No need to pull the CSV into search at all. If you really need the content in SPL after running the script, there can be several approaches. For example,

Make the script return the best matched string in CSV, then use this string to perform lookup. This will return all bested matched field values again.
Make a "dummy" field in CSV and use that dummy field to return every entry in the lookup file into every event. That technique is only useful in vary limited use cases.
Append | inputlookup into search, then use stats or some other technique to utilize it.

yuanliu · ‎10-01-2022

You need to first define what is a "best match" in terms of data. Will domain match suffice? Domain and protocol? Domain, protocol, plus a fixed number of paths?

Another question is how much flexibility is in the content of that lookup file.

tgravvold · ‎10-02-2022

Thanks for the answer. Unfortunately I can not assume matching only hostname or a fixed path depth. If it helps, I can manipulate the ingress in the CSV. The CSV is exported via script. For example I can sort the CSV by index if it helps.

What I'm seeking is a function that matches each ingress from the CSV to the url in the log entry and picks the ingress that best matches (matches most characters from start of field.

bowesmana · ‎10-02-2022

What about using wildcard lookups which will get you part of the way

https://mycompany.com/*	foo-bar	a-team	1
https://app.mycompany.com	good-app	b-team	0
https://app.mycompany.com/*	better-app	c-team	1
https://app.mycompany.com//	best-app	d-team	2
https://app.mycompany.com///*	gold-star-app	z-team	3

Made with this search

| makeresults
| eval _raw="ingress,app,team
https://mycompany.com/*,foo-bar,a-team
https://app.mycompany.com,good-app,b-team
https://app.mycompany.com/*,better-app,c-team
https://app.mycompany.com/*/*,best-app,d-team
https://app.mycompany.com/*/*/*,gold-star-app,z-team"
| multikv forceheader=1
| table ingress,app,team
| rex field=ingress max_match=0 "(?<p>/\*)"
| eval depth=mvcount(p)
| fillnull depth
| fields - p
| outputlookup ingress.csv

where the last column is path depth (based on number of /* elements in the path) and then (having made a wildcard lookup definition - WILDCARD(ingress), this example will

| makeresults
| eval _raw="ingress,app,team
https://mycompany.com/abc,foo-bar,a-team
https://app.mycompany.com,good-app,b-team
https://app.mycompany.com/abc,better-app,c-team
https://app.mycompany.com/abc/xyz,best-app,d-team
https://app.mycompany.com/zyx/123/abc,gold-star-app,z-team"
| multikv forceheader=1
| table ingress,app,team
| lookup ingress ingress OUTPUT team as f_team depth as f_depth
| eval actual_team=mvindex(f_team, max(f_depth, 1) - 1)

then return a match where it is exact or the match with the greatest path depth on the match.

Not sure if this would work in all cases and your requirements may be more specific than this...

BTW, I have used the fuzzy lookup app, which works reasonably well - I had to make a tweak to the underlying Python to get it to work in my context

https://splunkbase.splunk.com/app/5237

but naturally it can use quite a bit of compute

How to lookup the best matching ingress url for url field in log?

lookup

Tech Talk Recap | Mastering Threat Hunting

Observability for AI Applications: Troubleshooting Latency

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

Are you a member of the Splunk Community?

How to lookup the best matching ingress url for url field in log?

lookup

Tech Talk Recap | Mastering Threat Hunting

Observability for AI Applications: Troubleshooting Latency

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?