Solved: How to count or match fields?

David_Arnold · ‎09-18-2023

I have a CSV of URLs I need to search against my proxy index (the url field), I want to be able to do a count or match of the URLs.
my csv looks like this (with the header of the column called kurl)
kurl
splunk.com
youtube.com
google.com

So far, I have this SPL but it's only counting the matches, i need the URLs that don't exist to count 0

index="web_index" 
    [| inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
| stats count by url

yuanliu · ‎09-20-2023

Actually, I made a mistake when I emulate data. (I had a spurious "match" field in emulation which your index subsearch will not give.) Here is the correct solution:

index="web_index" 
    [| inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
| append
    [inputlookup URLs.csv
    | fields kurl]
| eval match = if(isnull(url), 0, 1)
| eval url = coalesce(url, kurl)
| stats sum(match) as count by url

Here is the corrected emulation

| makeresults
| eval url = mvappend("google.com", "foo.com", "bar.com", "google.com")
| eval index = "web_index"
| mvexpand url
| search * 
    [ inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
``` the above emulates
index="web_index" 
    [| inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
```

This gives

_time	index	url
2023-09-20 16:20:05	web_index	google.com
2023-09-20 16:20:05	web_index	google.com

(If your results do not meet the requirement, it would be useful to compare real data with emulation.) Combined simulation is then

| makeresults
| eval url = mvappend("google.com", "foo.com", "bar.com", "google.com")
| mvexpand url
| lookup URLs.csv kurl as url output kurl as match
| where isnotnull(match)
| fields - match
``` the above emulates
index="web_index" 
    [| inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
```
| append
    [inputlookup URLs.csv
    | fields kurl]
| eval match = if(isnull(url), 0, 1)
| eval url = coalesce(url, kurl)
| stats sum(match) as count by url

I get

url	count
google.com	2
splunk.com	0
youtube.com	0

However, I still don't understand why the first option (stats before append) would not give you the correct output. Here's my full emulation:

| makeresults
| eval url = mvappend("google.com", "foo.com", "bar.com", "google.com")
| mvexpand url
| lookup URLs.csv kurl as url output kurl as match
| where isnotnull(match)
| fields - match
``` the above emulates
index="web_index" 
    [| inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
```
| stats count by url
| append
    [inputlookup URLs.csv
    | fields kurl
    | rename kurl AS url]
| stats sum(count) as count by url
| fillnull count ``` or you can omit this and leave nonexistent to show null ```

Again It would help if you can run data part of emulation and let me know the difference with real data.

View solution in original post

yuanliu · ‎09-19-2023

You selected lookup as label, but are using inputlookup😃. You would have the answer if you stick to lookup.

index="web_index" 
| lookup URLs.csv kurl as url output kurl as match
| eval match = if(isnull(match), 0, 1)
| stats sum(match) as count by url

David_Arnold · ‎09-19-2023

That also returned every url not on my lookup. I guess I could just take the hits and do a duplicate compare in excel but it'd be nice to see it all in splunk.

Yeah, sorry about the tag I wasn't paying attention.

yuanliu · ‎09-19-2023

Maybe you can explain the requirement further? You said of your original SPL "it's only counting the matches, i need the URLs that don't exist to count 0." So, I thought you would want every url not with a match. Could you mock up some data in web_index and explain why the output doesn't meet the requirements?

Here is my emulation

| makeresults
| fields - _time
| eval url = mvappend("google.com", "foo.com", "bar.com", "google.com")
| mvexpand url
``` the above emulates index="web_index" ```

url

google.com

foo.com

bar.com

google.com

Using the exact mock lookup you give, my search will give

url	count
bar.com	0
foo.com	0
google.com	2

Is this not what you want, bar.com and foo.com count to 0?

Below is the full example

| makeresults
| fields - _time
| eval url = mvappend("google.com", "foo.com", "bar.com", "google.com")
| mvexpand url
``` the above emulates index="web_index" ```
| lookup URLs.csv kurl as url output kurl as match
| eval match = if(isnull(match), 0, 1)
| stats sum(match) as count by url

David_Arnold · ‎09-19-2023

So my first SPL, it gets me the URLs I'm looking for but doesn't list the URLs (in the lookup) that don't get any results.
So to break it down, say my csv has 120 URLs and I would like to know what my users are hitting and not hitting, when i run my first example search I'm getting 43 results back, but I also want to know what are the URLs (out of the 120) not hitting? Or should I create a search only telling me the URLs that don't match in the index?

Whilst that example does make sense, in the stats table I get all these other URLs that aren't even in my lookup list that I care about. So I go from expecting 120 results to (18,000).

Does that make more sense?

yuanliu · ‎09-20-2023

So my first SPL, it gets me the URLs I'm looking for but doesn't list the URLs (in the lookup) that don't get any results.

The parenthesis "(in the lookup)" is key here. My first reading of "only counting the matches, i need the URLs that don't exist to count 0" took it to mean URLs in the index that didn't have a match. This new description makes perfect sense. And your search is more efficient.

index="web_index" 
    [| inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
| stats count by url
| append
    [inputlookup URLs.csv
    | fields kurl
    | rename kurl AS url]
| stats sum(count) as count by url
| fillnull count ``` or you can omit this and leave nonexistent to show null ```

Alternatively, you can do

index="web_index" 
    [| inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
| append
    [inputlookup URLs.csv
    | fields kurl
    | rename kurl AS url]
| eval match = if(isnull(match), 0, 1)
| stats sum(match) as count by url

David_Arnold · ‎09-20-2023

Yeah it still doesn't mark the URLs in my list that don't exist in the index as 0.

I'll just accept your solution though, thanks for your help 😃

yuanliu · ‎09-20-2023

Actually, I made a mistake when I emulate data. (I had a spurious "match" field in emulation which your index subsearch will not give.) Here is the correct solution:

index="web_index" 
    [| inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
| append
    [inputlookup URLs.csv
    | fields kurl]
| eval match = if(isnull(url), 0, 1)
| eval url = coalesce(url, kurl)
| stats sum(match) as count by url

Here is the corrected emulation

| makeresults
| eval url = mvappend("google.com", "foo.com", "bar.com", "google.com")
| eval index = "web_index"
| mvexpand url
| search * 
    [ inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
``` the above emulates
index="web_index" 
    [| inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
```

This gives

_time	index	url
2023-09-20 16:20:05	web_index	google.com
2023-09-20 16:20:05	web_index	google.com

(If your results do not meet the requirement, it would be useful to compare real data with emulation.) Combined simulation is then

| makeresults
| eval url = mvappend("google.com", "foo.com", "bar.com", "google.com")
| mvexpand url
| lookup URLs.csv kurl as url output kurl as match
| where isnotnull(match)
| fields - match
``` the above emulates
index="web_index" 
    [| inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
```
| append
    [inputlookup URLs.csv
    | fields kurl]
| eval match = if(isnull(url), 0, 1)
| eval url = coalesce(url, kurl)
| stats sum(match) as count by url

I get

url	count
google.com	2
splunk.com	0
youtube.com	0

However, I still don't understand why the first option (stats before append) would not give you the correct output. Here's my full emulation:

| makeresults
| eval url = mvappend("google.com", "foo.com", "bar.com", "google.com")
| mvexpand url
| lookup URLs.csv kurl as url output kurl as match
| where isnotnull(match)
| fields - match
``` the above emulates
index="web_index" 
    [| inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
```
| stats count by url
| append
    [inputlookup URLs.csv
    | fields kurl
    | rename kurl AS url]
| stats sum(count) as count by url
| fillnull count ``` or you can omit this and leave nonexistent to show null ```

Again It would help if you can run data part of emulation and let me know the difference with real data.

David_Arnold · ‎09-20-2023

THIS WORKS!!!!!!!!!!
Thanks so much!!!!

How to count or match fields?

lookup

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

.conf24 | Session Scheduler is Live!!

Introducing the Splunk Community Dashboard Challenge!