Splunk Search

How to count or match fields?

David_Arnold
Explorer

I have a CSV of URLs I need to search against my proxy index (the url field), I want to be able to do a count or match of the URLs.
my csv looks like this (with the header of the column called kurl)
kurl
splunk.com
youtube.com
google.com


So far, I have this SPL but it's only counting the matches, i need the URLs that don't exist to count 0

 

 

index="web_index" 
    [| inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
| stats count by url

 

 

 

Labels (1)
Tags (2)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

Actually, I made a mistake when I emulate data. (I had a spurious "match" field in emulation which your index subsearch will not give.)   Here is the correct solution:

 

index="web_index" 
    [| inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
| append
    [inputlookup URLs.csv
    | fields kurl]
| eval match = if(isnull(url), 0, 1)
| eval url = coalesce(url, kurl)
| stats sum(match) as count by url

 

Here is the corrected emulation

 

| makeresults
| eval url = mvappend("google.com", "foo.com", "bar.com", "google.com")
| eval index = "web_index"
| mvexpand url
| search * 
    [ inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
``` the above emulates
index="web_index" 
    [| inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
```

 

This gives

_timeindexurl
2023-09-20 16:20:05web_indexgoogle.com
2023-09-20 16:20:05web_indexgoogle.com

(If your results do not meet the requirement, it would be useful to compare real data with emulation.)  Combined simulation is then

 

| makeresults
| eval url = mvappend("google.com", "foo.com", "bar.com", "google.com")
| mvexpand url
| lookup URLs.csv kurl as url output kurl as match
| where isnotnull(match)
| fields - match
``` the above emulates
index="web_index" 
    [| inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
```
| append
    [inputlookup URLs.csv
    | fields kurl]
| eval match = if(isnull(url), 0, 1)
| eval url = coalesce(url, kurl)
| stats sum(match) as count by url

 

I get

urlcount
google.com2
splunk.com0
youtube.com0

 

However, I still don't understand why the first option (stats before append) would not give you the correct output.  Here's my full emulation:

 

| makeresults
| eval url = mvappend("google.com", "foo.com", "bar.com", "google.com")
| mvexpand url
| lookup URLs.csv kurl as url output kurl as match
| where isnotnull(match)
| fields - match
``` the above emulates
index="web_index" 
    [| inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
```
| stats count by url
| append
    [inputlookup URLs.csv
    | fields kurl
    | rename kurl AS url]
| stats sum(count) as count by url
| fillnull count ``` or you can omit this and leave nonexistent to show null ```

 

Again It would help if you can run data part of emulation and let me know the difference with real data.

View solution in original post

0 Karma

yuanliu
SplunkTrust
SplunkTrust

You selected lookup as label, but are using inputlookup😃.  You would have the answer if you stick to lookup.

index="web_index" 
| lookup URLs.csv kurl as url output kurl as match
| eval match = if(isnull(match), 0, 1)
| stats sum(match) as count by url

 

0 Karma

David_Arnold
Explorer

That also returned every url not on my lookup.  I guess I could just take the hits and do a duplicate compare in excel but it'd be nice to see it all in splunk.

Yeah, sorry about the tag I wasn't paying attention.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Maybe you can explain the requirement further?  You said of your original SPL "it's only counting the matches, i need the URLs that don't exist to count 0."  So, I thought you would want every url not with a match.  Could you mock up some data in web_index and explain why the output doesn't meet the requirements?

Here is my emulation

 

| makeresults
| fields - _time
| eval url = mvappend("google.com", "foo.com", "bar.com", "google.com")
| mvexpand url
``` the above emulates index="web_index" ```

 

url
google.com
foo.com
bar.com
google.com

Using the exact mock lookup you give, my search will give

urlcount
bar.com0
foo.com0
google.com2

Is this not what you want, bar.com and foo.com count to 0?

Below is the full example

 

| makeresults
| fields - _time
| eval url = mvappend("google.com", "foo.com", "bar.com", "google.com")
| mvexpand url
``` the above emulates index="web_index" ```
| lookup URLs.csv kurl as url output kurl as match
| eval match = if(isnull(match), 0, 1)
| stats sum(match) as count by url

 

 

0 Karma

David_Arnold
Explorer

So my first SPL, it gets me the URLs I'm looking for but doesn't list the URLs (in the lookup) that don't get any results.
So to break it down, say my csv has 120 URLs and I would like to know what my users are hitting and not hitting, when i run my first example search I'm getting 43 results back, but I also want to know what are the URLs (out of the 120) not hitting? Or should I create a search only telling me the URLs that don't match in the index?

Whilst that example does make sense, in the stats table I get all these other URLs that aren't even in my lookup list that I care about.  So I go from expecting 120 results to (18,000).

Does that make more sense?


0 Karma

yuanliu
SplunkTrust
SplunkTrust

So my first SPL, it gets me the URLs I'm looking for but doesn't list the URLs (in the lookup) that don't get any results.

The parenthesis "(in the lookup)" is key here.  My first reading of "only counting the matches, i need the URLs that don't exist to count 0" took it to mean URLs in the index that didn't have a match.  This new description makes perfect sense.  And your search is more efficient.

index="web_index" 
    [| inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
| stats count by url
| append
    [inputlookup URLs.csv
    | fields kurl
    | rename kurl AS url]
| stats sum(count) as count by url
| fillnull count ``` or you can omit this and leave nonexistent to show null ```​

Alternatively, you can do

index="web_index" 
    [| inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
| append
    [inputlookup URLs.csv
    | fields kurl
    | rename kurl AS url]
| eval match = if(isnull(match), 0, 1)
| stats sum(match) as count by url

 

Tags (1)
0 Karma

David_Arnold
Explorer

Yeah it still doesn't mark the URLs in my list that don't exist in the index as 0.

I'll just accept your solution though, thanks for your help 😃

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Actually, I made a mistake when I emulate data. (I had a spurious "match" field in emulation which your index subsearch will not give.)   Here is the correct solution:

 

index="web_index" 
    [| inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
| append
    [inputlookup URLs.csv
    | fields kurl]
| eval match = if(isnull(url), 0, 1)
| eval url = coalesce(url, kurl)
| stats sum(match) as count by url

 

Here is the corrected emulation

 

| makeresults
| eval url = mvappend("google.com", "foo.com", "bar.com", "google.com")
| eval index = "web_index"
| mvexpand url
| search * 
    [ inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
``` the above emulates
index="web_index" 
    [| inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
```

 

This gives

_timeindexurl
2023-09-20 16:20:05web_indexgoogle.com
2023-09-20 16:20:05web_indexgoogle.com

(If your results do not meet the requirement, it would be useful to compare real data with emulation.)  Combined simulation is then

 

| makeresults
| eval url = mvappend("google.com", "foo.com", "bar.com", "google.com")
| mvexpand url
| lookup URLs.csv kurl as url output kurl as match
| where isnotnull(match)
| fields - match
``` the above emulates
index="web_index" 
    [| inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
```
| append
    [inputlookup URLs.csv
    | fields kurl]
| eval match = if(isnull(url), 0, 1)
| eval url = coalesce(url, kurl)
| stats sum(match) as count by url

 

I get

urlcount
google.com2
splunk.com0
youtube.com0

 

However, I still don't understand why the first option (stats before append) would not give you the correct output.  Here's my full emulation:

 

| makeresults
| eval url = mvappend("google.com", "foo.com", "bar.com", "google.com")
| mvexpand url
| lookup URLs.csv kurl as url output kurl as match
| where isnotnull(match)
| fields - match
``` the above emulates
index="web_index" 
    [| inputlookup URLs.csv 
    |  fields kurl 
    |  rename kurl as url]
```
| stats count by url
| append
    [inputlookup URLs.csv
    | fields kurl
    | rename kurl AS url]
| stats sum(count) as count by url
| fillnull count ``` or you can omit this and leave nonexistent to show null ```

 

Again It would help if you can run data part of emulation and let me know the difference with real data.

0 Karma

David_Arnold
Explorer

THIS WORKS!!!!!!!!!!
Thanks so much!!!!

0 Karma
Get Updates on the Splunk Community!

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

The Splunk Community Dashboard Challenge is underway! This is your chance to showcase your skills in creating ...

.conf24 | Session Scheduler is Live!!

.conf24 is happening June 11 - 14 in Las Vegas, and we are thrilled to announce that the conference catalog ...

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...