Splunk Search

How to feed a query?

jacques
Loves-to-Learn

I am trying to run a simple query, but with a catch.  I want to run something like this:

index=weblogs somedomain.com | stats count

I don't want to see all of the the events. I don't want detail. I just want to know how many events make reference to *somedomain.com*.  For this specific domain, I'm good, and the above query returns almost instantaneously what I want.

The catch - I now need to perform this simple query several thousand more times for each of the domains on a list, providing the number of events make reference to each domain in the list.

I thought if I spun my list into a CSV, import it and the reference/"feed" the query using something like this:

index=weblogs | inputcsv domainlist.csv | stats count

Or perhaps:

index=weblogs | inputlookup domainlist.csv | stats count

I'd have what I needed, but to no avail.  I've tinkered with LOOKUP, MAP, FOREACH and several others.  In the end, I feel like I've missed the obvious.  I feel like it's the iterative nature of the query that is defeating me.

Thank you in advance for your assistance.

Labels (2)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

Do you have a field that contains the name of the domain or are you searching for that domain anywhere in the raw event data - currently you are just looking for somedomain.com in the raw event.

The obvious solution if there is a field containing the domain is 

| stats count by domainField

but you would need to have domainField containing your event field.

Depending on your data, you may be able to extract 'domains' from the data, for example you could rex out what looks like a domain based on the standard domain elements, e.g. something basic as this captures various domains from raw

| rex "(?<domainField>\w+\.(com|org|co)\.?(au|uk|fr)?)"

 and then you could run your stats following the rex and then finally limit the search results to the ones you want.

 

0 Karma

jacques
Loves-to-Learn

Thank you for your quick response!

I do have a couple of fields in the weblogs that may contain the domain or variations of it.  As I am creating the CSV I can name that single column whatever I like so that it matches.  There are a couple of "unknowns" here that I believe are making this complicated.  In my simple query:

index=weblogs somedomain.com | stats count

I am searching my results for *somedomain.com* regardless of the field.  I certainly can be more specific and designate a single field to search, but I do need to be able to still retain the wildcard aspect so that "somedomain.com" from the list of domains provided by the CSV also finds "somedomain.com", "www.somedomain.com ", "video.somedomain.com", etc. from the weblogs.  In my past experience, adding an asterisk such as domain="*somedomain.com" significantly impacted the search and processing.  "Significantly" = a search similar to what I'm currently using but with an asterisk was 40% complete after two days, but when I ran the same search without the asterisk completed in a couple of minutes.

I am also following up with my Admin to see if there is a means to confirm _which_ field in weblogs is triggering "found" when I execute:

index=weblogs somedomain.com | stats count

I hadn't thought about it but the initial query's speed may likely be due to the request domain being in an indexed field and it might help my response time if I ensure that I use that field to search as opposed to a non-indexed one.

In my attempts to use inputlookup, Splunk refuses to produce a count of hits, 0 found.

Though I do not _know_, I _suspect_ that the inputlookup as a subsearch is essentially responding with the entire results/contents, rather than one at a time, iteratively.  I spent some time trying to get map or foreach to spit each domain from the CSV to me, one at a time but thus far have been unsuccessful in producing results.

Thanks again for your suggestion.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

So, assuming you can identify a domain field in your raw data you could do something like this

<search> 
| stats count by domain
| lookup domain_lookup domain output domain as foundDomain
| where !isnull(foundDomain)
| stats sum(count) as count by foundDomain

where you have a CSV lookup containing a domain field you are interested in, which includes wildcards, e.g.

*somedomain.com
*otherdomain.com
*xyz.domain.com
*.abc.domain.co.uk

and then you create a lookup definition using the CSV file, with an advanced lookup option Match type set to WILDCARD(domain)

Then if your domain in your data looks like

www.somedomain.com 
video.somedomain.com
blablabla.otherdomain.com
xyz.domain.com
123.abc.domain.co.uk
456.abc.domain.co.uk
789.abc.domain.co.uk

you will end up with 

*somedomain.com 2
*otherdomain.com 1
*xyz.domain.com 1
*.abc.domain.co.uk 3

The final stats in the query will sum the results of all the independent found domains into a total for the domain template in the lookup

You could do that several ways - either no need to do the final stats - just show the true found domains, or do this to get the individual counts grouped into the master domain from the lookup.

<search> 
| stats count by domain
| lookup domain_lookup domain output domain as foundDomain
| where !isnull(foundDomain)
| stats list(domain) as domains list(count) as counts sum(count) as total by foundDomain

 

 

0 Karma

jacques
Loves-to-Learn

Thank you.  I'm not initially able to get this to work, but feel like it is on the right path.  I've a series of errors just using a minimal amount of data (e.g. a couple of domains in the CSV over a 15 minute period).  Thank you for bearing with me while I get this sorted.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

If you domain is in a field called domain and your csv has a single column called domain you could try something like this

index=weblogs [| inputlookup domainlist.csv] | stats count by domain

If not, you could extract the domain from your events and search for matches

index=weblogs | <extract domain from events> | search [| inputlookup domainlist.csv] | stats count by domain
0 Karma

jacques
Loves-to-Learn

Thank you for your quick response!

I do have a couple of fields in the weblogs that may contain the domain or variations of it.  As I am creating the CSV I can name that single column whatever I like so that it matches.  There are a couple of "unknowns" here that I believe are making this complicated.  In my simple query:

index=weblogs somedomain.com | stats count

I am results for *somedomain.com* regardless of the field.  I certainly can be more specific and designate a single field to search, but I do need to be able to still retain the wildcard aspect so that "somedomain.com" from the list of domains provided by the CSV also finds "somedomain.com", "www.somedomain.com", "video.somedomain.com", etc. from the weblogs.  In my past experience, adding an asterisk such as domain="*somedomain.com" significantly impacted the search and processing.  "Significantly" = a search similar to what I'm currently using but with an asterisk was 40% complete after two days, but when I ran the same search without the asterisk completed in a couple of minutes.

I really like the simplicity of your suggestion:

index=weblogs [| inputlookup domainlist.csv] | stats count by domain

This is the line of thought I've been working along, but Splunk refuses to produce a count, 0 found.

Though I do not _know_, I _suspect_ that the inputlookup as a subsearch is essentially responding with the entire results/contents, rather than one at a time, iteratively.  I spent some time trying to get map or foreach to spit each domain from the CSV to me, one at a time but thus far have been unsuccessful.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
index=weblogs [| inputlookup domainlist.csv] | stats count by domain

effectively expands to 

index=weblogs (domain=somedomain.com OR domain=someotherdomain.com) | stats count by domain

so you are right, this would not pick up on www.somedomain.com (unless that was also in domainlist.csv).

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...