Splunk Search

How to find count of events by host from inputlookup?

whar_garbl
Path Finder

I have a lookup table with only one field, named host. The table contains a list of hostnames. 

I'm trying to find a way to get a count of events by host using this lookup table as the input (i.e. the hosts I want a count for). 

I've tried a variety of approaches. For example: 

 

 

|inputlookup file.csv | stats count by host

 

 

Every host returns a count of 1. 

 

 

|inputlookup file.csv | join type=left host [|tstats count by host] 

 

 

About a dozen hosts return counts; the rest return null values. 

Complicating this problem seems to be case. If I crunch all the hosts to upper or lowercase, I get different results, but neither returns a complete result set. That seems super odd given that field values aren't case sensitive. I've tried crunching case with eval as well as in the lookup table itself, to no avail. 

We're stumped. What is the best approach to use a lookup table of hostnames to get an event count by host?

Labels (3)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

In order to show the hosts where there is no data, your search should look like

| tstats count where [ | inputlookup file.csv ],index=* by host
| append [
  | inputlookup file.csv
  | eval count=0
]
| stats max(count) as count by host

so, you are first finding the data for the hosts that exist (tstats) and then adding all the hosts with count=0 to the end of the list, then finally aggregating the max count for all hosts, so that those hosts who have no data will then have 0 as their count.

 

View solution in original post

bowesmana
SplunkTrust
SplunkTrust

You could do this, just using the inputlookup as a subsearch

| tstats count where [ | inputlookup file.csv ] by host

whar_garbl
Path Finder

This was close to what I want to do:

 

| tstats count where [ | inputlookup file.csv ],index=* by host

 

But it doesn't include hosts from the lookup that have no events, which I want to include. Adding a fillnull afterward doesn't do it. 

0 Karma

bowesmana
SplunkTrust
SplunkTrust

In order to show the hosts where there is no data, your search should look like

| tstats count where [ | inputlookup file.csv ],index=* by host
| append [
  | inputlookup file.csv
  | eval count=0
]
| stats max(count) as count by host

so, you are first finding the data for the hosts that exist (tstats) and then adding all the hosts with count=0 to the end of the list, then finally aggregating the max count for all hosts, so that those hosts who have no data will then have 0 as their count.

 

whar_garbl
Path Finder

Thanks. I've tried a very similar permutation today, but this brings back the problem I had at another point: I get about a third more results than I should, and they're duplicates (one uppercase, one lowercase).

If I cast the host field to upper or lowercase before the stats max command, I get the correct number of hosts but around 10% then erroneously return 0 events. E.g. 

 

| tstats count where [ | inputlookup file.csv ],index=* by host
| append [
  | inputlookup file.csv
  | eval count=0
]
| eval host=upper(host)
| dedup host
| stats max(count) as count by host

 

 

Expected result: 

 

host      count
HOST1     12345
HOST2     67890
HOST3     24680

 

 

Actual result:

 

host      count
HOST1     0
HOST2     67890
HOST3     24680

 

 

But if I search for HOST1 or host1 manually, there are many thousands of events in the same time period. Since field values are supposed to be case-insensitive (IIRC), I'm stumped as to why case is relevant here. It's trivial to do the case-casting in the actual lookup file if appropriate, but I think there's one more piece missing somewhere.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

You should not do dedup - that's wrong. The whole point of stats is to do the aggregation of hosts, so you NEED both hosts (data value and 0 value from lookup) to exist, so the max(count) can work.

whar_garbl
Path Finder

Ah, I see what you mean. Righto. So, then I end up getting the same results as explained, with ~10% of hosts showing up with different-case duplicates - one with appropriate event counts, and one with zero. Any thoughts on what's going on there?

0 Karma

bowesmana
SplunkTrust
SplunkTrust

No idea without seeing the data and not sure what you mean by different case duplicates - narrow it down to a single host and show what the data is and what should be expected.

0 Karma

whar_garbl
Path Finder

Expected: 

host       count
HOST1      11111
HOST2      22222
HOST3      33333

Actual: 

host       count
HOST1      11111
HOST2      22222
HOST3      0
host1      0
host2      0
host3      33333

Does that make sense? I get HOSTx and hostx (the lookups are all uppercased), and on one or the other there are zero events counted - but they're the same host.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

That's odd if you are doing

| eval host=upper(host)

before the stats command as there cannot be any lower case hosts in there any more

whar_garbl
Path Finder

Huh. It must be an issue of WHEN I crunch the case, because this does the trick and gets the results I expect. The only zero hosts now are true zeros. Success!

You're the best! Thanks so much for sticking with this and holding my hand through it. I'm going to accept the post with the main query as the answer for the next person. Cheers!

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The first query 

|inputlookup file.csv | stats count by host

is counting how many times each host name appears in the lookup file.  That's why the results are only '1'.

The second query look for all hosts in the default indexes and joins those results with the lookup file.  Hosts not in an index will have a null count, but that can be fixed with the fillnull command.  To help ensure more hosts are found, specify index=* in the tstats command.

|inputlookup file.csv 
| join type=left host [|tstats count where index=* by host] 
| fillnull value=0 host

I've seen host names fail to match if the lookup contains a host name, but the events contain a FQDN or vice versa.

---
If this reply helps you, Karma would be appreciated.

whar_garbl
Path Finder

Thanks. I did start out using index=* as well, but I forgot about it. I still get null values for way too many hosts, but I can search for them directly using the same syntax (i.e. index=* and short hostname as found in the inputlookup) and get many thousands of events. It's making me nuts because it seems like it should be super straightforward!

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...