I need to report hosts that are configured to receive app.log details and also report the ones that are missing. For this, I use the query "index=application sourcetype="application:server:log" | stats values(host) as hostnames by customer_name". This retrieves the hostnames for each customer_name from the sourcetype.
I get a result as:
customer_name | host |
customer1 | server1 |
customer2 | server2 server3 |
Then, I join the result by customer_name field from the second part of the query "[| inputlookup server_info.csv | rename customer as customer_name | stats values(host) as hostnames_lookup by customer_name] which retrieves the hostnames for each customer_name from the server_info.csv lookup table."
Here I get result as:
customer_name | host |
customer1 | server1 server100 |
customer2 | server2 server3 server101 |
Later, I expand both the multivalue fields and perform evaluation on both the fields to retrieve result as configured or not configured. The evaluation looks like this
| mvexpand hostnames
| mvexpand hostnames_lookup
| eval not_configured = if(hostnames == hostnames_lookup, hostnames, null())
| eval configured = if(hostnames != hostnames_lookup, hostnames, null())
| fields customer_name, hostnames, hostnames_lookup, configured, not_configured
My final query looks like this:
(index=application sourcetype="application:server:log)
| stats values(host) as hostnames by customer_name
| join customer_name
[| inputlookup server_info.csv
| rename customer as customer_name
| stats values(host) as hostnames_lookup by customer_name]
| mvexpand hostnames
| mvexpand hostnames_lookup
| eval not_configured = if(hostnames == hostnames_lookup, hostnames, null())
| eval configured = if(hostnames != hostnames_lookup, hostnames, null())
| fields customer_name, hostnames, hostnames_lookup, configured, not_configured
However, in the result when the evaluation is completed the results are not as expected, the matching logic doesn't work and the resultant output is incorrect. There are no values evaluated in the not_configured column and the configured column only returns the values in hostnames. However, I'd expect the configured field to show results of all the servers configured to receive app.log and not configured to have hostnames that are present in lookup but are still not configured to receive logs.
Expected Output:
customer_name | hostnames | hostnames_lookup | configured | not_configured |
customer1 | server1 | server1 server100 | server1 | server100 |
customer2 | server2 server3 | server2 server3 server101 | server2 server3 | server101 |
Current Output:
customer_name | hostnames | hostnames_lookup | configured | not_configured |
customer1 | server1 | server1 server100 | server1 | |
customer2 | server2 server3 | server2 server3 server101 | server2 server3 |
Essentially customer1 and customer2 should display server1 as configured and server100 not_configured and likewise for customer2 as mentioned in expected output table. Which will mean that server100 and 101 are part of the lookup but are not configured to receive app.log
How can I evaluate this differently, so that the comparison works as expected. Is it possible to compare the values in this fashion? Is there anything wrong with the current comparison logic? Should I not use mvexpand on the extracted fields so that they are compared expectedly?
So as I understand,
| append [
| inputlookup server_info.csv
| rename customer as customer_name
| stats values(host) as hostnames by customer_name ]
is going to append the hostnames from lookup into the results received from the first query.
Finally,
| stats count by customer_name hostnames
is going to count the value as 1 if it's present only in lookup, otherwise, if it's present in the first part of search and lookup then the count is going to be evaluated as 2? Is that correct?
However, in the result there are no values with count as 2. Which is unlikely as there are few hosts which are present in events and lookup as well.
Here, we try to fetch the events that contains hostnames(configured to receive application logs) and then compare them with the list of servers present in lookup(if found in lookup only then count=1).
However it seems that the query still isn't performing the required search as there are no values with count as 2. Here I observer count is returned as 1 for the hosts that are received from the events in below
index=application sourcetype="application:server:log"
| stats values(host) as hostnames by customer_name
There is almost no trace of values in the lookup, so I'm not sure if they are even being compared. And this is what the issue earlier was.
How can these two lists be compared and listed?
Close. The count will depend on how many of those hostname-customer_name pairs you have. If you have just one (supposedly only from the lookup), it will be 1. If you have two (from the lookup and from the data), it will be two.
At least it should work this way. Of course I don't know either your events nor your lookup contents so I'm only deducing the data format from your searches.
See this short run-anywhere example.
| makeresults
| eval hosts=split("a,b,c,d:a,b,c",":")
| mvexpand hosts
| eval customers="aaa"
| eval hosts=split(hosts,",")
This prepares some mockup data. You have one row supposedly from your summarized values from events and one from the summarized lookup.
After you add
| stats count by customers hosts
You get three rows with count of 2 and one row with count of 1
Firstly, join is not a very friendly command, it has its quirks. In this case I'd rather use either append or inputlookup append=t
Another thing - if you do mvexpand on multiple multivalued fields you'll get a product of both sets of values. It can escalate quickly for bigger data sets. See the run-anywhere example
| makeresults
| eval a=split("1,2,3,4,5",",")
| eval b=split("1,2,3,4,5",",")
| mvexpand a
| mvexpand b
Of course you can do some fancy set arithmetics on those multivalued fields but it's usually easier done another way - count and filter.
This part is OK, it lists which customers use which hosts
index=application sourcetype="application:server:log"
| stats values(host) as hostnames by customer_name
You're gonna get a list of hosts per customer. What is important here is that each host will be listed only once per customer.
So we expand our list by the servers defined for each customer
| append [
| inputlookup server_info.csv
| rename customer as customer_name
| stats values(host) as hostnames by customer_name ]
So now for each customer you have a single value of hostnames field per customer per each host.
Nothing easier now than counting what we have
| stats count by customer_name hostnames
So for each pair of customer_name and hostnames values you will have a count value indicating whether it was only present in the lookup (value of 1) or both in the lookup and the indexed events (value of 2).
Now you can easily manipulate the data - filter, make a table, whatever.
All this assumes that you don't have any hosts in the event data which are not in the lookup. If you can have such situation, the search is getting a bit more complex because you need to add additional field with two different numerical values, depending on whether the data came from the events or from the lookup and do a sum() instead of count in your final stats so you can see where the data came from.