Splunk Search

Creating Server update from events and csv lookup

MattLingwood
Engager

I am hitting a mental block in creating this query and wish to monitor our server performance so we have visibility on our service.

Originally we were looking at individuals servers' events and so were able to tell the uptime and availability of that server but to optimise our queries we had to move all of our servers together.

In doing this if there is a server that isn't sending events to Splunk then it is completely ignored which didn't happen before when specifically looking for it.

Here is my current query but still doesn't seem to create a 0 for missing servers:

Like this:

index=identity sourcetype="identity_tcpsocket" (LogSource=Proxy OR LogSource=Api)
| eval ServerName=if(isnotnull(ProxyHostName),ProxyHostName,upper(substr(host,0,15)))
| search [| inputlookup ProdServerHostName.csv | fields ServerHostName | rename ServerHostName as ServerName]
| stats count AS total_requests count(eval('Message.Response.Status'="500")) AS server_errors BY ServerName
| eval error_percentage=((server_errors/total_requests) * 100),
       Status=case((error_percentage >=15 OR total_requests <=20), "DOWN",
                  ((error_percentage >=10 AND error_percentage < 15) OR total_requests <20), "WARNING - CHECK SERVICE",
                    error_percentage <10, "OK")
| table ServerName Status
| where Status!="OK"

Ideally my output would be a list of Servers that are not fitting the OK Criteria as seen above.
If a server is not sending data to Splunk then I want that to be listed as DOWN
I am currently trying to populate a full list of servers from a Lookup but that doesn't seem to be working.
Another point to mention is that the Server Names have to be joined together as the "host" field is sometimes an ip or a full dns name

0 Karma
1 Solution

woodcock
Esteemed Legend

Given your clarification of needing to count events that aren't even there, do it like this:

index=identity sourcetype="identity_tcpsocket" (LogSource=Proxy OR LogSource=Api)
| eval ServerName=if(isnotnull(ProxyHostName),ProxyHostName,upper(substr(host,0,15)))

| append [|inputlookup ProdServerHostName.csv | fields ServerHostName | rename ServerHostName as ServerName]

| stats count AS total_requests count(eval('Message.Response.Status'="500")) AS server_errors BY ServerName

| eval count = count - 1 | rename COMMENT AS "get rid of the extra 'fake' one from the append of inputlookup"
| lookup ProdServerHostName ServerHostName AS ServerName OUTPUT ServerHostName AS KEEPME
| search KEEPME="*" | fields - KEEPME

| eval error_percentage=((server_errors/total_requests) * 100),
       Status=case((error_percentage >=15 OR total_requests <=20), "DOWN",
                  ((error_percentage >=10 AND error_percentage < 15) OR total_requests <20), "WARNING - CHECK SERVICE",
                    error_percentage <10, "OK")
| table ServerName Status
| where Status!="OK"

View solution in original post

0 Karma

woodcock
Esteemed Legend

Given your clarification of needing to count events that aren't even there, do it like this:

index=identity sourcetype="identity_tcpsocket" (LogSource=Proxy OR LogSource=Api)
| eval ServerName=if(isnotnull(ProxyHostName),ProxyHostName,upper(substr(host,0,15)))

| append [|inputlookup ProdServerHostName.csv | fields ServerHostName | rename ServerHostName as ServerName]

| stats count AS total_requests count(eval('Message.Response.Status'="500")) AS server_errors BY ServerName

| eval count = count - 1 | rename COMMENT AS "get rid of the extra 'fake' one from the append of inputlookup"
| lookup ProdServerHostName ServerHostName AS ServerName OUTPUT ServerHostName AS KEEPME
| search KEEPME="*" | fields - KEEPME

| eval error_percentage=((server_errors/total_requests) * 100),
       Status=case((error_percentage >=15 OR total_requests <=20), "DOWN",
                  ((error_percentage >=10 AND error_percentage < 15) OR total_requests <20), "WARNING - CHECK SERVICE",
                    error_percentage <10, "OK")
| table ServerName Status
| where Status!="OK"
0 Karma

MattLingwood
Engager

Hi Woodcock, Thanks for this answer it's exactly what we needed! Had to make a small adjustment and remove the "| fields = KEEPME" as it was removing all previous fields and so the next eval was unable to do any calculations.

0 Karma

woodcock
Esteemed Legend

The = should have been a -. I will fix it.

0 Karma

woodcock
Esteemed Legend

If the problem is that some of your servers are not listing and you have more than 10.5K servers, then your problem is subsearch limits. Try this:

Like this:

index=identity sourcetype="identity_tcpsocket" (LogSource=Proxy OR LogSource=Api)
| eval ServerName=if(isnotnull(ProxyHostName),ProxyHostName,upper(substr(host,0,15)))
| stats count AS total_requests count(eval('Message.Response.Status'="500")) AS server_errors BY ServerName

| lookup ProdServerHostName ServerHostName AS ServerName OUTPUT ServerHostName AS KEEPME
| search KEEPME="*"

| eval error_percentage=((server_errors/total_requests) * 100),
       Status=case((error_percentage >=15 OR total_requests <=20), "DOWN",
                  ((error_percentage >=10 AND error_percentage < 15) OR total_requests <20), "WARNING - CHECK SERVICE",
                    error_percentage <10, "OK")
| table ServerName Status
| where Status!="OK"
0 Karma

woodcock
Esteemed Legend

NOTE: Do the lookup after the stats so that you will only have to traverse the list of raw events once.

0 Karma

MattLingwood
Engager

Hi Woodcock, Unfortunately I still have servers that aren't showing.

In total I have 24 servers and 3 aren't sending data. In my output table I can only see 21 servers if I remove line 13.

Kind Regards
Matt

0 Karma

woodcock
Esteemed Legend

If they are not sending data AT ALL (0 events) then of course they will not show up! Is that the situation (if so, there is a method to show all but it is very different)?

0 Karma

MattLingwood
Engager

Yes. Say for example our service is switched off, then it isn't sending data to Splunk but we would like to know that that is happening and display the server as DOWN.

0 Karma

woodcock
Esteemed Legend

Looks good to me.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...