Splunk Search

Dashboard to show count of unique alerts based on host

harry_123
Loves-to-Learn Lots

Hello, 

I have alerts that look like below

May 13 17:15:30 11.2.3.22 0000017768: NOXXXXXX10A: May 13 2021 17:15:30.467 -0400: %XYZ_11_6_INFRASTRUCTURE-4-SNMP_CONNECTION_FAILURE: Connection to the SNMP Subagent failed. Retrying next port in specified minutes. [id:9909]

host = 11.2.3.22 | source = XYZ | sourcetype = ABCD_syslog

May 7 21:29:20 11.2.3.22 0000043782: NOXXXXXX10A: May 07 2021 21:29:20.259 -0400: %XYZ_11_6____________IVR-3-API_INFO: VXML connection RESET RemoteAddress=11.2.3.24,RemotePort=40517,LocalAddress=11.2.3.22,LocalPort=8002 [id:3205]

host = 11.2.3.22 | source = XYZ | sourcetype = ABCD_syslog

 

Basically, I am trying to report a count on unique alerts such as "Connection to the SNMP Subagent failed", "VXML connection RESET" for host 11.2.3.22. So in dashboard when I select host 11.2.3.22, it gives me count of unique alerts for past 24 hours. I also want to create another dashboard that gives me a dropdown of all these unique alerts (it should be substrings such as VXML connection RESET, Connection to the SNMP Subagent failed.) for source XYZ in past 24 hours 

Labels (3)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Have you extracted any fields? Assuming the _raw data starts with the host ip address and all events have a similar structure e.g. the alert message starts after the 4th ": ", and it is enough to just use the beginning of the alert message up to the first punctuation, you could try this

--- your search ---
| rex "(?<host>[^\s]+)\s(.+\:\s){4}(?<msg>[\w\s]+)"
| stats count by host msg
0 Karma

harry_123
Loves-to-Learn Lots

@ITWhisperer  That's a great idea. that's what I was looking for. Now all my raw data starts with Date & time stamp followed by host ip address (like indicated below). How should the expression look like to filter the IP address and also the alert message starts after the 4th ":" and would like to use the alert message upto the " [id:" field.

May 7 21:38:06 20.0.0.00

 

VXML connection RESET RemoteAddress=20.0.0.0,RemotePort=58737,LocalAddress=30.0.0.0,LocalPort=8002 [id:3205]
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
--- your search ---
| rex "\w+\s\d+\s\d+:\d+:\d+\s(?<host>[^\s]+)\s(.+\:\s){4}(?<msg>.+?)\s\[id:"
| stats count by host msg
0 Karma

harry_123
Loves-to-Learn Lots

@ITWhisperer I tried this and missing something here..It completely misses messages like below and only catches certain ones. may be due to the spacing inconsistency

 

May 7 21:22:09 20.0.0.00 0000000549: NOXXXXXXXXX: May 07 2021 21:22:09.133 -0400: %XYZ_11_6____________ICM-3-LOGMSG_ICM_SS_GENERAL_INFO:   new VRU PIM connection SYN RemoteAddress=30.000.00.00,RemotePort=53249,LocalAddress=20.0.0.00,LocalPort=5000 [id:2007]

host = 20.0.0.00 | source = XYZ | sourcetype = XYZ_syslog

May 7 21:22:00 20.0.0.00 0000000545: NOQCJACC50A: May 07 2021 21:22:00.367 -0400: %XYZ_11_6____________ICM-6-LOGMSG_ICM_SS_GENERAL_INFO:  :  Registering Handshake Timer 30000 millisecs before terminating. [id:2007]

host = 20.0.0.00 | source = XYZ | sourcetype = XYZ_syslog

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

What were you expecting to have been extracted from these events and what if anything were you getting?

0 Karma

harry_123
Loves-to-Learn Lots

@ITWhisperer so like for example from the below error, I would like to extract hostname which is after second : then comp  which is after fourth : [the first value under square brackets] then the iid value and then the error  message which is after fifth : and until punctuation. I am trying to build a dashboard that lets me select these values starting with Instance value (i.e. iid), hostname, then component and should display all the corresponding error  messages based on the first 3 selection.

May 20 10:25:49 200.0.0.43 77042: noxxxxaa01a: May 20 2021 14:25:46.549 +0000: %ICM_Logger_NodeManager-4-102C10A: %[comp=Logger-B][pname=nm][iid=abcde][mid=102C10A][sev=warning]: Node: ICM\abcde\LoggerB, restarting process: clgr, after having delayed restart for 10 seconds.

host = 200.0.0.0 source = xyz sourcetype = cisco_syslog

I need to extract instance which is the value of iid(i.e. abcde) then hostname which is noxxxxaa01a then the value of comp(i.e.Logger-B) if all three match then all the corresponding error events should show up for each process (which is value of pname) which is "Node: ICM\abcde\LoggerB, restarting process: clgr, after having delayed restart for 10 seconds"

May 20 10:25:42 200.0.0.43 77039: noxxxxaa02a: May 20 2021 14:25:40.899 +0000: %ICM_PG_DeviceManagement-3-10F801F: %[comp=PG2-B][pname=pgag][iid=abcde][mid=10F801F][sev=error]: Connection to central controller side: A failed (high priority).

host = 200.0.0.43 source = xyz sourcetype = cisco_syslog

More example like for above instance would be abcde then hostname should give me dropdown with values for the instances found in errors then comp should give me dropdown of values found for the match of selected instance & hostname then should give me corrsponding pname value and alert

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

How does this work for you?

| rex "\w+\s\d+\s\d+:\d+:\d+\s(?<host>[^\s]+)\s.*?:\s(?<hostname>[^:]+):\s.*?:\s+(.*?)\:\s+((?<msg>.+)\s\[id|.+comp=(?<component>[^\]]+).*?iid=(?<instanceid>[^\]]+).*?:\s(?<mess>.+))"
0 Karma

harry_123
Loves-to-Learn Lots

@ITWhisperer  Thank you so much!! The only thing I would need is extracting date, time in the below query as well that I would like to display in two separate columns along with this events.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
| rex "\w+\s\d+\s\d+:\d+:\d+\s(?<host>[^\s]+)\s.*?:\s(?<hostname>[^:]+):\s(?<datetime>.*?):\s+(.*?)\:\s+((?<msg>.+)\s\[id|.+comp=(?<component>[^\]]+).*?iid=(?<instanceid>[^\]]+).*?:\s(?<mess>.+))"
0 Karma

harry_123
Loves-to-Learn Lots

@ITWhisperer I was able to expand your query to get me what I want. However the alerts that I am showing are all random (no particular order). How do I get them to show the count of Error that has the most recent event. I tried sort Date,Time at the end of below query but didn't work. I don't necessarily want Date,Time columns to appear but just give me the count of errors for each listed below components, ProcessNames in descending order per date,time (i.e. something like check the most recent date,time occurence of the event and if its latest that should display on top). In my dashboard I will have last 24 hours or last 15 mins etc so accordingly it should show me count for most recent error occurences then going back in time. Also, there is a particular error msg that keeps repeating every 5 mins (listed below), I only want the latest occurence of this event and not anything prior. How do I incorporate that as well.

 

rex "^(?P<Date>\w+\s+\d+)\s+(?P<Time>\d+:\d+:\d+)\s(?<host>[^\s]+)\s.*?:\s(?<hostname>[^:]+):\s.*?:\s+(.*?)\:\s+((?<msg>.+)\s\[id|.+comp=(?<component>[^\]]+).+pname=(?<ProcessName>[^\]]+).*?iid=(?<instanceid>[^\]]+).*?:\s(?<Error>.+))" | stats count by ProcessName,component,hostname,Error

 

The network communications between ICM router and Peripheral Gateway or NIC: PGx2 has been down for: 10 minutes.
The network communications between ICM router and Peripheral Gateway or NIC: PGx2 has been down for: 15 minutes.
The network communications between ICM router and Peripheral Gateway or NIC: PGx2 has been down for: 20 minutes.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

As there is little illustration of search output and desired output, I construct this simulator to make the simplest mix of fields:

 

 

| makeresults count=80
| streamstats count
| eval _time = _time - count * 3600, hostname = "hostname" . (random() % 2 + 1), Error = "error" . (random() % 2 + 1), ProcessName = "process" . (random() % 2 + 1), component = "component" . (random() % 2 + 1)
| fields - count

 

 

It renders sample data like the following:

ErrorprocessName_timecomponenthostname
error1process22021-05-25 17:14:04component2hostname1
error1process22021-05-25 16:14:04component2hostname1
error1process22021-05-25 15:14:04component1hostname1
error2process22021-05-25 14:14:04component1hostname1
error1process22021-05-25 13:14:04component1hostname2
error1process12021-05-25 12:14:04component1hostname2
error1process12021-05-25 11:14:04component1hostname1
error2process22021-05-25 10:14:04component1hostname1
error2process22021-05-25 09:14:04component2hostname1
error1process12021-05-25 08:14:04component1hostname2
error1process12021-05-25 07:14:04component1hostname1
...    
 

You can use eventstats to find the latest error per host, component, and process combination "| eventstats max(_time) as last_occurance by ProcessName, component, hostname, Error".  After that, you can count error such combinations by day, then sort by last _occurrance:

 

 

| eventstats max(_time) as last_occurance by ProcessName,component,hostname,Error
| sort - last_occurance
| bin span=1d _time
| stats count by ProcessName, component, hostname, Error, last_occurance, _time
| sort - _time, last_occurance

 

 

Sample output looks like the following:

ProcessNamecomponenthostnameErrorlast_occurance_timecount
process1component1hostname2error116219632012021-05-253
process1component2hostname1error116219596012021-05-251
process1component2hostname2error116219560012021-05-251
process2component2hostname1error116219488012021-05-253
process1component1hostname1error216219452012021-05-252
process2component1hostname1error216219380012021-05-252
process2component2hostname1error216219308012021-05-252
process1component1hostname2error216219164012021-05-251
process2component2hostname2error116219128012021-05-251
process1component2hostname2error216219092012021-05-251
process2component1hostname1error116219020012021-05-251
process1component2hostname2error116219560012021-05-243
process2component2hostname1error116219488012021-05-242
process1component1hostname1error216219452012021-05-242
...      

Hope this helps

0 Karma

harry_123
Loves-to-Learn Lots

Here is how my sample output looks like. I have Date and Time functionality split. So I want to sort by Date, Time and get a count in descending order starting with most recent error. So if I select 7 days, I need to look for the number of times the error occurred in past 7 days and arrange with recent error first. Similarly if I select past 24 hours need to get a count of  Errors for past 24 starting with recent error first. 

Sample Output:

harry_123_0-1622845265406.png

Desired output

harry_123_2-1622845901407.png

 

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
| stats max(_time) as last_occurance count by ProcessName,component,hostname,Error
| sort - last_occurance
0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...