May 7 21:38:06 20.0.0.00 |
VXML connection RESET RemoteAddress=20.0.0.0,RemotePort=58737,LocalAddress=30.0.0.0,LocalPort=8002 [id:3205] |
Hello,
I have alerts that look like below
May 13 17:15:30 11.2.3.22 0000017768: NOXXXXXX10A: May 13 2021 17:15:30.467 -0400: %XYZ_11_6_INFRASTRUCTURE-4-SNMP_CONNECTION_FAILURE: Connection to the SNMP Subagent failed. Retrying next port in specified minutes. [id:9909]
host = 11.2.3.22 | source = XYZ | sourcetype = ABCD_syslog
May 7 21:29:20 11.2.3.22 0000043782: NOXXXXXX10A: May 07 2021 21:29:20.259 -0400: %XYZ_11_6____________IVR-3-API_INFO: VXML connection RESET RemoteAddress=11.2.3.24,RemotePort=40517,LocalAddress=11.2.3.22,LocalPort=8002 [id:3205]
host = 11.2.3.22 | source = XYZ | sourcetype = ABCD_syslog
Basically, I am trying to report a count on unique alerts such as "Connection to the SNMP Subagent failed", "VXML connection RESET" for host 11.2.3.22. So in dashboard when I select host 11.2.3.22, it gives me count of unique alerts for past 24 hours. I also want to create another dashboard that gives me a dropdown of all these unique alerts (it should be substrings such as VXML connection RESET, Connection to the SNMP Subagent failed.) for source XYZ in past 24 hours
Have you extracted any fields? Assuming the _raw data starts with the host ip address and all events have a similar structure e.g. the alert message starts after the 4th ": ", and it is enough to just use the beginning of the alert message up to the first punctuation, you could try this
--- your search ---
| rex "(?<host>[^\s]+)\s(.+\:\s){4}(?<msg>[\w\s]+)"
| stats count by host msg
@ITWhisperer That's a great idea. that's what I was looking for. Now all my raw data starts with Date & time stamp followed by host ip address (like indicated below). How should the expression look like to filter the IP address and also the alert message starts after the 4th ":" and would like to use the alert message upto the " [id:" field.
May 7 21:38:06 20.0.0.00 |
VXML connection RESET RemoteAddress=20.0.0.0,RemotePort=58737,LocalAddress=30.0.0.0,LocalPort=8002 [id:3205] |
--- your search ---
| rex "\w+\s\d+\s\d+:\d+:\d+\s(?<host>[^\s]+)\s(.+\:\s){4}(?<msg>.+?)\s\[id:"
| stats count by host msg
@ITWhisperer I tried this and missing something here..It completely misses messages like below and only catches certain ones. may be due to the spacing inconsistency
May 7 21:22:09 20.0.0.00 0000000549: NOXXXXXXXXX: May 07 2021 21:22:09.133 -0400: %XYZ_11_6____________ICM-3-LOGMSG_ICM_SS_GENERAL_INFO: new VRU PIM connection SYN RemoteAddress=30.000.00.00,RemotePort=53249,LocalAddress=20.0.0.00,LocalPort=5000 [id:2007]
host = 20.0.0.00 | source = XYZ | sourcetype = XYZ_syslog
May 7 21:22:00 20.0.0.00 0000000545: NOQCJACC50A: May 07 2021 21:22:00.367 -0400: %XYZ_11_6____________ICM-6-LOGMSG_ICM_SS_GENERAL_INFO: : Registering Handshake Timer 30000 millisecs before terminating. [id:2007]
host = 20.0.0.00 | source = XYZ | sourcetype = XYZ_syslog
What were you expecting to have been extracted from these events and what if anything were you getting?
@ITWhisperer so like for example from the below error, I would like to extract hostname which is after second : then comp which is after fourth : [the first value under square brackets] then the iid value and then the error message which is after fifth : and until punctuation. I am trying to build a dashboard that lets me select these values starting with Instance value (i.e. iid), hostname, then component and should display all the corresponding error messages based on the first 3 selection.
May 20 10:25:49 200.0.0.43 77042: noxxxxaa01a: May 20 2021 14:25:46.549 +0000: %ICM_Logger_NodeManager-4-102C10A: %[comp=Logger-B][pname=nm][iid=abcde][mid=102C10A][sev=warning]: Node: ICM\abcde\LoggerB, restarting process: clgr, after having delayed restart for 10 seconds.
host = 200.0.0.0 source = xyz sourcetype = cisco_syslog
I need to extract instance which is the value of iid(i.e. abcde) then hostname which is noxxxxaa01a then the value of comp(i.e.Logger-B) if all three match then all the corresponding error events should show up for each process (which is value of pname) which is "Node: ICM\abcde\LoggerB, restarting process: clgr, after having delayed restart for 10 seconds"
May 20 10:25:42 200.0.0.43 77039: noxxxxaa02a: May 20 2021 14:25:40.899 +0000: %ICM_PG_DeviceManagement-3-10F801F: %[comp=PG2-B][pname=pgag][iid=abcde][mid=10F801F][sev=error]: Connection to central controller side: A failed (high priority).
host = 200.0.0.43 source = xyz sourcetype = cisco_syslog
More example like for above instance would be abcde then hostname should give me dropdown with values for the instances found in errors then comp should give me dropdown of values found for the match of selected instance & hostname then should give me corrsponding pname value and alert
How does this work for you?
| rex "\w+\s\d+\s\d+:\d+:\d+\s(?<host>[^\s]+)\s.*?:\s(?<hostname>[^:]+):\s.*?:\s+(.*?)\:\s+((?<msg>.+)\s\[id|.+comp=(?<component>[^\]]+).*?iid=(?<instanceid>[^\]]+).*?:\s(?<mess>.+))"
@ITWhisperer Thank you so much!! The only thing I would need is extracting date, time in the below query as well that I would like to display in two separate columns along with this events.
| rex "\w+\s\d+\s\d+:\d+:\d+\s(?<host>[^\s]+)\s.*?:\s(?<hostname>[^:]+):\s(?<datetime>.*?):\s+(.*?)\:\s+((?<msg>.+)\s\[id|.+comp=(?<component>[^\]]+).*?iid=(?<instanceid>[^\]]+).*?:\s(?<mess>.+))"
@ITWhisperer I was able to expand your query to get me what I want. However the alerts that I am showing are all random (no particular order). How do I get them to show the count of Error that has the most recent event. I tried sort Date,Time at the end of below query but didn't work. I don't necessarily want Date,Time columns to appear but just give me the count of errors for each listed below components, ProcessNames in descending order per date,time (i.e. something like check the most recent date,time occurence of the event and if its latest that should display on top). In my dashboard I will have last 24 hours or last 15 mins etc so accordingly it should show me count for most recent error occurences then going back in time. Also, there is a particular error msg that keeps repeating every 5 mins (listed below), I only want the latest occurence of this event and not anything prior. How do I incorporate that as well.
rex "^(?P<Date>\w+\s+\d+)\s+(?P<Time>\d+:\d+:\d+)\s(?<host>[^\s]+)\s.*?:\s(?<hostname>[^:]+):\s.*?:\s+(.*?)\:\s+((?<msg>.+)\s\[id|.+comp=(?<component>[^\]]+).+pname=(?<ProcessName>[^\]]+).*?iid=(?<instanceid>[^\]]+).*?:\s(?<Error>.+))" | stats count by ProcessName,component,hostname,Error
The network communications between ICM router and Peripheral Gateway or NIC: PGx2 has been down for: 10 minutes.
The network communications between ICM router and Peripheral Gateway or NIC: PGx2 has been down for: 15 minutes.
The network communications between ICM router and Peripheral Gateway or NIC: PGx2 has been down for: 20 minutes.
As there is little illustration of search output and desired output, I construct this simulator to make the simplest mix of fields:
| makeresults count=80
| streamstats count
| eval _time = _time - count * 3600, hostname = "hostname" . (random() % 2 + 1), Error = "error" . (random() % 2 + 1), ProcessName = "process" . (random() % 2 + 1), component = "component" . (random() % 2 + 1)
| fields - count
It renders sample data like the following:
Error | processName | _time | component | hostname |
error1 | process2 | 2021-05-25 17:14:04 | component2 | hostname1 |
error1 | process2 | 2021-05-25 16:14:04 | component2 | hostname1 |
error1 | process2 | 2021-05-25 15:14:04 | component1 | hostname1 |
error2 | process2 | 2021-05-25 14:14:04 | component1 | hostname1 |
error1 | process2 | 2021-05-25 13:14:04 | component1 | hostname2 |
error1 | process1 | 2021-05-25 12:14:04 | component1 | hostname2 |
error1 | process1 | 2021-05-25 11:14:04 | component1 | hostname1 |
error2 | process2 | 2021-05-25 10:14:04 | component1 | hostname1 |
error2 | process2 | 2021-05-25 09:14:04 | component2 | hostname1 |
error1 | process1 | 2021-05-25 08:14:04 | component1 | hostname2 |
error1 | process1 | 2021-05-25 07:14:04 | component1 | hostname1 |
... |
You can use eventstats to find the latest error per host, component, and process combination "| eventstats max(_time) as last_occurance by ProcessName, component, hostname, Error". After that, you can count error such combinations by day, then sort by last _occurrance:
| eventstats max(_time) as last_occurance by ProcessName,component,hostname,Error
| sort - last_occurance
| bin span=1d _time
| stats count by ProcessName, component, hostname, Error, last_occurance, _time
| sort - _time, last_occurance
Sample output looks like the following:
ProcessName | component | hostname | Error | last_occurance | _time | count |
process1 | component1 | hostname2 | error1 | 1621963201 | 2021-05-25 | 3 |
process1 | component2 | hostname1 | error1 | 1621959601 | 2021-05-25 | 1 |
process1 | component2 | hostname2 | error1 | 1621956001 | 2021-05-25 | 1 |
process2 | component2 | hostname1 | error1 | 1621948801 | 2021-05-25 | 3 |
process1 | component1 | hostname1 | error2 | 1621945201 | 2021-05-25 | 2 |
process2 | component1 | hostname1 | error2 | 1621938001 | 2021-05-25 | 2 |
process2 | component2 | hostname1 | error2 | 1621930801 | 2021-05-25 | 2 |
process1 | component1 | hostname2 | error2 | 1621916401 | 2021-05-25 | 1 |
process2 | component2 | hostname2 | error1 | 1621912801 | 2021-05-25 | 1 |
process1 | component2 | hostname2 | error2 | 1621909201 | 2021-05-25 | 1 |
process2 | component1 | hostname1 | error1 | 1621902001 | 2021-05-25 | 1 |
process1 | component2 | hostname2 | error1 | 1621956001 | 2021-05-24 | 3 |
process2 | component2 | hostname1 | error1 | 1621948801 | 2021-05-24 | 2 |
process1 | component1 | hostname1 | error2 | 1621945201 | 2021-05-24 | 2 |
... |
Hope this helps
Here is how my sample output looks like. I have Date and Time functionality split. So I want to sort by Date, Time and get a count in descending order starting with most recent error. So if I select 7 days, I need to look for the number of times the error occurred in past 7 days and arrange with recent error first. Similarly if I select past 24 hours need to get a count of Errors for past 24 starting with recent error first.
Sample Output:
Desired output
| stats max(_time) as last_occurance count by ProcessName,component,hostname,Error
| sort - last_occurance