Hi
I need to find 5 "Errors" peak points by server and sort by date
here is my spl:
index="myindex" err* | rex field=source "\/data\/(?<product>\w+)\/(?<date>\d+)\/(?<servername>\w+)"
| eventstats count as Errors by servername
expected output:
servername Time peak points Errors count
server1 2021-11-19 02:00:00,000 500
2021-11-19 10:00:00,000 450
2021-11-19 18:00:00,000 300
2021-11-19 20:00:00,000 800
2021-11-19 23:00:00,000 9000
server2 2021-11-19 01:00:00,000 250
2021-11-19 03:00:00,000 480
2021-11-19 08:00:00,000 30000
2021-11-19 09:00:00,000 463
2021-11-19 10:00:00,000 100
<your search for errors> | bin _time span=1h
| stats count by server _time
| sort server - count
| streamstats count as tempcount by server
| where tempcount <= 5
| table server _time count
If you want to group them by server, instead of the table at the end you can do
| stats list(_time) as _time list(count) as count by server
I still don't understand what you want.
You parse the "product" field from the event yet don't use it.
What is this error peak? Is it a count of error occcurrences per host? Or is it some value from the event? How does it correspond to the timestamp?
here is the main goal:
I have couple of servers that when I search below spl it return green bar chart that show me in last 24 (each hour) how many error occured:
index="myindex" err* earliest=-1d@d
in some hours in day we have high number of errors e.g (01:00AM, 18:00PM, 20:00PM)
just need to show them in table by server. top 5 error peak thats it.
OK. So for each server you want top 5 hours with highest error count? Is that it?
Exactly
<your search for errors> | bin _time span=1h
| stats count by server _time
| sort server - count
| streamstats count as tempcount by server
| where tempcount <= 5
| table server _time count
If you want to group them by server, instead of the table at the end you can do
| stats list(_time) as _time list(count) as count by server
It isn't clear over what time period you are measuring peaks but since your example has different hours and the times are close to the end of the hour I am going to assume you are counting error events by hour.
| gentimes start=-1 increment=1m
| rename starttime as _time
| eval server="server".mvindex(split("ABCD",""),random()%4)
| eval count=random()%10
| table _time server count
| bin _time as time span=1h
| eventstats sum(count) as total latest(_time) as peaktime by time server
| where _time=peaktime
| sort 0 server -total
| streamstats count as rank by server
| where rank < 6
| fieldformat peaktime=strftime(peaktime,"%F %T")
| fieldformat time=strftime(time,"%F %T")
time scope is daily and belong to yesterday. and you right counting error events by hour.
I try spl that you mention, but need to produce this:
servername Time peak points Errors count
server1 2021-11-19 02:00:00,000 500
2021-11-19 10:00:00,000 450
2021-11-19 18:00:00,000 300
2021-11-19 20:00:00,000 800
2021-11-19 23:00:00,000 9000
server2 2021-11-19 01:00:00,000 250
2021-11-19 03:00:00,000 480
2021-11-19 08:00:00,000 30000
2021-11-19 09:00:00,000 463
2021-11-19 10:00:00,000 100
So what does your search look like when you apply the ideas from my solution and what results do you get and how do they not match with what you are after?
index="myindex" err* source="/data/product/*/server*/*"
| rex field=source "\/data\/(?<product>\w+)\/(?<date>\d+)\/(?<server>\w+)"
| eventstats count as counter
| table _time server counter
| bin _time as time span=1h
| eventstats sum(counter) as total latest(_time) as peaktime by time server
| where _time=peaktime
| sort 0 server -total
| streamstats count as rank by server
| where rank < 6
| fieldformat peaktime=strftime(peaktime,"%F %T")
| fieldformat time=strftime(time,"%F %T")
_time server counter peaktime rank time total
2021-11-19 23:29:34.658 Server 9827 2021-11-19 23:29:34 1 2021-11-19 22:30:00 2191421
2021-11-19 20:28:27.490 Server 9827 2021-11-19 20:28:27 2 2021-11-19 19:30:00 2053843
2021-11-19 20:28:27.490 Server 9827 2021-11-19 20:28:27 3 2021-11-19 19:30:00 2053843
2021-11-19 04:29:52.897 Server 9827 2021-11-19 04:29:52 4 2021-11-19 03:30:00 2014535
2021-11-19 21:29:38.376 Server 9827 2021-11-19 21:29:38 5 2021-11-19 20:30:00 1975227
2021-11-19 18:29:58.330 Server2 9827 2021-11-19 18:29:58 1 2021-11-19 17:30:00 2368307
2021-11-19 18:29:58.330 Server2 9827 2021-11-19 18:29:58 2 2021-11-19 17:30:00 2368307
2021-11-19 11:29:47.954 Server2 9827 2021-11-19 11:29:47 3 2021-11-19 10:30:00 2289691
2021-11-19 20:29:58.899 Server2 9827 2021-11-19 20:29:58 4 2021-11-19 19:30:00 2171767
2021-11-19 23:29:54.958 Server2 9827 2021-11-19 23:29:54 5 2021-11-19 22:30:00 2083324
2021-11-19 23:55:41.719 Server3 9827 2021-11-19 23:55:41 1 2021-11-19 23:30:00 452042
2021-11-19 02:29:20.484 Server3 9827 2021-11-19 02:29:20 2 2021-11-19 01:30:00 383253
2021-11-19 18:29:39.514 Server3 9827 2021-11-19 18:29:39 3 2021-11-19 17:30:00 324291
2021-11-19 03:19:41.949 Server3 9827 2021-11-19 03:19:41 4 2021-11-19 02:30:00 265329
2021-11-19 19:21:27.495 Server3 9827 2021-11-19 19:21:27 5 2021-11-19 18:30:00 265329
2021-11-19 02:28:42.524 Server4 9827 2021-11-19 02:28:42 1 2021-11-19 01:30:00 452042
What is it that you think eventstats count as counter is doing in this situation?
count number of errors, and seems it's incorrect.
Yes, it is incorrect - try this
index="myindex" err* source="/data/product/*/server*/*"
| rex field=source "\/data\/(?<product>\w+)\/(?<date>\d+)\/(?<server>\w+)"
| bin _time as time span=1h
| stats count as total latest(_time) as _time by time server
| sort 0 server -total
| streamstats count as rank by server
| where rank < 6
fixed but I think span has not work correctly.
1637352000 | Server1 | 131 | 2021-11-20 00:29:49.360 | 1 |
1637355600 | Server1 | 323 | 2021-11-20 01:28:59.736 | 2 |
1637359200 | Server1 | 136 | 2021-11-20 02:29:10.834 | 3 |
1637362800 | Server1 | 140 | 2021-11-20 03:29:58.342 | 4 |
Expected:
1637352000 | Server1 | 131 | 2021-11-20 00:00:00.000 | 1 |
1637355600 | Server1 | 323 | 2021-11-20 01:00:00.000 | 2 |
1637359200 | Server1 | 136 | 2021-11-20 02:00:00.000 | 3 |
1637362800 | Server1 | 140 | 2021-11-20 03:00:00.000 | 4 |
Your original expected output had times at different points during the hour, which is what you got. If you wanted the times to be the beginning of the hour, this was in the time field. Please try to be more precise about what you are expecting the output to be.
Sorry about it, fix post.
That makes it simpler
index="myindex" err* source="/data/product/*/server*/*"
| rex field=source "\/data\/(?<product>\w+)\/(?<date>\d+)\/(?<server>\w+)"
| bin _time span=1h
| stats count as total by _time server
| sort 0 server -total
| streamstats count as rank by server
| where rank < 6
| table server _time count
current output:
server _time count
Server1 2021-11-21 11:30:00 28
Server1 2021-11-21 05:30:00 25
Server1 2021-11-21 07:30:00 25
Server1 2021-11-21 10:30:00 25
Server2 2021-11-21 18:30:00 2061
Server2 2021-11-21 13:30:00 668
Server2 2021-11-21 12:30:00 562
Server2 2021-11-21 11:30:00 481
Server3 2021-11-21 17:30:00 110
Server3 2021-11-21 12:30:00 73
Server3 2021-11-21 07:30:00 61
Server3 2021-11-21 18:30:00 60
expected output:
server _time count
Server1 2021-11-21 11:00:00 28
2021-11-21 05:00:00 25
2021-11-21 07:00:00 25
2021-11-21 10:00:00 25
Server2 2021-11-21 18:00:00 2061
2021-11-21 13:00:00 668
2021-11-21 12::00:00 562
2021-11-21 11:00:00 481
Server3 2021-11-21 17:00:00 110
2021-11-21 12:00:00 73
2021-11-21 07:00:00 61
2021-11-21 18:00:00 60
index="myindex" err* source="/data/product/*/server*/*"
| rex field=source "\/data\/(?<product>\w+)\/(?<date>\d+)\/(?<server>\w+)"
| bin _time span=1h
| stats count as total by _time server
| sort 0 server -total
| streamstats count as rank by server
| where rank < 6
| stats list(_time) as _time list(count) as count by server
| table server _time count
return this:
server _time count
Server1 1637611200,1637568000,1637571600,1637575200,1637557200
Server2 1637611200,1637571600,1637568000,1637575200,1637560800
Can you format these times to human readable
| eval time=strftime(_time,"%F %T")