this is my query
earliest=-15m latest=now index=** host="*" LOG_LEVEL=ERROR OR LOG_LEVEL=FATAL OR logLevel=ERROR OR level=error | rex field=MESSAGE "(?<message>.{35})" | search NOT [ search earliest=-3d@d latest=-d@d index=wiweb host="*" LOG_LEVEL=ERROR OR LOG_LEVEL=FATAL OR logLevel=ERROR OR level=error | rex field=MESSAGE "(?<message>.{35})" | dedup message | fields message ] | stats count by message appname | search count>50 | sort appname , -count
ALmost all the recurring 'message' is getting ignored but few of them still come in the result even if those are there in last 2 days (which should have been ignored which is what subsearch is doing)
is there anything else i can do to run this query with 100% success?
is there any other way i can use the same logic to exclude results with 100% success?
Try something like this
(earliest=-15m latest=now index=**) OR (earliest=-3d@d latest=-d@d index=wiweb) host="*" LOG_LEVEL=ERROR OR LOG_LEVEL=FATAL OR logLevel=ERROR OR level=error
| rex field=MESSAGE "(?<message>.{35})"
| bin _time span=1d
| stats count by _time message appname
| stats count as days count(eval(_time==relative_time(now(),"@d"))) as today values(count) as count by message appname
| where days=1 AND today=1 AND count>50
| sort appname, -count
there was one typo in my original query
earliest=-15m latest=now index=wiweb host="*" LOG_LEVEL=ERROR OR LOG_LEVEL=FATAL OR logLevel=ERROR OR level=error | rex field=MESSAGE "(?<message>.{35})" | search NOT [ search earliest=-3d@d latest=-d@d index=wiweb host="*" LOG_LEVEL=ERROR OR LOG_LEVEL=FATAL OR logLevel=ERROR OR level=error | rex field=MESSAGE "(?<message>.{35})" | dedup message | fields message ] | stats count by message appname | search count>50 | sort appname , -count
still your query holds true, right?
I thought there might have been, but you never know! 😀
(earliest=-15m latest=now) OR (earliest=-3d@d latest=-d@d) index=wiweb host="*" LOG_LEVEL=ERROR OR LOG_LEVEL=FATAL OR logLevel=ERROR OR level=error
| rex field=MESSAGE "(?<message>.{35})"
| bin _time span=1d
| stats count by _time message appname
| stats count as days count(eval(_time==relative_time(now(),"@d"))) as today values(count) as count by message appname
| where days=1 AND today=1 AND count>50
| sort appname, -count
The key line is the where command which is filtering for events which have only occurred today.
another issue is - it will check for message and appname together - what if the same message is there in other app and it is still throwing an alert when that message is not relevent as that has come in other app already and can be ignored?
I am not sure I understand the requirement here. Are you saying that if the message has been logged regardless of which appname in the last two days you want to ignore it, even if it is the first time it has been logged for this appname?
Exactly!
Try this
(earliest=-15m latest=now) OR (earliest=-3d@d latest=-d@d) index=wiweb host="*" LOG_LEVEL=ERROR OR LOG_LEVEL=FATAL OR logLevel=ERROR OR level=error
| rex field=MESSAGE "(?<message>.{35})"
| bin _time span=1d
| stats count by _time message appname
| stats count as days count(eval(_time==relative_time(now(),"@d"))) as today values(count) as count values(appname) as appname by message
| where days=1 AND today=1 AND count>50
| sort appname, -count
i have updated the query - will let it run for one day and will let you know if all good. THanks a LOT 🙂 @ITWhisperer
@ITWhisperer please help me.
hello..i let the new query run for the weekend every 15 mins ...looks like my original query is giving me diff results and not getting the same 'message' using the updated query.
when checked manually, the original query result seem to be genuine.
so not sure why the updated query didnt capture the new error 'message'
Awesome, looks to be working 🙂
how can i remove 'days' and 'today' from the result but still get the filtered output?
ah simple table worked ..thanks a lot @ITWhisperer
Subsearches are limited to (usually) 50,000 events so you may not be excluding all the messages you think should be excluded. Does the job inspector give you any messages indicating that this has happened?