I am looking for the best method to highlight host with errors, by comparing them to the previous days.
by example I run this search every day :
index="lsfgbc" OR index="lsficeng" process="lsf-sbatchd-audit" numautofsdefunct
| `autofsversion`
| table host, index, numstucksbatchd, numautofsdefunct, autofs_version
| sort index, host
I tried this but using, earliest=-48h latest=-24h returned an empty result.
| set diff [ search index="lsfgbc" OR index="lsficeng" process="lsf-sbatchd-audit" numautofdefunct
earliest=-48h latest=-24h
| fields + host | fields - _time _raw ]
[ search index="lsfgbc" OR index="lsficeng" process="lsf-sbatchd-audit" numautofsdefunct
earliest=-24h latest=now
| fields + host | fields - _time _raw ]
How to compare to the previous day, or the last month ?
An easy approach is to compare count per day. by example to see the number of error per hosts over a week, and generate a nice graph
error earliest=-7d@d | timechart span=1d count by host useother=0
You also can use summary indexing to save your results every day instead of recalculate them every time : http://www.splunk.com/base/Documentation/4.1.7/Knowledge/Usesummaryindexing with a scheduled search, running every day at midnight (+15 minutes to make sure that all your data is available) example my saved search "summary_error_daily" with the "si" version of the timechart and a more precise detail (per hour)
error earliest=-1d@d latest=@d | sitimechart span=1h count by host useother=0
then call the results with
index=summary name=summary_error_daily | timechart span=2d count by host
Another method is to use the alerting, run a stats count by host search, the use the condition with the "if number of host rise by 1" let it run one day (to store the first values), then it will fire email alerts see http://www.splunk.com/base/Documentation/latest/Admin/HowdoesalertingworkinSplunk
I tried the raw log search about but got this error: Error in 'timechart' command: When you specify a split-by field, only single functions applied to a non-wildcarded data field are allowed.
I would like to get the list of hosts each day, how would I do that?
can you provide the search you used ?
An easy approach is to compare count per day. by example to see the number of error per hosts over a week, and generate a nice graph
error earliest=-7d@d | timechart span=1d count by host useother=0
You also can use summary indexing to save your results every day instead of recalculate them every time : http://www.splunk.com/base/Documentation/4.1.7/Knowledge/Usesummaryindexing with a scheduled search, running every day at midnight (+15 minutes to make sure that all your data is available) example my saved search "summary_error_daily" with the "si" version of the timechart and a more precise detail (per hour)
error earliest=-1d@d latest=@d | sitimechart span=1h count by host useother=0
then call the results with
index=summary name=summary_error_daily | timechart span=2d count by host
Another method is to use the alerting, run a stats count by host search, the use the condition with the "if number of host rise by 1" let it run one day (to store the first values), then it will fire email alerts see http://www.splunk.com/base/Documentation/latest/Admin/HowdoesalertingworkinSplunk
The way I would recommend doing this is by setting up a summary index to look at the number of events over the last day (-1d@d) and then comparing the last 24 hours to the recent days. That will likely better than searching the raw logs, and solve the problem itself.
However, doing it based on the raw logs you can do:
index="lsfgbc" OR index="lsficeng" process="lsf-sbatchd-audit" numautofsdefunct
| autofsversion
| table host, index, numstucksbatchd, numautofsdefunct, autofs_version
| timechart span=1d sum(numstucksbatchd) as sumnumstucksbatchd, sum(numautofsdefunct) as numautofsdefunct by host
| delta sumstucksbatchd as diffsumstucksbatchd
| delta sumnumautofsdefunct as diffsumnumautofsdefunct
Timechart should summarize the events to a day (you might need to play with whether you want sum, avg or first, depending on the contents of the logs) and then delta will show you the change in values since the previous day. I pulled off a couple of the fields for the timechart, just because it can get overwhelming and it sounds like what you want, but you can toss them back in as well.
Let me know if that all makes sense.