In props.conf, I have a time-based auto-lookup: "LOOKUP-jobstart = jobstart host OUTPUT jobid, user", against a periodically-updated csv file with columns "time,host,jobid,user". This is a killer feature for us, but here is something undesired:
Say I search for "eventtype=OOM", and I get events with various jobids via the above. Now I want to omit events for jobid=8290453, so I alt-click on the jobid value. The search becomes "eventtype=OOM NOT jobid=8290453", which looks right, but "stats count by jobid" is different than before. Yes jobid=8290453 is gone, but some of the counts for other jobids are now less - where did the events go?
Looking at job inspector, I see that the search has internally become:
DEBUG: base lispy: [ AND [ OR oom [ AND killed memory of out process ] [ AND allocation failure page ] [ AND allocate failed to ] [ AND enough memory not ] [ AND memory of out process ran ] [ AND acquire enough huge memory to unable ] ] [ OR 8290453 [ NOT sourcetype::syslog ] [ AND [ NOT host::rs2767 ] [ NOT host::rs2768 ] [ NOT host::rs2769 ] [ NOT host::rs2770 ] [ NOT host::rs2771 ] [ NOT host::rs2772 ] [ NOT host::rs2773 ] [ NOT host::rs2774 ] [ NOT host::rs2775 ] [ NOT host::rs2776 ] [ NOT host::rs2777 ] ] ] ]
This reveals the details of eventtype=OOM and that the lookup is for sourcetype=syslog, but I think it also shows that the NOT condition has expanded to all the hosts associated with jobid=8290453 (yes, those hosts are associated with that jobid in jobstart.csv.gz, but only for a limited time range). So, if those hosts have OOM's during other jobids, I won't see those either? I'm not 100% sure that is what is happening because I'm struggling with "| set diff", but I think so.
For comparison, a search for "sourcetype=OOM | lookup jobstart host | search NOT jobid=8290453" does the desired thing, eg "stats count by jobid" is the same, except that jobid=8290453 is missing. This is a manual workaround, but now if I alt-click another jobid, I'm back in unhappy land. Basically alt-click doesn't work the way I'd like (or expect) for time-based auto-looked-up fields.
If I am understanding the issue (please correct if I'm wrong), this seems like a bug - the lookup is inherently time-based, so technically the auto-search-writer should do something like
"... (NOT (_time>jobstarttime AND _time<jobendtime AND (host=rs2770 OR host=rs2771 OR ...)))". If it did, I'd be able to get rid of one of my hairy macro which does this 🙂
So, 1) am I understanding the issue? And 2) is there a way to get the above desired behavior from alt-click on a time-based auto-looked-up field? Eg via different method, or revision of the internal query interpreter?
... View more