We've got log events that read like the following:
Mar 14 12:26:38 mailsrv.example.com MM: [Jilter Processor 21 - Async Jilter Worker 30 - 127.0.0.1:47850-o29CQWO2002696] INFO user.log - mtaqid=o29CQWO2002696, engine=virscan-olympus, from=<email@example.com>, recipients=<firstname.lastname@example.org>, relay=[188.8.131.52], size=40236, virus_name=W32/Mydoom.o@MM, virus_state=infected, filename=text.pif
The goal is to write a search to a time-bucketed set of results (suitable for passing to a time chart, or for use in a summary index search) to count distinct mtaqid fields versus the logging host and the virus_name.
The following search returns zero results. Further experimentation shows that after the bucket command, virus_name is null.
eventtype=smimm_virus virus_state!="clean" | bin _time span=5m | stats distinct_count(qid) by _time, virus_name, host
Is there a description somewhere of what bucket keeps or throws away when grouping log lines? When preparing such a search for sistats or other summary index use, what is the best way to perform this bucketing, without losing this detail?
If you were relying on the Field Picker to tell you if a field was extracted, then starting with 4.1, it's not that reliable for this as it will automatically suppress some fields unless they're explicitly used in some other part of the search.
Using sistats will include the correct field and summary information when attempting to insert into a summary index. It seems as though your search might not be the ideal search. I would try removing each distinct sort "by" to verify that there are existing values for virus_name.
When I indicated that I had performed "further experimentation", I meant that I had discarded various group bys, removed bucket, etc.
Compare the results of:
eventtype=smimm_virus virus_state!=clean | table _time, qid, virus_state, virus_name
eventtype=smimm_virus virus_state!=clean | bucket _time span=5m | table _time, qid, virus_state, virus_name.
On 4.1.0 (Linux), the bucketed search has non-null virus names (for every row).
On 4.1.2 (Mac), the non-bucketed search has non-null virus names, but the bucketed search has a blank column there....