I'm not that bad in searching 😉 but this case is a little over my head and I need some clever idea.
I have postfix logs. They have three types of events. All events have queue_id field which identifies the message. The events have either from, to or (to and orig_to) fields set.
I want to do
stats values(from) by to orig_to
The problem is that the fields are in separate events. And both methods of joining them together are faulted in one way or another.
If I do
eventstats values(from) by queue_id
I get the desired result but if I search over a longer timespan I'm hitting a memory limit. Sure, I can raise the limit in the config but it would be a better solution to find a more reasonable search for it 🙂
If I try to do
transaction queue_id
Of course everthing works but the transaction joins the to and orig_to fields into multivalued fields and there is no reliable way to "unjoin" them (over one transaction you can have more to values than orig_to and you can't simply do mapping between the values).
So I'm a little stuck how to transform my data to get from, to, orig_to tuples so I can later pass it to stats.
Any hints?
Of course, if nothing works I'll simply raise the limits and do eventstats but it's not a pretty solution.
Not a very elegant idea but it seems to work quite OK (and it doesn't exceed the limits).
index=postfix (some_additional_limiting_conditions) (to=* OR from=* OR orig_to=*)
| fields queue_id to from orig_to _time
| sort 0 _time
| streamstats current=t window=100 values(from) as from by queue_id
| stats values(from) as from latest(_time) as _time by to orig_to
That's pretty much what I needed all along 🙂
I must have missed something here. The standard way to join these field is just values(*) as * by unique ID. Does it apply?
| fields from to orig_to queue_id
| stats values(*) as * by queue_id
| stats values(from) by to orig_to
(BTW, I am curious about the semantic significance of the number. Appreciate an elaboration.)
The problem is that if I aggregate by values(*), I'll get all to and orig_to values but I lose the connection between them.
Let's say I have it like (different events, same queue_id):
from: a@b.c
to: c@d.e
to: e@f.g, orig_to: g@h.i
After values(to) I wouldn't know which of them relates to the orig_to that we know.
But it seems sorting by time and streamstatsing does the job.
OK. It seems I can do I thought about streamstats but for that I think I'd have to re-sort "backwards" time-wise - another not very best idea.
Not a very elegant idea but it seems to work quite OK (and it doesn't exceed the limits).
index=postfix (some_additional_limiting_conditions) (to=* OR from=* OR orig_to=*)
| fields queue_id to from orig_to _time
| sort 0 _time
| streamstats current=t window=100 values(from) as from by queue_id
| stats values(from) as from latest(_time) as _time by to orig_to
That's pretty much what I needed all along 🙂