The following two logical events are supposed to be one as they share the same
threadId - but instead they are split into two by my transaction command below:
Jan 22 10:33:56 smtp1 postfix/smtpd: 3AB561005181: client=[REDACTED] host = smtp1 source = /var/log/maillog sourcetype = postfix_syslog
Jan 22 10:33:56 smtp1 postfix/cleanup: 3AB561005181: message-id=<521185171.8189.**********.JavaMail.levels.[REDACTED] Jan 22 10:33:56 smtp1 postfix/qmgr: 3AB561005181: from=[REDACTED], size=2600, nrcpt=1 (queue active) Jan 22 10:33:56 smtp1 postfix/smtp: 3AB561005181: to=[REDACTED], relay=[REDACTED], delay=0.03, delays=0/0/0.01/0.02, dsn=2.0.0, status=sent (250 ok: Message 58706800 accepted) Jan 22 10:33:56 smtp1 postfix/qmgr: 3AB561005181: removed host = smtp1 source = /var/log/maillog sourcetype = postfix_syslog
(Most other events are correctly grouped but a number aren't.)
How would you modify the following command to correctly group events?
sourcetype="postfix_syslog" threadId=* | transaction "threadId", "host", "client" maxspan=24h maxpause=4h keepevicted=true startswith="client\=" | search threadId=3AB561005181
Note: if I add
message-id to the list of fields in
transaction, the events are grouped properly - but some others break. Why?
Generally, please share the logic of how transactions should be crafted for correct behavior.
(Splunk Enterprise 7.1.2)
Its not totally clear from your example, but the second group of logs above do not appear to contain any fields for 'client'
transaction "threadId", "host", "client" will group all events which share ALL 3 of those common values.
In your example, I can see that thread and host would match in all cases, but client may not.
If only some of your events contained a client id and some did not, you would naturually expect transaction to group them into two seperate transactions.
Nick, I'd be the first to admit that
transaction's behavior is not something I fully - or even remotely - understand, nor that I fully understand its documentation - i.e. how it's supposed to behave. That said, please take a look at "Specifying multiple fields" section in Transaction Search Reference to see that the logical event defined by
transaction's output is not supposed to comprise of events that each has all of the fields. It's not the case in my testing, and it's not how it's supposed to behave, per that article.
On a side note, when I run the same exact
transaction command on the same group of events I did when I posted my question, it now correctly groups those events.
And no, I have no clue what's happening but one of the possibilities is that
transaction is in "alpha" version - i.e. unpredictable behavior that is only partially documented. That in a way makes sense given how many people are suggesting not to use it, and use
And yes, you're right,
client= happens only once per group - that's the way Postfix logs events.
Yes, its not immediately obvious from the docs - specifying 3 fields does not mean all 3 fields must exist, but IF they do, then the transaction will group based on them: (emphasis added)
"IF there is a transitive relationship between the fields in the fields list AND if the related events appear in the CORRECT sequence..."
In your case, host and trace appear in each event, but client is sporadic. maxspan could therefore become critical in satisfying a match.
Do you have satisfactory results by ommiting client?
I can't even reproduce the unexpected behavior in my original search - so all bets are off... 🙂
Like I said simply re-running my original search produces expected (correct) results today - I'll have to find another anomaly where the results are unexpected and can be reliably reproduced. But even before that I'll likely have to upgrade our instance to the latest version (from 7.1.2 to 8.0.1) just in case my issue is the result of a bug that's been squashed in subsequent releases.
Oh and the kicker is that some of the Postfix threads can run for weeks... Non-deliverable emails are deferred, then retried, then deferred for longer and so on... So
maxpause are of limited use here despite relative simplicity of the events.
If you have lots of long running open transactions (ie your search window is large), you could be hitting memory limits causing evictions - your example above is all the same timeframe, but it could be contributing.
sourcetype="postfix_syslog" threadId=* | stats values(_time) as _time values(host) as host values(size) as size values(client) as client values(from) as from values(to) as to values(status) as status range(_time) as duration count as eventcount values(relay) as relay values(message_id) as message_id by threadId | table _time host threadId size client from to status duration eventcount relay message_id
This may be same results. try this.
Not the same results, with some errors and generally I'd like to learn to use
transaction correctly vs. using
stats. Is there a reason I consistently get advice to use
stats when I ask questions about
transaction (performance aside)?
deferred email threads can last several days and have something like this in the
startswith="client\="will not work.
Any idea why? Seems to be working in most cases and it seems all postfix smtp threads start that way...
The goal is to extract a set of additional fields that are only available in
transaction output. E.g.:
| table _time host threadId size client from to status duration eventcount relay message_id
(One could probably do something similar - and faster - via
stats - yet my goal is to group events related to a single Postfix transaction together for subsequent analysis - which only
transaction can do.)