Splunk Search

How to correlate 2 sets of logs together in one transaction?

bkirk
Path Finder

I have 3 mail servers like so, 2 postfix servers and the last one not important Exchange, like so:
Postfix1 -> Postfix2 -> Exchange

Both Postfix server logs are in splunk. I see that when postfix1 hands email to postfix2 it correlates mail with a queued_queue_id

How can I correlate the 2 sets of logs together in one transaction?

Sep  5 11:04:27 nagaviper postfix/smtpd[21218]: B570F448E65: client=unknown[192.168.1.65]
Sep  5 11:04:27 nagaviper postfix/cleanup[21216]: B570F448E65: message-id=<20140905150427.B570F448E65@nagaviper.123.com>
Sep  5 11:04:27 nagaviper postfix/qmgr[20283]: B570F448E65: from=<autonotify@test.com>, size=1771, nrcpt=1 (queue active)
Sep  5 11:04:27 nagaviper postfix/smtp[20729]: B570F448E65: to=<user@123.com>, relay=localhost[127.0.0.1]:10025, delay=0.18, delays=0.04/0/0.05/0.09, dsn=2.0.0, status=sent (250 OK: data received)
Sep  5 11:04:27 nagaviper postfix/qmgr[20283]: B570F448E65: removed

Sep  5 11:04:27 habanero postfix/smtpd[46622]: 8966F601B4: client=businessclass.comcast.net[11.22.33.44]
Sep  5 11:04:27 habanero postfix/cleanup[52613]: 8966F601B4: warning: header Subject: Speed Alert!!! from businessclass.comcast.net[11.22.33.44]; from=<autonotify@test.com> to=<user@123.com> proto=ESMTP helo=<businessclass.comcast.net>
Sep  5 11:04:27 habanero postfix/cleanup[52613]: 8966F601B4: message-id=<>
Sep  5 11:04:27 habanero postfix/qmgr[56439]: 8966F601B4: from=<autonotify@test.com>, size=1418, nrcpt=1 (queue active)
Sep  5 11:04:27 habanero postfix/smtp[52618]: 8966F601B4: to=<user@123.com>, relay=nagaviper.123.com[192.168.1.67]:25, delay=0.32, delays=0.27/0/0/0.04, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as B570F448E65)
Sep  5 11:04:27 habanero postfix/qmgr[56439]: 8966F601B4: removed

I initially had:

| transaction queue_id keepevicted=true maxspan=60s| transaction message_id keepevicted=true maxspan=120s 

Until I found not all mail servers give a message_id and when it is blank as it was in this example it failed to correlate the entry.

Thank you,
Brian

1 Solution

chris
Motivator

In the events of the habanero instance there is a line with the queue id of the nagaviper instance:

Sep  5 11:04:27 habanero postfix/smtp[52618]: 8966F601B4: to=<user@123.com>, relay=nagaviper.123.com[192.168.1.67]:25, delay=0.32, delays=0.27/0/0/0.04, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as B570F448E65)

You could try the following:

| transaction queue_id keepevicted=true maxspan=60s| rex "queued\sas\s(?<matching_id>[^\)]+)" | eval matching_id=coalesce(matching_id,queue_id) | transaction matching_id keepevicted=true
  • The rex command will extract the id in the habanero events and will fail in the nagaviper events
  • The coalesce command will fill matching_id with the first value it will find. Which is matching_id for the habanero events and queue_id for the nagaviper events
  • Then you can use the final transaction cmd

View solution in original post

chris
Motivator

In the events of the habanero instance there is a line with the queue id of the nagaviper instance:

Sep  5 11:04:27 habanero postfix/smtp[52618]: 8966F601B4: to=<user@123.com>, relay=nagaviper.123.com[192.168.1.67]:25, delay=0.32, delays=0.27/0/0/0.04, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as B570F448E65)

You could try the following:

| transaction queue_id keepevicted=true maxspan=60s| rex "queued\sas\s(?<matching_id>[^\)]+)" | eval matching_id=coalesce(matching_id,queue_id) | transaction matching_id keepevicted=true
  • The rex command will extract the id in the habanero events and will fail in the nagaviper events
  • The coalesce command will fill matching_id with the first value it will find. Which is matching_id for the habanero events and queue_id for the nagaviper events
  • Then you can use the final transaction cmd

felipetesta
Path Finder

coalesce is the way to go. If you have the proper sourcetype, you may have queued_as field, so "| eval matching_id=coalesce(queued_as,queueid) |" will do and you can drop the previous "rex". Be aware that you may get the same queue_id value on both servers within the maxspan timeframe, so consider differentiating your transactions adding the host field too: "| transaction queue_id, host ..."

Splunk can tame the "Postfix logs" beast! 🙂

chris
Motivator

You're welcome

0 Karma

bkirk
Path Finder

Thank you!

It worked and thank you for the awesome explanation!

Get Updates on the Splunk Community!

Observability | How to Think About Instrumentation Overhead (White Paper)

Novice observability practitioners are often overly obsessed with performance. They might approach ...

Cloud Platform | Get Resiliency in the Cloud Event (Register Now!)

IDC Report: Enterprises Gain Higher Efficiency and Resiliency With Migration to Cloud  Today many enterprises ...

The Great Resilience Quest: 10th Leaderboard Update

The tenth leaderboard update (11.23-12.05) for The Great Resilience Quest is out &gt;&gt; As our brave ...