Splunk Search

Search query to display some sendmail details

Explorer

I'd like to come up with a search/report that can display the number of emails sent "from" a particular to unique "to" addresses over a period of time. The output come look something like:

user@domain1.com   15
somebody@another.com  35
test@someplace.com  50
Total Sent 100

So here is what I'm trying to figure out:

Find all results for "mysender@mydomain.com". Take the qid (example: p2IIboft020468) and do a subsearch to find the "to" address Format the results from the "to" search into the above format.

Tags (3)
0 Karma
2 Solutions

Splunk Employee
Splunk Employee

I've been using Ayn's method, myself, for some time, but transaction is painfully slow over a large log. In looking at this particular question, I actually found a vastly better method, courtesy of gkanapathy ( http://answers.splunk.com/questions/1478/sendmail-transactions ).

If you're looking for data about a particular from address, it's much faster to filter on those qids first and then look specifically at the To addresses, if the below assumptions are correct:

  • one email has multiple events
  • all events include the qid
  • the recipient address is only extracted once per email
  • you have extracted the qid, sender and recipient fields

Using:

sourcetype=mail 
   [search sourcetype=mail sender=myuser@mycompany.com 
       | dedup qid 
       | fields qid
   ] 
   | stats count by recipient

I'm not sure how to do a totals row in the same table.. If you're going to put this in a dashboard, though, you could do the above (sans the stats command) as a hidden search, have one post process for the main table, and then another post process for a single value field with the total number of emails sent.

I went ahead and did a test in my environment, comparing

tag=mail 
   [search tag=ab_mail SenderAddress=me@me.com 
         | dedup ExchangeMSGID 
         | fields ExchangeMSGID
   ] 
   | stats count by RecipientAddress 

to

tag=mail | transaction ExchangeMSGID maxspan=30s 
            | search SenderAddress=me@me.com 
            | stats count by RecipientAddress

over the last 30 days. The first one completed in 38 seconds; I killed the second one 4.8% in, after 227 seconds. Now I need to go re-write some of my own reports to use this better method...

View solution in original post

Splunk Employee
Splunk Employee

I would do this:

sourcetype=sendmail | stats values(from) as from,values(to) as tolist by qid | mvexpand tolist | rename tolist as to| stats count by from,to

It's basically the same as transaction, but if you're running with more than one indexer, it will scale up much better. That's because transaction does a lot of work that we don't need here, that prevents the job to be more effectively distributed by map-reduce (e.g., keeping timestamps in order, calculating durations). The stats command above doesn't do this as much, so should run much faster than transaction if you have multiple indexers.

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

I would do this:

sourcetype=sendmail | stats values(from) as from,values(to) as tolist by qid | mvexpand tolist | rename tolist as to| stats count by from,to

It's basically the same as transaction, but if you're running with more than one indexer, it will scale up much better. That's because transaction does a lot of work that we don't need here, that prevents the job to be more effectively distributed by map-reduce (e.g., keeping timestamps in order, calculating durations). The stats command above doesn't do this as much, so should run much faster than transaction if you have multiple indexers.

View solution in original post

0 Karma

Explorer

Thanks. I did notice that using transaction would really chock the system pretty good. I had come up with something that was somewhat useful, but after seeing the responses in this thread, I still have a lot to learn!

0 Karma

Splunk Employee
Splunk Employee

I've been using Ayn's method, myself, for some time, but transaction is painfully slow over a large log. In looking at this particular question, I actually found a vastly better method, courtesy of gkanapathy ( http://answers.splunk.com/questions/1478/sendmail-transactions ).

If you're looking for data about a particular from address, it's much faster to filter on those qids first and then look specifically at the To addresses, if the below assumptions are correct:

  • one email has multiple events
  • all events include the qid
  • the recipient address is only extracted once per email
  • you have extracted the qid, sender and recipient fields

Using:

sourcetype=mail 
   [search sourcetype=mail sender=myuser@mycompany.com 
       | dedup qid 
       | fields qid
   ] 
   | stats count by recipient

I'm not sure how to do a totals row in the same table.. If you're going to put this in a dashboard, though, you could do the above (sans the stats command) as a hidden search, have one post process for the main table, and then another post process for a single value field with the total number of emails sent.

I went ahead and did a test in my environment, comparing

tag=mail 
   [search tag=ab_mail SenderAddress=me@me.com 
         | dedup ExchangeMSGID 
         | fields ExchangeMSGID
   ] 
   | stats count by RecipientAddress 

to

tag=mail | transaction ExchangeMSGID maxspan=30s 
            | search SenderAddress=me@me.com 
            | stats count by RecipientAddress

over the last 30 days. The first one completed in 38 seconds; I killed the second one 4.8% in, after 227 seconds. Now I need to go re-write some of my own reports to use this better method...

View solution in original post

Legend

My take on this: first, create the needed field extractions for the logs. Let's call the sourcetype "sendmail" and the fields of primary interest "qid", "from" and "to". Now with these extractions defined, group events together with a transaction based on qid. Then simply make a toplist using top, say for 100 entries.

sourcetype="sendmail" | transaction qid | top 100 from,to

That should give you something similar to what you seem to be looking for. The only thing missing would be the total count at the end, which you'll have to handle separately.