I have a requirement for a unique dashboard on mail log. Here I need to get the top ten receiver by email size. But that search need to embroidery with two distinct events (for example, at first it contain the email size, and then receiver address) with qid (queue ID).
Below is the logic for this requirement.
| stats list(to) as Receivers by qid
| join qid [search index=emaildata2008 sourcetype=allemaillog | stats list(size) as "Email Size" by qid ]
| dedup Receivers | bucket _time span=1h |top Receivers, “Email Size” | sort - “Email Size”
Can anyone help me, as this logic correct or not. Just for better surety.
Well, I am not sure whether I can get that logs or not. But possibility is ZERO.
Let me try to get it.
index=emaildata2008 sourcetype=all_email_log | stats list(to) AS Receivers first(size) AS size BY qid | mvexpand Receivers | stats sum(size) AS "Total Email Size" count AS "Number of Emails" BY Recievers | sort 10 - "Total Email Size"
Your logic worked. But could you please explain, what is the issue on my logic. Just want to avoid any problem on next time.
You have not shown the raw data so my solution may be slightly overcomplicated, too. First of all, avoid
transaction if at all possible, they have unescapable limits which means that they do not scale well at all. Most of the time
stats can be used and it map/reduces so it is FAST. If you check the
Job Inspector, you will see that my search is MUCH faster and it probably also returns more (accurate) results. My solution makes a single pass through the data and then a second pass to refine the intermediate results. The best way to learn is to strip of the last pipeline and see what each step does, and then do it again until you get to the point where you know what the remainder is doing.