I have mail processing log lines I need to combine and report on.
One type of log line contains strings like "cloned from Aggressive", "cloned from "Blocklist", etc.
Another type of log line contains a field "classification=" This field has values like "Zero-Hour", "Spam-Clean, Spam-Confirmed", "Passed", etc.
The various needed log lines do not share a common field name.
I need a report that combines all these disparate data, to show a stacked column of all email, colored as to its classification and "cloned from" counts by time interval.
I can get a report on classifications, but it drops the other two types of data. I can get a report on the other types of data (separately), but they drop the classification type, and so on.
How do I formulate the search/report to combine all these into a single chart?
Thank you to all. I believe I have a working solution to this:
index=myindex AND classification=* | timechart count by classification
| append [search index=myindex AND "cloned from" | timechart count AS Reputation]
| append [search index=myindex AND "User unknown" | timechart count AS "User Unknown"]
| append [search index=myindex AND "stat=Sent" | timechart count AS "Sent"]
| sort _time
Creating a stacked column (combined) chart from this gets me essentially what I need. Each subsearch is a different column, but I can live with that, I think.
I thought @martin_mueller's answer was better generally, as it avoids subsearches, which is why I erased mine. But I'm glad this works too.
If you want one combined column per time period, replace "| sort _time" with:
| transaction _time | fields - linecount _raw closed_txn duration eventcount field_match_sum | sort _time
So... an event in one log file doesn't have anything to do with an event from the other log file?
Extract the reasons from the first file into a field called classification
and run this:
sourcetype=st1 OR sourcetype=st2 | timechart count by classification
You can still extract the fields. For example the cloned-from-logfile:
\[(?<classification>cloned from \w+)\]
That way you get a classification
field in each source and hence can do a count by classification
.
same log file. different email filtering components all log to the same log facility/file. they each use different syntax in doing so. I tried the route of eval and renaming fields, etc. but some components log in such a way that there essentially are no field names.
If the field names don't match you can define field aliases or choose matching field names in your extractions or use rename
or eval
in the search to make the names match.
all the data are the same sourcetype (mail log), but only one has a classification field, only one has "User unknown", only one has "cloned from". Even though they are the same sourcetype, they have no intersecting filed names.
I need to tally all of the above (count(reject), count("cloned from"), count(classification) by classification) all on the same chart, so we have something like:
classification-1, classification-2, classification-3, classification-4, user-unknown, Bad-Reputation
with their individual tallies, charted as a stacked column over time.
Apr 22 03:03:31 host.com MM: [Jilter Processor 3 - Async Jilter Worker 37 - 127.0.0.1:40909-s3M33SSv011875] INFO user.log - AntiSpam.Log.Header.Debug: classification=Cloudmark, cloudmark_spam_score=100.00, cloudmark_content_score=100.00, cloudmark_ip_score=0.00, cloudmark_sender_score=0.00, cloudmark_analysis="v=2.1 cv=XMMJF2RE c=1 sm=1 tr=0 p=pKOSPnCJtLv9pbStFNYA:9 p=WthgjtGrYmcLPO50j_8A:9 a=XWQSJyLHRzquKgEqAPxMQA==:117 a=XWQSJyLHRzquKgEqAPxMQA==:17 a=aoWKRLlwSNoA:10 a=-N4dak_cAAAA:8 a=KGjhK52YXX0A:10 a=awlg0vDVAAAA:8 a=3fMtmCSMTM1j8r91:21 a=
Apr 22 03:00:41 host.com sm-mta[10912]: s3M30dPD010912: Milter: to=user@domain.com, reject=550 5.1.1 User unknown
Apr 22 03:01:19 host.com flow-control[16526]: something.com: selected class something.com [cloned from Moderate]
(more coming)
Do post some actual (anonymized) data from both sources.
While there is no common field name, there must be some bit of common information across the lines of data that identify a single piece of email. Otherwise you'll have no way to do this.
Or do you just want counts of two different things in one graph?