Re: How to combine disparate log data into a singl...

dgillam · ‎06-20-2014

I have mail processing log lines I need to combine and report on.

One type of log line contains strings like "cloned from Aggressive", "cloned from "Blocklist", etc.

Another type of log line contains a field "classification=" This field has values like "Zero-Hour", "Spam-Clean, Spam-Confirmed", "Passed", etc.

The various needed log lines do not share a common field name.

I need a report that combines all these disparate data, to show a stacked column of all email, colored as to its classification and "cloned from" counts by time interval.

I can get a report on classifications, but it drops the other two types of data. I can get a report on the other types of data (separately), but they drop the classification type, and so on.

How do I formulate the search/report to combine all these into a single chart?

dgillam · ‎06-20-2014

Thank you to all. I believe I have a working solution to this:

Creating a stacked column (combined) chart from this gets me essentially what I need. Each subsearch is a different column, but I can live with that, I think.

aweitzman · ‎06-20-2014

I thought @martin_mueller's answer was better generally, as it avoids subsearches, which is why I erased mine. But I'm glad this works too.

If you want one combined column per time period, replace "| sort _time" with:

| transaction _time | fields - linecount _raw closed_txn duration eventcount field_match_sum | sort _time

martin_mueller · ‎06-20-2014

So... an event in one log file doesn't have anything to do with an event from the other log file?

Extract the reasons from the first file into a field called classification and run this:

sourcetype=st1 OR sourcetype=st2 | timechart count by classification

martin_mueller · ‎06-20-2014

You can still extract the fields. For example the cloned-from-logfile:

\[(?<classification>cloned from \w+)\]

That way you get a classification field in each source and hence can do a count by classification.

dgillam · ‎06-20-2014

same log file. different email filtering components all log to the same log facility/file. they each use different syntax in doing so. I tried the route of eval and renaming fields, etc. but some components log in such a way that there essentially are no field names.

martin_mueller · ‎06-20-2014

If the field names don't match you can define field aliases or choose matching field names in your extractions or use rename or eval in the search to make the names match.

dgillam · ‎06-20-2014

all the data are the same sourcetype (mail log), but only one has a classification field, only one has "User unknown", only one has "cloned from". Even though they are the same sourcetype, they have no intersecting filed names.

dgillam · ‎06-20-2014

I need to tally all of the above (count(reject), count("cloned from"), count(classification) by classification) all on the same chart, so we have something like:

classification-1, classification-2, classification-3, classification-4, user-unknown, Bad-Reputation

with their individual tallies, charted as a stacked column over time.

dgillam · ‎06-20-2014

Apr 22 03:03:31 host.com MM: [Jilter Processor 3 - Async Jilter Worker 37 - 127.0.0.1:40909-s3M33SSv011875] INFO user.log - AntiSpam.Log.Header.Debug: classification=Cloudmark, cloudmark_spam_score=100.00, cloudmark_content_score=100.00, cloudmark_ip_score=0.00, cloudmark_sender_score=0.00, cloudmark_analysis="v=2.1 cv=XMMJF2RE c=1 sm=1 tr=0 p=pKOSPnCJtLv9pbStFNYA:9 p=WthgjtGrYmcLPO50j_8A:9 a=XWQSJyLHRzquKgEqAPxMQA==:117 a=XWQSJyLHRzquKgEqAPxMQA==:17 a=aoWKRLlwSNoA:10 a=-N4dak_cAAAA:8 a=KGjhK52YXX0A:10 a=awlg0vDVAAAA:8 a=3fMtmCSMTM1j8r91:21 a=

dgillam · ‎06-20-2014

Apr 22 03:00:41 host.com sm-mta[10912]: s3M30dPD010912: Milter: to=user@domain.com, reject=550 5.1.1 User unknown

Apr 22 03:01:19 host.com flow-control[16526]: something.com: selected class something.com [cloned from Moderate]

(more coming)

martin_mueller · ‎06-20-2014

Do post some actual (anonymized) data from both sources.

aweitzman · ‎06-20-2014

While there is no common field name, there must be some bit of common information across the lines of data that identify a single piece of email. Otherwise you'll have no way to do this.

Or do you just want counts of two different things in one graph?

How to combine disparate log data into a single time chart?

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

They're back! Join the SplunkTrust and MVP at .conf24

Enterprise Security Content Update (ESCU) | New Releases