Splunk Search

Populating a summary index with transactions while preserving fields.

cudgel
Path Finder

I have the following search which I would like to use to populate a summary index for reporting (run every 30 minutes or so to keep the summary index relatively up to date). Right now I am executing this transaction in saved searches to report on fields the cisco_esa_addon extracts (e.g. a search for a sender domain piped to "chart count(mailto) over mailfrom") but the searches take a long time to execute (up to 2 hours for a 24 hour period) as it operates on a very large set of data (~20 million events per day from our ironport cluster). The fillnull command is due to a csv lookup that is applied to the sourcetype cisco_esa that marks known large volume senders as "status=safe".

eventtype="cisco_esa" | transaction mid maxspan=180s | fillnull value=NULL status

A sample transaction is below:

<22>Sep 23 09:25:21 mail_logs: Info: Start MID 106908193 ICID 166122699
<22>Sep 23 09:25:21 mail_logs: Info: MID 106908193 ICID 166122699 From: <jae-language+punEKLvrDf_UaCRaq1vSTfHzVbjrd@rswrte1xthvxlc.org>
<22>Sep 23 09:25:21 mail_logs: Info: MID 106908193 ICID 166122699 RID 0 To: <towanda.ollie@cindy.edu>
<22>Sep 23 09:25:21 mail_logs: Info: MID 106908193 ICID 166122699 RID 1 To: <carri.west@cindy.edu>
<22>Sep 23 09:25:21 mail_logs: Info: MID 106908193 ICID 166122699 RID 2 To: <mark.susann@cindy.edu>
<22>Sep 23 09:25:21 mail_logs: Info: MID 106908193 ICID 166122699 RID 3 To: <felisa@cindy.edu>
<22>Sep 23 09:25:21 mail_logs: Info: MID 106908193 ICID 166122699 RID 4 To: <tabetha.bowman@cindy.edu>
<22>Sep 23 09:25:21 mail_logs: Info: MID 106908193 Message-ID '<465C0721A14CC844B81CB20CA2E66025A269C1D97D@mchex2k7>'
<22>Sep 23 09:25:21 mail_logs: Info: MID 106908193 Subject 'Question'
<22>Sep 23 09:25:21 mail_logs: Info: MID 106908193 ready 4617 bytes from <jae-language+punEKLvrDf_UaCRaq1vSTfHzVbjrd@rswrte1xthvxlc.org>
<22>Sep 23 09:25:21 mail_logs: Info: MID 106908193 matched all recipients for per-recipient policy Silas 2 in the inbound table
<22>Sep 23 09:25:21 mail_logs: Info: MID 106908193 interim verdict using engine: CASE spam negative
<22>Sep 23 09:25:21 mail_logs: Info: MID 106908193 using engine: CASE spam negative
<22>Sep 23 09:25:21 mail_logs: Info: MID 106908193 interim AV verdict using Sophos CLEAN
<22>Sep 23 09:25:21 mail_logs: Info: MID 106908193 antivirus negative 
<22>Sep 23 09:25:21 mail_logs: Info: MID 106908193 queued for delivery
<22>Sep 23 09:25:21 mail_logs: Info: Delivery start DCID 51145731 MID 106908193 to RID [0, 1, 2, 3, 4]
<22>Sep 23 09:25:21 mail_logs: Info: Message done DCID 51145731 MID 106908193 to RID [0, 1, 2, 3, 4] 
<22>Sep 23 09:25:21 mail_logs: Info: MID 106908193 RID [0, 1, 2, 3, 4] Response 'Ok'
<22>Sep 23 09:25:21 mail_logs: Info: Message finished MID 106908193 done

I have tried following several of the suggestions posted in forum questions about populating a summary index from a transaction, but they either don't populate the summary index or they don't capture the original fields in such a fashion that I can report upon the data. Is there a way to preserve this as a multiline event with fields in a summary index?

1 Solution

Lowell
Super Champion

Trying to keep your _raw event with the current summary indexing services is not supported out of the box. So you either have to extract out all the desired fields in your summary indexing generating search, or create a custom summary indexing mechanism custom to your specific needs (This isn't as daungting as it sounds. The summary index process basically just writes out a new logfile that splunk turns around and indexes and stick the new events into the "summary" index.)


eventtype="cisco_esa" | transaction mid maxspan=180s mvlist=mailto | fillnull value=NULL status  | eval mailto=mvjoin(mailto, ";")| fields - _raw | fields mid, status, icid, mailfrom, mailto, subject, ...

Then when you search your summary index, you'll need to split the "mailto" field based on ";" values.

View solution in original post

Lowell
Super Champion

Trying to keep your _raw event with the current summary indexing services is not supported out of the box. So you either have to extract out all the desired fields in your summary indexing generating search, or create a custom summary indexing mechanism custom to your specific needs (This isn't as daungting as it sounds. The summary index process basically just writes out a new logfile that splunk turns around and indexes and stick the new events into the "summary" index.)


eventtype="cisco_esa" | transaction mid maxspan=180s mvlist=mailto | fillnull value=NULL status  | eval mailto=mvjoin(mailto, ";")| fields - _raw | fields mid, status, icid, mailfrom, mailto, subject, ...

Then when you search your summary index, you'll need to split the "mailto" field based on ";" values.

View solution in original post

cudgel
Path Finder

This worked for what I need. Cut the searches from 2 hours to 2 minutes when run against the summary index.

Lowell
Super Champion

I'm not 100% sure how multi value fields are handled. I've updated the answer above with some ideas that may help...

0 Karma

cudgel
Path Finder

Can you point to where I should start looking for how to do this? I am not really interested in the _raw events in the transaction per se, but the fields this transaction would normally generate which I could then use reporting operations on. Given those fields, I can customize the drilldown to use live data as needed.

mid="106908193"
icid="166122699"
mailfrom="jae-language+punEKLvrDf_UaCRaq1vSTfHzVbjrd@rswrte1xthvxlc.org"
mailto="towanda.ollie@cindy.edu"
mailto="carri.west@cindy.edu"
mailto="mark.susann@cindy.edu"
mailto="felisa@cindy.edu"
mailto="tabetha.bowman@cindy.edu"
dcid="51145731"

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.