Splunk Search

Using Transaction with One to Many Relationships

Splunk Employee
Splunk Employee

I've something of a challenge: How to best generate a single event in a summary index that is based on a transaction across four different fields when there is not a 1:1 relationship across all the fields.

I'm trying to do a summary index with mail flow information across Exchange and Cisco Ironports. The Ironports have a MID field that is specific to each message, and an ICID and a DCID that is specific to each Incoming Connection and Destination Connection. I'd initially done a transaction MSGID MID ICID DCID, but then discovered that the ICID and DCID can be used by multiple MIDs.

I know I can solve this by generating an enormous table of all ICIDs, DCIDs and the information I want from them, then tossing that into a lookup table via |outputlookup, but having to run that as a scheduled search that will always complete before the primary summary indexing search seems very ugly.

Is there a better way?

0 Karma

Splunk Employee
Splunk Employee

As an aside from the main question -- suppose I'm trying to simplify 200 events down to 1 event to speed searching. It's not statistical work, not using the si commands, but speed is no less essential: is there a better way than using a summary index?

0 Karma

Splunk Employee
Splunk Employee

You may want to look at the searchtxn command. It may be better suited for what you're looking for.

http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Searchtxn

Splunk Employee
Splunk Employee

I could replicate its results by searching for:
host=MyExchangeServer OR host=MyIronport MSGID=[A Test Message ID Here] IronportMID=* | transaction MSGID

I would try to research it more myself, but there appears to be a total dearth of examples on answers or the web, beyond the official manpage. Anything I can change to give searchtxn a better shot?

0 Karma

Splunk Employee
Splunk Employee

I can't seem to get searchtxn to work. I've defined a transaction that I have verified works with the normal transaction command:

[NewMailFlowTransaction]
maxspan=24h
maxpause=24h
maxopentxn=42000
maxopenevents=400000
connected=t
fields=IronportMID,MSGID
search=host=MyExchangeServer OR host=MyIronport  (MSGID=* OR IronportMID=*)

I run the search | searchtxn NewMailFlowTransaction MSGID=[A Test Message ID Here] but it only ever returns the one line that contains both the MSGID and the IronportMID.

0 Karma

Splunk Employee
Splunk Employee

I've never looked at searchtxn before -- let me check that out.

0 Karma

Splunk Employee
Splunk Employee

Good point, searchtxn is a better fit for this use case than transaction.

0 Karma

Splunk Employee
Splunk Employee

In general, transaction is best reserved for situations where you want to group "events" together for viewing by an analyst. Stats, timechart, and other reporting/transforming commands are usually a better choice for dealing with "results", population of the summary index, and similar use cases.

0 Karma

Splunk Employee
Splunk Employee

I am confused why you are using the summary index to track individual message transactions. The summary index is designed to help aggregation of statistical trends.

That aside, you can do this:

 ... | stats values(icid) as icid values(dcid) as dcid by mid | mvexpand icid | mvexpand dcid | ...

Splunk Employee
Splunk Employee

The rationale behind this is that I want to be able to quickly search through my email logs. The raw logs contain between 20 and 50 events for the average message, with some messages containing 200 events. All I actually want, though, is 1 event. Thus, a summary index speeds search considerably, and even more importantly, avoids the need to use transaction to get critical details. This is all before knowing about searchtxn, though, so that may be my silver bullet.

0 Karma

Splunk Employee
Splunk Employee

It's also worth asking, what exactly are the results you are trying to get out at the end? i.e., why are you creating this table? Sometimes it's easier in Splunk to not create a large intermediate result to get to where you want to go.

0 Karma

Splunk Employee
Splunk Employee

Interestingly enough, the transaction examples on the manual page ( http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Transaction ) will suffer the same problem. In my environment, there are about 36% more dc(MIDs) than dc(DCIDs), when doing a search for just the single log message where they co-exist.

0 Karma