I've something of a challenge: How to best generate a single event in a summary index that is based on a transaction across four different fields when there is not a 1:1 relationship across all the fields.
I'm trying to do a summary index with mail flow information across Exchange and Cisco Ironports. The Ironports have a MID field that is specific to each message, and an ICID and a DCID that is specific to each Incoming Connection and Destination Connection. I'd initially done a transaction MSGID MID ICID DCID
, but then discovered that the ICID and DCID can be used by multiple MIDs.
I know I can solve this by generating an enormous table of all ICIDs, DCIDs and the information I want from them, then tossing that into a lookup table via |outputlookup
, but having to run that as a scheduled search that will always complete before the primary summary indexing search seems very ugly.
Is there a better way?
As an aside from the main question -- suppose I'm trying to simplify 200 events down to 1 event to speed searching. It's not statistical work, not using the si commands, but speed is no less essential: is there a better way than using a summary index?
You may want to look at the searchtxn
command. It may be better suited for what you're looking for.
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Searchtxn
I could replicate its results by searching for:
host=MyExchangeServer OR host=MyIronport MSGID=[A Test Message ID Here] IronportMID=* | transaction MSGID
I would try to research it more myself, but there appears to be a total dearth of examples on answers or the web, beyond the official manpage. Anything I can change to give searchtxn a better shot?
I can't seem to get searchtxn to work. I've defined a transaction that I have verified works with the normal transaction command:
[NewMailFlowTransaction]
maxspan=24h
maxpause=24h
maxopentxn=42000
maxopenevents=400000
connected=t
fields=IronportMID,MSGID
search=host=MyExchangeServer OR host=MyIronport (MSGID=* OR IronportMID=*)
I run the search | searchtxn NewMailFlowTransaction MSGID=[A Test Message ID Here]
but it only ever returns the one line that contains both the MSGID and the IronportMID.
I've never looked at searchtxn before -- let me check that out.
Good point, searchtxn is a better fit for this use case than transaction.
In general, transaction is best reserved for situations where you want to group "events" together for viewing by an analyst. Stats, timechart, and other reporting/transforming commands are usually a better choice for dealing with "results", population of the summary index, and similar use cases.
I am confused why you are using the summary index to track individual message transactions. The summary index is designed to help aggregation of statistical trends.
That aside, you can do this:
... | stats values(icid) as icid values(dcid) as dcid by mid | mvexpand icid | mvexpand dcid | ...
The rationale behind this is that I want to be able to quickly search through my email logs. The raw logs contain between 20 and 50 events for the average message, with some messages containing 200 events. All I actually want, though, is 1 event. Thus, a summary index speeds search considerably, and even more importantly, avoids the need to use transaction to get critical details. This is all before knowing about searchtxn, though, so that may be my silver bullet.
It's also worth asking, what exactly are the results you are trying to get out at the end? i.e., why are you creating this table? Sometimes it's easier in Splunk to not create a large intermediate result to get to where you want to go.
Interestingly enough, the transaction examples on the manual page ( http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Transaction ) will suffer the same problem. In my environment, there are about 36% more dc(MIDs) than dc(DCIDs), when doing a search for just the single log message where they co-exist.