Solved: Is the Transaction command suitable for large volu...

alexandermunce · ‎12-19-2016

After reading various questions/answers on the topic and the relevant Splunk documentation I am still unsure whether the Transaction command is suitable for large volumes of data (say for example 500,000+ events across two sources).

When I have tested this command on my data sources, I notice that it is only seemingly able to process a portion of the search, with the remaining time period returning zero results.

Would this be due to a high demand on the server resources being required?

Related question:

Does the Transaction command actually concatenate the raw data + fields from various events and combine them into a single event in your search results?

What is the benefit of using this command - how does it assist in aggregating related events in terms of the data returned?

Is it not just as feasible to use the stats command to display your related event data (combined with other eval commands such as coalesce)?

niketn · ‎12-19-2016

Transaction is just one of several event correlation mechanisms that SPL offers. However, application of transaction is use case specific. You can choose between stats, joins, append, appendcols, lookup, transaction, sub-query. Refer to following documentation on choosing between various correlation mechanisms.

http://docs.splunk.com/Documentation/Splunk/6.5.1/Search/Abouteventcorrelation

In your case transaction is dropping events because there are too many events to correlate. You can add keepevicted=true however, query performance will degrade as Splunk will not mark any event as orphaned unless search completes.

In your case stats might work out better. Try to provide the sourcetypes in base search so that events are brought together and then aggregate based on your common fields. If you can provide sample or mocked data example from both sources then, we would be able to assist you further.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

woodcock · ‎12-20-2016

The main problem with transaction is that the work must be done on the search head which means you have 2 very important downsides. First, there is no map-reduce; the indexers, instead of sharing the processing load and reducing the size of the partial results instead are merely fileservers for the data. Second, even more important, nothing can be discarded along the way which means that as you build aggregate events out of the raw events, you have an incredible explosion of RAM consumption on the search head. Exhausting RAM is what causes transaction to give up and the worst part is that it does so with no direct indication of it (silently). The good news is that you can almost always use stats instead. Check out Nick Mealy's Virtual Conf session (March 2016) here:

http://wiki.splunk.com/Virtual_.conf

niketn · ‎12-19-2016

Transaction is just one of several event correlation mechanisms that SPL offers. However, application of transaction is use case specific. You can choose between stats, joins, append, appendcols, lookup, transaction, sub-query. Refer to following documentation on choosing between various correlation mechanisms.

http://docs.splunk.com/Documentation/Splunk/6.5.1/Search/Abouteventcorrelation

In your case transaction is dropping events because there are too many events to correlate. You can add keepevicted=true however, query performance will degrade as Splunk will not mark any event as orphaned unless search completes.

In your case stats might work out better. Try to provide the sourcetypes in base search so that events are brought together and then aggregate based on your common fields. If you can provide sample or mocked data example from both sources then, we would be able to assist you further.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

snoobzilla · ‎12-20-2016

Nick from Sideview did a great presentation on use of stats...

http://conf.splunk.com/sessions/2016-sessions.html#search=Let%20Stats%20Sort%20Them%20Out&

nabeel652 · ‎12-19-2016

Transaction command is very heavy on resources. So the first thing is how much processing power you have. Yes, it can be used on large amount of data but you need to carefully examine your data and its behaviour and then select the Transaction Definition options including startswith, endswith, maxspan, maxevents, maxpause, keeporphans, unifyends etc;

https://docs.splunk.com/Documentation/Splunk/6.5.1/SearchReference/Transaction

Is the Transaction command suitable for large volumes of data and what is the benefit of using this command?

Can’t make it to .conf25? Join us online!

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Unlock What’s Next: The Splunk Cloud Platform at .conf25

Index This | How many sevens are there between 1 and 100?

Are you a member of the Splunk Community?

Is the Transaction command suitable for large volumes of data and what is the benefit of using this command?

Can’t make it to .conf25? Join us online!

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Unlock What’s Next: The Splunk Cloud Platform at .conf25

Index This | How many sevens are there between 1 and 100?