Splunk Search

transaction vs stats commands

cfrln
Explorer

When should I use the transaction command and when should I use stats?

I could use a recap...

1 Solution

Stephen_Sorkin
Splunk Employee
Splunk Employee

The transaction command is most useful in two specific cases:

  1. Unique id (from one or more fields) alone is not sufficient to discriminate between two transactions. This is the case when the identifier is reused, for example web sessions identified by cookie/client IP. In this case, time span or pauses are also used to segment the data into transactions. In other cases when an identifier is reused, say in DHCP logs, a particular message may identify the beginning or end of a transaction.

  2. When it is desirable to see the raw text of the events combined rather than analysis on the constituent fields of the events.

In other cases, it's usually better to use stats as the performance is higher, especially in a distributed search environment. Often there is a unique id and stats can be used.

For example, to compute statistics on the duration of trades identified by the unique id "trade_id" the following searches will yield the same answer:

... | transaction trade_id | chart count by duration span=log2

and

... | stats range(_time) as duration by trade_id | chart count by duration span=log2

The second search is more efficient.

If, however, trade_ids are reused but each trade ends with some text "END" the only viable solution is:

... | transaction trade_id endswith=END | chart count by duration span=log2

If, instead, trade_ids are not reused within 10 minutes, the solution is:

... | transaction trade_id maxpause=10m| chart count by duration span=log2

View solution in original post

woodcock
Esteemed Legend

One other surprising and wonderful thing about the transaction command is that it recognizes transitive relationships. If some events have userID & src_IP and others have sessionID & src_IP and still others have sessionID & userID, the transaction command will be able to recognize the transitive relationships and bundle them all together with a single command; this is not the case for stats.

0 Karma

Stephen_Sorkin
Splunk Employee
Splunk Employee

The transaction command is most useful in two specific cases:

  1. Unique id (from one or more fields) alone is not sufficient to discriminate between two transactions. This is the case when the identifier is reused, for example web sessions identified by cookie/client IP. In this case, time span or pauses are also used to segment the data into transactions. In other cases when an identifier is reused, say in DHCP logs, a particular message may identify the beginning or end of a transaction.

  2. When it is desirable to see the raw text of the events combined rather than analysis on the constituent fields of the events.

In other cases, it's usually better to use stats as the performance is higher, especially in a distributed search environment. Often there is a unique id and stats can be used.

For example, to compute statistics on the duration of trades identified by the unique id "trade_id" the following searches will yield the same answer:

... | transaction trade_id | chart count by duration span=log2

and

... | stats range(_time) as duration by trade_id | chart count by duration span=log2

The second search is more efficient.

If, however, trade_ids are reused but each trade ends with some text "END" the only viable solution is:

... | transaction trade_id endswith=END | chart count by duration span=log2

If, instead, trade_ids are not reused within 10 minutes, the solution is:

... | transaction trade_id maxpause=10m| chart count by duration span=log2

gkanapathy
Splunk Employee
Splunk Employee

Both are similar in that they allow you to aggregate individual events/lines together.

However, stats is meant to calculate statistical values on events grouped by the value of fields, and discards the events.

transaction can also group events based on the same field values, but it does not compute statistics over the group events (other than the duration between oldest and newest), while retaining the raw event and other field values from the original event. transaction can also group events using much more complex criteria, such as limiting the grouping by time span or delays, requiring terms to define the start of a group or the end of a group,

There is a small set of use cases that can be solved with either one, primarily through clever use of stats. Mostly these use some variation of stats max(_time),min(_time) by grouping_field to compute the duration in lieu of using transaction to compute the duration of a group.

In some cases stats may be less resource-intensive than transaction, though in those cases where either command can be used, any difference is likely to be small.

cervelli
Splunk Employee
Splunk Employee

Transaction marks a series of events as interrelated, based on a shared piece of common information. e.g. the flow of a packet based on clientIP address, a purchase based on user_ID.

Stats produces statistical information by looking a group of events. Primarily used when the field(s) in question has a numeric value, and you want to do a statistical calculation. e.g. the average time to complete a transaction based on the averaged sum of all latencies, find re-try attempts that exceed session time out by more than 2 standard deviations.

Both combine events. However transactions creates relationships based on metadata you provide, while stats calculates statistical relationships based on values or relationships already defined (by you, or by splunk).

Get Updates on the Splunk Community!

Monitoring Postgres with OpenTelemetry

Behind every business-critical application, you’ll find databases. These behind-the-scenes stores power ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...