Knowledge Management

How to make transaction count correct in Summary Index

Engager

I mean a situation and cannot figure out how to solve.
I have a task to calculate accurate transaction count totals into summary index. Each transaction has 2 events (let's say one begin event and another end event). Caused by scheduled search time range, there are always few transactions are broken at the beginning and end of the time range of a search which means some transactions are only seen the end event and others only seen begin event. Actually in the log, those transaction are completed if search in a broader time range. In an interactive search mode, it does not bring too much trouble as I can filter out all eventcount=1 to get all completed transaction. However, when I try to build a summary index to calculate the exact count, it bring me some confusion.
If I filter out eventcount=1, then those transactions has events cross 2 time range will be missing for ever. If I calculate all eventcount=1, then the transaction total counts will be larger than actual as the eventcount=1 transactions will be counted twice in former and latter scheduled search. And I cannot just pick the beginning event as the count because I still need the return code from the end event and calculate duration into the summary index too.

I tried to think about closed_txn, it seems not working in this scenario either.

Any suggestion for how to conquer this?

1 Solution

Motivator

So the fundamental problem is that you want to limit the scope of the search for the start of the event to the timeframe of the summary search, while having a wider time range for subsequent entries so you can get complete transactions even if they cross over.

I think this is something set union can help with.

First of all, you need to know the maximum expected duration of a transaction, and make sure that you're running your summary search with enough room after the end of the timeframe for transactions to complete. So if a transaction can last 30 seconds at most, and you're capturing 15 minutes of transactions, run the search for -16@m through now. Then, use something like the following:

| set union [search earliest=-16m@m latest=-1m@m "criteria to get only events that start transactions"] [search "criteria that gets all relevant events EXCEPT the ones that start the transaction"] | transaction startswith="starting qualifier" endswith="ending qualifier" 

This will take the results of the two sub searches, which have different time ranges, and unions them into a single result set. THEN you transaction on that result set. This makes sure you only get transactions that start within your specific 15 minute timeframe, but they can extend past it. With this, an event that crosses over will be counted once, in the span it starts in, but not in the subsequent span.

Not sure if there is an easier way to do this, but this does work.

EDIT: It's worth noting that the set command doesn't return events, it returns fields. (which include _raw by default) but it would be cleaner to have the two sub-searches return the fields you care about in your final transaction and aggregation.

View solution in original post

0 Karma

Motivator

So the fundamental problem is that you want to limit the scope of the search for the start of the event to the timeframe of the summary search, while having a wider time range for subsequent entries so you can get complete transactions even if they cross over.

I think this is something set union can help with.

First of all, you need to know the maximum expected duration of a transaction, and make sure that you're running your summary search with enough room after the end of the timeframe for transactions to complete. So if a transaction can last 30 seconds at most, and you're capturing 15 minutes of transactions, run the search for -16@m through now. Then, use something like the following:

| set union [search earliest=-16m@m latest=-1m@m "criteria to get only events that start transactions"] [search "criteria that gets all relevant events EXCEPT the ones that start the transaction"] | transaction startswith="starting qualifier" endswith="ending qualifier" 

This will take the results of the two sub searches, which have different time ranges, and unions them into a single result set. THEN you transaction on that result set. This makes sure you only get transactions that start within your specific 15 minute timeframe, but they can extend past it. With this, an event that crosses over will be counted once, in the span it starts in, but not in the subsequent span.

Not sure if there is an easier way to do this, but this does work.

EDIT: It's worth noting that the set command doesn't return events, it returns fields. (which include _raw by default) but it would be cleaner to have the two sub-searches return the fields you care about in your final transaction and aggregation.

View solution in original post

0 Karma

Engager

it is a very smart way, thanks.

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!