Splunk Search

Issue with missing data when using transaction

ellstream44
Explorer

I use the following query

source="/opt/apps/spring-boot/abc/log/communication.log"
| rex "\"correlation\" : \"(?P<correlation>.*)\""
| transaction correlation

To be able to find request/response entries, to then show statistic of failure rate for request with specific integrations.

However, it seems that the transaction event only show hits for the first 52minutes in this case, when searching for the last 4 hours

ellstream44_0-1600352865605.png


So I thought that perhaps there was some logging issue, resulting in no entries, so I searched last 60 minutes (which should have returned 0 hits logically)

ellstream44_1-1600352956773.png

 

But as you can see, I got answers now for the first 57 minutes.... 
I thought that it must mean that the correlation at some point found a duplicate, so i searched for a correlation that occured 15 minutes ago, but it was the only one, so I do not understand why it did not show up in the original 4 hour search.

Any help is greatly appreciated! 🙂 

//Jonathan





Labels (1)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

@ellstream44 

I posted a reply - but it seems it's lost in some 'unexpected error' when I try to view it.

I posted this solution

| makeresults 
| eval n=mvrange(1,501)
| mvexpand n
| eval t=(n*120)-(random()%120)
| eval _time=_time-t
| eval id=floor((n-1)/2)+1
| eval correlation="correlation_id".id
| eval integrationUrl=if(n%2=0, null(), if(random()%100<10,"randomurl", "createcustomerorder"))
| filldown integrationUrl
| eval status=if(random()%100<10,400,200)
| fields - n t
| eval COMMENT="----- ABOVE THIS IS SETTING UP EXAMPLE DATA ----"
| eval COMMENT="----- YOU CAN DO THIS BELOW TO ACHIEVE THE RESULT ----"
| bin _time span=30m
| stats values(*) as * by correlation _time
| where !isnull(mvfind(integrationUrl, "createcustomerorder"))
| eval f=if(!isnull(mvfind(status,"400")),1,0)
| stats count as countTotal count(eval(f=1)) as countFailed by  _time
| eval failedPercentage=((countFailed/countTotal)*100)
| fields _time failedPercentage

which creates some simulated data like your example up to the COMMENT lines. Then after that you can use bin and stats by time to achieve what you're trying to do.

 

View solution in original post

bowesmana
SplunkTrust
SplunkTrust

@ellstream44 

First advice is that transaction should never be the first choice for a Splunk query. It, along with some other commands, has restrictive memory constraints and you can experience random result sets depending on when you hit those limits. It is far more efficient to use stats.

From the 4999 entries in both your screenshots, it may be that you are hitting 'maxopentxn' which is 5000 by default.

However... I would suggest that instead of using transaction, use stats. A very simple option to demonstrate the behaviour once you've extracted the correlation id

| stats count earliest(_time) as first latest(_time) as last values(*) as * by correlation
| eval duration=last-first

but now it's all about your data and what you want to do with it. From the above, you will have duration from first to last event and count to show how many events for each correlation.

Hope this helps.

 

 

 

ellstream44
Explorer

Thank you! indeed, the issue was the 4999 limit as you mentioned. 
I took your advice and tried a stats query instead. I was able to find how many failed requests and how many completed request there were, but the problem with stats is that I cannot seem to find it to be visualized with timechart in a way that works well. 

What my goal is basically this, to be able to see failed percentage rate in a timechart format. 
The only thing holding back now is the 4999 limit. I have tried to put in a where statement before the transaction query, but I cannot think of any way to recude the 4999 entries besides recuding the search time span. 

@bowesmana Do you have any advice on how to use stats in a similar way, or how to optimize this transaction query? 

ellstream44_0-1601158311532.png



Query: 
source="/opt/apps/spring-boot/abcde/log/communication.log"
| rex "\"status\" : (?<status>\d+)"
| rex "\"uri\" : \"https:\/\/abc\.company\.com:8243\/(?<integrationUrl>.*)"
| rex "\"correlation\" : \"(?P<correlation>.*)\""
| transaction correlation
| where like(integrationUrl, "createcustomerorder%")
| timechart span=180m
count(eval(status>=0)) AS countTotal,
count(eval(status>=300)) AS countFailed,
| eval failedPercentage=((countFailed/countTotal)*100)
| fields _time failedPercentage

bowesmana
SplunkTrust
SplunkTrust

@ellstream44 

I posted a reply - but it seems it's lost in some 'unexpected error' when I try to view it.

I posted this solution

| makeresults 
| eval n=mvrange(1,501)
| mvexpand n
| eval t=(n*120)-(random()%120)
| eval _time=_time-t
| eval id=floor((n-1)/2)+1
| eval correlation="correlation_id".id
| eval integrationUrl=if(n%2=0, null(), if(random()%100<10,"randomurl", "createcustomerorder"))
| filldown integrationUrl
| eval status=if(random()%100<10,400,200)
| fields - n t
| eval COMMENT="----- ABOVE THIS IS SETTING UP EXAMPLE DATA ----"
| eval COMMENT="----- YOU CAN DO THIS BELOW TO ACHIEVE THE RESULT ----"
| bin _time span=30m
| stats values(*) as * by correlation _time
| where !isnull(mvfind(integrationUrl, "createcustomerorder"))
| eval f=if(!isnull(mvfind(status,"400")),1,0)
| stats count as countTotal count(eval(f=1)) as countFailed by  _time
| eval failedPercentage=((countFailed/countTotal)*100)
| fields _time failedPercentage

which creates some simulated data like your example up to the COMMENT lines. Then after that you can use bin and stats by time to achieve what you're trying to do.

 

ellstream44
Explorer

Perfect, exactly what I needed! 🙂 
Thanks for your help!

Tags (1)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

See this run anywhere example - past this into the search window and you can then visualise the result as a timechart.

| makeresults 
| eval n=mvrange(1,501)
| mvexpand n
| eval t=(n*120)-(random()%120)
| eval _time=_time-t
| eval id=floor((n-1)/2)+1
| eval correlation="correlation_id".id
| eval integrationUrl=if(n%2=0, null(), if(random()%100<10,"randomurl", "createcustomerorder"))
| filldown integrationUrl
| eval status=if(random()%100<10,400,200)
| fields - n t
| eval COMMENT="----- ABOVE THIS IS SETTING UP EXAMPLE DATA ----"
| eval COMMENT="----- YOU CAN DO THIS BELOW TO ACHIEVE THE RESULT ----"
| bin _time span=30m
| stats values(*) as * by correlation _time
| where !isnull(mvfind(integrationUrl, "createcustomerorder"))
| eval f=if(!isnull(mvfind(status,"400")),1,0)
| stats count as countTotal count(eval(f=1)) as countFailed by  _time
| eval failedPercentage=((countFailed/countTotal)*100)
| fields _time failedPercentage

It's doing what you're trying to do, but with stats.

HOWEVER, you should be aware that the transaction command, and also stats above, will create multivalue fields for things like status and integrationUrl, so you cannot just do if(x="y") as that will not work if the field is MV, hence using mvfind above.

Question: What are you trying to do with collecting all same correlation events? In your query, what if there is a failure in a request _followed by_ a successful one?

Anyway, hopefully this can help you ...

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...