Solved: Re: Calculation of availability

wcastillocruz · ‎12-04-2020

Hello dear community.
I'm a beginner on Splunk.
I would like to have your help today on a project that I am doing.
I have to calculate the availability of application services.
I have an entry from a database using Splunk DB connect.
in these data I receive all the events listed in a DB of a monitoring Tools.
I would like to calculate via the timestamp the duration of an incident between the moment when the status is failed and the return to normal.
it is difficult because an event can occur several times in the day so I have to find a foreach which will read a line with a severity 2 "critical" and its return to normal for this line with severity 0 "OK"
using the timestamp because its return to normal occurs after of course.
I don't know if I managed to explain the problematic. thank you for your precious help.

gcusello · ‎12-07-2020

Hi @wcastillocruz,

at first, do not use the table command before the transaction command, use it after transaction.

Then try to use quotes in startswith and endswith.

At least, to debug transaction, try the transaction without startswith and endswith and/or without one of the other transaction keys, maybe you have too many conditions and there isn't any results to all your conditions.

Ciao.

Giuseppe

View solution in original post

abhijeet01 · ‎12-04-2020

Hi wcastillocruz,

I think the filed called node_ref would be help you to find the duration between when the incident triggered and when it closed. This is possible by transaction command.

You need to merge those events which coming along with "node_ref" field but value should be same in all those events where incident occured and then you will get the exact duration.

index=xyz node_ref=*
| transaction node_ref
| stats values(filed_1) AS "field_1" values(duration) AS "Duration" by node_ref eventcount
| fields - eventcount

Please find below link of Splunk document will be help you.

https://docs.splunk.com/Documentation/Splunk/8.1.0/SearchReference/Transaction

gcusello · ‎12-04-2020

Hi @wcastillocruz,

is there in your data an ID to identify transaction?

If yes, you can run something like this:

| stats earliest(_time) AS earliest latest(_time) AS latest BY ID

in this way you have the start and the end of each transaction and you can define the uptime.

If instead the only way to define a transaction are a starting and an ending string, you can use the transaction command:

| transaction startswith="start_string" endswith="end_string"

and in this way you have the duration of the transaction.

Ciao.

Giuseppe

wcastillocruz · ‎12-04-2020

Hello @gcusello,
thank you for responding quickly.
this is another difficulty because I do not have a unique identifier to identify the "Critical" alert and the "OK" alert
I just have a unique reference for each row.
I think I should use the timestamp using multiple columns as a single index for after a "critical" alert take the next value in the temp which will match Env + App + varname
with a severity 0, and this is where I calculate the duration of the incident.
am I clear?

gcusello · ‎12-04-2020

Hi @wcastillocruz,

yes, you can correlate events using both the common fields (Env + App + varname) and the conditions of start and end transaction, something like this:

| transaction Env App varname startswith="start_string" endswith="end_string"

in this way you have the duration of the transaction.

Ciao.

Giuseppe

wcastillocruz · ‎12-07-2020

Hello @gcusello, sorry to reply late. I tried your suggestion. it seems to be correct but I cannot determine the value of "startswitch" and "endswitch". this is a value of the Env Apps varname columns? I added another column: description and use the value "FAILED" for "startswitch" and "OK" for "endswitch" but it doesn't return anything.

gcusello · ‎12-07-2020

Hi @wcastillocruz,

at first, do not use the table command before the transaction command, use it after transaction.

Then try to use quotes in startswith and endswith.

At least, to debug transaction, try the transaction without startswith and endswith and/or without one of the other transaction keys, maybe you have too many conditions and there isn't any results to all your conditions.

Ciao.

Giuseppe

wcastillocruz · ‎12-09-2020

Hi @gcusello, Thanks for your help. I managed to display the start of an event and its return to the nourmal on the same line. to close my question, I will need a last boost: is it possible to subtract the two timestamps contained in the same field. The goal is to know the exact duration of the incident?

gcusello · ‎12-09-2020

Hi @wcastillocruz,

the transaction command already generates a field called "duration" that's ready for you.

Ciao.

Giuseppe

Calculation of availability

stats

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?