Solved: Re: Help Needed with Real Time Query for SLA viola...

iTechEvent · ‎04-04-2014

The use case am working on:

I have one sourcetype, one index.
In the event log there are several apis with responsetime fields.
There is SLA values for responsetime for each API. This is not in the event log as field, but available as values to use in query.
For each of 20 API's in the event log, I need to compute the fraction of violations when response time is greater than SLA given a time range for historic search say past 15 min.
The query needs to determine the count of API's during this time range, number of SLA violations for API during this time range and calculate the fraction of violations.
The SLA value is not same for across API's. Each API has its own SLA value to compute violations
The time range is the same across all API's.

Here is a historic search which could do this easily using append for the past 15 min which I am trying to convert to realtime search.

index=i1 sourcetype=s1 uri_path="api1" | eval uri_path=replace(uri_path, "\w{8}-\w{4}-\w{4}-\w{4}-\w{12}", "{id}") | eval SLA=1000 | stats count as count, count(eval(responsetime>SLA)) as violations,first(SLA) as SLA by uri_path | append [ search index=i1 sourcetype=s1 uri_path="api2" | eval uri_path=replace(uri_path, "\w{8}-\w{4}-\w{4}-\w{4}-\w{12}", "{id}") | eval SLA=750 | stats count as count, count(eval(responsetime>SLA)) as violations,first(SLA) as SLA by uri_path ]

But I need to do this in realtime. Append doesnt work for this. I also believe there could be some issues with using multiple real time queries for real time searches like I used to do for historic searches which am not fully sure.

Here is one which I tried.

index=i1 sourcetype=s1 uri_path="api1" OR uri_path="api2" | eval uri_path=replace(uri_path, "\w{8}-\w{4}-\w{4}-\w{4}-\w{12}", "{id}") | eval SLA=750 | stats count as count, count(eval(responsetime>SLA)) as violations,first(SLA) as SLA by uri_path

The issue is SLA value is not the same for all API's. Its different for each API.

Perhaps there is limitations with using multiple queries in real time. Perhaps a single query should do this when converting the above use case from historic to real time search. Since append doesnt work, am not sure if map, join etc can work also because they involve 2 queries in conjunction.

I need help with this use case for real time searches in terms of writing splunk query for it.

In case there are limitations for this use case, an alternative way is to use scheduled historic search every minute. It should also be that the query should run fast and finish quickly within the minute and query performance acceleration could be a consideration.

Any suggestions for this as well?

Any help will be appreciated.

somesoni2 · ‎04-04-2014

The best approach (one which I am using as well) will be to create a lookup table file for the SLA value and then reference it in your query using lookup command.

Lookup table: api_sla.csv
Lookup Fields: api_name, sla

Updated (sample query, assuming api name=uri_path, if not use the field which contains api name in the lookup command):

index=i1 sourcetype=s1 uri_path="api1" OR uri_path="api2" | eval uri_path=replace(uri_path, "w{8}-w{4}-w{4}-w{4}-w{12}", "{id}") | lookup api_sla.csv api_name as uri_path OUTPUT sla as SLA | stats count as count, count(eval(responsetime>SLA)) as violations,first(SLA) as SLA by uri_path

View solution in original post

somesoni2 · ‎04-04-2014

The best approach (one which I am using as well) will be to create a lookup table file for the SLA value and then reference it in your query using lookup command.

Lookup table: api_sla.csv
Lookup Fields: api_name, sla

Updated (sample query, assuming api name=uri_path, if not use the field which contains api name in the lookup command):

index=i1 sourcetype=s1 uri_path="api1" OR uri_path="api2" | eval uri_path=replace(uri_path, "w{8}-w{4}-w{4}-w{4}-w{12}", "{id}") | lookup api_sla.csv api_name as uri_path OUTPUT sla as SLA | stats count as count, count(eval(responsetime>SLA)) as violations,first(SLA) as SLA by uri_path

iTechEvent · ‎04-06-2014

http://docs.splunk.com/Documentation/Splunk/6.0.2/SearchReference/Streamstats

It looks like streamstats is more appropriate from streaming perspective. How to explain this to unfamiliar audience. What about the choice of using stats over streamstats. It looks like streamstats is more appropriate for moving average of last 5 events in the entire collection of events in the real time window whereas stats works on the entire collection of events in the real time window.Are we loosing anything by not using streamstats when real time streaming is used?

somesoni2 · ‎04-04-2014

Great. please accept the answer if there are no followup questions.

iTechEvent · ‎04-04-2014

Thanks. Works as expected !

Help Needed with Real Time Query for SLA violations for each API in sourcetype or alternative approaches if not possible

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Build the Future of Agentic AI: Join the Splunk Agentic Ops Hackathon

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Splunk Community Badges!

Join the Conversation