Splunk Search

How to get results of two separate queries to calculate the percentage of the first result values present in the second?

freefall
Observer

Hi,

I have 2 queries , let's call them query_a & query_b.

query_a - gives me a table containing all the userAgent's that call a specific endpoint of my service, basically this gives me a list of all the clients who are on outdated versions of my app, as those are the only ones calling this deprecated endpoint. So this query gives me a list of all the outdated versions of my app that are still being used.

query_b - gives me a table containing all the userAgent's for every endpoint of my service. So this query gives me a list of every versions of my app that is being used.

here userAgent is a string like this  "app_name/app_version (device_name; device_OS_version)"

I am only concerned with the app_version out of this

I need to calculate, what percentage of userAgent's given by query_a (clients on outdated app_version) that are part of the results given by query_b (all clients) . I need to do this to figure out how many clients we have using outdated app version.

How do I achieve this.

 

query_a is like this:

 

index::apps source="/data/log/company/service/SERVICE-PUBLIC-API-access.log" "GET /service/diners/*/orders"
| rex "diners/(?<dinerId>.*)/orders HTTP/1.1\" (?<responseStatus>\d\d\d) .. ... \"(?<userAgent>.*?)\""
| dedup userAgent
| table userAgent

 

 

query_b is like this:

 

index::apps source="/data/log/company/service/SERVICE-PUBLIC-API-access.log"
| rex "HTTP/1.1\" (?<responseStatus>\d\d\d) .. ... \"(?<userAgent>.*?)\""
| dedup userAgent
| table userAgent

 

 

Labels (3)
0 Karma

woodcock
Esteemed Legend

index="apps" AND source="/data/log/company/service/SERVICE-PUBLIC-API-access.log"
| rex "diners/(?<dinerId>.*)/orders HTTP/1.1\" (?<responseStatus>\d\d\d) .. ... \"(?<userAgent>.*?)\""
| rex "HTTP/1.1\" (?<responseStatus>\d\d\d) .. ... \"(?<userAgent>.*?)\""
| stats count(dinerID) AS events_with_diners count AS total_events BY userAgent
| eval pct = 100 * (events_with_diners / events_with_diners)

0 Karma

freefall
Observer

I think I didn't explain my doubt clearly, what I want is to figure out, is out of all our clients for this API, what percentage are stuck on old versions of my app. I've added some more explanation to the original post. This query has one mistake where "events_with_diners" is used for both numerator and denominator, also when I ran it after some fixing like this:

 

index="apps" AND source="/data/log/company/service/SERVICE-PUBLIC-API-access.log"
| rex "diners/(?<dinerId>.*)/orders HTTP/1.1\" (?<responseStatus>\d\d\d) .. ... \"(?<userAgent>.*?)\""
| rex "HTTP/1.1\" (?<responseStatus>\d\d\d) .. ... \"(?<userAgent>.*?)\""
| stats count(dinerID) AS events_with_diners count AS total_events BY userAgent
| eval pct=round(100*(events_with_diners/total_events), 2)

 

 All the pct column values were 0.0

0 Karma

woodcock
Esteemed Legend

You have moved the goalposts completely, which is unreasonable.  The questions that you asked was answered and you should have accepted it and created a new question (think about those who google after you and get misdirected or miss your Q&A).  In any case, this should do what you are now asking:

|makeresults | eval raw=mvappend(
"/service/ditchers/foobarbat/1234/orders HTTP/1.1\" 200 XX YYY \"app_name1/app_version (device_name; device_OS_version)\"",
"/service/diners/diners/1234/orders HTTP/1.1\" 200 XX YYY \"app_name1/app_version (device_name; device_OS_version)\"",
"/service/diners/diners/9876/orders HTTP/1.1\" 200 XX YYY \"app_name2/app_version (device_name; device_OS_version)\"")
| mvexpand raw
| rename raw AS _raw
| rex "(?:\/service\/diners\/(?<dinerId>.*)\/orders )?HTTP\/1.1\" (?<responseStatus>\d\d\d) .. ... \"(?<userAgent>(?<appname>[^\/]+).*?)\""
| stats count(dinerId) AS events_with_diners count AS total_events values(appname) As appnames BY userAgent
| eval pct=round(100*(events_with_diners/total_events), 2)

0 Karma

yeahnah
Motivator

Hi @freefall 

You should be able to get that result with just one query with something like this...

index::apps source="/data/log/company/sedrvice/SERVICE-PUBLIC-API-access.log"
| rex "HTTP/1.1\" (?<responseStatus>\d\d\d) .. ... \"(?<userAgent>.*?)\""
| eval isdiners=if(searchmatch("GET /service/diners/*/orders"), "yes", "no")
| stats count AS total
        count(isdiners="yes") AS diners_count
  BY userAgent
| eval perc=round((diners_count/total)*100, 2)

The base search returns all results, including diners.  The eval statement checks if the diners string is matched.  The stats command counts the results by userAgent and then the eval works out the percentage.

Hope it helps 

0 Karma

freefall
Observer

I think I didn't explain my doubt clearly, what I want is to figure out, is out of all our clients for this API, what percentage are stuck on old versions of my app. I've added some more explanation to the original post.

this query gives me a table where all the perc column values are 0.00

0 Karma
Get Updates on the Splunk Community!

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...