Splunk Search

Filtering data based on results of another sub-query

skribble5
Explorer

Hi team,

I have a query about sub-queries. I've searched this forum for a while and tried a few different things but nothing seems to give me the desired result.

I am doing analysis on data within a csv file. Eventually, I need to calculate the average of a column in this csv but only on SOME records. I have a sub-query which tells me all the users (field name is "user_only") whose records I want to exclude from the average calculation. The sub-query is also on the same csv file.

So, what I need is something that does: where user_only is NOT IN (...a list of alphanumeric identifiers)....

Here is a screenshot showing my current code, shows where in the code my sub-query is and also shows separately that the sub-query does give some results:

alt text

Here is the current search query. Would appreciate any help you can provide!

| inputlookup uao0nqok.csv 
| where read_seconds > 0
| eval campaign_delivery = substr(custom_type,len(custom_type)-7,8)
| eval user_only = substr(custom_type,0,17)  
| where user_only NOT in 
       [| inputlookup uao0nqok.csv 
        | where read_seconds > 0
        | eval user_only = substr(custom_type,0,17)  
        | eval first_char = substr(custom_type,0,1)
        | where first_char = "X" or first_char = "Y"  
        | eval goodTiming = if(read_seconds != "20", 1, 0)
        | stats max(goodTiming) as goodTiming sum(read_seconds) as read_seconds count as count by user_only
        | search goodTiming=0
        | fields user_only ]

| lookup "ERT_Campaign_mapping.csv" "Campaign Subgroup ID" as campaign_delivery OUTPUT "Campaign Subgroup Name", "Email Sent Date"
| stats sum(read_seconds) as read_seconds sum(count) as count_records by "Campaign Subgroup Name", campaign_delivery, "Email Sent Date"
| eval avg = read_seconds / count_records
0 Karma
1 Solution

knielsen
Contributor

Did you try replacing

| where user_only NOT in 

with

| search NOT

?

View solution in original post

0 Karma

lakshman239
Influencer

In your original search | inputlookup uao0nqok.csv
| where read_seconds > 0 | table read_seconds, count_records
do you have value expected values? . The search NOT is only excluding the users in exclude.csv to that of original file. So, all the columns in your original search/CSV should be available for you.

0 Karma

knielsen
Contributor

Did you try replacing

| where user_only NOT in 

with

| search NOT

?

0 Karma

skribble5
Explorer

Hi @knielsen. Thanks for responding. I now tried "search NOT" and got some interesting result. First of all, I got a result, which is progress (unfortunately I can't attach a screenshot here but will explain it).

I get result with these fields:
-Campaign Subgroup Name = looks OK
-campaign_delivery = looks OK
-Email sent date = looks OK
-read_seconds = looks OK
-count_records = this column is visible but with all blank values. No good
-avg = this column is expected but is not visible.

I believe as "count_records" is blank, my calculation of avg (which is read_seconds divided by count_records) is not coming through.

Any idea why that is the case? THis is my query now - i have moved sub-search to a different csv file. Note - the new csv file only has IDs (field name = user_only) but uao0nqok.csv has all other fields.

| inputlookup uao0nqok.csv 
| where read_seconds > 0
| eval campaign_delivery = substr(custom_type,len(custom_type)-7,8)
| eval user_only = substr(custom_type,0,17)  
| search NOT [inputlookup uao0nqok_exclude.csv ]

| lookup "ERT_Campaign_mapping.csv" "Campaign Subgroup ID" as campaign_delivery OUTPUT "Campaign Subgroup Name", "Email Sent Date"
| stats sum(read_seconds) as read_seconds sum(count) as count_records by "Campaign Subgroup Name", campaign_delivery, "Email Sent Date"
| eval avg = read_seconds / count_records
0 Karma

skribble5
Explorer

I have since simplified the average calculation and now I'm happy with the exclusion and results. Thanks!

0 Karma

lakshman239
Influencer

One option would be to use 2 lookups
- | inputlookup uao0nqok.csv | put your search here to exclude your users | outputlookup uao0nqok_updated.csv . Then you can run your first search against this new lookup

would this be ok?

skribble5
Explorer

Thanks @lakshman239 , I can try it. I now have the exclusion ids in a new file (just ids not full records) but I still don't know the best way to say exclude the records for these user ids. Don't think my NOT in syntax is correct:

| inputlookup uao0nqok.csv 
| where read_seconds > 0
| eval campaign_delivery = substr(custom_type,len(custom_type)-7,8)
| eval user_only = substr(custom_type,0,17)  
| where user_only NOT in [inputlookup uao0nqok_exclude.csv ]

| lookup "ERT_Campaign_mapping.csv" "Campaign Subgroup ID" as campaign_delivery OUTPUT "Campaign Subgroup Name", "Email Sent Date"
| stats sum(read_seconds) as read_seconds sum(count) as count_records by "Campaign Subgroup Name", campaign_delivery, "Email Sent Date"
| eval avg = read_seconds / count_records
0 Karma

lakshman239
Influencer

Pls change where to search
| search NOT [|inputlookup uao0nqok_exclude.csv |fields user_only ]

skribble5
Explorer

Ok, my initial average query was a bit complicated when I was trying with sub-queries. Now, that I have moved that calculation outside (via outputlookup), I have simplified the main query and now I'm happy with the results. Thank you very much!

0 Karma

skribble5
Explorer

Ok just tried that and I think we are getting close. I am now getting a result but now a couple of columns are missing.

To calculate average, I need 2 fields: read_seconds and count_records. I see value in read_seconds and that looks right. I see a column count_records but it has blank values. Because of this, my "eval avg = read_seconds / count_records" isn't appearing at all in result.

Can you see why my count_records is blank?

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...