Hi team,
I have a query about sub-queries. I've searched this forum for a while and tried a few different things but nothing seems to give me the desired result.
I am doing analysis on data within a csv file. Eventually, I need to calculate the average of a column in this csv but only on SOME records. I have a sub-query which tells me all the users (field name is "user_only") whose records I want to exclude from the average calculation. The sub-query is also on the same csv file.
So, what I need is something that does: where user_only is NOT IN (...a list of alphanumeric identifiers)....
Here is a screenshot showing my current code, shows where in the code my sub-query is and also shows separately that the sub-query does give some results:
Here is the current search query. Would appreciate any help you can provide!
| inputlookup uao0nqok.csv
| where read_seconds > 0
| eval campaign_delivery = substr(custom_type,len(custom_type)-7,8)
| eval user_only = substr(custom_type,0,17)
| where user_only NOT in
[| inputlookup uao0nqok.csv
| where read_seconds > 0
| eval user_only = substr(custom_type,0,17)
| eval first_char = substr(custom_type,0,1)
| where first_char = "X" or first_char = "Y"
| eval goodTiming = if(read_seconds != "20", 1, 0)
| stats max(goodTiming) as goodTiming sum(read_seconds) as read_seconds count as count by user_only
| search goodTiming=0
| fields user_only ]
| lookup "ERT_Campaign_mapping.csv" "Campaign Subgroup ID" as campaign_delivery OUTPUT "Campaign Subgroup Name", "Email Sent Date"
| stats sum(read_seconds) as read_seconds sum(count) as count_records by "Campaign Subgroup Name", campaign_delivery, "Email Sent Date"
| eval avg = read_seconds / count_records
In your original search | inputlookup uao0nqok.csv
do you have value expected values? . The search NOT is only excluding the users in exclude.csv to that of original file. So, all the columns in your original search/CSV should be available for you.
| where read_seconds > 0 | table read_seconds, count_records
Did you try replacing
| where user_only NOT in
with
| search NOT
?
Hi @knielsen. Thanks for responding. I now tried "search NOT" and got some interesting result. First of all, I got a result, which is progress (unfortunately I can't attach a screenshot here but will explain it).
I get result with these fields:
-Campaign Subgroup Name = looks OK
-campaign_delivery = looks OK
-Email sent date = looks OK
-read_seconds = looks OK
-count_records = this column is visible but with all blank values. No good
-avg = this column is expected but is not visible.
I believe as "count_records" is blank, my calculation of avg (which is read_seconds divided by count_records) is not coming through.
Any idea why that is the case? THis is my query now - i have moved sub-search to a different csv file. Note - the new csv file only has IDs (field name = user_only) but uao0nqok.csv has all other fields.
| inputlookup uao0nqok.csv
| where read_seconds > 0
| eval campaign_delivery = substr(custom_type,len(custom_type)-7,8)
| eval user_only = substr(custom_type,0,17)
| search NOT [inputlookup uao0nqok_exclude.csv ]
| lookup "ERT_Campaign_mapping.csv" "Campaign Subgroup ID" as campaign_delivery OUTPUT "Campaign Subgroup Name", "Email Sent Date"
| stats sum(read_seconds) as read_seconds sum(count) as count_records by "Campaign Subgroup Name", campaign_delivery, "Email Sent Date"
| eval avg = read_seconds / count_records
I have since simplified the average calculation and now I'm happy with the exclusion and results. Thanks!
One option would be to use 2 lookups
- | inputlookup uao0nqok.csv | put your search here to exclude your users | outputlookup uao0nqok_updated.csv . Then you can run your first search against this new lookup
would this be ok?
Thanks @lakshman239 , I can try it. I now have the exclusion ids in a new file (just ids not full records) but I still don't know the best way to say exclude the records for these user ids. Don't think my NOT in syntax is correct:
| inputlookup uao0nqok.csv
| where read_seconds > 0
| eval campaign_delivery = substr(custom_type,len(custom_type)-7,8)
| eval user_only = substr(custom_type,0,17)
| where user_only NOT in [inputlookup uao0nqok_exclude.csv ]
| lookup "ERT_Campaign_mapping.csv" "Campaign Subgroup ID" as campaign_delivery OUTPUT "Campaign Subgroup Name", "Email Sent Date"
| stats sum(read_seconds) as read_seconds sum(count) as count_records by "Campaign Subgroup Name", campaign_delivery, "Email Sent Date"
| eval avg = read_seconds / count_records
Pls change where to search
| search NOT [|inputlookup uao0nqok_exclude.csv |fields user_only ]
Ok, my initial average query was a bit complicated when I was trying with sub-queries. Now, that I have moved that calculation outside (via outputlookup), I have simplified the main query and now I'm happy with the results. Thank you very much!
Ok just tried that and I think we are getting close. I am now getting a result but now a couple of columns are missing.
To calculate average, I need 2 fields: read_seconds and count_records. I see value in read_seconds and that looks right. I see a column count_records but it has blank values. Because of this, my "eval avg = read_seconds / count_records" isn't appearing at all in result.
Can you see why my count_records is blank?