hello, everyone
I have a question about how to write a subquery in Splunk.
for example
I would like to get a list of productId that was returned, but later was not purchased again.
NOT IN Subquery part.
How can I accomplish this?
index=main sourcetype=access_combined_wcookie action=returned
NOT IN [search index=main sourcetype=access_combined_wcookie action=purchase
|table clientip]
| stats count, dc(productId) by clientip
Thank you in advance
There can be probably more than one approach to your problem (one was already presented) but the subquery will not work this way.
As subquery is executed and the results are returned, they are "pasted" into the original query as a condition using field names and values returned from the subquery.
So the IN operator will not with them. With it after subquery expansion you'd have (hypoteticaly - it's not a valid syntax) something like
index=main sourcetype=access_combined_wcookie action=returned
NOT IN (clientip=value1 OR clientip=value2 OR ...)
The last() approach that @bowesmana showed is a neat trick but relies on the time succession.
It might be ok, but you might as well need something a bit different. Depends on your actual data.
Technically it is possible to get the subsearch to return a search string that will work with NOT IN, the syntax would be
<search> NOT your_field IN [
search <search>
| stats count by your_field
| fields your_field
| rename your_field as search
| format "(" "" "" "" "" ")" ]
but there is no value in this for the OP's problem, as this will not handle the basic problem of how to determine which event came after.
Nice to know. I always thought you could only return results as key/value pairs which get interpreted as additional conditions. Can you return any arbitrary string?
Take a look at the format command, the 6 components give you quite a lot of flexibility in returning different ways
https://docs.splunk.com/Documentation/Splunk/8.2.2/SearchReference/Format
Good to know. That's what I needed.
Thanks!
In order to find an event that did not occur later than one that did you are going to have to use a different method. I don't believe it will be possible to use the subquery to filter the parent query as you are trying.
Instead I would suggest you need to use stats, e.g.
index=main sourcetype=access_combined_wcookie (action=returned OR action=purchase)
| stats latest(action) as lastAction by productId clientIp
| where lastAction="returned"
This will give you the last action for the product/clientIp and then you just want to see if the last action is returned.
Naturally within your search results, the clientIp may have had any number of 'transactions' for that productId, so you will need to consider how you want to address that.