Hello,
I have a search that is joining two searches (one for cart details and one for items that have been brought to the carts). I am trying to count the number of items that have been moved to each cart, but my query is not keeping result that have an item count of 0.
How can I keep the results with an item count of 0? My search is below.
index=INDEX host=HOSTNAME sourcetype=SOURCETYPE
| rex field=_raw "Cart:\s(?<cart_id>\d+)\sbegin"
| rex field=_raw "Cart:\s(?<cart_id>\d+)\sselect\snew\ssection:\s(?<section>\d+)"
| rex field=_raw "Cart:\s(?<cart_id>\d+)\sset\scurrent\slocation\ssuccess:\s(?<section_name>\w+-\d+)"
| rex field=_raw "Cart:\s(?<cart_id>\d+)\snext\slocation\sis:(?<section_name>\w+-\d+)"
| rex field=_raw "Cart:\s(?<cart_id>\d+)\sset\scurrent\slocation\ssuccess:\s\w+-(?<station_number>\d+)"
| rex field=_raw "Cart:\s(?<cart_id>\d+)\snext\slocation\sis:\w+-(?<station_number>\d+)"
| eval select_time=if(like(_raw, "%set current location%"), _time, null)
| eval selected_at=strftime(select_time, "%Y-%m-%d %H:%M:%S")
| where cart_id>200 and ((like(_raw, "%next location%") and like(section_name, "AA%")) or (like(_raw, "%set current location%") and like(section_name, "BB%")) or like(_raw, "%begin%") or (like(_raw, "%select new section%")))
| stats latest(section) as section_id latest(section_name) as section_name latest(station_number) as station_number latest(selected_at) as selected_at by cart_id
| where isnotnull(section_name) and isnotnull(cart_id)
| join station_number [search index=INDEX host=HOSTNAME sourcetype=SOURCETYPE
| rex field=_raw "Move\sItem-(?<item>\w+\-\d+\-\d+\-\d+)\sfrom\sBin-\d+-\d+\sto\sLocation:(?<station_number>\d+)"
| where isnotnull(station_number)
| eventstats count(item) as item_count latest(_time) as last_item_time by station_number
| eval last_item_time=strftime(last_item_time, "%Y-%m-%d %H:%M:%S")
| table station_number, item_count, last_item_time]
| where like(section_name, "%AA%") or like(section_name, "%BB%")
| table cart_id, section_id, section_name, station_number, selected_at, last_item_time, item_count
If you want to keep the cart details (outer search) where the inner search (items) does not have any items for the station number, then you should use an outer/left join, i.e.
| join type=left station_number [...]
and then you need to do what you want to do with last_item_time and item_count to either leave those as null or put in some default value, e.g.
| fillnull value=0 item_count
by the way, your inner search is doing this
search index=INDEX host=HOSTNAME sourcetype=SOURCETYPE
| rex field=_raw "Move\sItem-(?<item>\w+\-\d+\-\d+\-\d+)\sfrom\sBin-\d+-\d+\sto\sLocation:(?<station_number>\d+)"
| where isnotnull(station_number)
| eventstats count(item) as item_count latest(_time) as last_item_time by station_number
| eval last_item_time=strftime(last_item_time, "%Y-%m-%d %H:%M:%S")
| table station_number, item_count, last_item_time
I don't believe you should be using eventstats - you are only interested in 3 fields, station_number, item_count and last_item_time, so use stats instead. It will make your inner search faster.
Note: Use of join is very dependent on data sizes - if you are joining against an inner data set that is large (typically >50,000) you will get unreliable outputs.
Note2: It is almost always possible to use a single search with a stats command to aggregate results like these.
If you want to keep the cart details (outer search) where the inner search (items) does not have any items for the station number, then you should use an outer/left join, i.e.
| join type=left station_number [...]
and then you need to do what you want to do with last_item_time and item_count to either leave those as null or put in some default value, e.g.
| fillnull value=0 item_count
by the way, your inner search is doing this
search index=INDEX host=HOSTNAME sourcetype=SOURCETYPE
| rex field=_raw "Move\sItem-(?<item>\w+\-\d+\-\d+\-\d+)\sfrom\sBin-\d+-\d+\sto\sLocation:(?<station_number>\d+)"
| where isnotnull(station_number)
| eventstats count(item) as item_count latest(_time) as last_item_time by station_number
| eval last_item_time=strftime(last_item_time, "%Y-%m-%d %H:%M:%S")
| table station_number, item_count, last_item_time
I don't believe you should be using eventstats - you are only interested in 3 fields, station_number, item_count and last_item_time, so use stats instead. It will make your inner search faster.
Note: Use of join is very dependent on data sizes - if you are joining against an inner data set that is large (typically >50,000) you will get unreliable outputs.
Note2: It is almost always possible to use a single search with a stats command to aggregate results like these.
Thank you! This worked perfectly. I'm also making the changes you recommended. I'm not too familiar with what can optimize these searches, so this was helpful!