I think I must be misunderstanding how dedup works. It seems to me if you add fields to the dedup field list, you should never get fewer events returned. | dedup fieldA Should get rid of all extra ...
See more...
I think I must be misunderstanding how dedup works. It seems to me if you add fields to the dedup field list, you should never get fewer events returned. | dedup fieldA Should get rid of all extra events with the same value of fieldA | dedup fieldA fieldB Should only get right of those where BOTH fieldA and fieldB have duplicate values, which set theory suggests to me must be the at least the same size as the those where we only get rid of duplicates for fieldA alone. But I'm getting far more results for: | dedup _time Than I do for | dedup _time wma_set wma_filename Any idea what's going on? For reference, here's the query: index="main" host="designsafe01.tacc.utexas.edu" "designsafe.storage.community" "SimCenter/Datasets" (op=download OR op=preview OR op=copy OR op=agave_file_download OR op=agave_file_preview OR op=data_depot_copy) | rex mode=sed "s/%20/ /g" | rex mode=sed field=info "s/\'/\"/g" | rex mode=sed field=info "s/\: u\"/: \"/g" | eval thepath=case(in(op,"download","preview","agave_file_download","agave_file_preview"),json_extract(info,"filePath"),op="copy", json_extract(info,"path"), op="data_depot_copy", json_extract(info,"fromFilePath")) | rex field=thepath "\/?SimCenter\/Datasets\/(?<wma_set>\w+)(?<wma_path>\/(.*\/)*)(?<wma_filename>[-\w\s\.]+)" | rex field=wma_filename ".+\.(?<wma_extension>\w*)" | dedup _time wma_set wma_filename