Assuming you want a list of all values of a field in an index, both these searches would give you that:
index=a | stats count by field | fields - count
index=a | dedup field | table field
Fundamentally, both searches have to do the same work:
load all events matching the search
extract, alias, calculate, lookup, whatever to produce the field
produce a deduplicated list on each indexer (prestats / prededup in remoteSearch in the job inspector) to return to the search head
merge those lists into one on the search head
Assuming both commands are built well, there will not be a huge difference in performance. You can verify this by looking at the big numbers to the right of dispatch.stream.remote.indexernamehere in the job inspector, both should show similar and small amounts of data returned to the search head. When looking at run time, make sure you do several executions to get a good average and iron out other activities on the system.
There can be subtle differences.
- dedup should not allow batch mode searches, but instead requires event ordering and may therefore not allow parallel search pipelines, didn't verify this
- less smart use of dedup may cause more data to be carried around, e.g. the _raw event
- large stats results will cause an on-disk mergesort, slowing the search head phase of the search down significantly
... View more