Splunk Search

Retrieving unique values of an indexed field

Is there a quick way to retrieve the list of all unique values of an indexed field?

I know I could search for the field and pipe to uniq, but hoping there might be something faster.

Tags (3)

Splunk Employee
Splunk Employee
|tstats values(<indexed__field_name>) where index=<index_name>

will totally avoid going over any events. It gets its answer from looking at metadata in .tsidx files, so no perf hit for scanning events. Orders of magnitude faster than piping a search to stats.

0 Karma

Splunk Employee
Splunk Employee

Actually, we were hoping that, because it is an indexed field, there is some kind of metadata or list that is persisted that we could access quickly, without running a search over all our events. I guess the simplest case would be source, sourcetype, or host - is there any quick way to find the list of all indexed hosts without going through stats or some other search? It seems like there must be, because the summary view displays those. We'd like to pull that type of summary information for any indexed field to get a list of all possible field values.

0 Karma

Splunk Employee
Splunk Employee

For host, source, and sourcetype specifically, you can use the |metadata search command.

0 Karma

Splunk Employee
Splunk Employee

For some reason, I don't see an "add comment" field on Nick's answer. Is there some other way to do that?

0 Karma

Splunk Employee
Splunk Employee

can you add this as a comment to Nick's answer, and not as a new answer?

0 Karma

SplunkTrust
SplunkTrust

Absolutely. There's several ways to do this. Lets assume your field is called 'foo'.

The most straightforward way is to use the stats command

<your search> | stats count by foo

Using stats opens up the door to collect other statistics by those unique values. For example:

<your search> | stats count avg(duration) dc(username) by foo

which will take the average of a field called duration and the distinct count of values of username, with each statistic being computed just for a given value of foo

http://www.splunk.com/base/Documentation/latest/SearchReference/Stats

Another way worth mentioning is to just use top

<your search> | top foo limit=10000

Splunk Employee
Splunk Employee

For host, source, andsourcetypespecifically, you can use the| metadata` search command, which can certainly be much faster. If you need this a lot, run a scheduled search that runs over recent data and updates a lookup table (...| append [ inputlookup mytable ] | dedup myfield1, myfield2 | outputlookup mytable), i.e., basically you generate and maintain the metadata yourself periodically.

0 Karma