Splunk Search

How to find common packages installed on many hosts?

Simeon
Splunk Employee
Splunk Employee

I am indexing rpm -qa outputs and want to find all of the packages that are common throughout my infrastructure. The dataset from each host looks as follows:

8:51:06.000 AM

Wed Feb 15 08:51:06 PST 2012
dhcpcd-1.3.22pl4-223.2
ntfsprogs-1.11.2-15.2
yast2-schema-2.13.2-13.2
libnscd-1.1-16.2
file-4.16-15.5

Tags (2)

marklaw2
Explorer

A different type of output show hosts and installed software:

sourcetype=package index=os
| multikv noheader=t
| rex field=_raw "^(?P[^ ]+)\s+(?P[^ ]+)\s+(?P[^ ]+)\s+(?P\w+)"
| search NOT NAME NOT VERSION NOT RELEASE NOT ARCH
| dedup host package version release arch
|table host package version release arch

0 Karma

Simeon
Splunk Employee
Splunk Employee

The solution to this is similar to finding out the most common values for any field across all hosts.

There are many pieces to getting this to work. First, you must expand the data using multikv:

sourcetype=rpm | multikv noheader=t

Add in the field extraction:

sourcetype=rpm | multikv noheader=t | rex "(?<package>\S+)"

Next, you need to find all of the hosts that exist for each package:

sourcetype=rpm | multikv noheader=t | rex "(?<package>\S+)" | stats dc(host) as dc by package

Find the maximum number of hosts that exist per package:

sourcetype=rpm | multikv noheader=t | rex "(?<package>\S+)" | stats dc(host) as dc by package | eventstats max(dc) as max

Search for the packages that meet the maximum number:

sourcetype=rpm | multikv noheader=t | rex "(?<package>\S+)" | stats dc(host) as dc by package | eventstats max(dc) as max | where dc = max

Finally, clean up your output to be a simple list:

sourcetype=rpm | multikv noheader=t | rex "(?<package>\S+)" | stats dc(host) as dc by package | eventstats max(dc) as max | where dc = max | table package

Here is another way to do this:

sourcetype=rpm | multikv noheader=t | rex "(?<package>\S+)" | stats count by package host | eventstats dc(host) as dc by package | eventstats max(dc) as max | where dc == max | xyseries package host count

Credit to Dr. Zhang for showing us how to do this.

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...

Design, Compete, Win: Submit Your Best Splunk Dashboards for a .conf26 Pass

Hello Splunkers,  We’re excited to kick off a Splunk Dashboard contest! We know that dashboards are a primary ...

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...