Splunk Search

Count unique values per field

topdeck
Explorer

Hello, imagine you have two fields: IP, ACCOUNT

An IP can access any number of ACCOUNT, an ACCOUNT can be accessed by any number of IP.

For each IP, the number of ACCOUNT it accesses.
For each ACCOUNT the number of IP accessed by it.

Potentially easy.

Show number of ACCOUNTS accessed by IP where those ACCOUNT are accessed by more than one IP and the ACCOUNT that IP accesses are accessed by a different IP not accessed by the other ACCOUNTs

Confused? I'd like to find IPs acccessing a lot of accounts where those accounts are also being accesed by more than one IP and the other IPs accessing those accounts are not all the same.

Tags (1)

sideview
SplunkTrust
SplunkTrust

To start simple -

For each IP, the number of ACCOUNT it accesses.

<search terms> | stats dc(ACCOUNT) by IP

likewise,

<search terms> | stats dc(IP) by ACCOUNT

Those are much simpler than what you're asking for obviously.

Here's the best approach I can think of. Breaking down the following search in english, we take the unique combinations of ACCOUNT and IP (using stats). We then pipe these rows through eventStats so that each row will get a 'distinctIPs' field. The distinctIPs value is the number of IP values that that row's ACCOUNT field was accessed by. Then we treat this as a rough weighting, and we just add up the values for each IP. It's kind of a ridiculous field name, but for clarity I've called it "totalDistinctIPsAccessedByAccountsTheyAccessed"

<searchterms> | stats count by ACCOUNT IP | eventstats dc(IP) as distinctIPs by ACCOUNT | stats count sum(distinctIPs) as totalDistinctIPsAccessedByAccountsTheyAccessed by IP | sort - totalDistinctIPsAccessedByAccountsTheyAccessed

In the end you get a list of the top IP addresses that had accessed LOTS of accounts, weighted heavily towards those where the accessed accounts were themselves accessed by a LOT of IP's.

phew. Hopefully I'm close. 😃

sideview
SplunkTrust
SplunkTrust

(Note - it's best to click 'comment on this answer', under my answer, rather than posting a new answer as a comment.. things get very confusing when the order of the answers changes later)

0 Karma

topdeck
Explorer

Thanks Nick, I'll take a stab using your suggestions. I really wish I could do this in something like perl or python but the data set is too large.

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Brett Adams

In our third Spotlight feature, we're excited to shine a light on Brett—a Splunk consultant, innovative ...

Index This | What can you do to make 55,555 equal 500?

April 2025 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Say goodbye to manually analyzing phishing and malware threats with Splunk Attack ...

In today’s evolving threat landscape, we understand you’re constantly bombarded with phishing and malware ...