I have a unique query that I think I have a general logical approach to solving, but the syntax and most efficient route is TBD>> Case to solve is this:
Users are assigned positions in an application, each position is unique. Positions are assigned security groups that are mapped to roles.
We are versioning this mapping into splunk for two reasons. 1 to be able to rewind and show who was in what groups so that we can do whatif scenarios 9 months back without trying to figure out what has changed etc. and 2. We want to analyze overlap in positions to roles to help simplify where necessary. The latter is the basis of my question.
I have a table created off a makemv/mvexpand that creates a cube of data that has
Position, GroupName
There are say 99 unique positions and 70 unique security groups.
Expanded I have just north of 1200 permutations of them
Position1, SecGroup1
Position1, SecGroup2
Position2, SecGroup2
Position2, Secgroup5
Position3, SecGroup1
Position4, SecGroup2
Etc
What I need to do is create stats on the relationship of overlap where positions are in similar groups>
I know for instance that in my current data set that ALL positions are in SecGroup1
and 68/99 are in SecGroup2
This is easily calculated for one group, but how do I extend this out at scale for all group?
I am thinking of creating a deduplicated list of security groups, and creating a full list of all combinations of
(SecGroup1 AND SecGroup2) OR (SecGroup1 AND SecGroup3) until that goes in reverse and deduplicating that list and using that list as a subsearch to my raw data and then running stats on it that I would think in theory would show where two PD's overlap because of the same two groups?
Is there a more succinct way of doing this? Can one create this list with | foreach to a foreach to create this list? How in splunk can one calculate a list of permutations and force an AND between them as a part of a subquery?
If you want any 2 groups with a common position:
| inputlookup position_group_lookup.csv
| rename group AS group_1
| lookup position_group_lookup.csv position OUTPUT group AS group_2
| mvexpand group_2
| table group_1 group_2 position
If you want any 2 groups and the list of common positions and % intersect:
| inputlookup position_group_lookup.csv
| rename group AS group_1
| eventstats dc(position) AS p1_position_ct BY group_1
| lookup position_group_lookup.csv position OUTPUT group AS group_2
| table group_1 position group_2 p1_position_ct
| mvexpand group_2
| stats values(position) AS common_positions dc(position) AS common_ct max(p1_position_ct) AS p1_position_ct BY group_1 group_2
| eval percent_common=ROUND((common_ct/p1_position_ct*100), 2)." %"
| sort 0 -percent_common
Thank you both for your feedback. Perhaps my request needs to be a bit more specific. The examples above are essentially another way of doing a selfjoin on groups; which returns back a disassociated set of which groups overlap with which positions; somewhere.
What I really want is to show which groups show up in tandem with multiple positions. Ie; which groups can I likely get rid of because they are redundant.
I believe what I need to do is create a subsearch that creates the permutations by starting with a deduplicated list of groups:
| makeresults | eval position=mvrange(1,99)
| mvexpand position | eval position="Position".position
| eval rng=(random() % 30) | eval group=mvrange(1,rng)
| mvexpand group | eval rng=(random() % 50)
| eval group="SecGroup".rng | dedup position group
| table position group
| dedup group | fields group | sort group
and then creating a foil operation somehow to return each combination of those deduplicated
ie:
first AND last
first AND secondtolast
first AND Lsub2 etc
I could take this and run it inside my initial query to return back sets where both are true and then run stats against it.
I just can't figure out a way to have a splunk command do this first, inside, outside operation to return back combinations of values from a search/lookup.
If I understand correctly, you want to find the groups which have the same positions in?
| stats values(position) as positions by group
| eval positions=mvjoin(positions,",")
| stats values(group) as groups by positions
More specifically, pairs unique pairs groups.
Which group have the same list of positions?
what two groups in combination show up in various positions.
If you want any 2 groups with a common position:
| inputlookup position_group_lookup.csv
| rename group AS group_1
| lookup position_group_lookup.csv position OUTPUT group AS group_2
| mvexpand group_2
| table group_1 group_2 position
If you want any 2 groups and the list of common positions and % intersect:
| inputlookup position_group_lookup.csv
| rename group AS group_1
| eventstats dc(position) AS p1_position_ct BY group_1
| lookup position_group_lookup.csv position OUTPUT group AS group_2
| table group_1 position group_2 p1_position_ct
| mvexpand group_2
| stats values(position) AS common_positions dc(position) AS common_ct max(p1_position_ct) AS p1_position_ct BY group_1 group_2
| eval percent_common=ROUND((common_ct/p1_position_ct*100), 2)." %"
| sort 0 -percent_common
This totally works... I have to go back and figure out exactly WHY! But it works. I envy the way some of your brains work with these things! Thank you!!
If you're trying to figure out the common groups and % intersect between any 2 positions, here's a possible way of doing it:
1. Let's generate test data and save it into a lookup:
| makeresults | eval position=mvrange(1,99)
| mvexpand position | eval position="Position".position
| eval rng=(random() % 30) | eval group=mvrange(1,rng)
| mvexpand group | eval rng=(random() % 50)
| eval group="SecGroup".rng | dedup position group
| table position group
| outputlookup position_group_lookup.csv
2. Query:
| inputlookup position_group_lookup.csv
| rename position AS position_1
| eventstats dc(group) AS p1_group_ct BY position_1
| lookup position_group_lookup.csv group OUTPUT position AS position_2
| table position_1 group position_2 p1_group_ct
| mvexpand position_2
| stats values(group) AS common_groups dc(group) AS common_ct max(p1_group_ct) AS p1_group_ct BY position_1 position_2
| eval percent_common=ROUND((common_ct/p1_group_ct*100), 2)." %"
Would it be possible to do this as a subsearch and a join as well instead of a lookup?
Does this help?
| stats values(group) as groups by position
| eval groups=mvjoin(groups,",")
| stats values(position) as positions by groups