Solved: Grouping data by multiple attribute values

alphadog00 · ‎04-01-2021

I have basic web logs with username and jsessionid. I want to group (assume a single index, with one set of data). So thousands of events.

I want to group by jsessionid and username - creating supergroups. Example:

username:jsessionid

tom:1234

frank:1234

bob:1234

bob:5467

sally:5467

sally:9012

amy:9012

harry:4709

tony:4709

I would wind up with 2 groups - a small group with just harry and tony, and a larger group with tom, frank, bob, sally, and amy due to shared jsessionid.

I would like my output to contain some kind of group ID or Group Name. I would have no knowledge of username or jsessionid - I just want to be able to loop through the data and assign users/jsessionids to groups where they exist.

My first thought is to sort by jsessionid, but I can't figure out how to loop through the data and create dynamic group names.

Thanks for any ideas, not an SPL expert.

ITWhisperer · ‎04-01-2021

This sort of works - the issue is knowing how many iterations to do to resolve all the indirections

| makeresults | eval _raw="tom:1234

frank:1234

bob:1234

bob:5467

sally:5467

sally:9012

amy:9012

harry:4709

tony:4709"
| multikv noheader=t 
| rex "(?<username>[^\:]+)\:(?<jsessionid>.+)"
| fields - _raw _time
| fields username jsessionid
| eventstats values(jsessionid) as jsessionids by username
| eventstats values(username) as usernames by jsessionid
| eventstats values(jsessionids) as jsessionidss by usernames
| eventstats values(usernames) as usernamess by jsessionids
| eventstats values(jsessionidss) as jsessionidsss by usernamess
| eventstats values(usernamess) as usernamesss by jsessionidss
| eventstats values(jsessionidsss) as jsessionidssss by usernamesss
| eventstats values(usernamesss) as usernamessss by jsessionidsss
| eventstats values(jsessionidssss) as jsessionidsssss by usernamessss
| eventstats values(usernamessss) as usernamesssss by jsessionidssss
| eval supergroup=mvjoin(usernamesssss,",")
| dedup supergroup
| table supergroup

View solution in original post

ITWhisperer · ‎04-01-2021

This sort of works - the issue is knowing how many iterations to do to resolve all the indirections

| makeresults | eval _raw="tom:1234

frank:1234

bob:1234

bob:5467

sally:5467

sally:9012

amy:9012

harry:4709

tony:4709"
| multikv noheader=t 
| rex "(?<username>[^\:]+)\:(?<jsessionid>.+)"
| fields - _raw _time
| fields username jsessionid
| eventstats values(jsessionid) as jsessionids by username
| eventstats values(username) as usernames by jsessionid
| eventstats values(jsessionids) as jsessionidss by usernames
| eventstats values(usernames) as usernamess by jsessionids
| eventstats values(jsessionidss) as jsessionidsss by usernamess
| eventstats values(usernamess) as usernamesss by jsessionidss
| eventstats values(jsessionidsss) as jsessionidssss by usernamesss
| eventstats values(usernamesss) as usernamessss by jsessionidsss
| eventstats values(jsessionidssss) as jsessionidsssss by usernamessss
| eventstats values(usernamessss) as usernamesssss by jsessionidssss
| eval supergroup=mvjoin(usernamesssss,",")
| dedup supergroup
| table supergroup

alphadog00 · ‎04-01-2021

thanks, the data is actually a little cleaner - i did username:sessionid for this post, really they are already separate fields of data in the index. But i will try what you posted. I had looked at eventstats, but not gone to this level

alphadog00 · ‎04-01-2021

To clarify, I could have thousands of sessionIDs and since Tom shares different jsessionids with multiple different people, and they share jsessionids with others, this would be one big group.

alphadog00 · ‎04-01-2021

thanks, but if I have no idea of username or group names - i need to create groups there could "n" groups, and the case statement would have to be built with unknown usernames - that is part of my sticking point.

I also have a "X" usernames and "Y" jsessionids

isoutamo · ‎04-01-2021

How you could group those without knowing which names belongs to which group?

alphadog00 · ‎04-01-2021

If you look at this data:

tom:1234

frank:1234

bob:1234

bob:5467

sally:5467

sally:9012

amy:9012

- tom, frank, bob are connected via one JSESSIONID, bob and sally by another, so sally has an indirect or transitive relationship to tom and frank. Amy is connected to Sally - so in the end, all of these above are part of one big group due some direct and indirect relationships

isoutamo · ‎04-01-2021

Hi

you could try with next

....
| eval group = case (username == "harry" OR username == "tom", "grp1", true(), "grp2")
| stats values(*) as * by group, jsessiond
| ....

please check the syntax as I haven’t splunk in my hands to check it.

r. Ismo

Grouping data by multiple attribute values

Other

table

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

Preparing your Splunk Environment for OpenSSL3

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector