I have a big query that produces output like this.
Those rows are guid id, count of occurrences, then ip addresses (they're stored in csv like that in raw data). What I'm attempting to do is basically combine instances of the same guid, sum all occurrences, and then have a column that would be a big csv of ALL ip addresses for the guid. I've tried many things, but not having any luck.
Split your ip addresses into a multi-value field, gather then up and sum your counts by guid, then join the ip addresses up again into a single string
| eval ip=split(ip,", ")
| stats values(ip) as ip sum(count) as count by guid
| eval ip=mvjoin(ip,", ")
Split your ip addresses into a multi-value field, gather then up and sum your counts by guid, then join the ip addresses up again into a single string
| eval ip=split(ip,", ")
| stats values(ip) as ip sum(count) as count by guid
| eval ip=mvjoin(ip,", ")
This said, I have a suspicion that the "big query" itself uses stats to get that "ip1, ip2, ip3" pattern. If so, you should examine that "big query" and do proper stats from there.
Unfortunately the IPAddresses are logged in that manner (2 addresses with a comma) in the applications themselves. My query didn't combine them like that.
That said, I ended up figuring it out. Used this.
prequery
| stats count(Customer) as CustomerRequests values(IPAddresses) as IPAddresses by Customer
| eval IPAddresses = mvjoin(IPAddresses, ",")
| table Customer, CustomerRequests, IPAddresses
| sort -CustomerRequests
This produced the desired output of
Customer | Requests | IPAddresses |
<guid here> | 1000 | 192.168.0.1,192.168.0.2,192.168.0.3,192.168.0.4,...etc |
If your ip addresses appear in more than one list, they get duplicated unless you do the split as I suggested.
oh good point! I hadn't noticed that possibility. Thanks!