I am trying to find the distance between two or more IP geolocations without the use of an external script (not an admin). Here is my base search:
tag=login | geoip src_ip | stats distinct_count(src_ip_country_name) AS count_country, values(src_ip_country_name) AS country by username | where count_country > 1
I know I can find the difference in the latitude and longitude fields. Something like the following:
But how do I incorporate that into my base search? Would I be able to build a table with the geolocations and the distance grouped by username?
fast forward into the future, we can do the great circle formula in Splunk now.
This example will provide the expected result:
| makeresults | eval lat1=1, lon1=1, lat2=2, lon2=2 | eval rlat1 = pi()*lat1/180, rlat2=pi()*lat2/180, rlat = pi()*(lat2-lat1)/180, rlon= pi()*(lon2-lon1)/180 | eval a = sin(rlat/2) * sin(rlat/2) + cos(rlat1) * cos(rlat2) * sin(rlon/2) * sin(rlon/2) | eval c = 2 * atan2(sqrt(a), sqrt(1-a)) | eval distance = 6371 * c | table lat1 lon1 lat2 lon2 distance
distance will be the distance in
Hope this helps ...
The three macros below calculate the haversine formula that @MuS provided.
[haversine(5)] # Calculate the great circle distance for a sphere with an arbitrary radius args = input_lat1, input_lon1, input_lat2, input_lon2, hav_radius definition = "eval hav_lat1_radians = pi()*$input_lat1$/180, hav_lat2_radians=pi()*$input_lat2$/180, hav_delta_lat_radians = pi()* ($input_lat2$-$input_lat1$)/180, hav_delta_lon_radians= pi()*($input_lon2$-$input_lon1$)/180 | eval hav_intermediate = pow(sin(hav_delta_lat_radians/2), 2) + cos(hav_lat1_radians) * cos(hav_lat2_radians) * pow(sin(hav_delta_lon_radians/2), 2) | eval hav_distance = 2 * $hav_radius$ * atan2(sqrt(hav_intermediate), sqrt(1-hav_intermediate)) | fields - hav_*_radians, hav_intermediate " [haversine(4)] # Calculate the great circle distance for the earth (in kilometers) args = input_lat1, input_lon1, input_lat2, input_lon2 definition = "`haversine($input_lat1$, $input_lon1$, $input_lat2$, $input_lon2$, 6371)` " [haversine(2)] # Calculate the great circle distance between two IPs (in kilometers) args = input_ip1, input_ip2 definition = "iplocation $input_ip1$ prefix=$input_ip1$_ | iplocation $input_ip2$ prefix=$input_ip2$_ | `haversine($input_ip1$_lat, $input_ip1$_lon, $input_ip2$_lat, $input_ip2$_lon)` "
Using streamstats, you can calculate IP location distances between events. With eventstats, you can calculate IP location distances between a common IP location and an events IP location.
The calculated value is returned as hav_distance, to decrease the chances of a field name collision.
The haversine formula is not as accurate as Vincenty's formulae, but is much more accurate than a simple chord length calculation.
| makeresults | eval usual_src_ip="220.127.116.11", src_ip="18.104.22.168" | `haversine(usual_src_ip, src_ip)` | where hav_distance > 500
I'm working on a similar query and I much appreciate what you've both done here. I've worked up this:
| lookup geoip clientip |dedup userID, client_city| eval location=clientip."- ".client_city.", ".client_region.", ".client_country| stats last(client_lat) as Lat1, last(client_lon) as Lon1, first(client_lat) as Lat2, first(client_lon) as Lon2, values(location) dc(client_city) as distinctCount by userID| where distinctCount = 2 | eval distance=sqrt(pow(Lat1-Lat2,2)+pow(Lon1-Lon2,2))|sort distance desc
I've gotten it to work when a user has had 2 different IPs. using first & last precludes more though. Still trying to work on that.
The pythagorean theorem is a good approximation only for shorter distances. If you're actually dealing with pretty big distances you have to break out some trig functions and calculate great circle distance. http://en.wikipedia.org/wiki/Great-circle_distance
And since eval can't do trig functions ( see http://splunk-base.splunk.com/answers/26399/can-eval-evaluate-cosines ) that would lead you back to a custom search command again.
However, if your distances are all short enough, then what you propose just needs to be plugged into eval.
| eval distance=sqrt(pow(src_ip_latidude1-src_ip_latidude2,2)+pow(src_ip_longitude1-src_ip_logitude2,2))
Once that eval clause gives you that field called distance on your rows, you can do whatever you want with it.
No, I don't see why you'd need to do the distance calculation within the stats clause. That would be a little crazy. Do it before and use some form of
last(distance) as distance by username, or
by username distance in your stats, and then filter afterwards. Or use some form of
last(src_ip_latitude) as src_ip_latitude last(src_ip_longitude) as src_ip_longitude in stats and then do the distance calculation after.
I think my question is a little more complex than I initially thought. My current base search only has the src_ip_latitude and src_ip_longitude fields. I want break it up (e.g. latitude1, latitude2, etc.) grouped by the username. I'm thinking I would need alter the end of my search to something like "where (count_country > 1) AND (distance > 100)". That means I likely need to do the distance calculation it within my stats clause. Because after my stats clause, I no longer have access to the latitude and longitude fields.
Assuming you have those other four fields in your events, just tack the
| eval onto the end of the search. Just by that eval will add an additional field to all rows called "distance". Again you have to have all four of those fields by those exact case sensitive names, on all events. More generally on all incoming rows, whether they're events or whether they've already been transformed or altered by other search language commands.
I completely forgot about the fact that that the Earth is round. 🙂 Too bad I can't use the great-circle formula.
How can I pull out the latitude and longitude field by username and plug it into the eval? In other words, how can I incorporate the eval into the base search?