hi... we are using splunk to look at indexed logs, at the same time, the googlemap add on is enabled to view query's origin.
problem is, when we search using search query sourcetype="*" | geoip , supposably should give all the events information (which by the way is nearly 5 billion of events!!) but it only shows 19,000 events!!
this is a disaster as geoip of splunk is really important to us.
any clues what might be the cause of the problem and how to fix it???
Using Splunk version 5.0
If I do this:
source="/home/IP_Addresses.txt" | geoip ip | top ip_country_name limit=100
source="/home/IP_Addresses.txt" | geoip ip | search ip_country_name="Denmark"
source="/home/IP_Addresses.txt" | geoip ip | search ip_country_name="Denmark" | stats count
The observed behavior is a postprocess-limitation of Splunk. When you take a look at the default maps view, you will notice that results are being post-processed. If you search through Splunkbase, youl'll find multiple discussions regarding the 10k postprocess limitation.
The results are summarized behind the scenes for the user. The module will automatically apply the following postprocess-search to the base-search:
eval _geo_count=coalesce(_geo_count,1) | stats sum(_geo_count) as _geo_count by _geo
So the results are aggregated to the count results by unique (distinct) location. The resulting number of records is usually lower by an order of a magnitude in most cases.
eg. when dealing with geo-ip database based results, there will not be a huge number of unique locations, since the number of records in the GeoCity Light database is not that big. A lot of IP addresses share the same location.
The GoogleMaps module will only fetch 100,000 results from the search endpoint. This is a hard-coded limitation at the moment, since the browser won't be able to handle more records at a time.
A better approach is to summarize the result in the base-search, by searching for something like:
sourcetype=something src_ip=* | stats count as _geo_count by src_ip | geoip src_ip | search _geo=* | stats sum(_geo_count) as _geo_count by _geo
Here's a short explaination what this search does:
Reduce the result in the base search to those events that contain the relevant IP field
| stats count as _geo_count by src_ip
Aggregate by distinct IP address
| geoip src_ip
Do the geo-ip lookup
| search _geo=*
Filter out those results that do not contain geo-information
| stats sum(_geo_count) as _geo_count by _geo
Aggregate again to the the summarized count of events by distinct location (ie. distinct combination of latitude and longitue).
If you're really dealing with a even bigger number of distinct locations (more than 100k), which I doubt, then you will need to perform some kind of server-side clustering. There will be support for accurate geo-clustering in a future version of the Google Maps app. In the meanwhile you can use the kmeans command or craft a custom search command.
ok, from what I understand, your comments are on Google Maps limit of 100,000:
1)Why splunk only goes up to 10k instead of 100k? if possible, how to modify that?
2)Why running geoip command in splunk's main search (the flashtime runner) also has the same issue although geoip command doesnt have anything to do with Google Maps?
thanks alot ziegfried for the comprehensive, detailed answer...
everyone here kindly aimed to help me with this problem and finally there is an answer...
2 thoughts though..
1: searching using your suggested search query does not fetch anything for me, it seems to be searching, but fetched results remains 0 and search percentage sticks to 46% for very long time (almost a day)...
2: your doubt is actually wrong. I do have nearly 1.5 millions of "distinct locations"...
I noticed dmaislin_splunk asked you to open a support ticket about this issue, if you haven't already done that, you should. If it's a bug, that will help get it addressed sooner.
Also, thank you for taking the time to report all of your efforts on this. The information you've reported will be helpful to others in the future.
I think i found the source of the problems..
i think its neither google maps nor geoip! but actually any view or special searches rather than normal search!!
to prove that im correct, you can try a simple search "*" which retrieves all info in the normal search, but then try it in "Advanced chart view" and u'll see again only less than 15-20 thousands of results will be shown!!! (this is while normal search goes up to few billions!)
I think there is a bug in splunk's views or any kind of advanced searching for that matter... so i've started a new thread here: Bug? Splunk advanced searching/views does not display correctly
answering my own question in my previous post, the parameter "maxout" can solve the misery as its under [subsearch] stanza, which makes sence as geoip is a subsearch not a whole search...
but anyhow, changing that parameter's value still did not help at all... no effects!!
and one other matter I cannot understand, is that the limits.conf file supposably should control all parts of splunk (since its located under system/default) but how come when I enter search query not using geoip, I can easily get billions of results, but the limit problem only occurs when geoip is being used in the search ??!
thanks for your answers... yes I figured that and its been few days I'm also looking after it, still could not find the right parameter.
under limits.conf it seemed the last parameter (max_count) to be the one, but no changes after changing the parameter takes effect...
In case you want to take a look at the limits, they are established on $SPLUNK_HOME/etc/system/default/limits.conf, find the one you'd like to change, create a new limits.conf and place under: