All Apps and Add-ons

geoip search results not correct

Communicator

hi... we are using splunk to look at indexed logs, at the same time, the googlemap add on is enabled to view query's origin.
problem is, when we search using search query sourcetype="*" | geoip , supposably should give all the events information (which by the way is nearly 5 billion of events!!) but it only shows 19,000 events!!
this is a disaster as geoip of splunk is really important to us.
any clues what might be the cause of the problem and how to fix it???

0 Karma

Path Finder

Using Splunk version 5.0




If I do this:


source="/home/IP_Addresses.txt" | geoip ip | top ip_country_name limit=100

I see Denmark has a count of 4,032.




If I do this:

source="/home/IP_Addresses.txt" | geoip ip | search ip_country_name="Denmark"

I see "1,026 matching events".




But this:

source="/home/IP_Addresses.txt" | geoip ip | search ip_country_name="Denmark" | stats count

returns 4,032.




Is "max_matches" the right parameter to change in limits.conf to get the full list of 4,032 events?

0 Karma

Influencer

The observed behavior is a postprocess-limitation of Splunk. When you take a look at the default maps view, you will notice that results are being post-processed. If you search through Splunkbase, youl'll find multiple discussions regarding the 10k postprocess limitation.

The results are summarized behind the scenes for the user. The module will automatically apply the following postprocess-search to the base-search:

eval _geo_count=coalesce(_geo_count,1) | stats sum(_geo_count) as _geo_count by _geo

So the results are aggregated to the count results by unique (distinct) location. The resulting number of records is usually lower by an order of a magnitude in most cases.

eg. when dealing with geo-ip database based results, there will not be a huge number of unique locations, since the number of records in the GeoCity Light database is not that big. A lot of IP addresses share the same location.

The GoogleMaps module will only fetch 100,000 results from the search endpoint. This is a hard-coded limitation at the moment, since the browser won't be able to handle more records at a time.

A better approach is to summarize the result in the base-search, by searching for something like:

sourcetype=something src_ip=* | stats count as _geo_count by src_ip | geoip src_ip | search _geo=* | stats sum(_geo_count) as _geo_count by _geo

Here's a short explaination what this search does:

sourcetype=something src_ip=*

Reduce the result in the base search to those events that contain the relevant IP field

| stats count as _geo_count by src_ip

Aggregate by distinct IP address

| geoip src_ip

Do the geo-ip lookup

| search _geo=*

Filter out those results that do not contain geo-information

| stats sum(_geo_count) as _geo_count by _geo

Aggregate again to the the summarized count of events by distinct location (ie. distinct combination of latitude and longitue).

If you're really dealing with a even bigger number of distinct locations (more than 100k), which I doubt, then you will need to perform some kind of server-side clustering. There will be support for accurate geo-clustering in a future version of the Google Maps app. In the meanwhile you can use the kmeans command or craft a custom search command.

Communicator

Im still waiting for an answer... Did I miss any answers here...?

0 Karma

Communicator

Im trying sourcetype="*" | geoip SourceIP | kmeans k=100 SourceIP_country_name , its not giving me anything... any suggestions on the search command?

0 Karma

Communicator

ok, from what I understand, your comments are on Google Maps limit of 100,000:
1)Why splunk only goes up to 10k instead of 100k? if possible, how to modify that?
2)Why running geoip command in splunk's main search (the flashtime runner) also has the same issue although geoip command doesnt have anything to do with Google Maps?

0 Karma

Influencer

I forgot the geoip command in the search.

You should take a closer look at the kmeans command to do server-side clustering of the results.

0 Karma

Communicator

thanks alot ziegfried for the comprehensive, detailed answer...
everyone here kindly aimed to help me with this problem and finally there is an answer...
2 thoughts though..

1: searching using your suggested search query does not fetch anything for me, it seems to be searching, but fetched results remains 0 and search percentage sticks to 46% for very long time (almost a day)...

2: your doubt is actually wrong. I do have nearly 1.5 millions of "distinct locations"...

0 Karma

Communicator

I'm not getting any answers or opinions here nor in the other post Bug? Splunk advanced searching/views does not display correctly... really disappointed...

0 Karma

Communicator

my pleasure... no problems...:)
and yes I just saw his answer there... yesterday when I posted this I was left with no answers..
thank you for replying me as well...

0 Karma

Engager

I noticed dmaislin_splunk asked you to open a support ticket about this issue, if you haven't already done that, you should. If it's a bug, that will help get it addressed sooner.

Also, thank you for taking the time to report all of your efforts on this. The information you've reported will be helpful to others in the future.

Communicator

I think i found the source of the problems..
i think its neither google maps nor geoip! but actually any view or special searches rather than normal search!!
to prove that im correct, you can try a simple search "*" which retrieves all info in the normal search, but then try it in "Advanced chart view" and u'll see again only less than 15-20 thousands of results will be shown!!! (this is while normal search goes up to few billions!)
I think there is a bug in splunk's views or any kind of advanced searching for that matter... so i've started a new thread here: Bug? Splunk advanced searching/views does not display correctly

Communicator

could this problem be from geoip's free license but not the limit.conf's parameters??

0 Karma

Communicator

answering my own question in my previous post, the parameter "maxout" can solve the misery as its under [subsearch] stanza, which makes sence as geoip is a subsearch not a whole search...
but anyhow, changing that parameter's value still did not help at all... no effects!!

Communicator

noted and thanks.
and I still am waiting to find out the source of the problem...

0 Karma

Splunk Employee
Splunk Employee

In this context geoip is not a subsearch, which is why the limits.conf parameter you mention has no influence on it.

Communicator

and one other matter I cannot understand, is that the limits.conf file supposably should control all parts of splunk (since its located under system/default) but how come when I enter search query not using geoip, I can easily get billions of results, but the limit problem only occurs when geoip is being used in the search ??!

0 Karma

Communicator

thanks for your answers... yes I figured that and its been few days I'm also looking after it, still could not find the right parameter.
under limits.conf it seemed the last parameter (max_count) to be the one, but no changes after changing the parameter takes effect...

0 Karma

Path Finder

I'm unable to find the right paramter till now. geoip still stops working after around 20k events and 12 seconds.

0 Karma

Splunk Employee
Splunk Employee

In case you want to take a look at the limits, they are established on $SPLUNK_HOME/etc/system/default/limits.conf, find the one you'd like to change, create a new limits.conf and place under:

$SPLUNK_HOME/etc/system/local/limits.conf

0 Karma

Path Finder

I play around with some larger firewall logs and observe the same behavior. geoip stops always after approximately 20000 events.

I will do some further testing later.

0 Karma