Solved: IP correlation and mapping

splunkingsplun1 · ‎03-03-2014

I have two indexes. One is index=intrusion and the other is index=threat_list, index=ips consists of IPS event logs which tracks dropped traffic on our perimeter IPS, the second index is index=threat_list containing the IPs of known threat sources. We would like to correlate IP addresses hitting our permiter IPS in index=intrusion with IP adresses on the index=threat_list. Then we would then like to map those results every month.

The threat list has more than 2 million IPs per day and the IPS can receive about 3 million "attacks" per month.

This is what we have come up with so far but the problem is when reporting on these indexes on a monthly basis the event counts are so high that searches time out even though we have raised their limits in limits.conf

Google Maps App for plotting search results:

index=intrusion | join src [search index=threat_list] | stats count as _geo_count by src | geoip src | search _geo=* | stats sum(_geo_count) as _geo_count by _geo

Can anyone suggest a way to make the search more efficient so we do not see these timeouts or suggest a different method perhaps using lookup, dedup, stats, etc. that we can then use to map the results similar to the image below and avoid timing out?

martin_mueller · ‎03-03-2014

Taking a look at the first part of your search:

index=intrusion | join src [search index=threat_list] | stats count as _geo_count by src

You're essentially looking for source addresses that appear in both indexes, right?
Consider this alternative:

index=intrusion OR index=threat_list | stats count as _geo_count dc(index) as index_count by src | where index_count==2

That should achieve the same thing without a subsearch.

Edit: The count may be different, my example counts a src appearing once per index as two events. If that's a concern it should be fixable by replacing count with count(eval(index=="intrusion")).

View solution in original post

martin_mueller · ‎03-03-2014

Taking a look at the first part of your search:

index=intrusion | join src [search index=threat_list] | stats count as _geo_count by src

You're essentially looking for source addresses that appear in both indexes, right?
Consider this alternative:

index=intrusion OR index=threat_list | stats count as _geo_count dc(index) as index_count by src | where index_count==2

That should achieve the same thing without a subsearch.

Edit: The count may be different, my example counts a src appearing once per index as two events. If that's a concern it should be fixable by replacing count with count(eval(index=="intrusion")).

splunkingsplun1 · ‎03-06-2014

Thank you for all your help

martin_mueller · ‎03-05-2014

You can do the exact same thing:

(index=intrusion host="192.168.1.20" action=drop NOT src="10.1.*") OR index=threat_list | ...

The NOT src=something might be applicable to both indexes, but won't do much to speed things up.

rabitoblanco · ‎04-28-2016

I'm trying to do something very similar, and trying to optimize my search--

Over the timeframe of 24h, I hit the subsearch limits.

Using a smaller 2hour timeframe, I see the difference is something like 5 ips after subsearch filter, vs. about 1,130,000 ips using the dc(index) model-- after which I am splitting all of those by about 10 different parameters and it takes forever to calculation on all of them.

Any suggestions on how to improve this?

My current search format is:

` (index=a sourcetype=b ) OR (index=c sourcetype=d action=e)
| eval ipv4=coalesce(ipv4, pattern)
| eval DISTINCTLOCKOUT=if(statement)
| eval DISTINCTELOCKOUT=if(statement)
| eval impacted_username=if(statement)

| eval whitelisted_username=if(statement)

| eval Date=strftime(_time, "%Y/%m/%d")

| eval failed_name=if(activity_status=="FAILED" AND activity_error!="7577",username,NULL)

| eval success_name=if(statement)
| eval blocked_name=if(statement)

| stats     count as TOTAL_COUNT, count(this) as UNBLOCKED_TOTAL,   count(this)) as BLOCKED_COUNT,  count(this)) as WHITELISTED_COUNT, dc(this)  as UNIQUE_WHITELISTED, count(eval(activity_status=="SUCCESS")) as SUCCESS_COUNT, count(eval(activity_status=="FAILED" AND activity_error!="7577")) as FAILED_COUNT,    count(eval(this and isnull(that))) as IMPACTED_COUNT,   dc(this)  as UNIQUE_IMPACTED, count(eval(this OR that)) as LOCKOUT_COUNT,  count(eval(this OR that OR that)) as ELOCKOUT_COUNT, count(eval(this OR that OR that)) as RANDOM_USER_LOGINS,  dc(username) as UNIQUE_USER_COUNT, dc(failed_name) as FAILED_UNIQUE, dc(success_name) as SUCCESS_UNIQUE, dc(blocked_name) as BLOCKED_UNIQUE,    min(_time) as FIRST_TIME,     max(_time) as LAST_TIME,     min(eval(if(statement))) as BLOCKED_TIME, dc(DISTINCTLOCKOUT) as "DISTINCTLOCKOUT_COUNT",    by ipv4
| some prettyprint stuff  `

Thanks in advance for any pointers.

splunkingsplun1 · ‎03-05-2014

ok I adjusted my view settings and was able to see the field you indicate, but what if I wanted to do some additional filtering? For instance, usually when we use the join command our search would be something like this:

index=intrusion host="192.168.1.20" action=drop NOT src="10.1.*" | join src [search index=threat_list] | stats count as _geo_count by src

How can I accomplish the same modifying the search you sent?

martin_mueller · ‎03-05-2014

the underscore'd field name may get hidden depending on your view.

splunkingsplun1 · ‎03-05-2014

Ok, that is perhaps my misunderstanding. I haven't seen _geo_count only index_count. Thank you for the clarification.

martin_mueller · ‎03-05-2014

It'd give you a row for that with fields src _geo_count index_count where index_count would indeed be 2, and _geo_count would be 30.

splunkingsplun1 · ‎03-05-2014

What happens if one IP attacks 30 times in one month? Wouldn't it only tell me that the IP exists in both index (index_count==2)?

martin_mueller · ‎03-03-2014

What's the difference?

splunkingsplun1 · ‎03-03-2014

This seems to only tell me if the IP occurred in both indexes but not the attack count from each IP on the threat list against my IPS throughout the entire month.

gauldridge · ‎03-03-2014

Can you explain a little more about the threat_list index? Does it only contain IPs? Do you need to join the entire threat_list event to the intrusion event when the source IP matches? Honestly, depending on your setup, this sounds like a good candidate situation for using a database lookup instead of joining two indexes. We use a database lookup to accomplish a similar task.

splunkingsplun1 · ‎03-03-2014

Yes the threat list index currently is IP and threat type for instance src=192.168.x.x threat_type=scanning host. I want to know if an IP on the threat list attacked our IPSs any time throughout the month, so I think the answer is yes, we have to see the entire threat list.

IP correlation and mapping

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

ATTENTION: We’re Moving! (AGAIN!)

Deep Dive: Optimizing Telemetry Pipelines in Splunk Observability Cloud

Announcing Modern Navigation: A New Era of Splunk User Experience

Join the Conversation