Running into an issue with the "Substantial Increase In Port Activity" correlation search in ES. Essentially this search looks at network traffic and returns a count of how many times a specific destination port was used. Instead of using hard-coded thresholds, the search uses Extreme Search to find anomalies when compared to a baseline of the last 30 days.
The problem is that the context doesn't seem to be storing more than 50,000 records. Which is a problem since TCP/UDP is capable of 65K destinations. Therefore, when the correlation search runs I'm seeing a bunch of errors like this:
xsWhere-I-111: There is no context 'count_by_dest_port_1d' with class '55008' from container 'network_traffic' in scope 'none', using default context count_by_dest_port_1d
These repeats a bunch of times (each time with a different port number (class). Then Splunk says: The limit has been reached for log messages in info.csv. 3203 messages have not been written to info.csv. Please refer to search.log for these messages or limits.conf to configure this limit. So basically, that means that there's over 3k ports that can't be compared properly.
So,
Additional troubleshooting info:
I can confirm the the follow search return no results:
| xsListContexts in network_traffic | search Context="count_by_dest_port_1d" Class="55008"
This search returns "50,000":
| xsListContexts in network_traffic | search Context="count_by_dest_port_1d" Class=* | stats count
The number of classes in a context is unlimited. The reason you see these INFO messages is that the contexts for these specific port have not yet been created. This is not unexpected if these ports are accessed for the first time since the last time a xsCreateDDContext (or xsUpdateDDContext) has been run. If you rerun the Context Gen search associated with the 'count_by_dest_port_1d' these messages should go away.
I have been playing around with this rule as well, but not this particular issue. I just had a thought, though. I get that the limit is too low for the number of ports that are out there, but if this is looking at outbound traffic, what attempting to connect to all of these ports? Is there a system that is misconfigured or some malware that's trying to call home? I wouldn't think that splunk should ever have to keep track of that many ports. Just thinking out loud. Let me know what you think or if you get this figured out! We might make a change too if you come up with anything. Thanks for sharing!