I'm running eNcore in our lab environment to replace the Splunk eStreamer Add-On. The connection events are coming through just fine as are Access Control Policy Metadata, however I'm not seeing any Intrusion Events being logged in Splunk.
I'm a bit confused if I need to manually edit the configuration files as the documentation reads as though it should work out of the box. I have skimmed through the estreamer.conf file and core as well as intrustion are both set to true. The old eStreamer Add-On is receiving Intrusion Events so I know the FMC is capable of sending them. I'm running Splunk 6.5.3 and FirePower 6.2.2.
Example of the metadata being logged instead of intrustion event:
rec_type=145 name=DMZ rec_type_desc="Access Control Policy Metadata" rec_type_simple="ACCESS CONTROL POLICY" sensor=zdxidsxxxxx uuid=00000000-0000-0000-0000-xxxxxxxxxxxx
I'm also seeing the following warning logged:
2017-11-13 10:26:06,927 estreamer.metadata.cache WARNING Metadata key ('uuid') missing on object ({'recordLength': 8, 'checksum': 0, 'blockLength': 8, 'archiveTimestamp': 0, 'blockType': 15, 'recordType': 119}). Ignoring
I am seeing a very similar issue. Our old environment using the old eStreamer app is working fine. When installing the Encore app in a completely new environment (don't have to worry about old TA messing w/ things) we are seeing a trickle of IPS events (1/20th of what FMC is seeing). After reading the comments above, I disabled the connection events, but after a few hours, I haven't seen any progress. I am not seeing my python process or system resources (core, mem, etc) overtaxed.
Is there a manual configuration setting (outside of the .confs) that needs to be made?
From an initial read from the experts you may be hitting a performance limitation. The current version generally maxes at 1.33 CPU cores. We're have plans for a more scalable version but I don't have a date yet.
What sort of event rate does your deployment run at for Connection Events? Any ideas?
If it is a performance limit it might explain the time stamp gap you see. The event queue gets pruned and if eNcore falls behind it could see the older events get pruned before transmission and then when it resumes you're getting subsequent events that didn't pruned. We've seen this in some networks with very rates.
I recall reading in the documentation and in the configuration files that eNcore still has to process all events sent to it by FirePOWER but it's only writing the Intrusion events to Splunk since I don't have connection events enabled in the configuration...Is that accurate? If that's true it now makes sense why it may be backlogged as it still has to process the connection events and drop them.
Connection Events: ~500 per second
On connection events, if you disable them at the FMC's eStreamer configuration page that should prevent them from being sent and free up CPU for eNcore as it won't have to read/write/format those events.
On the rate of the events, it should be possible to support 500 events per second with sufficient resources. The CPU clock speed will be a huge factor. What does this 16 CPU platform have for CPUs? Looks like you have plenty of Disk and RAM.
Disabling the connection events (and anything else aside from Intrusion Events) at the FMC seems to have corrected the Intrusion Event lag to Splunk. Surprisingly there is still a single python process at 100% CPU utilization. I would be really surprised if hardware is the limiting factor, Server has 2x Intel E7-4830 (2.13GHz base and 2.4GHz Boost). We have Splunk heavy forwarders ingesting Checkpoint Firewall OpsecLEA connections at 5k+ EPS running 4 core, 8GB ram virtual machines.
We've seen a few customers hit some performance limitations where one CPU hits 100 percent and a second at 33%. We know we need to make some enhancements to make it more scalable.
Do you have a second CPU available?
If you shut off Connection Events, do the Intrusion events show up?
System resources should not be an issue, we are using our old ArcSight SIEM box which has 16 physical cores, 32 virtual, 256GB ram and 12TB flash storage. This system is only used as a test box for on boarding Splunk apps and data so it sits relatively unused.
Here's an output from top with connection events off. The top python process always remains at 100%. The second two generally hover around 20% and the last one is generally under 2%. Velocity seems to jump from negative 1 to positive 2 even with minimal events coming in.
54708 splunk 20 0 133784 13784 3908 R 100.0 0.0 89182:56 python
17940 splunk 20 0 287516 12524 2668 S 21.9 0.0 1440:51 python
17939 splunk 20 0 287480 11268 1600 S 18.3 0.0 1045:41 python
17932 splunk 20 0 361280 14928 5032 S 1.7 0.0 141:10.29 python
Intrusion Events are coming in but appear to be quite delayed. In the last hour according to Splunk I have 40 Intrusion Events while FirePOWER FMC shows 64. The latest in Splunk has a timestamp of 11/21/17 07:39:17 while the latest in FirePOWER is 11/21/17 07:46:25. This accounts for 20 of the events. The rest of the missing events are mixed in throughout the past.
For example:
Splunk | FMC (Sorry for the crappy output.)
Splunk _time signature FMC Classification
- - | 11/21/2017 7:46 SERVER-APACHE Apache Struts remote code execution attempt (1:39191:2)
- - | 11/21/2017 7:45 SERVER-WEBAPP Java XML deserialization remote code execution attempt (1:44315:2)
- - | 11/21/2017 7:45 OS-OTHER Bash CGI environment variable injection attempt (1:31978:5)
- - | 11/21/2017 7:45 SERVER-APACHE Apache Struts remote code execution attempt (1:39191:2)
- - | 11/21/2017 7:45 SERVER-APACHE Apache Struts remote code execution attempt (1:41922:3)
- - | 11/21/2017 7:45 SERVER-WEBAPP Java XML deserialization remote code execution attempt (1:44315:2)
- - | 11/21/2017 7:44 OS-OTHER Bash CGI environment variable injection attempt (1:31977:5)
- - | 11/21/2017 7:44 SERVER-WEBAPP JBoss JMXInvokerServlet access attempt (1:24343:4)
- - | 11/21/2017 7:44 FILE-FLASH Adobe Flash Player MSIMG32.dll dll-load exploit attempt (1:38872:1)
- - | 11/21/2017 7:43 SERVER-APACHE Apache Struts2 blacklisted method redirect (1:29747:6)
- - | 11/21/2017 7:43 SERVER-WEBAPP JBoss web console access attempt (1:24342:3)
- - | 11/21/2017 7:43 SERVER-WEBAPP JBoss JMX console access attempt (1:21516:9)
- - | 11/21/2017 7:43 SERVER-APACHE Apache Struts2 blacklisted method redirect (1:29748:6)
- - | 11/21/2017 7:43 SERVER-APACHE Apache Struts remote code execution attempt (1:41922:3)
- - | 11/21/2017 7:43 SERVER-WEBAPP Java XML deserialization remote code execution attempt (1:44315:2)
- - | 11/21/2017 7:43 SERVER-APACHE Apache Struts2 blacklisted method redirect (1:29747:6)
- - | 11/21/2017 7:43 SERVER-APACHE Apache Struts remote code execution attempt (1:39191:2)
- - | 11/21/2017 7:43 SERVER-APACHE Apache Struts2 blacklisted method redirect (1:29748:6)
- - | 11/21/2017 7:43 SERVER-APACHE Apache Struts remote code execution attempt (1:39191:2)
- - | 11/21/2017 7:42 SERVER-WEBAPP JBoss web console access attempt (1:24342:3)
2017-11-21T07:39:17.000-0600 OS-OTHER Bash CGI environment variable injection attempt | 11/21/2017 7:39 OS-OTHER Bash CGI environment variable injection attempt (1:31978:5)
2017-11-21T07:38:42.000-0600 FILE-FLASH Adobe Flash Player MSIMG32.dll dll-load exploit attempt | 11/21/2017 7:38 FILE-FLASH Adobe Flash Player MSIMG32.dll dll-load exploit attempt (1:38872:1)
- - | 11/21/2017 7:34 OS-OTHER Bash CGI environment variable injection attempt (1:31978:5)
2017-11-21T07:28:13.000-0600 SERVER-WEBAPP JBoss JMX console access attempt | 11/21/2017 7:28 SERVER-WEBAPP JBoss JMX console access attempt (1:21516:9)
2017-11-21T07:22:42.000-0600 SERVER-APACHE Apache Struts remote code execution attempt | 11/21/2017 7:22 SERVER-APACHE Apache Struts remote code execution attempt (1:39191:2)
- - 11/21/2017 7:21 SERVER-APACHE Apache Struts remote code execution attempt (1:39191:2)
- - 11/21/2017 7:21 SERVER-WEBAPP JBoss JMXInvokerServlet access attempt (1:24343:4)
2017-11-21T07:16:34.000-0600 SERVER-WEBAPP Java XML deserialization remote code execution attempt | 11/21/2017 7:19 SERVER-WEBAPP Java XML deserialization remote code execution attempt (1:44315:2)
- - 11/21/2017 7:16 SERVER-WEBAPP Java XML deserialization remote code execution attempt (1:44315:2)
2017-11-21T07:16:02.000-0600 SERVER-APACHE Apache Struts remote code execution attempt | 11/21/2017 7:16 SERVER-APACHE Apache Struts remote code execution attempt (1:39191:2)
Couple of easy things to check. Sorry if these are super obvious.
- check to make Intrusion Events option in the estreamer config page is toggled on
- make sure you see the events in the FMC UI
- search for rec_type=400 in the Splunk search
The old TA might be requesting an older IDS event type. Not sure this matters.
This has to be something super basic.
Doug
Yes we have intrusion events being sent from our production FMC to our production Splunk instance using the old TA and events are showing up in the FMC UI as well.
We do not have the old TA installed on our test Splunk box so that shouldn't be the issue.
I do have 1 rec_type=400 event from two days ago showing in Splunk but should have many more, we generally have 15-20 Intrusion events an hour minimum. One thing I find interesting is my velocity is generally -.05 even though no events are being sent to Splunk other than status and log events from the python agents. One other thing to note is there is a single Python process that is pegged at 100% CPU. The Splunk box is a very beefy box with all flash storage so I wouldn't expect to have a velocity much lower than 0.
Would the add-on try to pull events from the past from the FMC or will it only pull data going forward?