Solved: Re: Sourcefire Encore data ingestion issue - Page 2

vik_splunk · ‎06-15-2021

Hi All,

We have recently upgraded from 7.2.6 to 8.1.3 Splunk and since then, we have been having issues with Sourcefire ingestion from FMC.

Splunk and sourcefire version - prior to upgrade - 7.2.6 and 3.0.0

Splunk and sourcefire version - Post upgrade - 8.1.3 and 4.6.0

TA used - https://splunkbase.splunk.com/app/3662/

What we've attempted so far

3.0.0 with compatibility enabled for Python 2.x - Errors out with Connection reset by peer
estreamer.subscriber ERROR error: \nTraceback (most recent call last):\n File "/opt/splunk/etc/apps/TA-eStreamer/bin/encore/estreamer/subscriber.py", line 198, in start\n self.connection.connect()\n File "/opt/splunk/etc/apps/TA-eStreamer/bin/encore/estreamer/connection.py", line 80, in connect\n self.socket.connect( ( host, port ) )\n File "/opt/splunk/lib/python2.7/ssl.py", line 864, in connect\n self._real_connect(addr, False)\n File "/opt/splunk/lib/python2.7/ssl.py", line 855, in _real_connect\n self.do_handshake()\n File "/opt/splunk/lib/python2.7/ssl.py", line 828, in do_handshake\n self._sslobj.do_handshake()\nerror: [Errno 104] Connection reset by peer\n
4.6.0 upgraded TA in 8.1.3 - Connection succeeds and collects logs for a while but then, we are met with the errors "Invalid JSON in settings file" followed by Subscriberparser is dead, message - We also found this bug reference, similar to the error -
https://quickview.cloudapps.cisco.com/quickview/bug/CSCvy06369
4.6.0 upgraded TA with compatibility enabled for 2.x - Same as above. Connection succeeds but eventually, stops collection after a while and errors out with the same message as present in 2.
Fresh install of 4.6.0 followed by fresh config. Connects fine to FMC but errors out as below
"Error state. Clearing queue"

In a nutshell, what used to be a stable stream of logs from FMC is completely broken/fragmented. In all cases, able to use the splencore test to establish successful connection and have restarted the service but no luck.

We have been through all articles in community and as well, all suggested troubleshooting but no luck. Any advice on getting this working is much appreciated.

@douglashurd - Can you please advise. Thanks!

vik_splunk · ‎09-15-2021

Posting an update in the hope that it will help someone.

We had a ticket open with Cisco support and after a few versions of the TA, the most recent eStreamer version 4.8.1 https://splunkbase.splunk.com/app/3662/ fixed the issue for us.

Thanks! to @skhademd for delivering the fix.

Anyone encountering this issue, please upgrade to ver. 4.8.1 of the TA as that seems to have fix the issue.

View solution in original post

_joe · ‎07-19-2021

Just a small comment on your clean command error.

The encore/default.conf default file wants to put new data ""uri": "relfile:///data/splunk/encore.log{0}","

"outputters": [
{
"name": "Splunk default",
"adapter": "splunk",
"enabled": true,
"stream": {
"uri": "relfile:///data/splunk/encore.log{0}",
"options": {
"rotate": true,
"maxLogs": 9999
}

However, if you updated like I did, you previously would have generated a "encore/estreamer.conf" with the older location overriding the default. That, that means you either need to change your location or update the clean command path.

nvzFlow · ‎07-19-2021

Thanks Joe,

You are right, using an estreamer.conf will override the default.conf and that is a normal practice we see with our customers who use encore. I wanted to take an opportunity to share with everyone the latest documentation we recently published, it not only provides a complete walkthrough on the install, but it also contains a detailed Q&A section that highlights some the issues that are mentioned here

https://www.cisco.com/c/en/us/td/docs/security/firepower/70/api/eNcore/eNcore_Operations_Guide_v08.h...

As always if there is an issue we regularly monitor the encore community mailer, [email protected]. so feel free to post questions there, thanks again

vik_splunk · ‎07-20-2021

Appreciate the inputs @nvzFlow and @_joe

We have been working with @skhademd and at the time of this message, the issue isn't solved for us yet. @skhademd was able to replicate the issue in his lab and after a few correspondences, has suggested the use of "alwaysAttemptToContinue": true variable in the estreamer.conf.

We are yet to implement the fix and will keep you posted if the issue persists. Along the way, will utilize the manual shared by @nvzFlow to ensure any gotcha's are avoided. Thanks!

_joe · ‎07-20-2021

Thanks, yes - please let us know. I think my current record is stable for one week prior to crashing with the "Configure ERROR Invalid JSON in settings file" error. I also have a ticket open with Cisco but as of yet I cannot provide them with the FMC logs so I don't think it will go anywhere.

One thing I found in our environment is that the stop () and status () commands just don't work. I am not sure if the status () command is even supposed to work (doesn't appear to be in the main module for splencore.sh), but there is still an input for it in default/inputs.conf.... at least the stop command SHOULD work..

At the moment I am considering writing my own stop script and scheduling it for once a day. I actually had my older firePower (3.x) TA scheduled to stop once a day, I found otherwise the volume of IDS events would be slightly less then when compared to the FMC.

_joe · ‎07-26-2021

Just want to report to everyone, that I have been running eNcore 6.4.3 (with "alwaysAttemptContinue": true,” ) for a few days. Overall it seems more stable... but it has already crashed once with a new error:

Decorator ERROR Message data too large. Enable debug if asked to do so.
Decorator INFO Error state. Clearing queue

On the positive side, I did get a "Invalid JSON in settings file" error which caused eNcore to stop logging to the estreamer.log, but the normal ingest process continued without any data loss.

Update: The newest version (4.6.3 with alwaysAttemptContinue as true) crashed the day after I posted this. It had been stable for 4 days.

Currently my Splunk inputs is setup to stop eNcore multiple times. By default, it attempts to start up again every two minutes. This has done an OK job at clearing the queue and restarting eNcore successfully after it crashes.

vik_splunk · ‎07-01-2021

@skhademd @douglashurd Unfortunately, the new version of 4.6.1 didn't help either. The same error is back, as always, slightly later in the same day.

In parallel, we have had our Network admins raise a ticket with Cisco. Cisco has acknowledged the problem and raised a bug as can be seen below. I will e-mail details of the case to the e-mail address shared by Doug.

We have a bug open for this, but it looks like there is no root cause found yet:

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvy79722

Logs as shown below.

2021-06-30 18:21:51,735 Monitor INFO Running. 104600 handled; average rate 5.62 ev/sec;
2021-06-30 18:22:19,262 Receiver DEBUG FMC sent no data
2021-06-30 18:22:41,791 Receiver DEBUG FMC sent no data
2021-06-30 18:22:52,297 Receiver DEBUG FMC sent no data
2021-06-30 18:23:02,798 Receiver DEBUG FMC sent no data
2021-06-30 18:23:13,307 Receiver DEBUG FMC sent no data
2021-06-30 18:23:23,836 Receiver DEBUG FMC sent no data
2021-06-30 18:23:34,343 Receiver DEBUG FMC sent no data
2021-06-30 18:23:52,872 Receiver DEBUG FMC sent no data
2021-06-30 18:23:53,384 Monitor INFO Running. 104600 handled; average rate 5.59 ev/sec;
2021-06-30 18:24:11,898 Receiver DEBUG FMC sent no data
2021-06-30 18:24:22,400 Receiver DEBUG FMC sent no data
2021-06-30 18:24:32,907 Receiver DEBUG FMC sent no data
2021-06-30 18:24:43,413 Receiver DEBUG FMC sent no data
2021-06-30 18:24:53,929 Receiver DEBUG FMC sent no data
2021-06-30 18:25:00,952 Receiver DEBUG Got null message.
2021-06-30 18:25:15,971 Receiver DEBUG FMC sent no data
2021-06-30 18:25:26,472 Receiver DEBUG FMC sent no data
2021-06-30 18:25:45,003 Receiver DEBUG FMC sent no data
2021-06-30 18:25:51,028 Monitor INFO Running. 104700 handled; average rate 5.56 ev/sec;
2021-06-30 18:26:11,040 Receiver DEBUG FMC sent no data
2021-06-30 18:26:21,550 Receiver DEBUG FMC sent no data
2021-06-30 18:26:32,058 Receiver DEBUG FMC sent no data
2021-06-30 18:26:42,569 Receiver DEBUG FMC sent no data
2021-06-30 18:26:53,076 Receiver DEBUG FMC sent no data
2021-06-30 18:27:00,099 Receiver DEBUG Got null message.
2021-06-30 18:27:10,099 Receiver DEBUG FMC sent no data
2021-06-30 18:27:20,612 Receiver DEBUG FMC sent no data
2021-06-30 18:27:31,123 Receiver DEBUG FMC sent no data
2021-06-30 18:27:41,635 Receiver DEBUG FMC sent no data
2021-06-30 18:27:46,163 Monitor INFO Running. 104700 handled; average rate 5.52 ev/sec;
2021-06-30 18:27:56,154 Receiver DEBUG FMC sent no data
2021-06-30 18:28:10,174 Receiver DEBUG FMC sent no data
2021-06-30 18:28:25,192 Receiver DEBUG FMC sent no data
2021-06-30 18:28:41,212 Receiver DEBUG FMC sent no data
2021-06-30 18:29:05,236 Receiver DEBUG FMC sent no data
2021-06-30 18:29:15,746 Receiver DEBUG FMC sent no data
2021-06-30 18:29:32,268 Receiver DEBUG FMC sent no data
2021-06-30 18:29:45,291 Receiver DEBUG FMC sent no data
2021-06-30 18:29:55,802 Receiver DEBUG FMC sent no data
2021-06-30 18:29:56,312 Monitor INFO Running. 104700 handled; average rate 5.49 ev/sec;
2021-06-30 18:30:06,304 Receiver DEBUG FMC sent no data
2021-06-30 18:30:17,330 Receiver DEBUG FMC sent no data
2021-06-30 18:30:34,346 Receiver DEBUG FMC sent no data
2021-06-30 18:30:47,081 Service ERROR [no message or attrs]: Invalid JSON in settings file

AlexS · ‎07-01-2021

I compared the new 4.6.1 version and for me issue is also not fixed. The changes necessary for the cleaning to run, I already implemented before.

For now I added another script input to call splencore.sh with a stop statement every x hours, so the service gets restarted regularly. The error appears after random time. Last time error appeared after 45 minutes, but the time before it was running more than 10 hours.
And still same behavior, logging continues to work after the error is logged but stops some time later.

elee_splunk · ‎06-30-2021

Will try that. I don't believe we have access to the 4.6.1 version of the TA yet. Doesn't look like it's published to splunkbase.

vik_splunk · ‎06-23-2021

Appears that I spoke too soon @douglashurd

It was fine for two days and now, we are back to square one unfortunately with the below errors

This is still an issue and will require a fix. Please advise.

2021-06-23 09:23:53,924 Connection INFO Connecting to <IP>
2021-06-23 09:23:53,926 Connection INFO Using TLS v1.2
2021-06-23 09:23:53,926 Transformer INFO Starting process.
2021-06-23 09:23:53,927 Monitor INFO Starting Monitor.
2021-06-23 09:23:53,927 Decorator INFO Starting process.
2021-06-23 09:23:53,928 Transformer DEBUG Transformer
2021-06-23 09:23:53,928 Decorator DEBUG Decorator
2021-06-23 09:23:53,928 Writer INFO Starting process.
2021-06-23 09:23:53,929 Writer DEBUG Writer
2021-06-23 09:23:53,929 Monitor INFO Starting. 0 handled; average rate 0 ev/sec;
2021-06-23 09:25:54,081 Controller INFO Process subscriberParser is dead.
2021-06-23 09:25:54,081 Monitor INFO Running. 0 handled; average rate 0 ev/sec;
2021-06-23 09:25:54,133 Controller INFO Stopping...
2021-06-23 09:25:54,134 Controller INFO Process 7091 (Process-1) exit code: 1
2021-06-23 09:25:54,134 Decorator INFO Stop message received
2021-06-23 09:25:54,140 Decorator INFO Error state. Clearing queue
2021-06-23 09:25:54,141 Decorator INFO Exiting
2021-06-23 09:25:54,141 Controller INFO Process 7092 (Process-2) exit code: 0
2021-06-23 09:25:54,146 Transformer INFO Stop message received
2021-06-23 09:25:54,152 Transformer INFO Error state. Clearing queue
2021-06-23 09:25:54,152 Transformer INFO Exiting
2021-06-23 09:25:54,152 Controller INFO Process 7093 (Process-3) exit code: 0
2021-06-23 09:25:54,157 Writer INFO Stop message received
2021-06-23 09:25:54,163 Writer INFO Error state. Clearing queue
2021-06-23 09:25:54,163 Writer INFO Exiting
2021-06-23 09:25:54,163 Controller INFO Process 7096 (Process-4) exit code: 0
2021-06-23 09:25:54,163 Monitor INFO Stopping Monitor.
2021-06-23 09:25:54,333 Controller INFO Goodbye

elee_splunk · ‎06-23-2021

Same here, happening to me on a daily basis

vik_splunk · ‎06-24-2021

Hmmm.. Just putting it out there. Not sure if it contributes to the issue @elee_splunk

What Linux version are you running Splunk on the specific machine?

The noticeable difference in our environment is as below.

FMC	RHEL	Splunk	estreamer TA	customizations if any	Status
6.6.4	6.x	8.1.3	4.6.0	sourcetype slightly changed	Completely broken without a semblance of stability
6.6.4	7.x	8.1.3	4.6.0	default sourcetype	Pretty stable

@douglashurd thoughts?

elee_splunk · ‎06-29-2021

@douglashurd @vik_splunk

I've switched back to my old HF and its been very stable.

FMC	Ubuntu	Splunk	estreamer TA	customizations if any	Status
6.6.4	18.04.5 LTS	7.2.10	3.6.8	default sourcetype	Stable

vik_splunk · ‎06-30-2021

Hi @elee_splunk That's good to hear.

Appears Python 2 script seems to offer stability in your case. In our environment, it's a bit puzzling in the fact that we have one forwarder ingesting logs without issues in the new version while the prod environment seems to throw errors.

elee_splunk · ‎06-24-2021

FMC	Ubuntu	Splunk	estreamer TA	customizations if any	Status
6.6.4	20.04.2 LTS	8.2	4.6.0	default sourcetype	Intermittent

I have to stop the estreamer service and start it back up to get it going at least once every day or 2 days. When that doesn't work I have to reboot the whole server.

douglashurd · ‎06-29-2021

Please email a link to this thread to [email protected]

Thanks,

Doug

vik_splunk · ‎06-30-2021

Just did, @douglashurd . Thanks!

Sourcefire Encore data ingestion issue

heavy forwarder

scripted input

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability

span_metrics: The OpenTelemetry-Idiomatic Way to See Inside Your Services

Join the Conversation