Getting Data In

Sourcefire Encore data ingestion issue

vik_splunk
Communicator

Hi All,

We have recently upgraded from 7.2.6 to 8.1.3 Splunk and since then, we have been having issues with Sourcefire ingestion from FMC.

Splunk and sourcefire version - prior to upgrade - 7.2.6 and 3.0.0

Splunk and sourcefire version - Post upgrade - 8.1.3 and 4.6.0

TA used - https://splunkbase.splunk.com/app/3662/


What we've attempted so far

  1. 3.0.0  with compatibility enabled for Python 2.x - Errors out with  Connection reset by peer

    estreamer.subscriber ERROR    error: \nTraceback (most recent call last):\n  File "/opt/splunk/etc/apps/TA-eStreamer/bin/encore/estreamer/subscriber.py", line 198, in start\n    self.connection.connect()\n  File "/opt/splunk/etc/apps/TA-eStreamer/bin/encore/estreamer/connection.py", line 80, in connect\n    self.socket.connect( ( host, port ) )\n  File "/opt/splunk/lib/python2.7/ssl.py", line 864, in connect\n    self._real_connect(addr, False)\n  File "/opt/splunk/lib/python2.7/ssl.py", line 855, in _real_connect\n    self.do_handshake()\n  File "/opt/splunk/lib/python2.7/ssl.py", line 828, in do_handshake\n    self._sslobj.do_handshake()\nerror: [Errno 104] Connection reset by peer\n

  2. 4.6.0 upgraded TA in 8.1.3 - Connection succeeds and collects logs for a while but then, we are met with the errors "Invalid JSON in settings file" followed by Subscriberparser is dead, message - We also found this bug reference, similar to the error - 

    https://quickview.cloudapps.cisco.com/quickview/bug/CSCvy06369

  3. 4.6.0 upgraded TA with compatibility enabled for 2.x - Same as above. Connection succeeds but eventually, stops collection after a while and errors out with the same message as present in 2.
  4. Fresh install of 4.6.0 followed by fresh config.  Connects fine to FMC but errors out as below
    "Error state. Clearing queue"

    In a nutshell, what used to be a stable stream of logs from FMC is completely broken/fragmented. In all cases, able to use the splencore test to establish successful connection and have restarted the service but no luck.

    We have been through all articles in community and as well, all suggested troubleshooting but no luck. Any advice on getting this working is much appreciated. 

@douglashurd - Can you please advise. Thanks!

Labels (2)
0 Karma
1 Solution

vik_splunk
Communicator

Posting an update in the hope that it will help someone.

We had a ticket open with Cisco support and after a few versions of the TA, the most recent eStreamer version 4.8.1 https://splunkbase.splunk.com/app/3662/  fixed the issue for us.

Thanks! to @skhademd for delivering the fix.

 

Anyone encountering this issue, please upgrade to ver. 4.8.1 of the TA as that seems to have fix the issue.

View solution in original post

0 Karma

_joe
Communicator

Just a small comment on your clean command error.

The encore/default.conf default file wants to put new data ""uri": "relfile:///data/splunk/encore.log{0}","

"outputters": [
{
"name": "Splunk default",
"adapter": "splunk",
"enabled": true,
"stream": {
"uri": "relfile:///data/splunk/encore.log{0}",
"options": {
"rotate": true,
"maxLogs": 9999
}

However, if you updated like I did, you previously would have generated a "encore/estreamer.conf" with the older location overriding the default. That, that means you either need to change your location or update the clean command path. 

 

0 Karma

nvzFlow
Path Finder

Thanks Joe,

You are right, using an estreamer.conf will override the default.conf and that is a normal practice we see with our customers who use encore.  I wanted to take an opportunity to share with everyone the latest documentation we recently published, it not only provides a complete walkthrough on the install, but it also contains a detailed Q&A section that highlights some the issues that are mentioned here

 

https://www.cisco.com/c/en/us/td/docs/security/firepower/70/api/eNcore/eNcore_Operations_Guide_v08.h...

 

As always if there is an issue we regularly monitor the encore community mailer, encore-community@cisco.com. so feel free to post questions there, thanks again

0 Karma

vik_splunk
Communicator

Appreciate the inputs @nvzFlow and @_joe 

 

We have been working with @skhademd and at the time of this message, the issue isn't solved for us yet. @skhademd  was able to replicate the issue in his lab and after a few correspondences, has suggested the use of "alwaysAttemptToContinue": true variable in the estreamer.conf.  

We are yet to implement the fix and will keep you posted if the issue persists. Along the way, will utilize the manual shared by @nvzFlow to ensure any gotcha's are avoided. Thanks!

0 Karma

_joe
Communicator

Thanks, yes - please let us know. I think my current record is stable for one week prior to crashing with the "Configure ERROR Invalid JSON in settings file" error. I also have a ticket open with Cisco but as of yet I cannot provide them with the FMC logs so I don't think it will go anywhere. 

 

One thing I found in our environment is that the stop () and status () commands just don't work. I am not sure if the status () command is even supposed to work (doesn't appear to be in the main module for splencore.sh), but there is still  an input for it in default/inputs.conf.... at least the stop command SHOULD work..

At the moment I am considering writing my own stop script and scheduling it for once a day. I actually had my older firePower (3.x) TA scheduled to stop once a day, I found otherwise the volume of IDS events would be slightly less then when compared to the FMC.

 

0 Karma

_joe
Communicator

Just want to report to everyone, that I have been running eNcore 6.4.3 (with "alwaysAttemptContinue": true,” ) for a few days. Overall it seems more stable... but it has already crashed once with a new error:

Decorator    ERROR    Message data too large. Enable debug if asked to do so.
Decorator    INFO     Error state. Clearing queue

On the positive side, I did get a "Invalid JSON in settings file" error which caused eNcore to stop logging to the estreamer.log, but the normal ingest process continued without any data loss.
 
Update: The newest version (4.6.3 with alwaysAttemptContinue as true) crashed the day after I posted this. It had been stable for 4 days. 
 
Currently my Splunk inputs is setup to stop eNcore multiple times. By default, it attempts to start up again every two minutes.  This has done an OK job at clearing the queue and restarting eNcore successfully after it crashes. 
0 Karma

vik_splunk
Communicator

@skhademd @douglashurd Unfortunately, the new version of 4.6.1 didn't help either. The same error is back, as always, slightly later in the same day.

In parallel, we have had our Network admins raise a ticket with Cisco. Cisco has acknowledged the problem and raised a bug as can be seen below. I will e-mail details of the case to the e-mail address shared by Doug.

 

We have a bug open for this, but it looks like there is no root cause found yet:

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvy79722

 

Logs as shown below.

2021-06-30 18:21:51,735 Monitor INFO Running. 104600 handled; average rate 5.62 ev/sec;
2021-06-30 18:22:19,262 Receiver DEBUG FMC sent no data
2021-06-30 18:22:41,791 Receiver DEBUG FMC sent no data
2021-06-30 18:22:52,297 Receiver DEBUG FMC sent no data
2021-06-30 18:23:02,798 Receiver DEBUG FMC sent no data
2021-06-30 18:23:13,307 Receiver DEBUG FMC sent no data
2021-06-30 18:23:23,836 Receiver DEBUG FMC sent no data
2021-06-30 18:23:34,343 Receiver DEBUG FMC sent no data
2021-06-30 18:23:52,872 Receiver DEBUG FMC sent no data
2021-06-30 18:23:53,384 Monitor INFO Running. 104600 handled; average rate 5.59 ev/sec;
2021-06-30 18:24:11,898 Receiver DEBUG FMC sent no data
2021-06-30 18:24:22,400 Receiver DEBUG FMC sent no data
2021-06-30 18:24:32,907 Receiver DEBUG FMC sent no data
2021-06-30 18:24:43,413 Receiver DEBUG FMC sent no data
2021-06-30 18:24:53,929 Receiver DEBUG FMC sent no data
2021-06-30 18:25:00,952 Receiver DEBUG Got null message.
2021-06-30 18:25:15,971 Receiver DEBUG FMC sent no data
2021-06-30 18:25:26,472 Receiver DEBUG FMC sent no data
2021-06-30 18:25:45,003 Receiver DEBUG FMC sent no data
2021-06-30 18:25:51,028 Monitor INFO Running. 104700 handled; average rate 5.56 ev/sec;
2021-06-30 18:26:11,040 Receiver DEBUG FMC sent no data
2021-06-30 18:26:21,550 Receiver DEBUG FMC sent no data
2021-06-30 18:26:32,058 Receiver DEBUG FMC sent no data
2021-06-30 18:26:42,569 Receiver DEBUG FMC sent no data
2021-06-30 18:26:53,076 Receiver DEBUG FMC sent no data
2021-06-30 18:27:00,099 Receiver DEBUG Got null message.
2021-06-30 18:27:10,099 Receiver DEBUG FMC sent no data
2021-06-30 18:27:20,612 Receiver DEBUG FMC sent no data
2021-06-30 18:27:31,123 Receiver DEBUG FMC sent no data
2021-06-30 18:27:41,635 Receiver DEBUG FMC sent no data
2021-06-30 18:27:46,163 Monitor INFO Running. 104700 handled; average rate 5.52 ev/sec;
2021-06-30 18:27:56,154 Receiver DEBUG FMC sent no data
2021-06-30 18:28:10,174 Receiver DEBUG FMC sent no data
2021-06-30 18:28:25,192 Receiver DEBUG FMC sent no data
2021-06-30 18:28:41,212 Receiver DEBUG FMC sent no data
2021-06-30 18:29:05,236 Receiver DEBUG FMC sent no data
2021-06-30 18:29:15,746 Receiver DEBUG FMC sent no data
2021-06-30 18:29:32,268 Receiver DEBUG FMC sent no data
2021-06-30 18:29:45,291 Receiver DEBUG FMC sent no data
2021-06-30 18:29:55,802 Receiver DEBUG FMC sent no data
2021-06-30 18:29:56,312 Monitor INFO Running. 104700 handled; average rate 5.49 ev/sec;
2021-06-30 18:30:06,304 Receiver DEBUG FMC sent no data
2021-06-30 18:30:17,330 Receiver DEBUG FMC sent no data
2021-06-30 18:30:34,346 Receiver DEBUG FMC sent no data
2021-06-30 18:30:47,081 Service ERROR [no message or attrs]: Invalid JSON in settings file

0 Karma

AlexS
Loves-to-Learn

I compared the new 4.6.1 version and for me issue is also not fixed. The changes necessary for the cleaning to run, I already implemented before.

For now I added another script input to call splencore.sh with a stop statement every x hours, so the service gets restarted regularly. The error appears after random time.  Last time error appeared  after 45 minutes, but the time before it was running more than 10 hours.
And still same behavior, logging continues to work after the error is logged but stops some time later.

 

0 Karma

elee_splunk
Loves-to-Learn Everything

Will try that. I don't believe we have access to the 4.6.1 version of the TA yet. Doesn't look like it's published to splunkbase.

0 Karma

vik_splunk
Communicator

Appears that I spoke too soon @douglashurd 

It was fine for two days and now, we are back to square one unfortunately with the below errors

This is still an issue and will require a fix. Please advise.

2021-06-23 09:23:53,924 Connection INFO Connecting to <IP>
2021-06-23 09:23:53,926 Connection INFO Using TLS v1.2
2021-06-23 09:23:53,926 Transformer INFO Starting process.
2021-06-23 09:23:53,927 Monitor INFO Starting Monitor.
2021-06-23 09:23:53,927 Decorator INFO Starting process.
2021-06-23 09:23:53,928 Transformer DEBUG Transformer
2021-06-23 09:23:53,928 Decorator DEBUG Decorator
2021-06-23 09:23:53,928 Writer INFO Starting process.
2021-06-23 09:23:53,929 Writer DEBUG Writer
2021-06-23 09:23:53,929 Monitor INFO Starting. 0 handled; average rate 0 ev/sec;
2021-06-23 09:25:54,081 Controller INFO Process subscriberParser is dead.
2021-06-23 09:25:54,081 Monitor INFO Running. 0 handled; average rate 0 ev/sec;
2021-06-23 09:25:54,133 Controller INFO Stopping...
2021-06-23 09:25:54,134 Controller INFO Process 7091 (Process-1) exit code: 1
2021-06-23 09:25:54,134 Decorator INFO Stop message received
2021-06-23 09:25:54,140 Decorator INFO Error state. Clearing queue
2021-06-23 09:25:54,141 Decorator INFO Exiting
2021-06-23 09:25:54,141 Controller INFO Process 7092 (Process-2) exit code: 0
2021-06-23 09:25:54,146 Transformer INFO Stop message received
2021-06-23 09:25:54,152 Transformer INFO Error state. Clearing queue
2021-06-23 09:25:54,152 Transformer INFO Exiting
2021-06-23 09:25:54,152 Controller INFO Process 7093 (Process-3) exit code: 0
2021-06-23 09:25:54,157 Writer INFO Stop message received
2021-06-23 09:25:54,163 Writer INFO Error state. Clearing queue
2021-06-23 09:25:54,163 Writer INFO Exiting
2021-06-23 09:25:54,163 Controller INFO Process 7096 (Process-4) exit code: 0
2021-06-23 09:25:54,163 Monitor INFO Stopping Monitor.
2021-06-23 09:25:54,333 Controller INFO Goodbye

0 Karma

elee_splunk
Loves-to-Learn Everything

Same here, happening to me on a daily basis 

0 Karma

vik_splunk
Communicator

Hmmm.. Just putting it out there. Not sure if it contributes to the issue @elee_splunk 

 

What Linux version are you running Splunk on the specific machine?

The noticeable difference in our environment is as below.

 

 

FMCRHELSplunkestreamer TAcustomizations if anyStatus
6.6.46.x8.1.34.6.0sourcetype slightly changed Completely broken without a semblance of stability
6.6.47.x8.1.34.6.0default sourcetypePretty stable

 

@douglashurd  thoughts?

0 Karma

elee_splunk
Loves-to-Learn Everything

@douglashurd @vik_splunk 

I've switched back to my old HF and its been very stable.

FMCUbuntuSplunkestreamer TAcustomizations if anyStatus
6.6.418.04.5 LTS7.2.103.6.8default sourcetypeStable

 

 

0 Karma

vik_splunk
Communicator

Hi @elee_splunk  That's good to hear.

 

Appears Python 2 script seems to offer stability in your case. In our environment, it's a bit puzzling in the fact that we have one forwarder ingesting logs without issues in the new version while the prod environment seems to throw errors.

0 Karma

elee_splunk
Loves-to-Learn Everything
FMCUbuntuSplunkestreamer TAcustomizations if anyStatus
6.6.420.04.2 LTS8.24.6.0default sourcetypeIntermittent

 

I have to stop the estreamer service and start it back up to get it going at least once every day or 2 days. When that doesn't work I have to reboot the whole server.

0 Karma

douglashurd
Builder

Please email a link to this thread to encore-community@cisco.com

 

Thanks,

 

Doug

 

0 Karma

vik_splunk
Communicator

Just did, @douglashurd . Thanks!

0 Karma
Get Updates on the Splunk Community!

Sending Metrics to Splunk Enterprise With the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. The OpenTelemetry project is the second largest ...

What's New in Splunk Cloud Platform 9.0.2208?!

Howdy!  We are happy to share the newest updates in Splunk Cloud Platform 9.0.2208! Analysts can benefit ...

Want a chance to win $500 to the Splunk shop? Take our IT Incident Management Survey!

  Top Trends & Best Practices in Incident ManagementSplunk is partnering up with Constellation Research to ...