We had an issue with IPFIX crashing about 10 times a day. Splunk fixed that in IPFIX v5.0.2 but now we're getting 300,000 WARNING events; over 75MB of logs a minute in /opt/splunk/var/log/ipfix.log.
Is anyone else seeing this issue?
Basically, we get en error for every number that is in the appflow data. Here's a couple log lines.
2015-01-22 15:01:59,886 WARNING pid=15901 tid=MainThread file=IPFIXData.py:__init__:129 | Parsed egressInterface of type unsigned32 (4) [Id 0:14] for template 262. Data(!L): Encode '2147483651' failed because of the non-unicode data. Use 2147483651 instead.
2015-01-22 15:05:28,027 WARNING pid=15901 tid=MainThread file=IPFIXData.py:__init__:129 | Parsed netscalerFlowFlags of type unsigned64 (8) [Id 5951:132] for template 280. Data(!Q): Encode '8673961984' failed because of the non-unicode data. Use 8673961984 instead.
I've been running 5.0.3 for about a week now with no issues.
Thanks for the fix!
Hopefully you can now spend some time on adding template caching between reloads! 😉
I've been running 5.0.3 for about a week now with no issues.
Thanks for the fix!
Hopefully you can now spend some time on adding template caching between reloads! 😉
Also experiencing the same problem, with the previous version of IPFIX we were only able to run for about 10 minutes or so before having to restart splunkd on the heavy forwarder.
This version as stated above does correct the crash, but we have over 2 million events for the last 15 minutes. Watching this thread awaiting an updated version of the TA.
For the time being I've stopped the TA from running and deleted the logs.
I've just re-enabled visibility of the 5.0.1 release -- we're pretty sure we've found the right fix and will post a new maintenance version shortly. I'm sorry for the delay.
Thank you! The AppFlow data isn't mission critical to us, we are still gathering syslog (ns_log) from the NetScaler's so we are okay with waiting for a proper fix from Splunk.
Thank you for the updates.
We've just released version 5.0.3, which should resolve the issues found. Sorry for the delay.
Has there been any development on this?
We were thinking of going from free to a paid version of Splunk but since NS is our main source of data at the moment we've put those plans on hold until the integration actually works.
We've just released version 5.0.3, which should resolve the issues found. Sorry for the delay.
Just a note that this is dropping about 80% of the logs so I'm rolling back to the one that crashes 10 times a day!
I'm testing a 5.0.3 beta of this and it hasn't crashed since I installed it; about a day. No strange errors and it doesn't seem to be dropping any logs. It does still log to ipfix.log AND splunkd.log but there is a logging.conf.sample in defaults that might be able to correct that.
I've provided support with this same info. Hopefully it'll be released soon.
Thanks dfronck! Got your report from support, we're going to get this out ASAP.
5.0.3 is released, please let us know if that fixes it. Sorry for the delay.
Instead of crashing it logs that the source tried to crash it. Your gear is still sending non-compliant data. It should stop doing that.
We use standard Python logging, so you can decrease the logging level. That will mean we are silently deleting messages instead of logging deletion, which is why we don't do it by default.
Hi there!
I have the same problem. Where i can download ipfix addon 5.0.1 if it works?
So basically you're saying that ALL NetScalers running v10 are broken?
Only 1 of the templates is data from our apps. The rest is just NetScaler Built-In performance metric crap like Round Trip Times and Response Status. If that's broken for us, I'd assume that it's broken for everyone. I'll try to compare what we're getting in Splunk to what the NetScalers are sending once the NetScaler guy gets back but it appears that what's making it into Splunk is valid.
Also, we were only crashing about 10 times a day, not 300,000 times a minute!
Is data being thrown out when you get those warnings, or is it still being logged?
It's being thrown out.
@dfronck, support should have a build for you to try.
The RFC says non-Unicode string characters are not ok. Python says the data isn't Unicode, and crashes when we parse it as Unicode. I'm open to understanding more about how we could process that scenario in a better way.
OK, I think I'm missing something. These are supposed to be ints and floats not strings.
They're defined in the XML as "unsigned64" and "unsigned32" and "unsigned8" and "dateTimeMilliseconds" and "dateTimeMicroseconds" which I assumed were integers or floats.
The warning message says that netscalerFlowFlags should be an unsigned64.
2015-01-22 15:05:28,027 WARNING pid=15901 tid=MainThread file=IPFIXData.py:init:129 | Parsed netscalerFlowFlags of type unsigned64 (8) [Id 5951:132] for template 280. Data(!Q): Encode '8673961984' failed because of the non-unicode data. Use 8673961984 instead.
Are you saying that the NetScaler should be passing these numbers as Unicoded strings?
dfronck, thanks for pointing that out. Do you have a support case open?