Having all sorts of problems with syslog-based reception of JSON-type events. Would like to be able to capture these events using the HTTP Event Collector (HEC) as I can't seem to get the HTTP RESTful API working. Any advice would be helpful.
Hi Tony,
You can try forwarding those logs using syslogger.
HEC is a valid option but the major drawback is the data loss in case of receiver (HF) down.
Hi Tony,
As you mentioned on point 2 above, im currently having a problem that my syslog for some fireeye events is being parsed to just the ones in the curly bracket { } which currently appeared as "malformed" in splunk and subsequently missing the events in notable (cos other fields in header got stripped off).
"When data is sent in as syslog and the header is stripped, Splunk parses as the data which is great, however when you expand an event it can be more than your browser can handle. For example, an XML or JSON normal event can be 300,000 lines long with over a million parsable fields. When expanded, your browser's memory can spike trying to display all the data and cause the browser to freeze or crash."
Can i just comment out that particular stanza that is doing the stripping?
Detailed issue is in this link:
https://answers.splunk.com/answers/689053/can-you-help-me-with-a-syslog-issue-in-the-fireeye.html
Tony,
I am currently working with the TA and seeing some parsing issues with ex logs being parsed incorrectly by the TA. Do you know if the json via syslog format has been resolved as you mentioned above? We are considering maybe switching the ex collection to json over https due to the issues with syslog and json. We currently have no issues with our NX / HX devices and seems to be an issue with just our ex device log collection.
thanks in advance and sorry for the bump
I also dug into why the JSON over syslog isn't working correctly. We tried to make Splunk parse all fields within JSON and XML. In order to do this, we added a feature to strip syslog headers and convert the event into JSON or XML that Splunk would recognize using a transform called "syslog-header-stripper-ts-host-proc", under the syslog stanza. There are two problems with this approach.
1) When the data is sent as fe_json_syslog and not as syslog and then auto converted to fe_json_syslog, the header is not stripped and thus Splunk does not recognize it as JSON data with the syslog header
2) When data is sent in as syslog and the header is stripped, Splunk parses as the data which is great, however when you expand an event it can be more than your browser can handle. For example, an XML or JSON normal event can be 300,000 lines long with over a million parsable fields. When expanded, your browser's memory can spike trying to display all the data and cause the browser to freeze or crash.
We are working on a fix to #1 and then a fix to #2.
Thanks for the question. Not sure when we will support HEC, however let's see if we can troubleshoot your current issues.
In terms of HTTP RESTful sending. Did you see page 11 and page 30 of the configuration guide found below?
https://www.fireeye.com/content/dam/fireeye-www/global/en/partners/pdfs/config-guide-fireeye-app-for...
Page 11 provides steps to set it up. Page 30 provides some troubleshooting tricks.
If those don't work, feel free to send me an email via the Help -> Send Feedback feature in the app itself and we can troubleshoot some more.
Thanks for getting back to me, Tony.
The troubleshooting piece seemed to help and I figured out how to format my RESTful query appropriately. The challenge that I'm seeing now is that it appears the JSON stream that is being sent to the Splunk server is incomplete. I'm seeing truncation occur at the beginning and the end of the events. So the JSON parser doesn't seem to see it as a complete message. Samples follow. I'm using the latest version of the TA in a distributed environment running Splunk 6.5.1.
Truncation at the end:
{
"product": "Web MPS",
"appliance-id": "<obfuscated>",
"appliance": "<obfuscated>",
"alert": {
"src": {
"ip": "<obfuscated>",
"mac": "<obfuscated>",
"vlan": "0",
"port": "<obfuscated>"
},
"severity": "crit",
"alert-url": "<obfuscated>",
"explanation": {
"malware-detected": {
"malware": {
"name": "<obfuscated>",
"stype": "<obfuscated>",
"sid": "<obfuscated>"
}
},
"cnc-services": {
"cnc-service": {
"location": "<obfuscated>",
"protocol": "tcp",
"port": "80",
"channel": "<obfuscated>",
"address": "<obfuscated>"
}
},
"protocol": "tcp",
"analysis": "content"
},
"locations": "<obfuscated>",
"id": "<obfuscated>",
"action": "notified",
Truncation at the beginning:
"occurred": "2016-12-16 04:31:05+00",
"interface": {
"interface": "pether3",
"mode": "tap",
"label": "A1"
},
"dst": {
"ip": "<obfuscated>",
"mac": "<obfuscated>",
"port": "80"
},
"name": "<obfuscated>"
},
"version": "<obfuscated>",
"msg": "extended"
}
Splunk, by default, looks for date/time as the start of events. Thus it is seeing "occurred": "2016-12-16 04:31:05+00", and assuming that it is the start of an event and is automatically truncating. This means that the line breaker is not working properly for some reason.
What does Splunk indicate as the sourcetype when viewing the event?
These events are the ones being sent RESTful-y so they're being parsed with a sourctype of "fe_json".
If I had to guess, the sample above is actually one event that is being split at the "occurred"
You are correct per my previous comment:
"Splunk, by default, looks for date/time as the start of events. Thus it is seeing "occurred": "2016-12-16 04:31:05+00", and assuming that it is the start of an event and is automatically truncating. This means that the line breaker is not working properly for some reason."
I am trying to replicate the issue now.
Any luck on this? We're still seeing this behavior. I tried to counteract it by setting the linebreaker to be the following:
LINE_BREAKER = \{\s+\"product\"\:
but I'm still seeing similar behavior.