Hi Splunk Community,
I’ve set up Azure Firewall logging, selecting all firewall logs and archiving them to a storage account (Event Hub was avoided due to cost concerns). The configuration steps taken are as follows:
Log Archival:
Microsoft Cloud Add-On
Input/Action | API | Permissions | Role (IAM) | Default Sourcetype(s) / Sources |
Azure Storage Table Azure Storage Blob | N/A | Access key OR Shared Access Signature: - Allowed services: Blob, Table - Allowed resource types: Service, Container, Object - Allowed permissions: Read, List | N/A | mscs:storage:blob ✅ (Received this) mscs:storage:blob:json ❌ mscs:storage:blob:xml❌ mscs:storage:table❌ |
We are receiving events from the source files in JSON format, but there are two issues:
Field Extraction:
Incomplete Logs:
Few logs were received compared to the traffic on Azure Firewall. Attached is a piece of logs showing errors as mentioned in the question.
________________________________________________________________
Environment Details:
• Log Collector: Heavy Forwarder (HF) hosted in Azure. • Data Flow: Logs are being forwarded to Splunk Cloud
Questions:
Ultimate Goal:
Receive Azure Firewall Logs with fields extracted as any other firewall logs received by Syslog (Fortinet for example)
Any guidance or troubleshooting suggestions would be much appreciated!
Splunk Support Update:
Regarding your question about the best way to ingest Azure Firewall logs into Splunk, I would recommend using Event Hub for this purpose. Event Hub allows you to stream real-time data, which is ideal for continuous log ingestion. On the other hand, using Storage Blob as an input can lead to delays, especially as log sizes increase, and could also result in data duplication.
As usual, there might probably be more than one solution to a problem (in your case - ingestion of Azure Firewall logs). True, Event Hub will give you a near-realtime (it's not strictly realtime since it's pull-based as far as I remember) but the storage-based method might be cheaper and if you're ok with the latency it might be sufficient.
Your original problems were most probably caused by misconfigured sourcetype. The input data was not broken into events properly and/or the events were to long and got truncated.
As a result json extractions didn't happen because the events were not well-formed jsons.
Thank you for your input. Might be the line breaker field that is causing this.
In addition, the amount of events received is low taking into consideration it's an Azure Firewall with 10-15 GB Daily of logs.
That's to be expected as well. If your input is not broken into single events properly you might end up with a small number of huge data blobs (effectively consisting of several "atomic" events). Since they'd get cut off at TRUNCATE point, all the data following that point would be lost.