Getting Data In

Struggling to correctly parse JSON data

LOP22456
Explorer

Hello,

We have multiple fortigate devices forwarding to a logstash server that is storing all the device's logs in 1 file (I can't change this unfortunately). This is then forwarding to our HF, and then to Splunk Cloud.

This then enters splunk with sometimes 20+ logs in a single event, and I can't get them to parse out into individual events by host.

Below are samples of 2 logs, but in a single event there could be 20+ logs - I cannot get this to parse correctly out into each event per host (redact).

{"log":{"syslog":{"priority":189}},"host":{"hostname":"redact"},"fgt":{"proto":"1","tz":"+0200","vpntype":"ipsecvpn","rcvdbyte":"3072","policyname":"MW","type":"traffic","identifier":"43776","trandisp":"noop","logid":"0001000014","srcintfrole":"undefined","policyid":"36","rcvdpkt":"3","vd":"root","duration":"180","dstintfrole":"undefined","dstip":"10.53.6.1","level":"notice","eventtime":"1750692044675283970","policytype":"policy","subtype":"local","srcip":"10.53.4.119","dstintf":"root","srcintf":"HUB1-VPN1","sessionid":"5612390","action":"accept","service":"PING","app":"PING","sentbyte":"3072","sentpkt":"3","dstcountry":"Reserved","poluuid":"cb0c79de-2400-51f0-7067-d28729f733cf","srccountry":"Reserved"},"timestamp":"2025-06-23T15:20:45Z","data_stream":{"namespace":"default","dataset":"fortinet.fortigate","type":"logs"},"@timestamp":"2025-06-23T15:20:45.000Z","type":"fortigate","logstash":{"hostname":"no_logstash_hostname"},"tags":["_grokparsefailure"],"@version":"1","system":{"syslog":{"version":"1"}},"event":{"created":"2025-06-23T15:20:45.563831683Z","original":"<189>1 2025-06-23T15:20:45Z redact - - - - eventtime=1750692044675283970 tz=\"+0200\" logid=\"0001000014\" type=\"traffic\" subtype=\"local\" level=\"notice\" vd=\"root\" srcip=10.53.4.119 identifier=43776 srcintf=\"redact\" srcintfrole=\"undefined\" dstip=10.53.6.1 dstintf=\"root\" dstintfrole=\"undefined\" srccountry=\"Reserved\" dstcountry=\"Reserved\" sessionid=5612390 proto=1 action=\"accept\" policyid=36 policytype=\"policy\" poluuid=\"cb0c79de-2400-51f0-7067-d28729f733cf\" policyname=\"MW\" service=\"PING\" trandisp=\"noop\" app=\"PING\" duration=180 sentbyte=3072 rcvdbyte=3072 sentpkt=3 rcvdpkt=3 vpntype=\"ipsecvpn\""},"observer":{"ip":"10.53.12.113"}}

{"log":{"syslog":{"priority":189}},"host":{"hostname":"redact"},"fgt":{"proto":"1","tz":"+0200","rcvdbyte":"3072","policyname":"redact (ICMP)","type":"traffic","identifier":"43776","trandisp":"noop","logid":"0001000014","srcintfrole":"wan","policyid":"40","rcvdpkt":"3","vd":"root","duration":"180","dstintfrole":"undefined","dstip":"10.52.25.145","level":"notice","eventtime":"1750692044620716079","policytype":"policy","subtype":"local","srcip":"10.53.4.119","dstintf":"root","srcintf":"wan1","sessionid":"8441941","action":"accept","service":"PING","app":"PING","sentbyte":"3072","sentpkt":"3","dstcountry":"Reserved","poluuid":"813c45e0-3ad6-51f0-db42-8ec755725c23","srccountry":"Reserved"},"timestamp":"2025-06-23T15:20:45Z","data_stream":{"namespace":"default","dataset":"fortinet.fortigate","type":"logs"},"@timestamp":"2025-06-23T15:20:45.000Z","type":"fortigate","logstash":{"hostname":"no_logstash_hostname"},"tags":["_grokparsefailure"],"@version":"1","system":{"syslog":{"version":"1"}},"event":{"created":"2025-06-23T15:20:45.639474828Z","original":"<189>1 2025-06-23T15:20:45Z redact - - - - eventtime=1750692044620716079 tz=\"+0200\" logid=\"0001000014\" type=\"traffic\" subtype=\"local\" level=\"notice\" vd=\"root\" srcip=10.53.4.119 identifier=43776 srcintf=\"wan1\" srcintfrole=\"wan\" dstip=10.52.25.145 dstintf=\"root\" dstintfrole=\"undefined\" srccountry=\"Reserved\" dstcountry=\"Reserved\" sessionid=8441941 proto=1 action=\"accept\" policyid=40 policytype=\"policy\" poluuid=\"813c45e0-3ad6-51f0-db42-8ec755725c23\" policyname=\"redact (ICMP)\" service=\"PING\" trandisp=\"noop\" app=\"PING\" duration=180 sentbyte=3072 rcvdbyte=3072 sentpkt=3 rcvdpkt=3"},"observer":{"ip":"10.52.31.14"}}

 

I have edited props.conf to contain the following stanza, but still no luck:

 

[fortigate_log]
SHOULD_LINEMERGE = false
LINE_BREAKER = }(\s*)\{

 

Any direction on where to go from here? 

Labels (1)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

Ok. Can you please stop posting random copy-pastes from LLMs? LLMs are a useful tool... if they supplement your knowledge and expertise. Otherwise you're only introducing confusing wrong advices into the thread.

Your advice about both indexed extractions and kv mode at the same time is simply wrong - it will lead to duplicate fields. Your line breaker is also needlessly complicated. BREAK_ONLY_BEFORE has no effectt with line merging disabled.

Your advice about an addon for Fortigate is completely off because the TA for Fortigate available on Splunkbase handles default Fortigate event format, not jsons. Adjusting the events to be parsed by that addon will require more than just installing said addon.

And there is no _MetaData:tags key!

LLMs are known for making things up. Copy-pasting their delusions here isn't helping anyone! Just stop leading people astray.

@LOP22456 I assume that it's either multiple events per line in your input file or your events are multilined and therefore the usuall approach to split the file on line breaks doesn't work.

Unfortunately, there's no bulletproof solution for this since handling structured data with regexes alone is bound to be wrong in border cases. You can assume that your input breaks when you have two "touching" braces without a comma between them (even better if they must be on separate lines - that could give you "stronger" line breaker) but there still could be a border case where you have such string inside your json. But in most cases something like

LINE_BREAKER = }([\r\n\s]*){

should do. In most cases. In some border cases you might end up with broken events.

View solution in original post

LAME-Creations
Path Finder
Thanks for sharing the details of your FortiGate log parsing issue in Splunk Cloud! It sounds like your Logstash server is combining multiple FortiGate logs into a single file, which is then sent to your Heavy Forwarder (HF) and ingested into Splunk Cloud as multi-line events (sometimes 20+ logs per event). Your props.conf configuration isn’t breaking these into individual events per host, likely due to an incorrect LINE_BREAKER regex or misconfigured parsing settings. The _grokparsefailure tag suggests additional parsing issues, possibly from Logstash or Splunk misinterpreting the syslog format. Below is a solution to parse these logs into individual events per FortiGate host, tailored for Splunk Cloud and your HF setup.
Why the Current Configuration Isn’t Working
  • LINE_BREAKER Issue: Your LINE_BREAKER = }(\s*)\{ aims to split events between JSON objects (e.g., } {), but it may not account for the syslog headers or whitespace correctly, causing Splunk to treat multiple JSON objects as one event. The regex might also be too restrictive or not capturing all cases.
  • SHOULD_LINEMERGE: Setting SHOULD_LINEMERGE = false is correct to disable line merging, but without a precise LINE_BREAKER, Splunk may still fail to split events.
  • Logstash Aggregation: Logstash is bundling multiple FortiGate logs into a single file, and the _grokparsefailure tag indicates Logstash’s grok filter (or Splunk’s parsing) isn’t correctly processing the syslog format, leading to malformed events.
  • Splunk Cloud Constraints: In Splunk Cloud, props.conf changes must be applied on the HF, as you don’t have direct access to the indexers. The current configuration may not be properly deployed or tested.
Solution: Parse Multi-Line FortiGate Logs into Individual Events
To break the multi-line events into individual events per FortiGate host, you’ll need to refine the props.conf configuration on the Heavy Forwarder and ensure proper event breaking at ingestion time. Since the logs are JSON with syslog headers, you can use Splunk’s JSON parsing capabilities and a corrected LINE_BREAKER to split events.
Step 1: Update props.conf on the Heavy Forwarder
Modify the props.conf file on your HF to correctly break events and parse the JSON structure. Place this in $SPLUNK_HOME/etc/system/local/props.conf or an app’s local directory (e.g., $SPLUNK_HOME/etc/apps/<your_app>/local/props.conf).
ini
 
[fortigate_log]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)(?=\{"log":\{"syslog":\{"priority":\d+\}\})
INDEXED_EXTRACTIONS = json
KV_MODE = json
TIMESTAMP_FIELDS = timestamp
TIME_FORMAT = %Y-%m-%dT%H:%M:%SZ
BREAK_ONLY_BEFORE = ^\{"log":\{"syslog":\{"priority":\d+\}\}
TRUNCATE = 10000
category = Structured
disabled = false
pulldown_type = true
Explanation:
  • LINE_BREAKER = ([\r\n]+)(?=\{"log":\{"syslog":\{"priority":\d+\}\}): Splits events on newlines (\r\n) followed by the start of a new JSON object (e.g., {"log":{"syslog":{"priority":189}}). The positive lookahead (?=) ensures the JSON start is not consumed, preserving the event.
  • SHOULD_LINEMERGE = false: Prevents Splunk from merging lines, relying on LINE_BREAKER for event boundaries.
  • INDEXED_EXTRACTIONS = json: Automatically extracts JSON fields (e.g., host.hostname, fgt.srcip) at index time on the HF, reducing search-time parsing issues.
  • KV_MODE = json: Ensures search-time field extraction for JSON fields, complementing index-time parsing.
  • TIMESTAMP_FIELDS = timestamp: Uses the timestamp field (e.g., 2025-06-23T15:20:45Z) for event timestamps.
  • TIME_FORMAT = %Y-%m-%dT%H:%M:%SZ: Matches the timestamp format in the logs.
  • BREAK_ONLY_BEFORE: Reinforces event breaking by matching the start of a JSON object, as a fallback if LINE_BREAKER struggles.
  • TRUNCATE = 10000: Ensures large events (up to 10,000 characters) aren’t truncated, accommodating multi-log events.
  • category and pulldown_type: Improves Splunk Cloud’s UI compatibility for source type selection.
Step 2: Deploy and Restart the Heavy Forwarder
  • Deploy props.conf:
    • Place the updated props.conf in $SPLUNK_HOME/etc/system/local/ or a custom app directory on the HF.
    • If using a custom app, ensure it’s deployed via a Deployment Server or manually copied to the HF.
  • Restart the HF:
    • On the Windows HF, open a Command Prompt as Administrator.
    • Navigate to $SPLUNK_HOME\bin (e.g., cd "C:\Program Files\Splunk\bin").
    • Run: splunk restart
  • Note: In Splunk Cloud, you can’t modify indexer configurations directly. The HF applies these parsing rules before forwarding to the Splunk Cloud indexers.
Step 3: Verify Event Breaking
  • Run a search in Splunk Cloud to confirm events are split correctly:
    spl
     
    index=<your_index> sourcetype=fortigate_log| stats count by host.hostname
  • Check that each FortiGate host (from host.hostname) appears as a separate event with the correct count. Each event should correspond to one JSON log entry (e.g., one per host.hostname like “redact”).
  • If events are still merged, inspect $SPLUNK_HOME/var/log/splunk/splunkd.log on the HF for parsing errors (e.g., grep -i "fortigate_log" splunkd.log).
Step 4: Address _grokparsefailure Tag
The _grokparsefailure tag suggests Logstash’s grok filter isn’t correctly parsing the FortiGate syslog format, which may contribute to event merging. Since you can’t modify the Logstash setup, you can mitigate this in Splunk:
  • Override Logstash Tags: In props.conf, add a transform to remove the _grokparsefailure tag and ensure clean parsing:
    ini
     
    [fortigate_log]
    ...
    TRANSFORMS-remove_grok_failure = remove_grokparsefailure
    In $SPLUNK_HOME/etc/system/local/transforms.conf:
    ini
     
    [remove_grokparsefailure]
    REGEX = .
    FORMAT = tags::none
    DEST_KEY = _MetaData:tags
  • Restart the HF after adding the transform.
  • This clears the _grokparsefailure tag, ensuring Splunk doesn’t inherit Logstash’s parsing issues.
Step 5: Optimize FortiGate Integration (Optional)
  • Install Fortinet FortiGate Add-On: If not already installed, add the Fortinet FortiGate Add-On for Splunk
    on the HF and Search Head to improve field extraction and CIM compliance.
    • Install on the HF for index-time parsing (already handled by INDEXED_EXTRACTIONS = json).
    • Install on the Search Head for search-time field mappings and dashboards.
  • Verify Syslog Configuration: Ensure FortiGate devices send logs to Logstash via UDP 514 or TCP 601, as per Fortinet’s syslog standards.
     
  • Check Logstash Output: If possible, verify Logstash’s output plugin (e.g., Splunk HTTP Event Collector or TCP output) is configured to send individual JSON objects without excessive buffering, which may contribute to event merging.
Troubleshooting Tips
  • Test Parsing: Ingest a small sample log file via the HF’s Add Data wizard in Splunk Web to test the props.conf settings before processing live data.
  • Check Event Boundaries: Run index=<your_index> sourcetype=fortigate_log | head 10 and verify each event contains only one JSON object with a unique host.hostname.
  • Logstash Buffering: If Logstash continues to bundle logs, consider asking your Logstash admin to adjust the output plugin (e.g., splunk output with HEC) to flush events more frequently, though you noted this isn’t changeable.
     
  • Splunk Cloud Support: If parsing issues persist, contact Splunk Cloud Support to validate the HF configuration or request assistance with indexer-side parsing (though the HF should handle most parsing).
0 Karma

PickleRick
SplunkTrust
SplunkTrust

Ok. Can you please stop posting random copy-pastes from LLMs? LLMs are a useful tool... if they supplement your knowledge and expertise. Otherwise you're only introducing confusing wrong advices into the thread.

Your advice about both indexed extractions and kv mode at the same time is simply wrong - it will lead to duplicate fields. Your line breaker is also needlessly complicated. BREAK_ONLY_BEFORE has no effectt with line merging disabled.

Your advice about an addon for Fortigate is completely off because the TA for Fortigate available on Splunkbase handles default Fortigate event format, not jsons. Adjusting the events to be parsed by that addon will require more than just installing said addon.

And there is no _MetaData:tags key!

LLMs are known for making things up. Copy-pasting their delusions here isn't helping anyone! Just stop leading people astray.

@LOP22456 I assume that it's either multiple events per line in your input file or your events are multilined and therefore the usuall approach to split the file on line breaks doesn't work.

Unfortunately, there's no bulletproof solution for this since handling structured data with regexes alone is bound to be wrong in border cases. You can assume that your input breaks when you have two "touching" braces without a comma between them (even better if they must be on separate lines - that could give you "stronger" line breaker) but there still could be a border case where you have such string inside your json. But in most cases something like

LINE_BREAKER = }([\r\n\s]*){

should do. In most cases. In some border cases you might end up with broken events.

LOP22456
Explorer

thank you my friend that worked, most events are now being parsed properly - however I am still seeing some very large 200+ line events not getting parsed, with many of them being 257 lines? Any idea what could be causing these not to parse?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Unparsed or incorrectly broken? If they are incorrectly broken you might want to tweak that line breaker. Use https://regex101.com to test your ideas against your data.

If they are not/incorrectly parsed, either the events are malformed or you might be hitting extraction limits (there are limits to the size of the data and number of fields which are automaticaly extracted if I remember correctly).

livehybrid
Super Champion

@LAME-Creations @LOP22456 

Please do not set both INDEXED_EXTRACTIONS and KV_MODE = json.

See props.conf docs for more info - https://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf

 When 'INDEXED_EXTRACTIONS = JSON' for a particular source type, do not also 
  set 'KV_MODE = json' for that source type. This causes the Splunk software to extract the JSON fields twice: once at index time, and again at search time.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

Get Updates on the Splunk Community!

Deep Dive into Federated Analytics: Unlocking the Full Power of Your Security Data

In today’s complex digital landscape, security teams face increasing pressure to protect sprawling data across ...

Your summer travels continue with new course releases

Summer in the Northern hemisphere is in full swing, and is often a time to travel and explore. If your summer ...

From Alert to Resolution: How Splunk Observability Helps SREs Navigate Critical ...

It's 3:17 AM, and your phone buzzes with an urgent alert. Wire transfer processing times have spiked, and ...