- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How to troubleshoot data Ingestion Latency or related using logs
Here is my experience troubleshooting Splunk data ingestion related issues.
1. Search for the top 3 issue in your environment.
index=_internal host=<indeser_host or HF_host> source="/opt/splunk/var/log/splunk/splunkd.log" log_level=WARN
| top 3 component
2. Address the top 1 issue and review. Going to use an example component issue. In my case it's the DateParserVerbose is the top 1 issue. When we use the drill down to see the logs. You notice that field that are necessary are not parsed by the TA or Splunk to narrow the issue. Here is the SPL to do that using REX.
index=_internal host=<index_host or HF_host>source="/opt/splunk/var/log/splunk/splunkd.log" log_level=WARN component=DateParserVerbose
| rex "| rex "\] - (?P<source_message>.+)source\S(?P<data_source>.+)\|host\S(?P<data_host>\w{5}\d{4}\S$m$\S$msk$\S$mask$\S$msk$)"
```Note: $m$, $msk$, $mask$ are masked value. You need to put your domain here
3. Then you can see which source is causing what issue. You can event break down the message using rex to figure the common cause and repeat step 1.
| stats values(data_source) as data_source by data_host source_message
| top source_message
Happy Splunkin! We wish we had this in the beginning but, as most Splunker are task with so many tasks not enough to troubleshoot ingestion related issues.
How do you other type of troubleshooting like filter events and sending things to null queue or blacklist? Do you see any in the logs?
