Hello All,
I need to work on building SPL to fetch information related to corrupt data.
The conditions I narrowed to to determine if data is corrupt or not are: -
1. Improper breaking of data into individual lines.
- When fewer than expected events are ingested in index.
- When multiple events are grouped together as a large event.
- Truncation of lines in lengthy events, when length size exceeds the defined limit.
2. Improper breaking of events.
- When events are not properly recognized by Splunk.
- When we see more than required or lesser than required events.
3. Incorrect timestamp extraction.
- Timestamp issues related to: -
a. DATETIME_CONFIG
b. TIME_PREFIX
c. TIME_FORMAT
d. MAX_TIMESTAMP_LOOKAHEAD
e. When time observed in _time does not match the raw data.
- Errors related to timestamp extraction issues: -
a. AggregatorMiningProcessor
b. DateParserVerbose
Thus, I need your assistance to understand the approach of build SPLs to fetch details for above Splunk failure conditions occur.
So far, I have been able to document the below two queries: -
index=_internal TERM(AggregatorMiningProcessor) |stats count BT event_message
index=_internal TERM(DateParserVerbose) |stats count BY event_message
Thank you
I tried to use dbinspect command to fetch the details of corrupt buckets in Splunk index: -
|dbinspect index=* corruptonly=true cached=false
Can you please suggest the approach to fetch the following: -
1. How many searches were impacted due to corrupt bucket?
2. How many users were impacted due to corrupt buckets?
3. How to fetch which forwarder was part of each corrupt bucket?
Apart from above, I found data can be corrupted due to below scenarios: -
1. Improper breaking of data into individual lines, such as: -
-> Fewer than expected events are ingested because multiple events are grouped together creating a large event.
-> Splunk is not breaking the events as expected because events are not properly recognized.
2. Incorrect timestamp paring issues, such as: -
-> Many events are assigned with same timestamp value.
Thus, it would be helpful if you can share the approach to fetch details of the above scenarios and other similar issues that may be leading to corrupt data.
Thank you
Hi @Taruchit
Data quality is not about corruption of indexed buckets. Once incoming data has been ingested and written to a bucket (indexed) then it cannot be modified or changed. Data quality is about configuring Splunk to extract timestamps and line break events, etc, before they get indexed.
There is no quick fix and the best you can do if you have poorly configured ingestion rules is to go back to the start and work through each one to improve them. The Data Quality dashboard can help identify the worst ones, which is always a great place to start. Work through them one at a time.
It would be worth while reading the Splunk docs on getting data in as a start. In regards to event processing start here...
https://docs.splunk.com/Documentation/Splunk/9.0.4/Data/Overviewofeventprocessing
If the Splunk data quality is a real mess then consider getting an independent Splunk Consultant involved to assist in fixing things up. There is no one answer for this, I'm afraid.
Hope that helps
Hi @yeahnah,
Thank you for sharing your inputs about data quality.
I need some help to understand when you say: -
1. Data quality is not about corruption of indexed buckets.
2. Once incoming data has been ingested and written to a bucket (indexed) then it cannot be modified or changed.
As I understand: -
1. dbinspect command with corruptonly=true gives details of all buckets which are corrupt along with the Splunk reason.
2. Data comes from source system to forwarders. Forwarders send the data to Indexers. Indexers break data into events, apply transformation logic and then index them. It also stores the events in buckets. And then finally Search Head uses the events from buckets to fetch and return results based on search queries executed by users.
3. In dbinspect command results we get details of both corrupt buckets and non-corrupt buckets in Splunk.
Thus, I want your help to understand: -
1. When does the bucket becomes corrupt which also impacts the data quality in Splunk?
2. Can you please elaborate why data quality is not about corruption of indexed buckets especially when dbinspect command returns details of corrupt buckets?
It would be very helpful to seek your inputs to understand details at more granular level.
Thank you
A bucket is not corrupted because of how data is ingested. A bucket becomes corrupted because of a system or application error. Bucket corruption is not a data quality issue and data quality does not corrupt buckets. However, if a bucket cannot be read because it is corrupted then data in that bucket cannot be included in search results.
Data in Splunk is immutable. Once the data is written to disk it cannot be modified in any way until it is deleted.
You asked "When does the bucket becomes corrupt which also impacts the data quality in Splunk?". I'll say it again: bucket corruption and data quality are two different and unrelated concepts. Please do not conflate them. Don't even use them in the same sentence.
Data quality is about how Splunk processes events so they can be written to disk. Bucket corruption is what happens to the files that are used to store the data. One does not cause or affect the other.
You can start with the "Alerts for Admins" app. Also be aware that most every Splunk PS company does free/low-cost health checks and you can search through "index=_audit" afterwards. We do Health Checks.
Hi @woodcock,
Thank you for sharing your inputs.
Unfortunately, I did not find "Alerts for Admins" app in the dropdown list.
Thank you
Hi @Taruchit
It's a 3rd party app that would need to be installed first. It can be downloaded from Splunkbase here
https://splunkbase.splunk.com/app/3796
Hi @yeahnah,
Thank you for sharing that its a 3rd party app.
However, in my case, I will not be able to download and install any app; and thus, would need to rely on SPLs and approaches.
Thank you
Hi @Taruchit
You've not described your environment (standalone or distributed), but the best place to start is to take a look at your Splunk management server (assuming distributed env), open its Monitoring Console and look at the Indexing > Inputs > Data quality dashboard.
https://docs.splunk.com/Documentation/Splunk/latest/DMC/Dataquality
Hope this helps
Hi @yeahnah,
Thank you for sharing about Data quality dashboard in Monitoring Console.
I checked Data Quality dashboard and observed following: -
1. Event processing issues by Source Type
-> Sourcetype
-> Total issues
-> Host Count
-> Source Count
-> Line Breaking Issues
-> Timestamp Parsing Issues
-> Aggregation issues
-> Metrics Schema issues
-> Metrics issues
Do you know if there is any way to drilldown for fetching the details for each of the counts?
Thank you