We have a new sourcetype that's using the AWS Add-on to grab data from S3 (SQS-based). Whenever we do a stats count or timechart or similar statistical command, we'll get counts that are 2-4x the actual data. For instance, if we do a stats count by a unique ID field, most of the IDs return a count of 2, but when we drill down, there's only one matching record. When we do a "top", the percents shown add up to well over 100%.
We've confirmed that there are no duplicates in the source data. None of our other sources seem to have this problem.
Also of note: When we set up data model acceleration and use tstats, the numbers are correct.
The unique ID field (and possibly all others) most likely are multi-value, with the value contained twice... I'm guessing that's json data, and you have both INDEXED_EXTRACTIONS = json
and KV_MODE = json
set? That would cause this behaviour.
The unique ID field (and possibly all others) most likely are multi-value, with the value contained twice... I'm guessing that's json data, and you have both INDEXED_EXTRACTIONS = json
and KV_MODE = json
set? That would cause this behaviour.
Usually you'll want to keep indexed extractions and turn off the search-time duplication.
I just set INDEXED_EXTRACTIONS=none and that looks like it solved it, and it's still recognized as JSON. Thanks for putting me on the right path!
It is JSON data. We do have INDEXED_EXTRACTIONS=json, but we don't have KV_MODE=json set. We did have AUTO_KV_JSON=true, but I removed that and it didn't seem to make a difference.
You are on the right track with the multi-value fields, though. I piped my search through | table id and the values were duplicated in the output.