I am having a VERY strange problem with my summary indexing. I have the following search running every hour at 20 minutes past the however, doing a summary of -1h@h to @h
index="app_silayer7" "*~HTTP://*" NOT "*~ERR~*" |dedup _raw|bucket _time span=5m
|rex field=_raw "^[^~\n]*~(?P<Domain>\w+)"
|rex field=_raw "^(?:[^~\n]*~){2}(?P<Service>[^~]+)"
|rex field=_raw "^(?:[^~\n]*~){3}(?P<Operation>[^~]+)"
|rex field=_raw "^(?:[^~\n]*~){4}(?P<NameSpace>[^~]+)"
|rex field=_raw "^(?:[^~\n]*~){5}(?P<Consumer>[^~]+)"
|rex field=_raw "^(?:[^~\n]*~){6}(?P<BackendTime>[^~]+)"
|rex field=_raw "^(?:[^~\n]*~){7}(?P<TotalTime>[^~]+)"
|rex field=NameSpace "(?<Version>[^\/])(|\/)$"
|eval Version="V".Version
|stats count as Count, avg(TotalTime) as TotalTime, min(TotalTime) as MinTime, max(TotalTime) as MaxTime, stdev(TotalTime) as STDDEV, perc95(TotalTime) as 95_Percentile ,by _time, Consumer, Domain, Service, Operation, Version
Every once in a while however, I get some duplicate events. Some hours there are no duplicates, but some hours there are. The interesting thing is, on the hours that I see duplicates, I review the job results in the job inspector, and the results look clean! Has anyone run into this issue before? The raw events don't have any duplication in them, but it almost seems like when Splunk is stashing the results in my summary index, it hiccups and adds a few extra duplicate rows.
For example, here are my results from the 9:00 hour:
Service DetailCount SummCount delta
Processor 36 72 36
Profile 55 110 55
ProfileAnd 185 370 185
It is exactly doubling the rows that were entered.
... View more