Getting Data In

Why I am seeing data duplication in my summary index?

Builder

I am having a VERY strange problem with my summary indexing. I have the following search running every hour at 20 minutes past the however, doing a summary of -1h@h to @h

index="app_silayer7" "*~HTTP://*" NOT "*~ERR~*" |dedup _raw|bucket _time span=5m
|rex field=_raw "^[^~\n]*~(?P<Domain>\w+)" 
|rex field=_raw "^(?:[^~\n]*~){2}(?P<Service>[^~]+)" 
|rex field=_raw "^(?:[^~\n]*~){3}(?P<Operation>[^~]+)" 
|rex field=_raw "^(?:[^~\n]*~){4}(?P<NameSpace>[^~]+)"
|rex field=_raw "^(?:[^~\n]*~){5}(?P<Consumer>[^~]+)"
|rex field=_raw "^(?:[^~\n]*~){6}(?P<BackendTime>[^~]+)"
|rex field=_raw "^(?:[^~\n]*~){7}(?P<TotalTime>[^~]+)"
|rex field=NameSpace "(?<Version>[^\/])(|\/)$"
|eval Version="V".Version
|stats count as Count, avg(TotalTime) as TotalTime, min(TotalTime) as MinTime, max(TotalTime) as MaxTime, stdev(TotalTime) as STDDEV, perc95(TotalTime) as 95_Percentile ,by _time, Consumer, Domain, Service, Operation, Version

Every once in a while however, I get some duplicate events. Some hours there are no duplicates, but some hours there are. The interesting thing is, on the hours that I see duplicates, I review the job results in the job inspector, and the results look clean! Has anyone run into this issue before? The raw events don't have any duplication in them, but it almost seems like when Splunk is stashing the results in my summary index, it hiccups and adds a few extra duplicate rows.

For example, here are my results from the 9:00 hour:

Service    DetailCount  SummCount   delta
Processor    36          72        36
Profile    55            110          55
ProfileAnd  185         370       185

It is exactly doubling the rows that were entered.

Explorer

I have exactly the same issue. We are running SPlunk enterprise in Rackspace environment and did a lift and shift to AWS. SInce the lift d shift to AWS we have this issue. In AWS we are still using Splunk Enterprise. WE also did an upgrade to splunk 7.0.2 This did not solve our problem. WE hae a search head cluster of 3 servers with 5 indexers. IT is one search whick populates our summary index. Some entries are duplicated and some are not. No idea why. Looking in the summary index i an see that i have multiple splunk_server records. In the past i always had 1 splunk_server for my result set.

0 Karma

Builder

Hmm, try changing the "count" field.. give it a different field name. I had this issue once.. changed "count" to "events" or something and it stopped duplicating.

0 Karma

Builder

Interesting. We will give it a shot. Also testing the same summary indexing using 'collect' vs the summary indexing built into scheduled reporting to see if it makes any difference. (@cjmckenna)

0 Karma

New Member

Changed "Count" to "Events" and it still created dupes.

What is interesting is that we set something up to see what could be causing the issue.

We still have our original scheduled search using the built in summary indexing function as well as a second saved search using |collect that goes to a different index. both are using the EXACT SAME search.

For the 9:00 hour today, the original one had dupes but the other one did not. SAME SEARCH. This is driving me nuts

0 Karma

SplunkTrust
SplunkTrust

I've seen that kind of thing. splunk stats-type commands occasionally get confused between an incoming field called count and the count it is doing itself.

0 Karma

Builder

The interesting thing is that all other values are correct, the only value that is 'doubled' is the "Count" value that is inserted into the summary index. it is as if the "count" command has a glitch that is doubling the value before it inserts to the summary index.

Problem is that it doesn't happen every time.

And just to be sure, we checked the summary index, and for each of the hours, the data is coming from a single search head.

0 Karma

Builder

What search do you use to report on your summary index data ?

0 Karma

Builder

Main Index
index="app_silayer7" "~HTTP://" NOT "~ERR~" |dedup _raw|bucket _time span=5m
|rex field=_raw "^[^~\n]~(?P\w+)"
|rex field=_raw "^(?:[^~\n]
~){2}(?P[^~]+)"
|rex field=_raw "^(?:[^~\n]~){3}(?P[^~]+)"
|rex field=_raw "^(?:[^~\n]
~){4}(?P[^~]+)"
|rex field=_raw "^(?:[^~\n]~){5}(?P[^~]+)"
|rex field=_raw "^(?:[^~\n]
~){6}(?P[^~]+)"
|rex field=_raw "^(?:[^~\n]*~){7}(?P[^~]+)"
|rex field=NameSpace "(?[^\/])(|\/)$"
|eval Version="V".Version
|stats count as Count by _time, Consumer, Domain, Service, Operation, Version

Summary Index
search index="summary_eu" | stats sum(Count) as Count_summary by _time, Consumer, Domain, Service, Operation, Version

Here is how we can see that the counts are dupilcated

0 Karma

New Member

Si I work with Paimon and we have been trying to unravel this issue for a few days now.
Here is exactly what is happening:
Every hour at 20 minutes past the hour the query in the original post executes and summarizes our data from our original index for the previous hour using -1h@h to @h. The output from that scheduled job NEVER has duplicates in it.

What we see in the summary index is that sometimes... not every hour, Splunk decides to duplicate the inserts to the summary index.

Example - Service ABC gets summarized for the hour in 5 minute intervals. When the job runs the output produces 12 "rows" to be inserted in the summary. This is correct at this point. The trouble is that Splunk then takes those 12 entries and puts them in the summary twice so in the summary I get 24 rows.

When you look at the raw events in the summary its clear that every event from the scheduled query was inserted twice in the summary.

Again, this is NOT happening every hour. Its a crap shoot on when Splunk decides to double the entries.

I know folks keep asking for queries, but the duplication is seen even by just looking at the raw events in the summary and not using a query.

0 Karma

SplunkTrust
SplunkTrust

Are you sure those at 9:00 are doubled? I see counts like 55 and 185...

0 Karma

Builder

Yep, so Detail count is from the actual source index, and SummCount is the counts within the summary index

0 Karma

New Member

Each of those rows that Paimon posted are a different service. So the counts are from the original scheduled job that produced the data to be summarized (detail count) and the count after the data was put in the summary index (summcount). The delta is to show the difference in count between the detail and summary to show that the summary is EXACTLY duplicating

0 Karma

SplunkTrust
SplunkTrust

What happens when you run the duplicate-producing query an hour later using -2h@h and -1h@h, or some later time using the appropriate offset? Does it produce duplicates consistently for that time frame?

0 Karma

Builder

So are you saying, to clear out our summary index (no big deal), and force our query to produce results from 8am-9am each time?

My assumption is that you are trying to remove time out of the equation and to see if the same exact result set experiences different behaviors over time?

To rehash: The actual result set never has duplicates, but it seems like the path between 'i got my results' and 'im putting my results in the index', some duplication occurs

Also FYI, different hours were 'duplicated' in the summary index yesterday, just incase we were thinking something strange was happening with time parsing.

0 Karma

SplunkTrust
SplunkTrust

Suppose you ran the report at 7:20 for all events from 6:00 to 7:00 and got duplicate count.

If you run AGAIN at, say, 4:00, for all events from 6:00 to 7:00, do you STILL get duplicate count, or is it correct?

0 Karma