We use several scheduled reports to ensure that we do not have any duplicate events in our indexes.
Our searches look as follows:
index=ng | eval myUniqueId = index + "_" + _cd + "_" + splunk_server | stats count by myUniqueId | where count > 1
AND
index=ng | streamstats count as DuplicateNumber by _raw | search DuplicateNumber>1
Above searches worked fine until splunk 6.2 and did not find any duplicate events for index ng.
We did upgrade Splunk in the last weeks to version 6.3.1 and last week also to version 6.3.3. For Splunk 6.3, the above searches do not return correct results.
If we run above searches for the last 7 days, then I get duplicates e.g. for 04.02.2016 between 00:00 and 02:00 UTC. However, if I limit the same search to search for duplicates on 04.02.2016 between 00:00 and 02:00 UTC, then no duplicates are returned - this doesn't make any sense.
It get's even more confusing: If we afterwards limit the time window of the search to 02.02.2016 00:00 to 05.02.2016 00:00 (which would include the duplicates, that have been detected at 04.02.2016 between 00:00 and 02:00 UTC) I get now only duplicates on 03.02.2016 between 00:00 and 02:00 - but no duplicates on the 04.02.2016. So it looks like there is something seriously broken.
Also if we explicitly search for an event, that according to above searches in Splunk 6.3 is a duplicate, we only find this event once.
After above results, we stopped Splunk 6.3 and moved the index to a different Splunk instance, that has still version 6.2: Above searches worked fine on Splunk 6.2 again - so we do not assume that the index ng is broken somehow.
Can anybody confirm our observations?
Thank you.
This issue has been solved since splunk 6.4.0.
What I miss is some clear indication on Splunk 6.4.0 release notes, that this issue has been solved.
This issue has been solved since splunk 6.4.0.
What I miss is some clear indication on Splunk 6.4.0 release notes, that this issue has been solved.
have you turned on parallelIngestionPipelines?
can you paste the result for following command
./splunk cmd btool server list --debug | grep parallelIngestionPipelines
Here is the result of ./splunk cmd btool server list --debug | grep parallelIngestionPipelines:
/opt/tst-splunk/etc/system/default/server.conf parallelIngestionPipelines = 1
Thanks for your response. I tried to post yesterday as well but not sure why my comments made it to the forum.
I ran your query on splunk 6.3, time picker set to 01.02.2016 to 08.02.2016 and got 7267 output lines (2 of them below):
_raw | _time | count | max(DuplicateNumber)
2016-02-04T00:00:00.063000+00:00 d11a-f02 process1 [22143] SEV="info" | 2016-02-04T00:00:00.063+0000 | 1 | 2
2016-02-04T00:00:00.136000+00:00 d11a-f02 script2 [22393] SEV="info" | 2016-02-04T00:00:00.136+0000 | 1 | 2
All 7267 output lines are on th 04.02.2016. All output lines have count =1 and max(DuplicateNumber) = 2
It looks to me that for all output events the timestam in _raw equals _time
Afterwards I ran your query once again on splunk 6.2, again with timepicker limited to 01.02.2016 to 08.02.2016. I got 0 output lines.
My setup:
I have one server, which has 2 versions of splunk (6.2.1 and 6.3.3) installed.
Each splunk version has been setup as it's own service, using different tcp ports to avoid conflicts.
Each service has it's own indexer, search head and of course it's own data directory.
I wanted to avoid further changing the index ng - so I disabled all input sources for index ng.
Then I copied the index ng from the splunk 6.2 data directory to the splunk 6.3 data directory (both services 6.2 and 6.3 were stopped before) - so both versions have to deal with the same index.
So what I currently fail to understand is why the splunk 6.3 service returns more than 7000 output lines for above query, when at the same time the splunk 6.2 service returns 0 output lines for the very same query and as said before: Both services/versions use the same index.
Yes, both search heads return the same amount for eventcount.
Strange - if they have the same eventcount then it would be the 6.2 Splunkforwarder that is in error. That is there ARE duplicates on your 6.2 indexer but it cannot see them. You probably did this already but if you run this search:
index=ng [search index=ng | streamstats count as DuplicateNumber by _raw,_time| search DuplicateNumber>1 | stats count max(DuplicateNumber) by _raw,_time | fields _raw,_time]
You get 14534 rows (ie you see all originals and the duplicates? And if you sort by the _time or _raw column you can clearly see the duplicate pairs. But on the 6.2 server you get no results for the same search?
If so then it's probably a good thing that you upgraded.
index=ng [search index=ng | streamstats count as DuplicateNumber by _raw,_time| search DuplicateNumber>1 | stats count max(DuplicateNumber) by _raw,_time | fields _raw,_time]
For this search I do get 0 events - on the instance with splunk 6.2 and 6.3 - so this time both versions of splunk agree for above query that there are 0 events - which is strange.
Right afterwards I ran the query from before once again:
index=ng | streamstats count as DuplicateNumber by _raw,_time| search DuplicateNumber>1 | stats count max(DuplicateNumber) by _raw,_time
btw. the date range for all queries was 01.02.2016 to 08.02.2016.
If splunk 6.2 is wrong in telling me that there are no duplicates, as you assume, then I do not understand why I cannot find the duplicates that splunk 6.3 identifies using streamstats.
I took parameters of the events that splunk 6.3 identified with the query
index=ng | streamstats count as DuplicateNumber by _raw,_time| search DuplicateNumber>1 | stats count max(DuplicateNumber) by _raw,_time
e.g. one of the duplicates contains a unique number in the message part.
If I search for this unique number I would expect to get more than 1 event in return, as the query above clearly states that this is a duplicate event.
I searched for this unique number in index ng - date/time picker was set to All time: I could find only one event - so no duplicate events in this index.
In my opinion the splunk 6.3 streamstats function has a bug and returns wrong results OR there was some new feature introduced in splunk 6.3 that has some side effects like the one with streamstats.
Does | eventcount index=ng
return the same value on both search heads?
Looks like my last comment got deleted - I'm not sure why.
If you run this search
index=ng | streamstats count as DuplicateNumber by _raw,_time| search DuplicateNumber>1 | stats count max(DuplicateNumber) by _raw,_time
Do you get this exact output ? or 7267 lines of this output? Are the times different between events or different? Actually if you could post a sample of the search results thta would make things easier
| _raw | _time |count | max(DuplicateNumber) |
| xxxx | 12:00 | 1 | 2 |
What is the cd field in your first search? Also how are you getting a time range in your results for the first search? I'm trying to replicate your result
Sorry, there was something missing due markup language.
This is the correct search definition:
index=ng | eval myUniqueId = index + "_" + _cd + "_" + splunk_server | stats count by myUniqueId | where count > 1
Additional info can be found here: https://answers.splunk.com/answers/334613/is-there-a-unique-event-id-for-each-event-in-the-i.html
Also how are you getting a time range in your results for the first search? --> Not sure if this answers your question. Time range: you just select a specific time range within the search form in splunk, right next to the search button.
Also: Both searches return events and not time ranges. If events are returned, then this means that the returned events can be found more than once in this index within the selected time window.
By time range I mean you said :
If we run above searches for the last 7 days, then I get duplicates e.g. for 04.02.2016 between 00:00 and 02:00 UTC.
How do you know there are duplicates in that timerange for the first search? All you see is the bucket id and the splunk server
I see your point.
You can lookup for the unique ids, that's what I did in the first place.
I've extended the search from before to include the timestamp next to the unique id:
index=ng | eval time=strftime(_time, "%y-%m-%d %H:%M:%S ") | eval myUniqueId = index + "_" + _cd + "_" + splunk_server + "_" + time | stats count by myUniqueId | where count > 1
OK - and so you see duplicate events for this search
index=ng | eval time=strftime(_time, "%y-%m-%d %H:%M:%S ") | eval myUniqueId = index + "_" + _cd + "_" + splunk_server + "_" + time | stats count by myUniqueId,time | where count > 1
But if you change the time picker to that range you see no events? My first thought would be your splunk user has their timezone set to a different time than the _time field for the event. So you see the events when searching a large range, but if you restrict your time picker range you wont see those events as they'll actually be outside it
I thought about different time zones as well - but I doubt that this is the root cause: We use UTC everywhere (client, server, logs).
Still, the duplicates found by this search do not make a lot of sense.
I tried some additional searches, but used now the secondary search, as it visualizes the issue better:
index=ng | streamstats count as DuplicateNumber by _raw | search DuplicateNumber>1
Here are the results and the time picker settings for each run:
- Time picker set for a date range between 01.02.2016 and 08.02.2016: 7281 duplicate events on 04.02.2016 between 00:00 and 02:00
- Time picker set for a date range between 02.02.2016 and 06.02.2016: 7281 duplicate events on 03.02.2016 between 00:00 and 02:00
- Time picker set for a date range between 03.02.2016 and 05.02.2016: 7276 duplicate events on 04.02.2016 between 00:00 and 02:00
- Time picker set for a date range between 04.02.2016 and 04.02.2016: 0 duplicate events
- Time picker set for a date range between 03.02.2016 and 03.02.2016: 0 duplicate events
- Time picker set for a date range between 05.02.2016 and 05.02.2016: 0 duplicate events
- Time picker set for a date range between 04.02.2016 and 06.02.2016: 0 duplicate events
- Time picker set for a date range between 03.02.2016 and 07.02.2016: 7276 duplicate events on 04.02.2016 between 00:00 and 02:00
Well, if the events they duplicate are outside that time range then they will no longer be duplicates - that might be why the count drops to zero
Not sure If I understand correctly. In my opinion duplicates need to be in the same time window - otherwise they are not duplicates at all.
After above tests I copied the index ng to a different splunk instance running splunk 6.2.
I worked through all time picker settings as above and the results for each query was 0.
So there are no duplicates at all in this index.
Is there some new "feature" in splunk 6.3 that causes this strange behavior or is there a general issue with the stats functions?
Not neccessarily - your search only considers the raw field - it is possible that your events are not having their time field extracted correctly and so the _time field is actually when the event was indexed.
If you run this search between 01.02.2016 and 08.02.2016 what do you see?
index=ng | streamstats count as DuplicateNumber by _raw,_time| search DuplicateNumber>1 | stats count max(DuplicateNumber) by _raw,_time
I ran your query with the specified time range on splunk 6.3 and got:
7267 events returned
count is for all returned events 1
max(DuplicateNumber) is for all returned events 2
I ran the same query on the same index with splunk 6.2:
0 events returned
jplumsdaine22 wrote:
When you say you ran the same query on 6.2, you have a 6.2 search head or this is a completely separate indexer?
When you run the search is the output you get?
| _raw | _time |count | max(DuplicateNumber) |
| xxxx | 12:00 | 1 | 2 |
My response:
I have one PC, where I have installed splunk 6.2 side by side with splunk 6.3. 2 services of splunk with different versions, having each of them their own search head, indexer etc. - also using different network ports. All input configurations have been removed so that the index ng does not change anymore.
Each splunk service has it's own data directory and so I copied the index ng from the splunk 6.2 service data directory to the splunk 6.3 data directory.
This way I am able to quickly compare the results between splunk 6.2 and 6.3.
The results look exactly as you expected:
| _raw | _time |count | max(DuplicateNumber) |
| xxxx | 12:00 | 1 | 2 |