Getting Data In

Why Was My Event Indexed Twice? Same Host, Same File, Same _raw, Same Indextime, different splunk_server

Builder

Hi All;

Noticed something very interesting and I don't seem to find the smoking gun. Today I was alerted by one of my users that he saw some duplicate events, more recently today at 2:20pm. I took a look, and sure enough I was able to find that an event was duplicated. I narrowed the result down to the following search:

index=app_harmony sourcetype=harmony:\*:access "[04/Jan/2017:14:20*" host=cimasked0047 | eval idxtime=_indextime | table _time idxtime host source splunk_server _raw

The results were as follows:

_time   idxtime host    source  splunk_server   _raw
2017-01-04 14:20:02 1483557602  hostname    /usr/local/openresty/nginx/logs/access.log  splunkindex0006 xx.xx.xxx.xxx - - [04/Jan/2017:14:20:02 -0500] "GET /api/harmony/v1/User/G_IAS_6e680df53be6a06f9e11faf40812dc8c?domain=internal HTTP/1.1" 200 317 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36" "-"

2017-01-04 14:20:02 1483557602  hostname    /usr/local/openresty/nginx/logs/access.log  splunkindex0009 xx.xx.xx.xxx - - [04/Jan/2017:14:20:02 -0500] "GET /api/harmony/v1/User/G_IAS_6e680df53be6a06f9e11faf40812dc8c?domain=internal HTTP/1.1" 200 317 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36" "-"

The raw event is exactly the same, as is the index time. The only difference between the two events is our indexer. We have a cluster of 5 indexers with a RF of 3.

I double checked the file, and there isn't more than 1 event at that time.

I also verified that I dont see any random '.filepart' or temp files or anything appearing

the input for that particular file is pretty strict and as follows:

[monitor:///usr/local/openresty/nginx/logs/access.log]
index = app_harmony
sourcetype = harmony:openresty:access

I took a look at the internal logs, and didn't see any errors around that time.

1 Solution

SplunkTrust
SplunkTrust

Actually, you have an unnecessary second tcpout stanza in there. I don't know if that's the problem, but you should remove one tcpout stanza.

Do you use a master server? If so, change your tcpout stanzas to something like this (and the others settings you'd like to have):

[tcpout]
defaultGroup = companyPVSNewIndexers
indexAndForward = false

[tcpout:companyPVSNewIndexer]
indexerDiscovery = YourIndexerCluster
//additional settings for SSL required when enabled

[indexer_discovery:YourIndexerCluster]
pass4SymmKey = YourKey    //defined on master
master_uri = https://YourMasterServer:8089

If you don't use a master (I don't think this is recommended), just merge both tcpout stanzas into one. Not sure about the autoLB settings though.

Edit: typo

View solution in original post

SplunkTrust
SplunkTrust

Actually, you have an unnecessary second tcpout stanza in there. I don't know if that's the problem, but you should remove one tcpout stanza.

Do you use a master server? If so, change your tcpout stanzas to something like this (and the others settings you'd like to have):

[tcpout]
defaultGroup = companyPVSNewIndexers
indexAndForward = false

[tcpout:companyPVSNewIndexer]
indexerDiscovery = YourIndexerCluster
//additional settings for SSL required when enabled

[indexer_discovery:YourIndexerCluster]
pass4SymmKey = YourKey    //defined on master
master_uri = https://YourMasterServer:8089

If you don't use a master (I don't think this is recommended), just merge both tcpout stanzas into one. Not sure about the autoLB settings though.

Edit: typo

View solution in original post

Builder

Oh very interesting! We do have a cluster master. We actually had a consultant set our outputs.conf up that way. I wonder why he did it that way instead of relying on the CM to delegate the indexers

0 Karma

Builder

I think that was the issue, the duplicate stanza! Thanks @skalliger for pointing that one out! Sent a note to the admins to promote as the correct answer.

0 Karma

SplunkTrust
SplunkTrust

Glad it helped you! Just wondering why you reported my comment though, haha.

0 Karma

Builder

Because you hurt my feelings with your answer. Lol no i reported it as 'other' to get the mods attention to promote it to an 'accepted answer'

0 Karma

SplunkTrust
SplunkTrust

Ah, alright. I get way too many Splunk Answers notifications, haha. I converted it. 🙂

0 Karma

Builder

You rock, thanks again.

SplunkTrust
SplunkTrust

I've had a number of issues the the indexer discovery and I prefer not to use it now!

Your duplicate tcpout stanza might be an issue though...that's an easy fix.

0 Karma

Builder

FYI I have tried running the forwarder with useACK=false with no success. Any other ideas would be greatly appreciated as this is still happening for the index 😞

0 Karma

SplunkTrust
SplunkTrust

So just to confirm, same source, sourcetype, index and host is producing duplicate log entries?

If you count the number of lines per hour or similar, does that add up to more than the number of lines within the source file on the server ?

If it is consistently duplicated then it would be worth checking the configuration of your outputs.conf ...

0 Karma

Builder

So here is what I got:

In Splunk-> index=app_harmony source=/usr/local/openresty/nginx/logs/access.log "13/Jan/2017:08:"
events = 4678

Grepping the actual log file : less access.log | grep -o '13/Jan/2017:08' | wc -l
lines = 1080

0 Karma

SplunkTrust
SplunkTrust

If you run a dedup in Splunk so:
index=app_harmony source=/usr/local/openresty/nginx/logs/access.log "13/Jan/2017:08:" | dedup

Do you see much less events ? The numbers are strange you have a lot more than double the number of events in Splunk...also I notice you did not narrow down to a host, are there multiple hosts that this log comes from or just one?

0 Karma

Builder

Ah Jeez!! You hare 100% correct, sorry, retried:

[splkadmn@cmasked0083 logs]$ less access.log | grep "13/Jan/2017:08:" | wc -l
1140

index=app_harmony host=cmasked0083 source=/usr/local/openresty/nginx/logs/access.log "13/Jan/2017:08:"
1,959 events

index=app_harmony host=cmasked0083 source=/usr/local/openresty/nginx/logs/access.log "13/Jan/2017:08:" | dedup _raw
1,140 events

0 Karma

SplunkTrust
SplunkTrust

Interesting, during the time period where you have the duplicates, are there any warning/error messages from the splunkd? crc checksum or duplication errors in particular...

0 Karma

Builder

I wish there were, but nothing in the splunkd on those forwarders 😞

0 Karma

Splunk Employee
Splunk Employee

Does the forwarder have a useACK=true in the outputs.conf?

One of the known side effects of this setting can be data duplication in rare cases.

Search for this message in the _internal splunkd.log on the forwarder:

WARN TcpOutputProc - Possible duplication of events with

0 Karma

Builder

I took another look and did notice that I do uave 'useACK' as true. This is a default on all of our forwarders:

[tcpout]
defaultGroup = companyPVSNewIndexers
# These two options below are required for forwarders when clustering.
# Max queue size ensures that the forwarder has enough of a buffer while
# waiting for the ACK from the indexer; without useACK, the search head
# will spout yellow warning banners in a clustered environment.
maxQueueSize = 7MB
useACK = true

Should I try running without useACK for this one forwarder?

0 Karma

SplunkTrust
SplunkTrust

maybe it's worth checking the setting for the servers in outputs.conf as well, because if they are wrong this can end in duplicated events as well.

0 Karma

Builder

So here is the whole outputs file. To save space on this post I have removed all the settings that have been commented out.

Company BASE SETTINGS

[tcpout]
defaultGroup = companyPVSNewIndexers
# These two options below are required for forwarders when clustering.
# Max queue size ensures that the forwarder has enough of a buffer while
# waiting for the ACK from the indexer; without useACK, the search head
# will spout yellow warning banners in a clustered environment.
maxQueueSize = 7MB
useACK = true

[tcpout]
defaultGroup = companyPVSNewIndexers
indexAndForward = false

# When indexing a large continuous file that grows very large, a universal
# or light forwarder may become "stuck" on one indexer, trying to reach
# EOF before being able to switch to another indexer. The symptoms of this
# are congestion on *one* indexer in the pool while others seem idle, and
# possibly uneven loading of the disk usage for the target index.
# In this instance, forceTimebasedAutoLB can help!
# ** Do not enable if you have events > 64kB **
forceTimebasedAutoLB = true

# Correct an issue with the default outputs.conf for the Universal Forwarder
# or the SplunkLightForwarder app; these don't forward _internal events.
forwardedindex.2.whitelist = (_audit|_introspection|_internal)

[tcpout:companyPVSNewIndexers]
server = masked6.com:9997, masked7.com:9997, masked8.com:9997, masked9.com:9997, masked10.com:9997
autoLB = true
autoLBFrequency = 40

I checked the servers, and they are correct

0 Karma

Builder

Thanks for the quick response. I checked for that yesterday around the time of the duplication, and didn't see anything. Also tried a week long search to verify.

0 Karma