Hi
I have the following configuration in inputs.conf:
[monitor:///<directory>]
index=results
crcSalt = <SOURCE>
sourcetype = results
My intend was to input data based on the location of the data. But the following command displays duplicates with the same source (location).
... | stats count by source
I want to know how to fix this problem.
Output:
source: count
<directory>/filename1 2
<directory>/filename2 2
<directory>/filename3 2
<directory>/filename4 2
Edit:
There is a workaround, but undesirable because I still have duplicate data.
Workaround:
... | dedup source
Find any outputs.conf
files on your server (which, BTW, is a forwarder) and shows us what is inside them (and where they are). Let's say you have 2 indexers and you have configured to send the same events to each indexer separately. This would cause this problem. You can get more insight on this by modifying your test search to this:
... | stats dc(splunk_server) count by source
I have only four files of outputs.conf:
find ./ -name "outputs.conf"
/etc/modules/distributedDeployment/classes/deployable/outputs.conf
/etc/system/default/outputs.conf
/etc/apps/SplunkLightForwarder/default/outputs.conf
/etc/apps/SplunkForwarder/default/outputs.conf
file at .../classes/deployable:
[tcpout]
disabled=false
# Replace 'YourDeploymentServerHostname' with the ip-address where your deployment server is running.
[tcpout:RouteMetricsToDeploymentServer]
disabled=false
server=YourDeploymentServerHostname:9997
File at /SplunkForwarder/default:
[tcpout]
maxQueueSize = 500kb
forwardedindex.0.whitelist = .*
forwardedindex.1.blacklist = _.*
forwardedindex.2.whitelist = (_audit|_introspection)
forwardedindex.filter.disable = false
File at /SplunkLightForwarder/default:
[tcpout]
forwardedindex.0.whitelist = .*
forwardedindex.1.blacklist = _.*
forwardedindex.2.whitelist = (_audit|_introspection)
forwardedindex.filter.disable = false
File at .../system/default.
[tcpout]
maxQueueSize = auto
forwardedindex.0.whitelist = .*
forwardedindex.1.blacklist = _.*
forwardedindex.2.whitelist = (_audit|_internal|_introspection)
forwardedindex.filter.disable = false
indexAndForward = false
autoLBFrequency = 30
blockOnCloning = true
compressed = false
disabled = false
dropClonedEventsOnQueueFull = 5
dropEventsOnQueueFull = -1
heartbeatFrequency = 30
maxFailuresPerInterval = 2
secsInFailureInterval = 1
maxConnectionsPerIndexer = 2
forceTimebasedAutoLb = false
sendCookedData = true
connectionTimeout = 20
readTimeout = 300
writeTimeout = 300
tcpSendBufSz = 0
ackTimeoutOnShutdown = 30
useACK = false
blockWarnThreshold = 100
sslQuietShutdown = false
[syslog]
type = udp
priority = <13>
dropEventsOnQueueFull = -1
maxEventSize = 1024
... | stats dc(splunk_server) count by source
output:
source: dc(splunk_server) count
<directory>/filename1 1 2
<directory>/filename2 1 2
<directory>/filename3 1 2
<directory>/filename4 1 2
All dc(splunk_server) values are 1 and I haven't made any change in any of those outputs.conf files.
Do you need to include the crcSalt = ? Best practice is to use it only as needed and not leave it set.
Was it always there or did you add it?
That is likely causing the date to be reindexed if the file name is the same.
Try:
your search | eval indextime=strftime(_indextime,"%Y-%m-%d %H:%M:%S")| stats count by source, indextime
Hi
I included crcSalt because all the files are very similar and if Splunk thinks they are the same they will not be indexed in Splunk. crcSalt makes sure that all files with different source(location) are indexed into Splunk. Also if I disable crcSalt then new files that are added to the directory will not be indexed.
... | your command output:
source: indextime count
<directory>/filename1 2015-10-14 14:48:14 1
<directory>/filename1 2015-10-16 10:27:25 1
<directory>/filename2 2015-10-14 14:48:14 1
<directory>/filename2 2015-10-16 10:27:25 1
The output showed that those files were re-indexed the next day causing the problem. I remembered that day I added the crcSalt configuration because I wasn't able to index all the files because of their similarity. Once I added the configuration all files were indexed. Looks like Splunk re-indexed all files even though there were files already indexed with the same SOURCE value.
This means that Splunk will ignored whatever is already indexed if the inputs.conf file is changed. Thanks for your help. Now, how could I solve this issue?
Hi edrivera3, some possible explanations:
Let me know if this helps!
Are you saying that ... | stats count by source
shows that more than one row appears to have the same value for source? That is kind of impossible, due to the nature of stats. So if that is what you're seeing, I suspect there is some tiny tiny difference, possibly as tiny as one of them somehow ended up with a space character after them. Can you click them each to drill down and see what the searchterms yielded are?
Well it is possible. The command is showing events with the same source(location).
The results of the output:
source: count
<directory>/filename1 2
<directory>/filename2 2
<directory>/filename3 2
<directory>/filename4 2
Ah that makes more sense. Sorry I didn't realize that this sourcetype is configured to have the entire file indexed as one event. Muebel's answer has the way to proceed with troubleshooting.