our setup 2 SH, 1 deployment server, 1 license server and 2 indexers , our two indexers are also syslog servers and they read the input file directly from syslog folder for indexing i suspect Splunk is ingesting archive files of syslog data that has already been ingested How do i verify this ?
you can run a search like this
index=your_index | stats count by source
and verify if the eventcount for each source is the same of the file or not.
otherwise you can run
index=your_index | stats count by _raw | where count>1
if there are results, you have duplicated events.
You can also verify if your monitored servers are configured to send to both the hosts or not.
The best way is to have a Load Balancer between monitored servers and Indexers to be sure of syslog flow ingestion.
Splunk can run a syslog server, and it's efficient (if you have quick disks), why do you use a syslog server on your indexers?
I ran index=* | stats count by _raw | where count>1 for last 24hrs and I see more than 4 hundred thousand events.
We do have a f5 load balancer before the indexer/syslog cluster and I do not know why our previous splunk admin set the indexers and syslog servers on same box.
It could not be a problem to have both Indexers and syslog servers on the same machines,it depends only by the load that they have to manage, if you have few events to ingest and index you can leave them on Indexers, if instead you have many events it's surely better to have two dedicated Heavy Forwarders for syslog ingestion.
Anyway per the log duplication, you have to check your F5 configuration and what is the destination host of your monitored servers, maybe they are configurated to directly send to both the Indexers addresses instead Virtual IP.
N.B.: if your satisfied by this answer, please accept or upvote it
From the query and the results that i told you earlier do you mean to say we have duplication of data issue? and I checked the f5 config we have both our indexers mapped to a virtual IP and all the monitored hosts like FWs and switches have this VIP entry in their logging for syslog setup.
Only one additional check, run:
index=* | stats count by host _raw
to verify if the doubled log is sent by one or two hosts (it's really difficoult that the same log is sent by different hosts!).
About the reason of this, the only possibility is that there are two sources for the same data, two sources because F5 sends to both the indexers or because there are two inputs.confs.
Verify F5 configuration.
Then check your inputs configurations to understand if there are more inputs for the same source.
Are your hostnames expressed as IPs?
If you didn't configured hostname run the same search using IP instead host.
The scope of this search is exclude same logs that arrive from different hosts.