Splunk will look at the first few bytes (256 I think) to compute a CRC for the file (unless you disable this with a CRCSALT ) to determine the "instance" of a logfile. I would suspect that the 10,000 line export would frequently not begin with the same 256 bytes - because all it takes it appending 5 or 6 lines at the bottom to change the first 256 bytes. That would cause you to have potentially 9990+ duplicated lines.
This could be a good use of Splunk's "batch mode" inputs - basically configure a spool directory for Splunk to read-from and discard. The whole file would be read each time, indexed, and deleted. The trick then is to only send whole files of "new" events to Splunk. I've not worked on Z/OS in a long time, but I assume you have Websphere configured to write to the system SPOOL. This Websphere technote appears to describe a way to have Websphere "rotate" its SPOOL occasionally - possibly making all of this easier. http://www-01.ibm.com/support/docview.wss?uid=swg1PK26722
This technote looks useful/relates as well.. http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD103695
Hope this helps
If Z/OS forwarder support is important to you, please make sure to file an Enhancement Request with Splunk support.
... View more