I'm looking for a bit of advice on how to extract Websphere logs from an IBM Z10 Mainframe running z/OS.
The logs don't actually get written to a file, they are stored in a 'console'. To get these logs out, we can run a script to export them to file.
Due to the size & number of the logs, our z/OS administrators are reluctant to download the whole files regularly due to the overhead it will have on the mainframe, so their solution is to export the last 10000 lines every hour to reduce this overhead.
If we do this, each log file that gets exported will be overwritten each time, and the file will only ever be a maximum of 10000 lines.
I understand that we may miss data doing this, but my main concern is how Splunk will handle the log file being overwritten each time. In most cases, the overwritten log will contain lines that were there in the previous hour (but not all of them), and a whole heap of new lines.
From my understanding, Splunk will try to work out where it was up to previously, but I don't think it will be able to as the start of the file has changed. Therefore, it would just re-index the whole log file, causing duplicate data to be stored in Splunk.
So, these are my questions:
1. Is my understanding above correct?
2. Is there any way of getting Splunk to recognise the duplicate data before indexing?
3. Has anyone else been able to successfully obtain logs from z/OS and index them into Splunk? I've seen a couple of posts in Answers about this, but there doesn't seem to be a definitive way of doing it. It would be great if there was a mainframe agent.
Splunk will look at the first few bytes (256 I think) to compute a CRC for the file (unless you disable this with a CRCSALT) to determine the "instance" of a logfile. I would suspect that the 10,000 line export would frequently not begin with the same 256 bytes - because all it takes it appending 5 or 6 lines at the bottom to change the first 256 bytes. That would cause you to have potentially 9990+ duplicated lines.
This could be a good use of Splunk's "batch mode" inputs - basically configure a spool directory for Splunk to read-from and discard. The whole file would be read each time, indexed, and deleted. The trick then is to only send whole files of "new" events to Splunk. I've not worked on Z/OS in a long time, but I assume you have Websphere configured to write to the system SPOOL. This Websphere technote appears to describe a way to have Websphere "rotate" its SPOOL occasionally - possibly making all of this easier. http://www-01.ibm.com/support/docview.wss?uid=swg1PK26722