I'm starting with Splunk Enterprise with the aim of generate CUCM CDR reports. I downloaded all the CDR/CMR June raw files to the Debian system (where I have Splunk installed) and created the Data Input pointing to the folder where I set them (Data inputs » Files & directories) :
Full path to your data Set host Source type Set the destination index Number of files App Status Actions
/home/user/CUCMCDRJun2016 Constant Value cucmcdr default 1888 ciscocdr Enabled | Disable Delete
So, I have these questions:
1) The number of files is increasing. ¿Does it mean that Splunk will parse all the 17069 files/ 2.2Gb within the folder before I can generate any report?
2) As I want to report only 3 extensions, ¿Could be another more efficient way of declaring the data input of CDRs that match the calling/called fields only for those extensions?
3) Can Splunk parse and manage CDR folders of 17069 folder files/ 2.2Gb in this way? The Cisco CDR Reporting and Analytics can locate and use CDRs stored in the OS?
1) No - if somehow only a portion of those files are indexed, your reports would pick up call data in those files. In healthy circumstances splunk will index the files within seconds. (As to unhealthy circumstances, read on below)
2) As to indexing JUST the fields you need.... you'd have to write a scripted input, or try your hand at a SED command that would parse commas and surgically delete everything except those columns. This would be tricky and possibly quite brittle. Also the app would not function correctly fwiw. You'd need to at least retain a certain set of core fields for it to not freak out.
3) Yes of course. Although best practices these days are for a host to receive the CDR/CMR files from CUCM via SFTP and then forward to Splunk via a Splunk Universal Forwarder, for standalone indexer deployments you can absolutely index the files right from the local filesystem like this (see below for more details)
The best thing to do when you're setting up our app is to follow the instructions and use the data input setup wizard. In particular if you're on a standalone indexer the wizard has a step that will create the data inputs for you automatically.
However, it's worth noting that the data input we create is a destructive one - it will delete the CDR and CMR files from the local filesystem as we index them. We do this because
a) in the larger context, we assume you're following the overall setup docs and these files have been sent to this host by CUCM via SFTP, just to be indexed in Splunk.
b) the more normal "monitor" inputs behave very badly when asked to monitor tens of thousands and hundreds of thousands of files.
We used to have everyone set up monitor inputs and then set up a script to run once a day and delete any file older than 3 days. This is easy and works great. The problem is that we saw a LOT of cases where organizational changes or IT changes caused those scripts to break, and within a month the Splunk instance is spending almost all of its resources monitoring hundreds of thousands of files for appended changes that will never come.
After about ten of these disasters we switched and now the app and the docs firmly recommend sinkhole inputs.
Anyway, I'm happy to help get the app set up either with sinkhole inputs OR monitor inputs.