I am new to Splunk (6.3) and am interested in knowing a few things in addition to the original question:
A. Assuming I can connect to a locally residing MySQL database (5.7) and extract rows from the database is it more efficient to:
1. Have Splunk operate directly on the results of queries against the database OR
2. Have Splunk operate on the results of the query that are stored as a CSV file on the Splunk Server.
B. How do I estimate (ahead of time) the size of the index that will be created using either method.
Best way to query a database (local or remote) is using Splunk DBconnect (v2). DBconnectv2 will handle pooling and caching etc. It can import the table in block by block basis, so you can test plan before you load whole of the system. (You can operate the database like a lookup if you don't want to index it.)
Unless you can automate the production of the CSV, the export of the CSV from MySQL, and the import into Splunk, then that becomes cumbersome. Also, consider the max limits of a CSV- not sure how big your datasets are.