I've been wondering this for a while, but haven't found a worthwhile answer in the documentation. I have clustered indexers and distributed search set up (but not search clusters / search head pooling, etc).
For .csv files that I want to index (so that all search heads can get to the data), where should I upload the files? I typically use the WebUI to upload these types of files, so should I be doing that on an indexer or on the index cluster master? I assume that I shouldn't use the search head since only that search head would have access to the file (on the local index only), but I'm not 100% on that.
So where should I upload it?
You should upload it to one of your indexers - preferably through the web UI so that you can leverage Data Preview.
I would recommending describing your use case or the why you are trying to do this. There may be a variety of solutions.
I would first ask you if you looking to index these csv files as Splunk is a time series index? i.e. do they have data that is associated with an event in time? This would be more of a log file in csv format. If so, you would just index with a universal forwarder. Search heads also have the forwarding ability if the CSVs reside on one of the search heads.
Are you just using the CSVs to enhance Splunk's knowledge? i.e. a CSV full of web codes that would translate the code 404 to not found. If so, you would likely define the CSVs as a lookup table and may have to script something to update the the lookup table file. i.e. copy an updated version into the appropriate location you have defined. Typically csv/lookup tables are hosted on your search heads within an app.
good question. Not really. I am just using Splunk to count things and do some data viz in this giant spreadsheet. I'm not setting it up as a lookup, since it's sort of standalone data.
Another project I'm working on uploads a .csv with "Today's Timestamp" set up in props.conf and monitors the file using inputs.conf and then sends it out via AutoLB=true to ALL my indexers, but this one is just a one-off upload.
On a side note, this is one of my primary complaints about Splunk. It is fantastic at eating time-series data and also reference data, but it doesn't really have a place for regular, slightly messy data. It's almost easier to use Tableau or R or Python to data munge, but then I have to convert everything before I display it in Splunk, which is annoying, so I end up uploading .csv and .xml files into a private index quite often for this purpose.
A lookup can be used for standalone data. It doesn't require another search.
The following example would show all records in a lookup table. You could then use the splunk query language to filter and format.
| inputlookup my_lookup_table.csv
You may also want to check out using a kvstore.
http://dev.splunk.com/view/webframework-features/SP-CAAAEZK
You should upload it to one of your indexers - preferably through the web UI so that you can leverage Data Preview.
This would be correct for an all in one Splunk instance where a single instance is performing log monitor, search head and indexer functions, however it may not work for a distributed environment.
You can run data preview from any Splunk instance with a UI running. This implies that you are preparing to index the csv. The typical flow for indexing is to monitor with a forwarder and configure the forwarder to send to one or more indexers.
If just loading a CSV file as lookup table, this would typically be done on your search head instances.