Getting Data In

Where should I "Upload" a CSV file in a distributed search and indexer clustering environment?

proletariat99
Communicator

I've been wondering this for a while, but haven't found a worthwhile answer in the documentation. I have clustered indexers and distributed search set up (but not search clusters / search head pooling, etc).

For .csv files that I want to index (so that all search heads can get to the data), where should I upload the files? I typically use the WebUI to upload these types of files, so should I be doing that on an indexer or on the index cluster master? I assume that I shouldn't use the search head since only that search head would have access to the file (on the local index only), but I'm not 100% on that.

So where should I upload it?

1 Solution

MillerTime
Splunk Employee
Splunk Employee

You should upload it to one of your indexers - preferably through the web UI so that you can leverage Data Preview.

View solution in original post

bandit
Motivator

I would recommending describing your use case or the why you are trying to do this. There may be a variety of solutions.

0 Karma

bandit
Motivator

I would first ask you if you looking to index these csv files as Splunk is a time series index? i.e. do they have data that is associated with an event in time? This would be more of a log file in csv format. If so, you would just index with a universal forwarder. Search heads also have the forwarding ability if the CSVs reside on one of the search heads.

Are you just using the CSVs to enhance Splunk's knowledge? i.e. a CSV full of web codes that would translate the code 404 to not found. If so, you would likely define the CSVs as a lookup table and may have to script something to update the the lookup table file. i.e. copy an updated version into the appropriate location you have defined. Typically csv/lookup tables are hosted on your search heads within an app.

proletariat99
Communicator

good question. Not really. I am just using Splunk to count things and do some data viz in this giant spreadsheet. I'm not setting it up as a lookup, since it's sort of standalone data.

Another project I'm working on uploads a .csv with "Today's Timestamp" set up in props.conf and monitors the file using inputs.conf and then sends it out via AutoLB=true to ALL my indexers, but this one is just a one-off upload.

On a side note, this is one of my primary complaints about Splunk. It is fantastic at eating time-series data and also reference data, but it doesn't really have a place for regular, slightly messy data. It's almost easier to use Tableau or R or Python to data munge, but then I have to convert everything before I display it in Splunk, which is annoying, so I end up uploading .csv and .xml files into a private index quite often for this purpose.

0 Karma

bandit
Motivator

A lookup can be used for standalone data. It doesn't require another search.
The following example would show all records in a lookup table. You could then use the splunk query language to filter and format.
| inputlookup my_lookup_table.csv

You may also want to check out using a kvstore.
http://dev.splunk.com/view/webframework-features/SP-CAAAEZK

0 Karma

MillerTime
Splunk Employee
Splunk Employee

You should upload it to one of your indexers - preferably through the web UI so that you can leverage Data Preview.

View solution in original post

bandit
Motivator

This would be correct for an all in one Splunk instance where a single instance is performing log monitor, search head and indexer functions, however it may not work for a distributed environment.

You can run data preview from any Splunk instance with a UI running. This implies that you are preparing to index the csv. The typical flow for indexing is to monitor with a forwarder and configure the forwarder to send to one or more indexers.

If just loading a CSV file as lookup table, this would typically be done on your search head instances.

0 Karma