Getting Data In

Where should I "Upload" a CSV file in a distributed search and indexer clustering environment?

proletariat99
Communicator

I've been wondering this for a while, but haven't found a worthwhile answer in the documentation. I have clustered indexers and distributed search set up (but not search clusters / search head pooling, etc).

For .csv files that I want to index (so that all search heads can get to the data), where should I upload the files? I typically use the WebUI to upload these types of files, so should I be doing that on an indexer or on the index cluster master? I assume that I shouldn't use the search head since only that search head would have access to the file (on the local index only), but I'm not 100% on that.

So where should I upload it?

1 Solution

MillerTime
Splunk Employee
Splunk Employee

You should upload it to one of your indexers - preferably through the web UI so that you can leverage Data Preview.

View solution in original post

bandit
Motivator

I would recommending describing your use case or the why you are trying to do this. There may be a variety of solutions.

0 Karma

bandit
Motivator

I would first ask you if you looking to index these csv files as Splunk is a time series index? i.e. do they have data that is associated with an event in time? This would be more of a log file in csv format. If so, you would just index with a universal forwarder. Search heads also have the forwarding ability if the CSVs reside on one of the search heads.

Are you just using the CSVs to enhance Splunk's knowledge? i.e. a CSV full of web codes that would translate the code 404 to not found. If so, you would likely define the CSVs as a lookup table and may have to script something to update the the lookup table file. i.e. copy an updated version into the appropriate location you have defined. Typically csv/lookup tables are hosted on your search heads within an app.

proletariat99
Communicator

good question. Not really. I am just using Splunk to count things and do some data viz in this giant spreadsheet. I'm not setting it up as a lookup, since it's sort of standalone data.

Another project I'm working on uploads a .csv with "Today's Timestamp" set up in props.conf and monitors the file using inputs.conf and then sends it out via AutoLB=true to ALL my indexers, but this one is just a one-off upload.

On a side note, this is one of my primary complaints about Splunk. It is fantastic at eating time-series data and also reference data, but it doesn't really have a place for regular, slightly messy data. It's almost easier to use Tableau or R or Python to data munge, but then I have to convert everything before I display it in Splunk, which is annoying, so I end up uploading .csv and .xml files into a private index quite often for this purpose.

0 Karma

bandit
Motivator

A lookup can be used for standalone data. It doesn't require another search.
The following example would show all records in a lookup table. You could then use the splunk query language to filter and format.
| inputlookup my_lookup_table.csv

You may also want to check out using a kvstore.
http://dev.splunk.com/view/webframework-features/SP-CAAAEZK

0 Karma

MillerTime
Splunk Employee
Splunk Employee

You should upload it to one of your indexers - preferably through the web UI so that you can leverage Data Preview.

bandit
Motivator

This would be correct for an all in one Splunk instance where a single instance is performing log monitor, search head and indexer functions, however it may not work for a distributed environment.

You can run data preview from any Splunk instance with a UI running. This implies that you are preparing to index the csv. The typical flow for indexing is to monitor with a forwarder and configure the forwarder to send to one or more indexers.

If just loading a CSV file as lookup table, this would typically be done on your search head instances.

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...