Solved: Where should I "Upload" a CSV file in a distribute...

proletariat99 · ‎04-06-2015

I've been wondering this for a while, but haven't found a worthwhile answer in the documentation. I have clustered indexers and distributed search set up (but not search clusters / search head pooling, etc).

For .csv files that I want to index (so that all search heads can get to the data), where should I upload the files? I typically use the WebUI to upload these types of files, so should I be doing that on an indexer or on the index cluster master? I assume that I shouldn't use the search head since only that search head would have access to the file (on the local index only), but I'm not 100% on that.

So where should I upload it?

MillerTime · ‎04-06-2015

You should upload it to one of your indexers - preferably through the web UI so that you can leverage Data Preview.

View solution in original post

bandit · ‎04-06-2015

I would recommending describing your use case or the why you are trying to do this. There may be a variety of solutions.

bandit · ‎04-06-2015

I would first ask you if you looking to index these csv files as Splunk is a time series index? i.e. do they have data that is associated with an event in time? This would be more of a log file in csv format. If so, you would just index with a universal forwarder. Search heads also have the forwarding ability if the CSVs reside on one of the search heads.

Are you just using the CSVs to enhance Splunk's knowledge? i.e. a CSV full of web codes that would translate the code 404 to not found. If so, you would likely define the CSVs as a lookup table and may have to script something to update the the lookup table file. i.e. copy an updated version into the appropriate location you have defined. Typically csv/lookup tables are hosted on your search heads within an app.

proletariat99 · ‎04-06-2015

good question. Not really. I am just using Splunk to count things and do some data viz in this giant spreadsheet. I'm not setting it up as a lookup, since it's sort of standalone data.

Another project I'm working on uploads a .csv with "Today's Timestamp" set up in props.conf and monitors the file using inputs.conf and then sends it out via AutoLB=true to ALL my indexers, but this one is just a one-off upload.

On a side note, this is one of my primary complaints about Splunk. It is fantastic at eating time-series data and also reference data, but it doesn't really have a place for regular, slightly messy data. It's almost easier to use Tableau or R or Python to data munge, but then I have to convert everything before I display it in Splunk, which is annoying, so I end up uploading .csv and .xml files into a private index quite often for this purpose.

bandit · ‎04-06-2015

A lookup can be used for standalone data. It doesn't require another search.
The following example would show all records in a lookup table. You could then use the splunk query language to filter and format.
| inputlookup my_lookup_table.csv

You may also want to check out using a kvstore.
http://dev.splunk.com/view/webframework-features/SP-CAAAEZK

MillerTime · ‎04-06-2015

You should upload it to one of your indexers - preferably through the web UI so that you can leverage Data Preview.

bandit · ‎04-08-2015

This would be correct for an all in one Splunk instance where a single instance is performing log monitor, search head and indexer functions, however it may not work for a distributed environment.

You can run data preview from any Splunk instance with a UI running. This implies that you are preparing to index the csv. The typical flow for indexing is to monitor with a forwarder and configure the forwarder to send to one or more indexers.

If just loading a CSV file as lookup table, this would typically be done on your search head instances.

Where should I "Upload" a CSV file in a distributed search and indexer clustering environment?

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

Are you a member of the Splunk Community?

Where should I "Upload" a CSV file in a distributed search and indexer clustering environment?

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...