i am new to hadoop, so from what I understood: If your data upload is not an actual service of the cluster, which should be running on an edge node of the cluster, then you can configure your own computer to work as an edge node. An edge node doesn't need to be known by the cluster (but for security stuff) as it does not store data nor compute job. This is basically what it means to be an edge-node: it is connected to the hadoop cluster but does not participate. In case it can help someone, here is what I have done to connect to a cluster that I don't administer: get an account on the cluster, say myaccount create an account on you computer with the same name: myaccount configure your computer to access the cluster machines (ssh w\out passphrase, registered ip, ...) get the hadoop configuration files from an edge-node of the cluster get a hadoop distrib (eg. from here) uncompress it where you want, say /home/myaccount/hadoop-x.x add the following environment variables: JAVA_HOME, HADOOP_HOME (/home/me/hadoop-x.x) (if you'd like) add hadoop bin to your path: export PATH=$HADOOP_HOME/bin:$PATH replace your hadoop configuration files by those you got from the edge node. With hadoop 2.5.2, it is the folder $HADOOP_HOME/etc/hadoop also, I had to change the value of a couple $JAVA_HOME defined in conf files. To find them use: grep -r "export.*JAVA_HOME" Then do hadoop fs -ls / which should list the root directory of the cluster hdfs. KBS Training
... View more