I'm running Splunk 6.2 and trying to connect Hive through DB connect. I am able to add the DB type in (db_connection_types.conf) but i am not able to create any connection successfully. It looks like splunk makes some calls to hive JDBC but the method is not there in the hive driver class. Below are the exceptions,
While validating connection - Validating connection with URL [jdbc:hive2://myhiveserver:10000/default] failed: java.sql.SQLException:,Method not supported
While executing Query - External search command 'dbxquery' returned error code 1. Script output = "RuntimeError: Failed to run query: "SELECT * FROM (select * from mytable limit 1 ) t", caused by: AvroRemoteException(u'java.sql.SQLException: java.sql.SQLException:java.sql.SQLException: Method not supported',). "
Below are my configutations,
[hiveServer2] displayName = Hive Server 2 serviceClass = com.splunk.dbx2.DefaultDBX2JDBC jdbcDriverClass = org.apache.hive.jdbc.HiveDriver port = 10000 jdbcUrlFormat = jdbc:hive2://<host>:<port>/<database> testQuery = select * from table limit 1 database = default useConnectionPool = false #Got connection pool error without above settings
[myHive] connection_type = hiveServer2 database = default host = myhiveserver identity = hiveCred jdbcUrlFormat = jdbc:hive2://<host>:<port>/<database> jdbcUseSSL = 0 port = 10000 readonly = 0 disabled = 0
The versions of different tools are,
Splunk - 6.2 and 6.4
DB Connect v2 - 2.2.0
Java - jdk1.8.0_66
Hadoop - Hortonworks Hadoop 18.104.22.168.3.4.0-3485
Hive - 22.214.171.124.3.4.0-3485
Command ran - | dbxquery connection=myHive query="select * from table limit 1 "
Also i verified that i could execute the below command in beeline and run queries,
beeline -u jdbc:hive2://myhiveserver:10000/default
Any suggestion on the next steps would be helpful.
Hi. Did you ever get this to work?
I think I am running into this with Hive 1.2. Apparently it was fixed in 2.0 but I don't have the option of running that right now.
Actually, in the end, in order to get the Apache JDBC driver to work with us for Hive 1.2 we had to do the following.
1) readonly=0 as discussed above
2) there were at least three other methods that had be manually patched. If you would like to collaborate on this, let me know.
The GOOD news is that when I did end up using the JDBC driver for data that is not well partitioned I got much better performance.
We have a minimal working set of changes but some features are missing such as the ability to kill running jobs and some field names with null table names. There are work arounds for some of these issues such as the shortnames=1 to remove the table name, but the current status is not production ready.
We are working to push the Hive JDBC changes upstream to Apache.
Anyone who is interested in collaborating with me on testing and improving, please send mail to my last name at yahoo-inc.com. You should already be able to recompile Hive from source in order to apply these changes.
The blog neatly explained different steps required to establish hive connection using DB Connect 1. But it didn't help me out because of the below reasons,
1) Wanted to use DB Connect v2 as it the supported version going forward (V1 support lifecycle ends on July 29, 2016 and works only till splunk 6.3). I don't want this to be a problem while upgrading splunk.
2) We are using hortonworks hiveserver 2 whereas the blog talks about cloudera. I have taken the hortonworks hive jdbc driver but I'm getting the "Method not supported" exception.