We have DSE Cassandra v5.0.8 running in multiple node IP's as a cluster setup. And, we have used the datastax provided hunk connector for Cassandra DB to access this data. We were able to configure a single Cassandra node IP in the virtual index configuration when we started using the connector around November last year(It was the latest version available at that time).
Now, we need to set up the configuration in the indexes.conf file or from the UI in such a way that the connector accepts multiple connection points(vix.cassandra.connection.point settings for the virtual index), so that if there is a failure case in one of the nodes, hunk can retrieve the data from the other Cassandra nodes/connection points specified in the virtual indexes configuration. Is this possible in the latest version of the Cassandra connector for hunk? P.S: In the older connector version we are using, we tried giving the multiple node IP's as comma separated values in Vix.cassandra.connection.point field in the configuration settings, but the connection points are not being accepted by hunk. So, Can cluster IP's be provided to conf file in the latest Cassandra Connector versions? If yes, how? Thank you.
I do not believe that option is supported. Here is the workaround I used in the past:
[provider:cassandra_erp1]
vix.family = cassandra_erp_family
vix.cassandra.connection.point = host1
[provider:cassandra_erp2]
vix.family = cassandra_erp_family
vix.cassandra.connection.point = host2
[cassandra_video1]
vix.cassandra.cql.cmd = SELECT JSON * FROM videodb.users
vix.cassandra.datetime.field = created_date
vix.cassandra.max.days.hence = 1000
vix.provider = cassandra_erp1
[cassandra_video2]
vix.cassandra.cql.cmd = SELECT JSON * FROM videodb.users
vix.cassandra.datetime.field = created_date
vix.cassandra.max.days.hence = 1000
vix.provider = cassandra_erp2
In the Splunk search I used something like this:
index=cassandra_video1 OR index=cassandra_video2
I do not believe that option is supported. Here is the workaround I used in the past:
[provider:cassandra_erp1]
vix.family = cassandra_erp_family
vix.cassandra.connection.point = host1
[provider:cassandra_erp2]
vix.family = cassandra_erp_family
vix.cassandra.connection.point = host2
[cassandra_video1]
vix.cassandra.cql.cmd = SELECT JSON * FROM videodb.users
vix.cassandra.datetime.field = created_date
vix.cassandra.max.days.hence = 1000
vix.provider = cassandra_erp1
[cassandra_video2]
vix.cassandra.cql.cmd = SELECT JSON * FROM videodb.users
vix.cassandra.datetime.field = created_date
vix.cassandra.max.days.hence = 1000
vix.provider = cassandra_erp2
In the Splunk search I used something like this:
index=cassandra_video1 OR index=cassandra_video2
Thank you @rdagan. This approach is nice. But proceeding with this would mean I have to compromise with my search performance, right ? What about latency and how does splunk handle it when, say, host1 is up and host2 is down ?
Yes, doing connection OR connection will make two calls to the Cassandra and will bring double the results. Since Cassandra is fast, using something like index=cassandra_video1 OR index=cassandra_video2 | dedup someid might solve the issue.
Another potential solutions are:
1) Ask Datastax, the owner of the Hunk Cassandra ERP to enable this HA option
2) See if Splunk DB Connect with this JDBC driver to Cassandra has this HA option: https://documentation.progress.com/output/DataDirect/jdbccassandrahelp/#page/cassandrahelp%2Fpassing...