With the Splunk Add-on for Kafka, where can I see the consumer lag?
It appears that the consumer offset is not stored in Kafka or Zookeeper. Where is it stored? How can I tell how far behind (if at all) Splunk is in reading messages from a topic?
Aside:
The Splunk Modular Input for Kafka stores the offset in Kafka or Zookeeper so that the Kafka tools can be used to find the consumer offset. However, that input seems MUCH slower (20x) than Splunk Add-on for Kafka
Please answer:
With the Splunk Add-on for Kafka, where can I see the consumer lag?
I should not have mentioned The Splunk Modular Input for Kafka in the same question.
However, that input seems MUCH slower (20x)
Well , perhaps you have not configured your setup for https://splunkbase.splunk.com/app/1817/ optimally/correctly for performance ,ie: multi threading ,memory limits, data output to Splunk , horizontal scalability etc... are some things that come to mind.
Multi Threading : to achieve this you setup up multiple Kafka stanzas in inputs.conf
pointing to the same topic. Each stanza will run in it's own thread in the same JVM Mod Input instance
#global settings
[kafka]
group_id = my_test_group
index = main
sourcetype = kafka
topic_name = test
zookeeper_connect_host = localhost
zookeeper_connect_port = 2181
zookeeper_session_timeout_ms = 400
zookeeper_sync_time_ms = 200
output_type = stdout
#each polling thread , all run in the same JVM instance, can inherit or override global settings
[kafka://kafka_test_thread_1]
disabled = 0
[kafka://kafka_test_thread_2]
disabled = 0
Memory Limits : you can increase the JVM memory in kafka_ta/bin/kafka.py
Data Output to Splunk : you can override STDOUT and use HEC for better "Mod Input to Splunk" pipeline throughput
Multi Processes : you can scale out multiple Mod Input instance horizontally for running in multiple processes (or many multi threaded instances in many multi processes)
[root@splunk-xl kafka_mod]# pwd
/opt/splunk/var/lib/splunk/modinputs/kafka_mod
[root@splunk-xl kafka_mod]# ls -la
total 128
drwx------ 2 root root 4096 Aug 1 21:05 .
drwx------. 4 root root 4096 Aug 1 19:08 ..
-rw------- 1 root root 82 Aug 1 21:05 a2Fma2EwOjkwOTIsa2Fma2ExOjkwOTIsa2Fma2EyOjkwOTI=_logs_0
-rw------- 1 root root 82 Aug 1 21:05 a2Fma2EwOjkwOTIsa2Fma2ExOjkwOTIsa2Fma2EyOjkwOTI=_logs_1
-rw------- 1 root root 83 Aug 1 21:05 a2Fma2EwOjkwOTIsa2Fma2ExOjkwOTIsa2Fma2EyOjkwOTI=_logs_10
[...]
[root@splunk-xl kafka_mod]# cat a2Fma2EwOjkwOTIsa2Fma2ExOjkwOTIsa2Fma2EyOjkwOTI=_logs_0
{"kafka_partition": 0, "kafka_topic": "logs", "kafka_partition_offset": 306627169}
Memory Limits - did that
Data Output to Splunk - getting BAD REQUEST from HEC because it doesn't handle some characters properly (fails every time when generating messages with heka-flood in ascii mode or kafka-consumer-perf-test) so cannot test with HEC
Multi Processes - did that
Getting 400 msg/ sec vs 17000 with splunk add-on
I have not seen any case studies which show what kind of performance can be expected with either Kafka consumer.
Did you setup Multi Threading correctly ? Memory boosting is intrinsically related to this.
Multi Processing ... I don't see how you could have deployed this correctly and not seen throughput increase.And I have seen countless deployments of this App.
"...getting BAD REQUEST from HEC because it doesn't handle some characters properly..." , can't help here unless you can provide more concise information to diagnose. Can you elaborate on "some characters" ?
Note : This is related to Splunk Modular Input for Kafka. I am not familiar with Splunk Add-on for Kafka.
Here is a capture of traffic between the Splunk Modular Input for Kafka and HEC. The quote before C3M is causing the json not to lint. This was generated from heka-flood with the ascii-only option on. Unfortunately, I did not capture the traffic from heka-flood to kafka or kafka to Splunk Modular Input for Kafka. Splunk Modular Input for Kafka fails consistently with traffic from heka-flood or the kafka provided consumer test.
POST /services/collector HTTP/1.1^M
Authorization: Splunk C28170BC-215A-44A8-8229-B05D708ECDD3^M
Content-Length: 425^M
Content-Type: application/json; charset=UTF-8^M
Host: localhost:8088^M
Connection: Keep-Alive^M
User-Agent: Apache-HttpAsyncClient/4.1 (Java/1.7.0_101)^M
^M
{"event":"Tue Jul 19 20:34:48 UTC 2016 name=kafka_msg_received event_id= msg_body={\"logger\":\"\",\"type\":\"logfile\",\"tags\":[\"ih\",\"services\"],\"host\":\"pod.test.heka-sq05z\",\"message\":\"hekabench: pod.test.heka-sq05z - `U)G\\7{Cu| KYih@_E@@TU*:|cP\\"C3M <_M.r:B58eRNT![IbBoN(.7[6J,:w\/1\/+c+rZ\\#;Pi5>\\-\\yjo*7us<8.LHP4vb'0`3W\"}","source":"kbrown2-kafka-logs","time":"1468960488565","sourcetype":"infinitehome"}è<8e><8e>W-º^H^@m^A^@^@m^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^H^@E^@^A_-Á@^@@^F^MÖ^?^@^@^A^?^@^@^A^_<98>©äàVý<89>=ï <80>^X^D^@ÿS^@^@^A^A^H
^EÍ´<84>^EÍ´<80>HTTP/1.1 400 Bad Request^M
Date: Tue, 19 Jul 2016 20:34:48 GMT^M
Content-Type: application/json; charset=UTF-8^M
X-Content-Type-Options: nosniff^M
Content-Length: 64^M
Connection: Keep-Alive^M
X-Frame-Options: SAMEORIGIN^M
Server: Splunkd^M
^M
{"text":"Invalid data format","code":6,"invalid-event-number":0}
Can you share your inputs.conf so I can guide you on how to setting up multithreading correctly , which will then determine JVM heap size.