Fellow Splunksters,
I have been able to send data to Splunk via TCP sockets for a while and never had any issues. I switched some of our apps over to using the HTTP event collector via the Python Splunk API, so for example...
import splunklib.client as splunk_client
service = splunk_client.connect(host='127.0.0.1', port=8089, username=<username>, password=<password>)
index = service.indexes['my_index']
index.submit(message, sourcetype='_json', host='local')
I think it is important to note that data coming into my script running the API is MQTT data, as the data is coming (about 1 event every 2 seconds) Splunk is able to index the data just fine. However, if the data stream is interrupted the events are stored until the connection is re-established and all the events flood to the Splunk server. This is when it takes about 10-15 minutes to index anywhere from 150 to 300 events. I certainly do expect some delay just not 15 minutes.
I'm wondering if anybody else has had this issue with the HTTP Event Collector? Is there a more efficient way of indexing data so this doesn't happen? Is a TCP socket faster than the HEC?
I am currently waiting for our IT department to allocate more resource to our Splunk server (such as RAM and CPU cores), so maybe that will help increase performance?
Thanks!
that definitely seems like too long, so not sure if this will help at all but it can't hurt....this is a conf session from 2017 that had some good tuning tips - may be a bit outdated now?
https://conf.splunk.com/files/2017/slides/measuring-hec-performance-for-fun-and-profit.pdf