Solved: Any options to speed it up a falling behind JMS Me...

maciep · ‎04-16-2018

We are using the JMS Messaging TA to pull messages from IBM MQ queues. Recently, a couple of these queues started receiving production load and the TA is falling behind during peak hours (7a-3p). It does seem to eventually catch up late at night around 10p, but of course that kinda defeats the purpose of "real time" monitoring.

One queue does about 120k messages an hour during peak hours and we're behind by about an hour. The other queue we're typically behind by about 4 hours, which I understand is about 600k messages on the queue. The payloads can be quite large as well (hl7 messages).

I don't have access the queues themselves as another team manages all of that - I just point to where they tell me and use the bindings file they provide. But they want this resolved too, so I can get questions to them if needed. I did notice some errors in the log like the following - not all that often, but they are there.

04-12-2018 13:34:37.867 -0400 ERROR ExecProcessor - message from "python /opt/splunk/etc/apps/jms_ta/bin/jms.py" Stanza jms://queue/:SOME_QUEUE : Error running message receiver : com.ibm.msg.client.jms.DetailedJMSException: JMSWMQ2002: Failed to get a message from destination 'SOME_QUEUE'

The add-on is installed on a heavy forwarder (6.5.2) and here is an example of how one of these queues is configured:

[jms://queue/:Some_Queue]
browse_mode = stats
browse_queue_only = 0
destination_pass = gobbledygook
destination_user = some_queue_user
durable = 0
hec_batch_mode = 0
hec_https = 0
index = some_queue_index
index_message_header = 1
index_message_properties = 1
init_mode = jndi
jms_connection_factory_name = SomeFactoryName
jndi_initialcontext_factory = com.sun.jndi.fscontext.RefFSContextFactory
jndi_provider_url = File:/<path to bindings file>
output_type = stdout
sourcetype = queue:message
strip_newlines = 1

I did increase the java heap to 512MB on line 97 of the jms.py script, but I have no idea if that should make much of a difference (didn't seem to) .

Any other suggestions for increasing performance? Or any way to determine if maybe the problem is on the queue itself and not Splunk? Or possibly related to the HF parsing too slow (not sure the messages would still be on the queue then though).

Also, I'm assuming this is the code used by the add-on, is that correct?
https://github.com/damiendallimore/SplunkModularInputsJavaFramework/blob/master/jms/src/com/splunk/m...

Thanks!

Damien_Dallimor · ‎04-16-2018

To achieve scale you should try this in order :

1) add/clone more JMS input stanzas pulling from the same queue, this will effectively run multiple consumers in multiple threads in the same JMS Modular Input JVM instance , thereby taking advantage of any increased JVM heap limits also

2) add more JMS Modular Inputs deployed out horizontally across multiple Universal Forwarders.

3) a combination of 1 and 2

Please contact us if you require formal support: www.baboonbones.com

View solution in original post

Damien_Dallimor · ‎04-16-2018

To achieve scale you should try this in order :

1) add/clone more JMS input stanzas pulling from the same queue, this will effectively run multiple consumers in multiple threads in the same JMS Modular Input JVM instance , thereby taking advantage of any increased JVM heap limits also

2) add more JMS Modular Inputs deployed out horizontally across multiple Universal Forwarders.

3) a combination of 1 and 2

Please contact us if you require formal support: www.baboonbones.com

maciep · ‎04-18-2018

still trying to determine how many inputs we may need for each queue in total, but adding additional stanzas has been working to improve performance. I may end up deploying across multiple heavy forwarders at some point as well.

Damien_Dallimor · ‎04-18-2018

The answer is "it depends" on the specifics of your environment, message throughput/message size/any pre-processing/available compute resources etc...

The way you are going about it is just fine...incrementally scale up by using approach (1) first.
See what performance improvements you get, keep an eye on CPU and Memory usage.
When you start to max out the performance achievable by approach (1) , then start to look at approach (2) and (3) to continue scaling horizontally as far as you need to reach your performance SLAs.

maciep · ‎04-17-2018

thanks...number 1 seems like a pretty obvious answer now that you say it. I'll give that a go today!

Any options to speed it up a falling behind JMS Messaging Modular Input?

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

What’s New in Splunk Observability – September 2025

Fun with Regular Expression - multiples of nine

Are you a member of the Splunk Community?

Any options to speed it up a falling behind JMS Messaging Modular Input?

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

What’s New in Splunk Observability – September 2025

Fun with Regular Expression - multiples of nine