My client uses an email solution that produces a log for each step in email processing, hence, we have a variable number of logs for each email sent or received.
In order to work with the received data and build queries around it, we use the transaction command of Splunk to aggregate logs by email ID.
The problem is that this is extremely heavy computational task and searches on long timeranges cannot be executed.
I'm thinking of using Kafka Streams to aggregate the logs by email ID before sending them to Splunk, I started by coding and understanding the Java code offered by Apache and I have some difficulties:
First, is this feasible? Has anybody here successfully achieved that?
How can I assign the extracted email ID from the log as key in a KTable?
How can I manage 'windowing' the time to wait for logs with same email ID?
I find the Apache examples a bit short and I'm having hard time figuring out where to start to learn and build my own Java Kafla Stream app, if anyone can help me with this, it would be highly appreciated.