I have Splunk Universal Forwarder installed on one machine and Splunk Enterprise installed on another machine.
On the machine with Splunk Forwarder I have a process running which produces a logfile in such strange format:
2019-05-03 nodes signals requests 23:50:11 7 56 348 23:51:02 7 31 784 23:52:13 8 24 1022 23:53:15 8 12 98 23:54:11 8 17 34 2019-05-03 nodes signals requests 23:55:07 8 24 123 23:56:10 8 33 211 23:57:03 5 101 215 23:58:11 5 9 213 23:59:01 5 6 211 2019-05-04 nodes signals requests 00:00:06 3 21 115 00:01:12 3 31 304 00:02:03 3 98 215 00:03:19 5 9 213 00:04:01 5 6 34
I want the forwarder to forward this log to the splunk on the other machine, and on the other machine I want to have this log parsed into reasonable events. For instance, I want this line:
23:52:13 8 24 1022
to produce such event:
timestamp: 2019-05-03 23:52:13 nodes: 8 signals: 24 requests: 1022
How can I achieve this effect?
I can quite easily write a Python script which converts this strange format into CSV format, but then I have no good idea how to make Splunk Enterprise or Splunk Forwarder use my script. I know that I can configure a scripted input on the Splunk Forwarder, the Splunk Forwarder would run it periodically, my script would read the whole log file, it would convert it to csv, it would print it and the forwarder would send CSV to the Splunk Enterprise. However, as the log file grows, its beginning stays the same, and my script would read this beginning every time it is executed - so the same events would be sent to the Splunk Enterprise multiple times. So I would have some events duplicated. I could improve my script so it remembers somewhere (for instance, in a database) which events it has already printed and make it not print these events again, but then this script becomes complicated.
I could also use my script in another way: it could run non-stop, tail the log, convert it to csv and write the result to another file - and then I would configure Splunk Forwarder to monitor this other file, produced by my script. But then my script would have to take care of log file rotation, and I would need some mechanism to take care of starting the script again if it is somehow killed - so also this solution is complicated.
Is there any better way to achieve my goal?
An important part of the problem is that the date is in header rows, while hour, minutes and seconds are in normal rows, so I somehow have to combine them in order to get the timestamp. So I think that KV_MODE=multi will not help me and I think I must parse it with some code (for instance, with a Python script).
imho go with the Python script to modify the data, seems to be able to create very simple csv / tsv / psv file here with full timestamp.
while you are at it, you can quietly grunt in annoyance about this weird log format and the poor choices the developers of this format took