Hello, I am working with historical log data from a train system and I have two different types of log files: log1: each row is an event that was logged every time a train arrived at a station. log2: each row is an event that was logged every time a train station sign displayed a message. The messages predicted how many minutes it will take for the next train to arrive. My goal: Add a new column to the log2 events which is a calculated field. The calculation is the difference between the predicted arrival time (time of log2 event + predicted minutes until arrival) and the actual arrival time (time of the log1 event which corresponds to the log2 event). My proposed solution: For each log1 row, generate a table of all the corresponding log2 rows. Corresponding means: The log2 timestamp is within a 30 minute time range BEFORE the log1 timestamp. The log2 "next_serial" field is equal to the log1 "serial" field. The log2 "platform" field is equal to the log1 "platform" field. In other words, for every train that arrived, I will list all the logged train station sign messages that predicted the arrival time of that train. After creating the table of log2 events specified above, I will calculate the new field, which I will call "prediction_deviation", for all the log2 rows in the table. I am able to do this because I now know which log1 event all the rows in this table correspond with. Question: Can someone please help me implement my proposed solution, or suggest a better method to create this calculated field? Which SPL commands can I use? I have been looking at the documentation for the stats, eval, and transaction commands, but I have not found a way to use these to solve my problem. I would appreciate any help or guidance. Thank you.
... View more