I have logs from two different sources in one search. One source provides a time range, while the other provides a time stamp. I am wondering how these could best be matched (with a minimum of processing power).
So one source leads to:
startTime| endTime | userId| task
12:00 | 12:47| 34 | Processing
12:10 | 13:11| 22 | Initiating
12:50 | 12:55| 34 | Cleaning
13:12 | 13:22| 22 | Processing
The other leads to
timestamp | userId | actionStatus
12:05 | 34 | Error
12:20 | 22 | Finished
12:45 | 22 | Error
13:00 | 22 | Error
Both are searched together (the unused fields of the different index are accordingly empty).
So the full table is:
|table startTime, endTime, userId, task, timestamp, userId,actionStatus
Each line of the second batch (with the actionStatus) needs the additional info regarding the task which took place on that userId and at that timeframe.
So the final result should be:
timestamp | userID | actionStatus | task
12:05 | 34 | Error | Processing
12:20 | 22 | Finished | Initiating
12:45 | 22 | Error | Initiating
13:00 | 22 | Error | Initiating
Is there any good way to enrich the data like this? The major complication is that the userId and timestamp need to be compared with the userId and timeframe of another batch. The only rather bad way which comes to mind is breaking up the time spans and timestamps into hourly or minutely brackets and then group by them, but that seems to be quite messy.
| join type=left earlier=true userID [index=yourfirstset|eval _time=startTime ]
| where timestamp > startTime and timestamp | stats values(task) as task by timestamp, userID, actionStatus
Thanks. Unfortuntely the amount of data is too big for that case as an inner search will cap out at 50k lines. This search will likely result in a few hundred thousand if not million lines though.
While it would be possible to reduce that, by adding a more compact main search also as subsearch into the join subsearch to limit the total amount, this would result in a search which exceeds the subsearch time limit.
This is why my idea is to go the route of using just two main searches as in
(index=A fields=values) OR (index=B fields=values)
And then trying to merge them together.