I have a common field shared between two events which is a phone number. One event has details about the type of phone model and the other event is a RADIUS accounting start message for the same phone number, these two events occur within seconds of each other.
| transaction phonenum maxpause=5s
I then want to take these two events and combine them with
| transaction which I have done without issue, my problem is I am trying to do a subsequent search to include the accounting stop message along with the first transactions output. Since the original transaction I executed also includes some additional fields, specifically a session_id field I can ensure that the correct stop message is associated with it when it decides to show up.
So basically there are three events in total, but the third event could take hours or days to show up and I don't simply want to do a
| transaction over days due to CPU consumption time.
I have spent a lot of time searching and reading to find the best approach, but haven't really found anything that speaks to my scenario.
My initial thought was to run the first transaction and store the fields I care about into a summary index every 15 mins and then run a second search which tries to
| transaction the new combined event with the last event (accounting stop message), however again as the summary index grows and the stop messages potentially taking days to arrive this could lead to 30day+ searches to correlate the full transaction.
My next thought was to run the first
| transaction and store it into an outputlookup and then do a periodic lookup against this table for just the accounting stop messages which would filter out the sessions that had truly stopped, but since my outputlookup contains a few fields, session_id, handset_model etc since only the session_id matches the accounting stop message the handset_model field which was present in the original outputlookup csv that field isn't retained in the found events.
I then took this same approach but ran it with
| join and managed to get a single event with both the stop message and the handset model present, but I read all over answers that
| join is not a good choice to use so I am now questioning it's use.
So I guess I am looking for some pointers on the best way to tackle this issue, my end goal is to feed a new event that includes a customers start, stop and handset model so I can easily report data usage by handset over long periods of time like 30 days for example and not cripple the box. It would seem I need a combination of scheduled searches with the final cooked event sent to a summary index for reporting purposes.
Would someone be so kind as to give me their 2 cents
1) If a summary index is set up right, (meaning that there's very little data going into a summary index relative to the data going into the main index), then searching over many days might not be bad at all; it will always be several orders of magnitude faster than searches against the main index. (if it's not then you have bigger problems)
So I wouldnt necessarily be scared of searching many days worth of summary data... You can test it out and see what kind of event density you're going to be dealing with in the summary index. It may be fine and if so that's the way to go.
2) The lookup approach could maybe work too although there's a lot of moving parts here. Probably enough that at least one thing would go wrong. I could totally be overthinking this, but here's the moving parts I see --
You're using the lookup as a queue basically, if I understand correctly.
A) So some scheduled search has to append-and-dedup into that lookup. sort of like this:
| inputlookup myQueue | append [ <your search> | transaction phonenum] | stats first(session_id) as session_id, first(handset_model) as handset_model by phonenum | outputlookup myQueue
Then there's the scheduled search that runs against the accounting stop events. you have a lookup on that search that stitches in your old data. Simplistically it looks something like this, and im just using the lookup command cause it's easier to follow than the conf-style:
<accounting stop searchterms> | lookup phonenum myQueue
(I may be missing something but the fact that only one field matches is fine; that's actually perfectly normal for lookups. )
However that search is overly simplistic, because we cannot let this lookup grow without bound. (They start getting ugly up somewhere around the ten-thousands. ) Which means we have to clear the queue, and if the queue's going to be cleared then these nice transactions have to go somewhere more permanent. Ick.
B) Put the completed transactions somewhere more permanent.
<accounting stop searchterms> | lookup phonenum myQueue | search handset_model=*
and that schedule search writes those events to some
C) and then slightly staggered behind that scheduled search, have yet another search that tries to clear the queue lookup.
| inputlookup myQueue | join type=outer phonenum [<accounting stop searchterms>] | where is_null(some_accounting_stop_field) | outputlookup myQueue.
My red light started blinking a while back. It's either the "this is too clever" light or the "I've horribly misunderstood this question" light, or the "ive totally overthought this and someone else will write a better answer".
This sounds like araitz's approach in http://blogs.splunk.com/2011/01/11/maintaining-state-of-the-union/ . He recommends lookups as the "best" way to maintain state over a long period of time - which is what this sounds like.