I have a registration log and a session log. When performing a search against the session log, I would like to know if a user is registered or not. I know that I can search both logs, but this seems inefficient since I would have to search the registration log for all time to know if a user has registered or not.
Another way would be to maintain a lookup table with registered users to determine registration status. The lookup table would have millions of records. Is this efficient? It would also contain most the same data the registration log (user_id, registration_date).
Is there a best practice for this type of search?
If you have millions of users, I assume you have that information in a database somewhere. Rather than maintain a copy of that in a lookup file, check out this splunk app: https://splunkbase.splunk.com/app/2686/ . It will let you configure a lookup that uses your database, so you can easily check a users registration status.
Thanks, I currently use this app for other searches, but I wasn't sure if it would be efficient to use in this case since there would be many queries against the db. Perhaps this isn't an issue and this is the best way to handle this situation. Thoughts?
It's a bit hard to say without knowing more about what you want to do. From what you're saying you have millions of users. As such I am guessing that you have hundreds of millions, possibly billions, of events.
If you want to run millions of db lookups then yes it will be slow. I think you would be better off extrapolating the data another way - for instance, if a user can login then they are registered? As I said its hard to say without knowing what your data looks like.
You're correct that it's probably better off to extrapolate the data in a different way, which is what I've ended up doing for now. However, I'm not really satisfied with this solution. Perhaps doing this in Splunk is not the best approach and maybe I should combine this with an ETL process to another database for this type of analysis. Determining registration was just one example, there is a variety of additional business data that I would like to combine as well.
In terms of efficiency if you construct your search in a clever way db look ups should be fine.
For example, say you want search user sessions for your top ten users. You then run a db query lookup against your registration database for those ten users. Ten queries should be trivial.
Have a look at the db connect app: https://splunkbase.splunk.com/app/2686/
If your data has millions of records, then definitely it is not advisable to use lookups. Try joins !
I thought joins were to be avoided in Splunk? Also, wouldn't that require a subsearch which has a result limit?
Do you have an example?