I am trying to run a search where it counts the number of new users who have made purchases in the previous day, and the number of old users who have purchased in the previous day. A sample log is:
"2015-07-27 05:15:34" splunk_index_order=20150727052534, id=1182317, number="R539008334", item_total=1000.00, total=1084.25, state="complete", adjustment_total=14.25, user_id=845817, completed_at=1437974134993, bill_address_id=12502, ship_address_id=12501, payment_total=84.25, shipping_method_id=0, shipment_state="ready", payment_state="paid", email="firstname.lastname@example.org", special_instructions="NULL", created_at=1437973787491, updated_at=1437974145529, currency="USD", last_ip_address="NULL", created_by_id=1178357, oms_id="NULL", campaign_source="NULL", has_gift_box=## NOT SUPPORTED TYPE ##, abandoned_email_sent_at=NULL
Basically, the issue is, the only way to check if a user is distinct is by their userid tag. How can I find the unique userid fields in the past day?
Before anyone writes
... | dc(user_id) | ...
that doesn't work, as it doesn't do a true distinct count because the user could have ordered two days previously or three years previously, and would still show up as a unique user as the time range isn't constricted.
Is this search possible in Splunk? I can't seem to figure it out.
Thanks for any and all answers. 🙂
This should give you what you need to start; let's say you run this for "Last 30 days":
latest=0d@d YourBaseSearchHere | stats values(user_id) AS user_id | eval type=Last30Days | append [search earliest=0@d latest = now YourBaseSearchHere | stats values(user_id) AS user_id | eval type=Yesterday] | stats dc(type) AS numTypes values(*) AS * BY user_id | eval userType=if((numTypes==1), "New User", "Old User") | where type=Yesterday
In this search if a user is in the logs for the last 29 days, he is an "old user" but if he is only today, he is a "new user". This is probably not exactly what you need but you can modify the mechanics easily enough now that you have a starting point.
success! But if you look above at the accepted answer, he does it really nicely and simply instead of in a separate nested search.
Yes, sometimes I answer the question as asked and sometimes I go smarty-pants and answer the question that you should have asked. In this case, since I was already making several broad assumptions, I figured I would stick closer to what you asked. I do thank you for the up-vote, though.
Also do note that you should be able to "Accept" more than one answer if more than one of them works. This would be totally appropriate in this case.
I see you already you have accept an answer but here is another way.
... | eval rt=relative_time(now(),"-1d@d")| eval ts=_time| eval purchasedToday=if(rt=<ts, userID, null()) | purchasedYesterday=(rt>ts, userID, null()) | table userID, purchasedToday, purchasedYesterday | selfjoin userID | stats dc(userID) as TotalDC, dc(purchasedToday) as purchasedTodayDC, dc(purchasedYesterday) as purchasedYesterdayDC
Doesn't work. purchasedYesterday isn't a command is the error thrown. What command did you want before purchasedYesterday?