I have an index with start and finish time of user`s workday
27.08.2014 user="userA" weekday="monday" worktime="10.00-18.00"
27.08.2014 user="userB" weekday="monday" worktime="12.00-16.00"
.....
27.08.2014 user="userZZ" weekday="monday" worktime="09.00-16.00"
search for this looks like index=workday |....| table date, user, weekday, worktime
AND I have a csv timetable for every user for every day of week in different index(index=timetable)
user;monday_work_hours;tuesday_work_hours;......;sunday_work_hours
userA; 8.00-12.00;13.00-19.00;..........................................;12.00-16.00
.....
userZ; 7.00-12.00;11.00-19.00;..........................................12.00-19.00
AS a result i need a table like this
date user worktime work_hours
27.08.2014 "userA" "10.00-18.00" 8.00-12.00
now I have only idea about join and case
index=workday | eval....| fields date, user, weekday, worktime | join user [search index=timetable | eval work_hours=case(weekday="monday", monday_work_hours,.....,weekday="sunday", sunday_work_hours)| table date user worktime work_hours
but I have no idea how put weekday value to search in join((
If I will make LOOKUP table from csv table i have no idea how to compare field value and field name
for example for userA weekday="monday" and value in table monday_work_hours=8.00-12.00
Thanks all for help.
Assumptions
I'm going to use two source types, but you could easily swap one or both of them with csv data.
Timetable
I'm going to start with timetable, I indexed the following data as sourcetype timetable
user;monday_work_hours;tuesday_work_hours;wednesday_work_hours;thursday_work_hours;friday_work_hours;saturday_work_hours;sunday_work_hours
userA; 8.00-12.00;13.00-19.00;12.00-16.00;12.00-16.00;12.00-16.00;12.00-16.00;12.00-16.00;
userZ; 7.00-12.00;11.00-19.00;12.00-19.00;;;;7.00-19.00
I then want to get the latest schedule per user. If you're doing a csv, it may only have the current data, but this way I can update one users schedule and get the correct result. If I really wanted to do this, I would use stream stats to line up the two sourcetypes so that I could process this on historical data.
sourcetype=timetable | stats latest(*_work_hours) as * by user
This gives me a table with user and then the days of the week.
Work Day
I then queried to get the work day data (sourcetype=answers_work_day
). I then joined this to the query I previously ran (sourcetype=timetable | stats latest(*_work_hours) as * by user
) joining on user
sourcetype=answers_work_day
| join user [search sourcetype=timetable | stats latest(*_work_hours) as * by user]
Getting Something Useful
The next step is to use the weekday
field from the answers_work_day data to pick the correct work schedule. The way I did this (apologize caffeine hasn't kicked in yet) was to use a really ugly case statement the compares the value of weekday to the known weekdays. You could use lower
to make this a bit more reliable.
[...] | eval scheduledhours=case(
weekday="monday", monday,
weekday="tuesday", tuesday,
weekday="wednesday", wednesday,
weekday="thursday", thursday,
weekday="friday", friday,
weekday="saturday", saturday,
weekday="sunday", sunday
1=1,"fixme")
Finally, I added a table so you'd see just the columns that you want (see below)
Final Solution
sourcetype=answers_work_day
| join user [search sourcetype=timetable | stats latest(*_work_hours) as * by user]
| eval scheduledhours=case(
weekday="monday", monday,
weekday="tuesday", tuesday,
weekday="wednesday", wednesday,
weekday="thursday", thursday,
weekday="friday", friday,
weekday="saturday", saturday,
weekday="sunday", sunday)
| table user weekday worktime scheduledhours
While this solution works, I try to avoid subqueries. If you're honestly using a lookup for timetable, the subquery method would work. If you use join user [ inputlookup "timetable" [...] ]
, then hopefully this meets your needs.
The problem I have is what if you have historical data and schedules have changed over time. Typically you can avoid using sub-queries by using stats
to combined rows. You may be able to use streamstats
then to handle the case of changing schedules, but as I said I haven't had caffeine yet this morning so an obvious solution is escaping me and I've already sank more time then I should have into this answer.
Assumptions
I'm going to use two source types, but you could easily swap one or both of them with csv data.
Timetable
I'm going to start with timetable, I indexed the following data as sourcetype timetable
user;monday_work_hours;tuesday_work_hours;wednesday_work_hours;thursday_work_hours;friday_work_hours;saturday_work_hours;sunday_work_hours
userA; 8.00-12.00;13.00-19.00;12.00-16.00;12.00-16.00;12.00-16.00;12.00-16.00;12.00-16.00;
userZ; 7.00-12.00;11.00-19.00;12.00-19.00;;;;7.00-19.00
I then want to get the latest schedule per user. If you're doing a csv, it may only have the current data, but this way I can update one users schedule and get the correct result. If I really wanted to do this, I would use stream stats to line up the two sourcetypes so that I could process this on historical data.
sourcetype=timetable | stats latest(*_work_hours) as * by user
This gives me a table with user and then the days of the week.
Work Day
I then queried to get the work day data (sourcetype=answers_work_day
). I then joined this to the query I previously ran (sourcetype=timetable | stats latest(*_work_hours) as * by user
) joining on user
sourcetype=answers_work_day
| join user [search sourcetype=timetable | stats latest(*_work_hours) as * by user]
Getting Something Useful
The next step is to use the weekday
field from the answers_work_day data to pick the correct work schedule. The way I did this (apologize caffeine hasn't kicked in yet) was to use a really ugly case statement the compares the value of weekday to the known weekdays. You could use lower
to make this a bit more reliable.
[...] | eval scheduledhours=case(
weekday="monday", monday,
weekday="tuesday", tuesday,
weekday="wednesday", wednesday,
weekday="thursday", thursday,
weekday="friday", friday,
weekday="saturday", saturday,
weekday="sunday", sunday
1=1,"fixme")
Finally, I added a table so you'd see just the columns that you want (see below)
Final Solution
sourcetype=answers_work_day
| join user [search sourcetype=timetable | stats latest(*_work_hours) as * by user]
| eval scheduledhours=case(
weekday="monday", monday,
weekday="tuesday", tuesday,
weekday="wednesday", wednesday,
weekday="thursday", thursday,
weekday="friday", friday,
weekday="saturday", saturday,
weekday="sunday", sunday)
| table user weekday worktime scheduledhours
While this solution works, I try to avoid subqueries. If you're honestly using a lookup for timetable, the subquery method would work. If you use join user [ inputlookup "timetable" [...] ]
, then hopefully this meets your needs.
The problem I have is what if you have historical data and schedules have changed over time. Typically you can avoid using sub-queries by using stats
to combined rows. You may be able to use streamstats
then to handle the case of changing schedules, but as I said I haven't had caffeine yet this morning so an obvious solution is escaping me and I've already sank more time then I should have into this answer.
It really works cool!
Now I am trying to understad how))
If you have questions, please feel free to ask; I'd be more then happy to try and flush out my answer.
You're probably not the only one with the question, so it may help some one else enough that they get it and don't have to ask a question (which is MUCH faster for them since they aren't waiting on some one).
I see unusual thing for me in Splunk
for request
sourcetype=timetable | stats latest(*_hours) as * by user
I see ONLY 6 days for every user(7 columns, fist column is list of users and 6 columns represent timetable for days sorted by alphabet from friday to tuesday -there is no wednesday)
BUT for more precise request
sourcetype=timetable | stats latest(*_work_hours) as * by user
I see the table which consists of 8 columns( fist column is list of users and 7 columns represent timetable)
I assume that it happens because wednesday starts with w according to default Splunk sorting A=>Z wednesday is the last workday but why it is absent in table?
Can you provide a sample of what's in timetable?
It looks like the *
is missing from your query, so I'm assuming the non-working one is:
sourcetype=timetable | stats count(*_hours) as * by user
I haven't been able to reproduce your problem.
With that said, what if you specify all the days (kinda a pain, but since its workdays, its possible):
sourcetype=timetable | stats count(monday_work_hours) as monday count(tuesday_work_hours) as tuesday count(wednesday_work_hours) as wednesday count(thursday_work_hours) as thursday count(friday_work_hours) as friday by user
If that above query is still missing Wednesday, I'd double check the extractions with:
sourcetype=timetable | table user monday_work_hours tuesday_work_hours wednesday_work_hours thursday_work_hours friday_work_hours
Sorry I'm not more helpful; its hard when you can't reproduce a problem.
Sorry, now there are corrected requests which give different results. As it turns asterisk displays here(on portal) when you mark it as code but not in plain text even you see it before press submit button.
timetable is an csv file which looks like
user;monday_work_hours;.....;sunday_work_hours
UserA;11.00-12.00;........;12.00-19.00
...
userZ;09.00;....;12.00-20.00
Thank you for answer now it is not the problem but strange case.
Sure I resolved it by specifying _work_hours instead of _hours