Splunk Search

How to join field name and field value?

andrey2007
Contributor

I have an index with start and finish time of user`s workday

27.08.2014 user="userA" weekday="monday" worktime="10.00-18.00"
27.08.2014 user="userB" weekday="monday" worktime="12.00-16.00"
.....
27.08.2014 user="userZZ" weekday="monday" worktime="09.00-16.00"
search for this looks like index=workday |....| table date, user, weekday, worktime

AND I have a csv timetable for every user for every day of week in different index(index=timetable)

user;monday_work_hours;tuesday_work_hours;......;sunday_work_hours
userA; 8.00-12.00;13.00-19.00;..........................................;12.00-16.00
.....
userZ; 7.00-12.00;11.00-19.00;..........................................12.00-19.00

AS a result i need a table like this
date user worktime work_hours
27.08.2014 "userA" "10.00-18.00" 8.00-12.00

now I have only idea about join and case

index=workday | eval....| fields date, user, weekday, worktime | join user [search index=timetable | eval work_hours=case(weekday="monday", monday_work_hours,.....,weekday="sunday", sunday_work_hours)| table date user worktime work_hours
but I have no idea how put weekday value to search in join((

If I will make LOOKUP table from csv table i have no idea how to compare field value and field name
for example for userA weekday="monday" and value in table monday_work_hours=8.00-12.00

Thanks all for help.

1 Solution

triest
Communicator

Assumptions

  1. You say you will make a lookup table from a csv table, but then in the example query you posted you have index=timetable
  2. That what you want out of this is to list the data in work day and append the timetable data
  3. That the time table data represents the scheduled time (really just affects that way I named things in the final solution)

I'm going to use two source types, but you could easily swap one or both of them with csv data.

Timetable

I'm going to start with timetable, I indexed the following data as sourcetype timetable

user;monday_work_hours;tuesday_work_hours;wednesday_work_hours;thursday_work_hours;friday_work_hours;saturday_work_hours;sunday_work_hours
userA; 8.00-12.00;13.00-19.00;12.00-16.00;12.00-16.00;12.00-16.00;12.00-16.00;12.00-16.00;
userZ; 7.00-12.00;11.00-19.00;12.00-19.00;;;;7.00-19.00

I then want to get the latest schedule per user. If you're doing a csv, it may only have the current data, but this way I can update one users schedule and get the correct result. If I really wanted to do this, I would use stream stats to line up the two sourcetypes so that I could process this on historical data.

sourcetype=timetable | stats latest(*_work_hours) as * by user

This gives me a table with user and then the days of the week.

Work Day

I then queried to get the work day data (sourcetype=answers_work_day). I then joined this to the query I previously ran (sourcetype=timetable | stats latest(*_work_hours) as * by user) joining on user

sourcetype=answers_work_day 
 | join user [search sourcetype=timetable | stats latest(*_work_hours) as * by user]

Getting Something Useful

The next step is to use the weekday field from the answers_work_day data to pick the correct work schedule. The way I did this (apologize caffeine hasn't kicked in yet) was to use a really ugly case statement the compares the value of weekday to the known weekdays. You could use lower to make this a bit more reliable.

 [...] | eval scheduledhours=case(
                          weekday="monday", monday,
                           weekday="tuesday", tuesday,
                           weekday="wednesday", wednesday,
                           weekday="thursday", thursday,
                           weekday="friday", friday,
                           weekday="saturday", saturday,
                           weekday="sunday", sunday
                           1=1,"fixme")

Finally, I added a table so you'd see just the columns that you want (see below)

Final Solution

sourcetype=answers_work_day 
| join user [search sourcetype=timetable | stats latest(*_work_hours) as * by user]
| eval scheduledhours=case(
                         weekday="monday", monday,
                          weekday="tuesday", tuesday,
                          weekday="wednesday", wednesday,
                          weekday="thursday", thursday,
                          weekday="friday", friday,
                          weekday="saturday", saturday,
                          weekday="sunday", sunday)
| table user weekday worktime scheduledhours

While this solution works, I try to avoid subqueries. If you're honestly using a lookup for timetable, the subquery method would work. If you use join user [ inputlookup "timetable" [...] ], then hopefully this meets your needs.

The problem I have is what if you have historical data and schedules have changed over time. Typically you can avoid using sub-queries by using stats to combined rows. You may be able to use streamstats then to handle the case of changing schedules, but as I said I haven't had caffeine yet this morning so an obvious solution is escaping me and I've already sank more time then I should have into this answer.

View solution in original post

triest
Communicator

Assumptions

  1. You say you will make a lookup table from a csv table, but then in the example query you posted you have index=timetable
  2. That what you want out of this is to list the data in work day and append the timetable data
  3. That the time table data represents the scheduled time (really just affects that way I named things in the final solution)

I'm going to use two source types, but you could easily swap one or both of them with csv data.

Timetable

I'm going to start with timetable, I indexed the following data as sourcetype timetable

user;monday_work_hours;tuesday_work_hours;wednesday_work_hours;thursday_work_hours;friday_work_hours;saturday_work_hours;sunday_work_hours
userA; 8.00-12.00;13.00-19.00;12.00-16.00;12.00-16.00;12.00-16.00;12.00-16.00;12.00-16.00;
userZ; 7.00-12.00;11.00-19.00;12.00-19.00;;;;7.00-19.00

I then want to get the latest schedule per user. If you're doing a csv, it may only have the current data, but this way I can update one users schedule and get the correct result. If I really wanted to do this, I would use stream stats to line up the two sourcetypes so that I could process this on historical data.

sourcetype=timetable | stats latest(*_work_hours) as * by user

This gives me a table with user and then the days of the week.

Work Day

I then queried to get the work day data (sourcetype=answers_work_day). I then joined this to the query I previously ran (sourcetype=timetable | stats latest(*_work_hours) as * by user) joining on user

sourcetype=answers_work_day 
 | join user [search sourcetype=timetable | stats latest(*_work_hours) as * by user]

Getting Something Useful

The next step is to use the weekday field from the answers_work_day data to pick the correct work schedule. The way I did this (apologize caffeine hasn't kicked in yet) was to use a really ugly case statement the compares the value of weekday to the known weekdays. You could use lower to make this a bit more reliable.

 [...] | eval scheduledhours=case(
                          weekday="monday", monday,
                           weekday="tuesday", tuesday,
                           weekday="wednesday", wednesday,
                           weekday="thursday", thursday,
                           weekday="friday", friday,
                           weekday="saturday", saturday,
                           weekday="sunday", sunday
                           1=1,"fixme")

Finally, I added a table so you'd see just the columns that you want (see below)

Final Solution

sourcetype=answers_work_day 
| join user [search sourcetype=timetable | stats latest(*_work_hours) as * by user]
| eval scheduledhours=case(
                         weekday="monday", monday,
                          weekday="tuesday", tuesday,
                          weekday="wednesday", wednesday,
                          weekday="thursday", thursday,
                          weekday="friday", friday,
                          weekday="saturday", saturday,
                          weekday="sunday", sunday)
| table user weekday worktime scheduledhours

While this solution works, I try to avoid subqueries. If you're honestly using a lookup for timetable, the subquery method would work. If you use join user [ inputlookup "timetable" [...] ], then hopefully this meets your needs.

The problem I have is what if you have historical data and schedules have changed over time. Typically you can avoid using sub-queries by using stats to combined rows. You may be able to use streamstats then to handle the case of changing schedules, but as I said I haven't had caffeine yet this morning so an obvious solution is escaping me and I've already sank more time then I should have into this answer.

andrey2007
Contributor

It really works cool!
Now I am trying to understad how))

0 Karma

triest
Communicator

If you have questions, please feel free to ask; I'd be more then happy to try and flush out my answer.

You're probably not the only one with the question, so it may help some one else enough that they get it and don't have to ask a question (which is MUCH faster for them since they aren't waiting on some one).

andrey2007
Contributor

I see unusual thing for me in Splunk
for request

sourcetype=timetable | stats latest(*_hours) as * by user
I see ONLY 6 days for every user(7 columns, fist column is list of users and 6 columns represent timetable for days sorted by alphabet from friday to tuesday -there is no wednesday)
BUT for more precise request

sourcetype=timetable | stats latest(*_work_hours) as * by user

I see the table which consists of 8 columns( fist column is list of users and 7 columns represent timetable)
I assume that it happens because wednesday starts with w according to default Splunk sorting A=>Z wednesday is the last workday but why it is absent in table?

0 Karma

triest
Communicator

Can you provide a sample of what's in timetable?

It looks like the * is missing from your query, so I'm assuming the non-working one is:
sourcetype=timetable | stats count(*_hours) as * by user

I haven't been able to reproduce your problem.

With that said, what if you specify all the days (kinda a pain, but since its workdays, its possible):
sourcetype=timetable | stats count(monday_work_hours) as monday count(tuesday_work_hours) as tuesday count(wednesday_work_hours) as wednesday count(thursday_work_hours) as thursday count(friday_work_hours) as friday by user

If that above query is still missing Wednesday, I'd double check the extractions with:
sourcetype=timetable | table user monday_work_hours tuesday_work_hours wednesday_work_hours thursday_work_hours friday_work_hours

Sorry I'm not more helpful; its hard when you can't reproduce a problem.

0 Karma

andrey2007
Contributor

Sorry, now there are corrected requests which give different results. As it turns asterisk displays here(on portal) when you mark it as code but not in plain text even you see it before press submit button.
timetable is an csv file which looks like
user;monday_work_hours;.....;sunday_work_hours
UserA;11.00-12.00;........;12.00-19.00
...
userZ;09.00;....;12.00-20.00

Thank you for answer now it is not the problem but strange case.
Sure I resolved it by specifying _work_hours instead of _hours

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...