Hi I've json data in one index containing student_id,grade_id,class_id . and another static dataset(csv) in another index which contains a the names based on ids. Say student_name,student_id,grade_name,grade_id.
I need to show a dashboard in realtime which will update the count of students per grade per class , however the filter are on the names and not the ids. So I need to join these two datasets. The grade_id,class_id are inside nested json and are extracted at search time before they are joined with the static index. This query is taking 20seconds!!! too slow. What techniques can be done to make it under 2-3seconds since data volume is very small.
Even better, I would use a scheduled search to turn the 2nd CSV dataset back into a lookup like this:
(index=<student data> AND sourcetype=<student data>)
| dedup student_name student_id grade_name grade_id
| table student_name student_id grade_name grade_id
| outputlookup MyStudentLookup.csv
Then do this:
index=<json data> AND sourcetype=<json data>
| lookup MyStudentLookup.csv student_id grade_id
This is one of the main reasons you keep hearing us say never use join
; instead use stats
like this:
(index=<json data> AND sourcetype=<json data> OR (index=<student data> AND sourcetype=<student data>)
| stats values(*) AS * BY student_id grade_id
Then do your stuff from here.
Please share your search.
Also, please explain why a dashboard needs to be real-time. Is the number of students in a class changing so fast a 20-second query can't keep up? If a human is processing the results of the query then real-time is not necessary.
Perhaps by "realtime" you mean "fast". That's different and by seeing your query we may be able to suggest improvements.
What is the size of the "very small" data volume?
So we need to monitor student activity. So any instant if a student is inactive for less than 5 seconds we see that. So we want it realtime.