Dashboards & Visualizations
Highlighted

Performance very slow for index join on 200MB data only

New Member

Hi I've json data in one index containing studentid,gradeid,classid . and another static dataset(csv) in another index which contains a the names based on ids. Say studentname,studentid,gradename,gradeid.
I need to show a dashboard in realtime which will update the count of students per grade per class , however the filter are on the names and not the ids. So I need to join these two datasets. The grade
id,class_id are inside nested json and are extracted at search time before they are joined with the static index. This query is taking 20seconds!!! too slow. What techniques can be done to make it under 2-3seconds since data volume is very small.

0 Karma
Highlighted

Re: Performance very slow for index join on 200MB data only

SplunkTrust
SplunkTrust

Please share your search.
Also, please explain why a dashboard needs to be real-time. Is the number of students in a class changing so fast a 20-second query can't keep up? If a human is processing the results of the query then real-time is not necessary.

Perhaps by "realtime" you mean "fast". That's different and by seeing your query we may be able to suggest improvements.
What is the size of the "very small" data volume?

---
If this reply helps you, an upvote would be appreciated.
0 Karma
Highlighted

Re: Performance very slow for index join on 200MB data only

New Member

So we need to monitor student activity. So any instant if a student is inactive for less than 5 seconds we see that. So we want it realtime.

0 Karma
Highlighted

Re: Performance very slow for index join on 200MB data only

Esteemed Legend

This is one of the main reasons you keep hearing us say never use join; instead use stats like this:

(index=<json data> AND sourcetype=<json data> OR (index=<student data> AND sourcetype=<student data>)
| stats values(*) AS * BY student_id grade_id

Then do your stuff from here.

0 Karma
Highlighted

Re: Performance very slow for index join on 200MB data only

Esteemed Legend

Even better, I would use a scheduled search to turn the 2nd CSV dataset back into a lookup like this:

(index=<student data> AND sourcetype=<student data>)
| dedup student_name student_id grade_name grade_id
| table student_name student_id grade_name grade_id
| outputlookup MyStudentLookup.csv

Then do this:

 index=<json data> AND sourcetype=<json data>
| lookup MyStudentLookup.csv student_id grade_id
0 Karma
Speak Up for Splunk Careers!

We want to better understand the impact Splunk experience and expertise has has on individuals' careers, and help highlight the growing demand for Splunk skills.