Splunk Search

Issue with performance on join command?

jsoohoo
New Member

Hello there,
Sorry for asking a noob question! But I'm struggling to determine why my join isn't working across all data. My main search which queries the total number of accesses by a student or staff (in a separate query) and then the inner search tries to match the username to find the faculty. Then I try to dedup and combine the faculties if they are studying more than one degree.

I can query the summary index directly and retrieve a single student's faculty information successfully however if I query the entire summary index I do not get many faculty results returned. The summary index has 3.8mill events. Could it be performance limitation affecting the results? Honestly I dont know the details of the limit, as the Sys Admins set this enterprise version.

Is it correct to assume there is capacity issues or is it my query?

index="123" source="/path/to/logs.log"
| rex field=uri "(?[^.]+).(com|org|de|co|net|gov)"
| table username, url
| join type=left username [search index="student-summary-index" source="SummaryIndex"
| table username, FACULTY]

0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

@jsoohoo

If you want to go with join then I suggest restricting events by adding a field filter. Like ( uri=* and username=* FACULTY=*) This will fetch only the required events.
Check this:

index="123" source="/path/to/logs.log" uri=*
| rex field=uri "(?[^.]+).(com|org|de|co|net|gov)" 
| table username, url 
| join type=left username 
    [ search index="student-summary-index" source="SummaryIndex" username=* FACULTY=* 
    | table username, FACULTY]

I would like you to try below search also. Here I have used append for achieving the same.

index="123" source="/path/to/logs.log" uri=*
| rex field=uri "(?[^.]+).(com|org|de|co|net|gov)" 
| table username, url 
| append 
    [ search index="student-summary-index" source="SummaryIndex" username=* FACULTY=* 
    | table username, FACULTY] 
    | stats values(url) as url values(FACULTY) as FACULTY by username

Note: Make sure about uri and url fields. 🙂
Thanks

0 Karma

lakshman239
SplunkTrust
SplunkTrust

If you have 3.8m events in the summary index, do you want to check for all users in the index or only active users or users logged/accessed in the last 7 days or so. Perhaps limiting the results from summary index can return fewer results and thereby produce faster search times. One thing you could also do is change the sub-search to something like search index="student-summary-index" source="SummaryIndex"
| stats count by username, FACULTY | fields username, FACULTY
, this will return fewer records. Include additional filtering before stats, if you have [ e.g. enabled users, last 7 days etc.]

The other option is to run a scheduled search against your summary index to produce a lookup [ again try to limit the fields/contents going to lookup] and use that lookup in place of sub-search or directly alone with your main search to OUTPUT FACULTY using username.

0 Karma