Splunk Search

Optimize join for audit logs

Explorer

I have a search that returns correct results. However, the join subsearch portion is constantly hitting the max 50000 results limit. I'd like to run this against a larger timerange so I can produce a weekly report. Right now, I have to keep the timerange small to get any results.

index=os sourcetype=linux_audit type=SYSCALL key=pci
| join msg [search index=os sourcetype=linux_audit type=CWD]
| table _time, host, exe, comm, success, auid, cwd

The field I want to use within the join is the msg field. Is there a way to pass the msg value in the join to speed up the search?

Some sample data from the log messages:

    type=SYSCALL msg=audit(1524096248.939:201277):  success=yes pid=6561 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=33671 comm="rm" exe="/bin/rm" key="pci"
 type=CWD msg=audit(1524096248.939:201277):  cwd="/home/user"

    type=SYSCALL msg=audit(1524096249.335:201280): success=yes pid=6561 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=33671 comm="rm" exe="/bin/rm" key="pci"
type=CWD msg=audit(152409649.335:201280):  cwd="/home/user"

The expected results match based on the contents of the msg field
alt text

None of the provided answers seems to be what I need. Anyone else able to answer this?

0 Karma

Esteemed Legend

Assuming that the timestamps are exactly the same for the events that need to be connected, this is a perfect use case for selfjoin:

index=os sourcetype=linux_audit AND ((type=SYSCALL AND key=pci) OR type=CWD)
| selfjoin msg
| table _time, host, exe, comm, success, auid, cwd
0 Karma

Explorer

While this is a cool command that I didn't know existed, it doesn't give me the results that I need. I end up with over a million results. My posted search gives me two results. I will update the initial question with some sample data and expected results.

Thanks for the attempt.
Regards

0 Karma

Legend

@BrandonKeep while the actual query would be based on sample data and correlation between two sourcetype and fields coming from each sourcetype

  index=os (sourcetype=linux_audit type=SYSCALL key=pci) OR (index=os sourcetype=linux_audit type=CWD)
 | stats count as eventCount values(type) as types earliest(_time) as EarliestTime latest(_time) as LatestTime by msg
 | search eventCount>1 types="SYSCALL" AND types="linux_audit"

PS: stats aggregate above needs to have other fields (like exe, comm, success) included as per need and their correlation/aggregation.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

Explorer

Thanks for the reply. I have added some sample log data and a screenshot of the expected output. Can you clarify your example a bit as it isn't clear to me how to get my expected output.

Regards,

0 Karma