Splunk Search

Optimize join for audit logs


I have a search that returns correct results. However, the join subsearch portion is constantly hitting the max 50000 results limit. I'd like to run this against a larger timerange so I can produce a weekly report. Right now, I have to keep the timerange small to get any results.

index=os sourcetype=linux_audit type=SYSCALL key=pci
| join msg [search index=os sourcetype=linux_audit type=CWD]
| table _time, host, exe, comm, success, auid, cwd

The field I want to use within the join is the msg field. Is there a way to pass the msg value in the join to speed up the search?

Some sample data from the log messages:

    type=SYSCALL msg=audit(1524096248.939:201277):  success=yes pid=6561 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=33671 comm="rm" exe="/bin/rm" key="pci"
 type=CWD msg=audit(1524096248.939:201277):  cwd="/home/user"

    type=SYSCALL msg=audit(1524096249.335:201280): success=yes pid=6561 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=33671 comm="rm" exe="/bin/rm" key="pci"
type=CWD msg=audit(152409649.335:201280):  cwd="/home/user"

The expected results match based on the contents of the msg field
alt text

None of the provided answers seems to be what I need. Anyone else able to answer this?

0 Karma

Esteemed Legend

Assuming that the timestamps are exactly the same for the events that need to be connected, this is a perfect use case for selfjoin:

index=os sourcetype=linux_audit AND ((type=SYSCALL AND key=pci) OR type=CWD)
| selfjoin msg
| table _time, host, exe, comm, success, auid, cwd
0 Karma


While this is a cool command that I didn't know existed, it doesn't give me the results that I need. I end up with over a million results. My posted search gives me two results. I will update the initial question with some sample data and expected results.

Thanks for the attempt.

0 Karma


@BrandonKeep while the actual query would be based on sample data and correlation between two sourcetype and fields coming from each sourcetype

  index=os (sourcetype=linux_audit type=SYSCALL key=pci) OR (index=os sourcetype=linux_audit type=CWD)
 | stats count as eventCount values(type) as types earliest(_time) as EarliestTime latest(_time) as LatestTime by msg
 | search eventCount>1 types="SYSCALL" AND types="linux_audit"

PS: stats aggregate above needs to have other fields (like exe, comm, success) included as per need and their correlation/aggregation.

| makeresults | eval message= "Happy Splunking!!!"
0 Karma


Thanks for the reply. I have added some sample log data and a screenshot of the expected output. Can you clarify your example a bit as it isn't clear to me how to get my expected output.


0 Karma
Get Updates on the Splunk Community!

Admin Your Splunk Cloud, Your Way

Join us to maximize different techniques to best tune Splunk Cloud. In this Tech Enablement, you will get ...

Cloud Platform | Discontinuing support for TLS version 1.0 and 1.1

Overview Transport Layer Security (TLS) is a security communications protocol that lets two computers, ...

New Customer Testimonials

Enterprises of all sizes and across different industries are accelerating cloud adoption by migrating ...