Splunk Search

Optimize join for audit logs

BrandonKeep
Explorer

I have a search that returns correct results. However, the join subsearch portion is constantly hitting the max 50000 results limit. I'd like to run this against a larger timerange so I can produce a weekly report. Right now, I have to keep the timerange small to get any results.

index=os sourcetype=linux_audit type=SYSCALL key=pci
| join msg [search index=os sourcetype=linux_audit type=CWD]
| table _time, host, exe, comm, success, auid, cwd

The field I want to use within the join is the msg field. Is there a way to pass the msg value in the join to speed up the search?

Some sample data from the log messages:

    type=SYSCALL msg=audit(1524096248.939:201277):  success=yes pid=6561 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=33671 comm="rm" exe="/bin/rm" key="pci"
 type=CWD msg=audit(1524096248.939:201277):  cwd="/home/user"

    type=SYSCALL msg=audit(1524096249.335:201280): success=yes pid=6561 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=33671 comm="rm" exe="/bin/rm" key="pci"
type=CWD msg=audit(152409649.335:201280):  cwd="/home/user"

The expected results match based on the contents of the msg field
alt text

None of the provided answers seems to be what I need. Anyone else able to answer this?

0 Karma

woodcock
Esteemed Legend

Assuming that the timestamps are exactly the same for the events that need to be connected, this is a perfect use case for selfjoin:

index=os sourcetype=linux_audit AND ((type=SYSCALL AND key=pci) OR type=CWD)
| selfjoin msg
| table _time, host, exe, comm, success, auid, cwd
0 Karma

BrandonKeep
Explorer

While this is a cool command that I didn't know existed, it doesn't give me the results that I need. I end up with over a million results. My posted search gives me two results. I will update the initial question with some sample data and expected results.

Thanks for the attempt.
Regards

0 Karma

niketn
Legend

@BrandonKeep while the actual query would be based on sample data and correlation between two sourcetype and fields coming from each sourcetype

  index=os (sourcetype=linux_audit type=SYSCALL key=pci) OR (index=os sourcetype=linux_audit type=CWD)
 | stats count as eventCount values(type) as types earliest(_time) as EarliestTime latest(_time) as LatestTime by msg
 | search eventCount>1 types="SYSCALL" AND types="linux_audit"

PS: stats aggregate above needs to have other fields (like exe, comm, success) included as per need and their correlation/aggregation.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

BrandonKeep
Explorer

Thanks for the reply. I have added some sample log data and a screenshot of the expected output. Can you clarify your example a bit as it isn't clear to me how to get my expected output.

Regards,

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...