Getting Data In

Adaptive filtering/Stateful cross-event logic to filter unwanted events

fatsug
Builder

When collecting Linux logs using a Universal Forwarder we are collecting a lot of unnecessary audit log from cronjobs collecting server status and other information. In particular the I/O from a little script.

To build a grep based logic on a Heavy Forwarder there would have to be a long list of very particular "grep" strings not to loose ALL grep attempts. In a similar manner, commands like 'uname' and 'id' are even harder to filter out.

The logic needed o reliably filter out only I/O generated by the script would be to find events with comm="script-name", get the pid value from that initial event and drop all events for the next say 10 seconds with a ppid that matches the pid.

To make things complicated there is no control over the log/files on the endpoints, only what the universal forwarder is able to do then the heavy forwarder before the log in indexed.

Is there any way to accomplish this kind of adaptive filtering/stateful cross-event logic in transit and under these conditions? Is this something that may be possible using the new and shiny Splunk Edge Processor once it is generally available?

0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

EP will not do that cross event analysis, at least not in its current form, but I would imagine that aggregations, and hence the ability to handle event relationships, is something that will come.

Your comment 'a lot of unnecessary audit logs from a little script' makes me wonder if your little script could be pruned to be even smaller 😀

View solution in original post

bowesmana
SplunkTrust
SplunkTrust

EP will not do that cross event analysis, at least not in its current form, but I would imagine that aggregations, and hence the ability to handle event relationships, is something that will come.

Your comment 'a lot of unnecessary audit logs from a little script' makes me wonder if your little script could be pruned to be even smaller 😀

fatsug
Builder

That's a shame, but reasonable I suppose. This feature should be resource heavy as it would need to keep a bunch of data in memory running "searches" with a high frequency, but it would have been a nice fix for my problem.

'pruned' is an understatement. Not to give it all away (in case someone happens to google their way here) but I've repeatedly pointed out that "you do not need to check if your running on a virtual machine/VPS once every hour, every day, forever". Especially if you are the service provider and know that you will never ever ever ever run on "Alibaba Cloud" and most likely never use QEMU. But that's a problem related to "drop-in solutions" for metric collection which no one has any interest of optimizing and over which I have no control.

Hence, insanely detailed grep solutions and allowing some completely pointles audit log to trickle in is just the way it has to be for now.

Thank you for your feedback, much appreciated.

 

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Splunk got rid of DSP which did that and I'm aware that the aggregation features of DSP is something that EP is hopefully going to support at some point - note that Cribl could do this if you really wanted to go that route although it would entail another tech stack if you don't already use it.

0 Karma

fatsug
Builder

Ah, yes. Crible. Looked at their product earlier for other issues. Seems like a good product and solution but as you point out, another product to manage.

Supposedly a logstash in front of our heavy forwarders could also do the trick. But that also includes another product and I assume that log shipping would then also become HEC traffic and in my (very limited but still) experience also mean having to deal with a new log format and sourcetype for a standard Linux audit log.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Dunno about Edge Processor but the general answer to any "inter-event" question regarding forwarding/filtering is "no". Splunk processes each event independently and doesn't carry any state from one event to another (which makes sense when you realize that two subsequent events from the same source can be forwarded to two different receivers and end up on two completely different indexers).

Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...