Getting Data In

How to do Firewall log summarization before indexing?

edoardo_vicendo
Contributor

Hello,

In our environment we are dealing with hundreds of GB/day of logs coming from Firewalls.

Despite having already fixed some noisy sources we are in difficulty to reduce the load.

I was wondering if any of you have already tackled this problem.

Our configuration is:

 

 

FW --> Load Balancer --> Syslog servers --> file --> Splunk HFs --> Splunk Indexer

 

 

The Splunk HFs are installed on the same servers where the syslog service is running. Syslog receives the data from Firewalls, write them into a file, then Splunk HF monitor those files.

The idea is to use a component that every "n" minutes consolidate/summarize the information written into the file by the syslog server and produce an output with a summary. The summarized file is then read by Splunk HF.

 

 

FW --> Load Balancer --> Syslog servers --> file --> Summarization tool --> summarized file --> Splunk HFs --> Splunk Indexer

 

 

I can write a script for this use case, but do you know if there is already a tool that can do the job?

I was checking logwatch, maybe you have a better suggestion.

 

Thansk a lot,

Edoardo

Labels (5)
0 Karma
1 Solution

edoardo_vicendo
Contributor

We ended up changing the integration patter from API query to syslog flow. This allowed us to reduce the payload.

Consequently we also filtered useless information with a SED command.

Thanks to both actions we were able to halve the data ingested.

Later on we performed an analysis on the data,  and we saw that aggregating them would have reduced just about 10%, therefore the effort implementing a sort of summarization tool before ingesting the data was worthless, at least for us.

If you still need that you can look at some commercial tools that can cover the Use case.

Otherwise creating your own script that read the data and summarize them before Splunk ingest, but there are pros and cons, like making it very solid to avoid stopping the data ingestion.

Best Regards,

Edoardo

View solution in original post

0 Karma

PickleRick
SplunkTrust
SplunkTrust

I have no knowledge of any tool "summarizing" firewall logs and honestly - I pretty much doubt something like that even exists since it's a very unusual use case.

But.

If it's a firewall, maybe it could export data using NetFlow/IPFIX? With those you can often set up aggregation at the source (at expense of increased resources usage there).

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @edoardo_vicendo,

No you cannot implement your idea: summarization is a process that runs on Indexers not on HFs.

The first thing you should analyze is to understand if all the logs are useful for you and trying, if possible to reduce the logs to index.

Then you have to index the remaining logs, and after make the summarization (or DataModels) to use for quick searches, but anyway, summary indexes are extracted from indexed logs, not before, also because summary indexes don't consume license, so it isn't possible to create them before indexing.

In other words: summary indexes are useful to have fester searches and they are created from indexed logs.

You could preparse logs using a script but this is outside Splunk, and you can do this on HF: I did it to anonymize proxy logs, encrypting accounts (in reversible way) with a certificate.

Ciao.

Giuseppe

0 Karma

edoardo_vicendo
Contributor

Hi @gcusello 

Thanks for your feedback. Yes I know summary index runs on Splunk over indexed data.

My aim here is to summarize the data with a tool before they are indexed, to save license and storage.

Just posted here to know if someone already did something similar, or have some suggestions on tools that can fit my needs etc...

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @edoardo_vicendo,

using Splunk you can filter logs before indexing, deleting the not used events.

In addition you can truncate the dimension of each event, but I think that this isn't useful because firewall's logs are',t verbose but very numerous.

For this reason I hint to see at https://docs.splunk.com/Documentation/Splunk/9.0.0/Forwarding/Routeandfilterdatad#Filter_event_data_... where it's described hot to do.

In few words, you have to:

  • identify the regexes of the events to discard,
  • create a props.conf and a transforms.con on your HFs (if you have) otherwise on Indexers.

the files are:

props.conf

[your_sourcetype]
TRANSFORMS-null= setnull

transforms.conf

[setnull]
REGEX = <your_regex>
DEST_KEY = queue
FORMAT = nullQueue

Remember to restart Splunk after the creation of these files.

Ciao.

Giuseppe

0 Karma

edoardo_vicendo
Contributor

Hi @gcusello 

Thanks again for your feedback. We already do route and filtering for several sourcetype.

Anyway for Firewall logs the scenario we want to achieve  is different.

Logs are mostly similar and they contains the following fields:

FirewallID, sourceIP, destinationIP, destinationPort, action etc...

So every time an host attempt a connection it is logged by the Firewall.

What we want to do is to summarize the data as follows before indexing in Splunk

FirewallID, sourceIP, destinationIP, destinationPort, action etc... <count>

So basically if an host attempt to connect, let's say 100 time, to the same destinationIP and destinationPort and the connection is always dropped instead of having 100 lines of logs we want 1 line of log with the count.

It is like doing a group by

The only way I see to do that is having a "Summarization tool" before Splunk index the data

0 Karma

harendram
Engager

Were you able to get an answer for log summarization, I am interested to know if there is a solution.

edoardo_vicendo
Contributor

We ended up changing the integration patter from API query to syslog flow. This allowed us to reduce the payload.

Consequently we also filtered useless information with a SED command.

Thanks to both actions we were able to halve the data ingested.

Later on we performed an analysis on the data,  and we saw that aggregating them would have reduced just about 10%, therefore the effort implementing a sort of summarization tool before ingesting the data was worthless, at least for us.

If you still need that you can look at some commercial tools that can cover the Use case.

Otherwise creating your own script that read the data and summarize them before Splunk ingest, but there are pros and cons, like making it very solid to avoid stopping the data ingestion.

Best Regards,

Edoardo

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @edoardo_vicendo ,

good for you, see next time!

let us know if we can help you more, or, please, accept one answer for the other people of Community.

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the contributors 😉

gcusello
SplunkTrust
SplunkTrust

Hi @edoardo_vicendo,

I'm sorry I cannot help you, I don't know any kind of tools for this purpose (if exist).

Ciao.

Giuseppe

Get Updates on the Splunk Community!

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...