Getting Data In

Dealing with a UF client that is sending too much data

eddpot
New Member

I have a number of windows clients using the Universal forwarder to send a small log file to Splunk. Typically around 15kb per day per client.

However, when testing this I found a client that is sending almost 1gb a day rather than the expected 15kb. It appears as though this client is having issues and is writing a massive amount of errors to the log daily.

If I scale up the deployment of the UF for this app to more clients, then I am concerned that multiple clients having this issue could push my data ingest up to an unsustainable level.

I need to be able to reduce the amount of data this client (and any future clients that have the same issue) are sending, but I don't want to exclude it entirely as then I won't be able to see which clients are having this manic log writing issue.

What is the best way to solve this? Can I limit the total data that can be forwarded per client for this app, or can I do some de-duplication on the data prior to forwarding in order to reduce the amount sent? It writes the same log lines repeatedly within the same timestamp

Thanks for any advice you can offer.

0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @eddpot,
In Splunk it's possible to filter logs before indexing.
To do this, you should understand which subset of these logs you really need , both in normal and in error condition.
Then you have to find a way to identificate them or to identificate the discarding logs using one or more regex and at the end filter the unwanted logs before indexing (for more infos see at https://docs.splunk.com/Documentation/Splunk/8.0.0/Forwarding/Routeandfilterdatad ).
Obviously in this way you are loosing data that you cannot use for debugging or other use cases.

Ciao.
Giuseppe

View solution in original post

gcusello
SplunkTrust
SplunkTrust

Hi @eddpot,
In Splunk it's possible to filter logs before indexing.
To do this, you should understand which subset of these logs you really need , both in normal and in error condition.
Then you have to find a way to identificate them or to identificate the discarding logs using one or more regex and at the end filter the unwanted logs before indexing (for more infos see at https://docs.splunk.com/Documentation/Splunk/8.0.0/Forwarding/Routeandfilterdatad ).
Obviously in this way you are loosing data that you cannot use for debugging or other use cases.

Ciao.
Giuseppe

eddpot
New Member

Thanks @gcusello

That seems like a good solution. Unfortunately I can't come up with a good filter that will reduce the entries significantly, while still leaving enough data in there to be able to identify a faulty client.

I think if it's not possible to de-duplicate the logs before indexing (which I don't think is possible) then there may not be a good solution available for me.

However, your reply did answer the question so would it be good form for me to mark your answer as 'Accepted'?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Thank you!
Ciao and Next time!
Giuseppe

gcusello
SplunkTrust
SplunkTrust

Hi @eddpot,
if you could identify these exceptional logs, you could move them in a different index with a low retention time (e.g. one or two days) that you can use for analyze flows and solve problems: search on Splunk answers how to do it, you don't need another question!
In this way, you anyway consume license, but not so much storage.

Ciao.
Giuseppe

Get Updates on the Splunk Community!

New Dates, New City: Save the Date for .conf25!

Wake up, babe! New .conf25 dates AND location just dropped!! That's right, this year, .conf25 is taking place ...

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...

Observability protocols to know about

Observability protocols define the specifications or formats for collecting, encoding, transporting, and ...