How to prevent splunk forwarder for sending gibbe...

z080236 · ‎05-13-2021

As seen in Solved: How to establish secure connection between Univers... - Splunk Community

there are ways to secure the connection between the forwarder and indexer. This is to stop unauthorized users from forwarding to the Splunk Indexer, and managing the other splunk components.

More detailed steps on ssl , and some token can be seen here for stopping unauthorized components to connect.

However, this does not stop the forwarder for sending rubbish data to the indexer, is there any way that the forwarder or some component can packet inspect the data, and stop those rubbish data or strange data from sending to the indexer?

z080236 · ‎05-21-2021

OK this is my final question

We have a requirement to stream the events from a cloud Splunk SIEM to an on-prem SIEM

This is the splunk flow:

Cloud Splunk -> On Prem Splunk Fwd (9997) -> On prem Indexer

Cloud Splunk currently sends indexA, indexB, indexC

On my Splunk Forwarder end, I only want data from indexA.

Based on https://docs.splunk.com/Documentation/Splunk/latest/Data/Monitornetworkports

I configure the inputs.conf as follows:

[splunktcp://9997]
disabled = 0

There doesn't seem to be a whitelist for the index, it seems whatever I receive, I will forward to the indexer.

I looked through the documentation , doesn't seem it can filter by index, only can filter by regex events at output.conf

https://docs.splunk.com/Documentation/Splunk/8.2.0/Forwarding/Routeandfilterdatad#Filter_data_by_tar...

z080236 · ‎05-14-2021

let's say we monitor a folder which has syslog.log in it

in this log there is 3 lines

the first and 3rd lines are showing gibberish data and does not conform to the source type

the second line is correct data.

Is the Splunk forwarder intelligent enough to just forward the second line and ignore the first and third line, using the whitelist method you have described?

gcusello · ‎05-14-2021

Hi @z080236,

Yes, I'm pretty sure that Splunk is so intelligent to delete a not relevant part of an event.

If you want to delete a part of each event you have to intervene in props.conf using the SEDCMD command.

In other words, you have to find the regex that identifies the event part that you want and create a SEDCMD command to delete the other parts.

You can find how to do this at https://community.splunk.com/t5/Splunk-Search/How-do-I-ignore-part-of-an-event/m-p/345367 and https://community.splunk.com/t5/Getting-Data-In/Need-SEDCMD-Help/td-p/409993 and https://docs.splunk.com/Documentation/Splunk/8.1.3/Admin/Propsconf

Ciao.

Giuseppe

richgalloway · ‎05-14-2021

If a regular expression can describe the gibberish or non-gibberish lines then a set of transforms can be written to send the gibberish to nullQueue.

---
If this reply helps you, Karma would be appreciated.

richgalloway · ‎05-13-2021

What is your definition of "rubbish data" and how is Splunk to know which is rubbish and which is not?

---
If this reply helps you, Karma would be appreciated.

z080236 · ‎05-13-2021

I will consider that anything that does not meet the sourcetype in the monitor file, which is configured at the heavy forwarder, to be "rubbish data"

is there any way to validate at the heavy forwarder end?

gcusello · ‎05-13-2021

Hi @z080236,

I think that you should design with more attention your inputs!

In other words, you should:

at first, list and analyze the data to take from a source and identify the ones that you want to index;
then modify your inputs to take only the data you want and not other data (using whitelists and blacklists), in other words you have to modify your inputs.conf avoiding *.* but choosing only the wanted files;
then associate to each of that type of data a sourcetype that characterizes it;
at least, if there are still unwanted data, you can create a filter on Indexer to delete those data before indexing (https://docs.splunk.com/Documentation/Splunk/8.1.3/Forwarding/Routeandfilterdatad#Filter_event_data_...).

Ciao.

Giuseppe

z080236 · ‎05-15-2021

at first, list and analyze the data to take from a source and identify the ones that you want to index

yes, we have already done this. We already used monitor on folder at the Splunk forwarder to define the sourcetype and the index.

However, we can't stop if the external person send us invalid file types and weird content, it will just ingest in it.

So far, I have searched Splunk answers and documentation, there is no way to ensure the content is "clean".

what whitelist can do, is to monitor file extension only

For example, to monitor only files with the .log extension, make the following change:

[monitor:///mnt/logs]
    whitelist = \.log$

or based on the file name, but can't check the content.

at least, if there are still unwanted data, you can create a filter on Indexer to delete those data before indexing (https://docs.splunk.com/Documentation/Splunk/8.1.3/Forwarding/Routeandfilterdatad#Filter_event_data_...).

This one is based on specific regex expression, doesnt seem to fit in, as we are looking for a whitelist.

Thus, if I am monitoring this folder

[monitor:///var/log/putlogshere]

whitelist = \.log$

sourcetype=xx

index=index1

and I implement the whitelist , the user can still send in a log with weird data which fulfil the whitelist condition and it will still be forwarded to the indexer, is that correct?

gcusello · ‎05-15-2021

Hi @z080236,

At first why do you have users that can change Splunk Forwarders configurations?

Anyway if you have users that modify inputs adding stanzas that take weird logs, you could do three things:

at first, manage the forwarders (all or at least the ones that the users can modify) under Deployment Server that check Forwarders and push the correct configurations when modified.
In addition, you can pun on your indexers a filter (but you have to identify each log to take using regexes) that takes only the logs you want and discard the others.
Then you could create an alert that fires when configurations are changed and/or you receive weird logs, so you can cut the little hand of that users ! 😉

Ciao.

Giuseppe

z080236 · ‎05-16-2021

At first why do you have users that can change Splunk Forwarders configurations?

This is not what I want to achieve, what I want to achieve to ensure that content sent to the forwarder is clean. Based on the solution above, you proposed whitelist, and filter and routing based on regex. Another way I heard of is to check the timestamp of the log, and if it is before xx days, it wont be ingested. However, assuming I implement all the 3 checks, is that enough to ensure the content that is received from the forwarder is clean?

I can't stop if one day the remote user put a wrong macro file , rename the extension to .log and send to me.

Then, I will not be sure Splunk will just ingest the content like this or not.

Anyway if you have users that modify inputs adding stanzas that take weird logs, you could do three things:

This is not what I want to achieve, I just want the forwarder to be able to prevent those weird entries from being ingested, if one day, the user's server files got corrupted. Then, the file goes to my system and gets ingested in.

gcusello · ‎05-16-2021

Hi @z080236,

if you want that each user can ingest each kind of data, the only hint I can give you is to monitor those inputs.

In other words, when someone needs other logs, maintain the control on the flow giving you the rules for ingestions to the user.

I say this because Splunk can filter data but it needs one or more rules in data ingestion but you're saying that there isn't any rule in ingestion and users can create every kind of ingestion.

The only way is to create a governance of ingestion:

you can plan the ingestions with your users and create together the correct inputs,
you can leave the users free to create their inputs and then you monitor flows and create the filters.

About the check you asked about ingestion of old data, you can have this check inserting in inputs.conf on Forwarders:

ignoreOlderThan = <non-negative integer>[s|m|h|d]

as described in https://docs.splunk.com/Documentation/Splunk/Latest/Admin/Inputsconf

or on indexers, put in props.conf:

MAX_DAYS_AGO = <integer>

as you can see at https://docs.splunk.com/Documentation/Splunk/Latest/Admin/Propsconf

But in this way you solve only a little part of the problem: as I said, you have a governance problem!

Ciao.

Giuseppe

z080236 · ‎05-17-2021

I say this because Splunk can filter data but it needs one or more rules in data ingestion but you're saying that there isn't any rule in ingestion and users can create every kind of ingestion.

This is not what I am saying.

I am saying I have already planned the sourcetype , index and filepath/filetype that I am receiving from my user.

But there's no stopping the user from putting a file that is different from the sourcetype that I am monitoring.

In the case of the above scenario, what will likely happen?

1. Splunk ingest the content and forward to the indexer, if it is not picked up by the filter.

2. Splunk filters out the content, if it is picked up by the filter with the 3 methods, timestamp, file extension/file name, regular expression whitelist.

richgalloway · ‎05-17-2021

If Splunk is monitoring a particular filepath for data in a certain format and someone inserts a file with data in a different format into that filepath then Splunk very likely will not recognize the change. It will do its best to interpret the alien file as though it was normal. Depending on how alien the file is Splunk may complain about the timestamps being in the wrong format or not in the expected location. However, Splunk will NOT say "you told me to expect sourcetype 'foo', but this is sourcetype 'bar' so I'm not touching it".

---
If this reply helps you, Karma would be appreciated.

How to prevent splunk forwarder for sending gibberish data over to the indexer?

heavy forwarder

Index This | When is October more than just the tenth month?

Observe and Secure All Apps with Splunk

What’s New & Next in Splunk SOAR

Are you a member of the Splunk Community?

How to prevent splunk forwarder for sending gibberish data over to the indexer?

heavy forwarder

Index This | When is October more than just the tenth month?

Observe and Secure All Apps with Splunk

What’s New & Next in Splunk SOAR