Getting Data In

How to prevent splunk forwarder for sending gibberish data over to the indexer?

z080236
Explorer

As seen in Solved: How to establish secure connection between Univers... - Splunk Community

there are ways to secure the connection between the forwarder and indexer. This is to stop unauthorized users from forwarding to the Splunk Indexer, and managing the other splunk components.

More detailed steps on ssl , and some token can be seen here for stopping unauthorized components to connect.

 

However, this does not stop the forwarder for sending rubbish data to the indexer,  is there any way that the forwarder or some component can packet inspect the data, and stop those rubbish data or strange data from sending to the indexer?

 

Labels (1)
0 Karma

z080236
Explorer

OK this is my final question

We have a requirement to stream the events from a cloud Splunk SIEM to an on-prem SIEM

This is the splunk flow:

Cloud Splunk  -> On Prem Splunk Fwd (9997) -> On prem Indexer

 

Cloud Splunk currently sends indexA, indexB, indexC

 

On my Splunk Forwarder end, I only want data from indexA.

 

Based on https://docs.splunk.com/Documentation/Splunk/latest/Data/Monitornetworkports

I configure the inputs.conf as follows:

[splunktcp://9997]
disabled = 0

 There doesn't seem to be a whitelist for the index, it seems whatever I receive, I will forward to the indexer.

I looked through the documentation , doesn't seem it can filter by index, only can filter by regex events at output.conf

https://docs.splunk.com/Documentation/Splunk/8.2.0/Forwarding/Routeandfilterdatad#Filter_data_by_tar...

0 Karma

z080236
Explorer

let's say we monitor a folder which has syslog.log in it

 

in this log there is 3 lines

 

the first and 3rd lines are showing gibberish data and does not conform to the source type

the second line is correct data.

 

Is the Splunk forwarder intelligent enough to just forward the second line and ignore the first and third line, using the whitelist method you have described?

 

0 Karma

gcusello
Legend

Hi @z080236,

Yes, I'm pretty sure that Splunk is so intelligent to delete a not relevant part of an event.

If you want to delete a part of each event you have to intervene in props.conf using the SEDCMD command.

In other words, you have to find the regex that identifies the event part that you want and create a SEDCMD command to delete the other parts.

You can find how to do this at https://community.splunk.com/t5/Splunk-Search/How-do-I-ignore-part-of-an-event/m-p/345367 and https://community.splunk.com/t5/Getting-Data-In/Need-SEDCMD-Help/td-p/409993 and https://docs.splunk.com/Documentation/Splunk/8.1.3/Admin/Propsconf

Ciao.

Giuseppe

0 Karma

richgalloway
SplunkTrust
SplunkTrust

If a regular expression can describe the gibberish or non-gibberish lines then a set of transforms can be written to send the gibberish to nullQueue.

---
If this reply helps you, Karma would be appreciated.
0 Karma

richgalloway
SplunkTrust
SplunkTrust

What is your definition of "rubbish data" and how is Splunk to know which is rubbish and which is not?

---
If this reply helps you, Karma would be appreciated.
0 Karma

z080236
Explorer

I will consider that anything that does not meet the sourcetype in the monitor file, which is configured at the heavy forwarder,  to be "rubbish data"

is there any way to validate at the heavy forwarder end?

 

0 Karma

gcusello
Legend

Hi @z080236,

I think that you should design with more attention your inputs!

In other words, you should:

  • at first, list and analyze the data to take from a source and identify the ones that you want to index;
  • then modify your inputs to take only the data you want and not other data (using whitelists and blacklists), in other words you have to modify your inputs.conf avoiding *.* but choosing only the wanted files;
  • then associate to each of that type of data a sourcetype that characterizes it;
  • at least, if there are still unwanted data, you can create a filter on Indexer to delete those data before indexing (https://docs.splunk.com/Documentation/Splunk/8.1.3/Forwarding/Routeandfilterdatad#Filter_event_data_...).

Ciao.

Giuseppe

z080236
Explorer

at first, list and analyze the data to take from a source and identify the ones that you want to index

yes, we have already done this. We already used monitor on folder at the Splunk forwarder to define the sourcetype and the index.

However, we can't stop if the external person send us invalid file types and weird content, it will just ingest in it.

So far, I have searched Splunk answers and documentation, there is no way to ensure the content is "clean".

 

what whitelist can do, is to monitor file extension only

For example, to monitor only files with the .log extension, make the following change:

[monitor:///mnt/logs]
    whitelist = \.log$

 

or based on the file name, but can't check the content.

 

This one is based on specific regex expression, doesnt seem to fit in, as we are looking for a whitelist.

 

Thus, if I am monitoring this folder

[monitor:///var/log/putlogshere]

whitelist = \.log$

sourcetype=xx

index=index1

and I implement the whitelist ,  the user can still send in a log with weird data which fulfil the whitelist condition and it will still be forwarded to the indexer, is that correct?

 

 

 

 

0 Karma

gcusello
Legend

Hi @z080236,

At first why do you have users that can change Splunk Forwarders configurations?

Anyway if you have users that modify inputs adding stanzas that take weird logs, you could do three things:

  • at first, manage the forwarders (all or at least the ones that the users can modify) under Deployment Server that check Forwarders and push the correct configurations when modified.
  • In addition, you can pun on your indexers a filter (but you have to identify each log to take using regexes) that takes only the logs you want and discard the others.
  • Then you could create an alert that fires when configurations are changed and/or you receive weird logs, so you can cut the little hand of that users ! 😉

Ciao.

Giuseppe

0 Karma

z080236
Explorer

At first why do you have users that can change Splunk Forwarders configurations?

This is not what I want to achieve, what I want to achieve to ensure that content sent to the forwarder is clean.  Based on the solution above, you proposed whitelist, and filter and routing based on regex. Another way I heard of is to check the timestamp of the log, and if it is before xx days, it wont be ingested.   However, assuming I implement all the 3 checks, is that enough to ensure the content that is received from the forwarder is clean?

I can't stop if one day the remote user put a wrong macro file , rename the extension to .log and send to me.

Then, I will not be sure Splunk will just ingest the content like this or not.

 

 

Anyway if you have users that modify inputs adding stanzas that take weird logs, you could do three things:

 

This is not what I want to achieve, I just want the forwarder to be able to prevent those weird  entries from being ingested, if one day,  the user's server files got corrupted. Then, the file goes to my system and gets ingested in.

0 Karma

gcusello
Legend

Hi @z080236,

if you want that each user can ingest each kind of data, the only hint I can give you is to monitor those inputs.

In other words, when someone needs other logs, maintain the control  on the flow giving you the rules for ingestions to the user.

I say this because Splunk can filter data but it needs one or more rules in data ingestion but you're saying that there isn't any rule in ingestion and users can create every kind of ingestion.

The only way is to create a governance of ingestion:

  • you can plan the ingestions with your users and create together the correct inputs,
  • you can leave the users free to create their inputs and then you monitor flows and create the filters.

About the check you asked about ingestion of old data, you can have this check inserting in inputs.conf on Forwarders:

 

ignoreOlderThan = <non-negative integer>[s|m|h|d]

 

as described in https://docs.splunk.com/Documentation/Splunk/Latest/Admin/Inputsconf 

or on indexers, put in props.conf:

 

MAX_DAYS_AGO = <integer>

 

as you can see at https://docs.splunk.com/Documentation/Splunk/Latest/Admin/Propsconf

But in this way you solve only a little part of the problem: as I said, you have a governance problem!

Ciao.

Giuseppe

0 Karma

z080236
Explorer

I say this because Splunk can filter data but it needs one or more rules in data ingestion but you're saying that there isn't any rule in ingestion and users can create every kind of ingestion.

This is not what I am saying.

 

I am saying I have already planned the sourcetype , index and filepath/filetype that I am receiving from my user.

But there's no stopping the user from putting a file that is different from the sourcetype that I am monitoring.

In the case of the above scenario, what will likely happen?

1. Splunk ingest the content and forward to the indexer, if it is not picked up by the filter.

2. Splunk filters out the content, if it is picked up by the filter with the 3 methods, timestamp, file extension/file name, regular expression whitelist.

 

 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

If Splunk is monitoring a particular filepath for data in a certain format and someone inserts a file with data in a different format into that filepath then Splunk very likely will not recognize the change.  It will do its best to interpret the alien file as though it was normal.  Depending on how alien the file is Splunk may complain about the timestamps being in the wrong format or not in the expected location.  However, Splunk will NOT say "you told me to expect sourcetype 'foo', but this is sourcetype 'bar' so I'm not touching it".

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Happy CX Day to our Community Superheroes!

Happy 10th Birthday CX Day!What is CX Day? It’s a global celebration recognizing innovation and success in the ...

Check out This Month’s Brand new Splunk Lantern Articles

Splunk Lantern is a customer success center providing advice from Splunk experts on valuable data insights, ...

Routing Data to Different Splunk Indexes in the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. The OpenTelemetry project is the second largest ...