As seen in Solved: How to establish secure connection between Univers... - Splunk Community
there are ways to secure the connection between the forwarder and indexer. This is to stop unauthorized users from forwarding to the Splunk Indexer, and managing the other splunk components.
More detailed steps on ssl , and some token can be seen here for stopping unauthorized components to connect.
However, this does not stop the forwarder for sending rubbish data to the indexer, is there any way that the forwarder or some component can packet inspect the data, and stop those rubbish data or strange data from sending to the indexer?
OK this is my final question
We have a requirement to stream the events from a cloud Splunk SIEM to an on-prem SIEM
This is the splunk flow:
Cloud Splunk -> On Prem Splunk Fwd (9997) -> On prem Indexer
Cloud Splunk currently sends indexA, indexB, indexC
On my Splunk Forwarder end, I only want data from indexA.
Based on https://docs.splunk.com/Documentation/Splunk/latest/Data/Monitornetworkports
I configure the inputs.conf as follows:
[splunktcp://9997] disabled = 0
There doesn't seem to be a whitelist for the index, it seems whatever I receive, I will forward to the indexer.
I looked through the documentation , doesn't seem it can filter by index, only can filter by regex events at output.conf
let's say we monitor a folder which has syslog.log in it
in this log there is 3 lines
the first and 3rd lines are showing gibberish data and does not conform to the source type
the second line is correct data.
Is the Splunk forwarder intelligent enough to just forward the second line and ignore the first and third line, using the whitelist method you have described?
Hi @z080236,
Yes, I'm pretty sure that Splunk is so intelligent to delete a not relevant part of an event.
If you want to delete a part of each event you have to intervene in props.conf using the SEDCMD command.
In other words, you have to find the regex that identifies the event part that you want and create a SEDCMD command to delete the other parts.
You can find how to do this at https://community.splunk.com/t5/Splunk-Search/How-do-I-ignore-part-of-an-event/m-p/345367 and https://community.splunk.com/t5/Getting-Data-In/Need-SEDCMD-Help/td-p/409993 and https://docs.splunk.com/Documentation/Splunk/8.1.3/Admin/Propsconf
Ciao.
Giuseppe
If a regular expression can describe the gibberish or non-gibberish lines then a set of transforms can be written to send the gibberish to nullQueue.
What is your definition of "rubbish data" and how is Splunk to know which is rubbish and which is not?
I will consider that anything that does not meet the sourcetype in the monitor file, which is configured at the heavy forwarder, to be "rubbish data"
is there any way to validate at the heavy forwarder end?
Hi @z080236,
I think that you should design with more attention your inputs!
In other words, you should:
Ciao.
Giuseppe
at first, list and analyze the data to take from a source and identify the ones that you want to index
yes, we have already done this. We already used monitor on folder at the Splunk forwarder to define the sourcetype and the index.
However, we can't stop if the external person send us invalid file types and weird content, it will just ingest in it.
So far, I have searched Splunk answers and documentation, there is no way to ensure the content is "clean".
what whitelist can do, is to monitor file extension only
For example, to monitor only files with the .log extension, make the following change:
[monitor:///mnt/logs] whitelist = \.log$
or based on the file name, but can't check the content.
This one is based on specific regex expression, doesnt seem to fit in, as we are looking for a whitelist.
Thus, if I am monitoring this folder
[monitor:///var/log/putlogshere]
whitelist = \.log$
sourcetype=xx
index=index1
and I implement the whitelist , the user can still send in a log with weird data which fulfil the whitelist condition and it will still be forwarded to the indexer, is that correct?
Hi @z080236,
At first why do you have users that can change Splunk Forwarders configurations?
Anyway if you have users that modify inputs adding stanzas that take weird logs, you could do three things:
Ciao.
Giuseppe
At first why do you have users that can change Splunk Forwarders configurations?
This is not what I want to achieve, what I want to achieve to ensure that content sent to the forwarder is clean. Based on the solution above, you proposed whitelist, and filter and routing based on regex. Another way I heard of is to check the timestamp of the log, and if it is before xx days, it wont be ingested. However, assuming I implement all the 3 checks, is that enough to ensure the content that is received from the forwarder is clean?
I can't stop if one day the remote user put a wrong macro file , rename the extension to .log and send to me.
Then, I will not be sure Splunk will just ingest the content like this or not.
Anyway if you have users that modify inputs adding stanzas that take weird logs, you could do three things:
This is not what I want to achieve, I just want the forwarder to be able to prevent those weird entries from being ingested, if one day, the user's server files got corrupted. Then, the file goes to my system and gets ingested in.
Hi @z080236,
if you want that each user can ingest each kind of data, the only hint I can give you is to monitor those inputs.
In other words, when someone needs other logs, maintain the control on the flow giving you the rules for ingestions to the user.
I say this because Splunk can filter data but it needs one or more rules in data ingestion but you're saying that there isn't any rule in ingestion and users can create every kind of ingestion.
The only way is to create a governance of ingestion:
About the check you asked about ingestion of old data, you can have this check inserting in inputs.conf on Forwarders:
ignoreOlderThan = <non-negative integer>[s|m|h|d]
as described in https://docs.splunk.com/Documentation/Splunk/Latest/Admin/Inputsconf
or on indexers, put in props.conf:
MAX_DAYS_AGO = <integer>
as you can see at https://docs.splunk.com/Documentation/Splunk/Latest/Admin/Propsconf
But in this way you solve only a little part of the problem: as I said, you have a governance problem!
Ciao.
Giuseppe
I say this because Splunk can filter data but it needs one or more rules in data ingestion but you're saying that there isn't any rule in ingestion and users can create every kind of ingestion.
This is not what I am saying.
I am saying I have already planned the sourcetype , index and filepath/filetype that I am receiving from my user.
But there's no stopping the user from putting a file that is different from the sourcetype that I am monitoring.
In the case of the above scenario, what will likely happen?
1. Splunk ingest the content and forward to the indexer, if it is not picked up by the filter.
2. Splunk filters out the content, if it is picked up by the filter with the 3 methods, timestamp, file extension/file name, regular expression whitelist.
If Splunk is monitoring a particular filepath for data in a certain format and someone inserts a file with data in a different format into that filepath then Splunk very likely will not recognize the change. It will do its best to interpret the alien file as though it was normal. Depending on how alien the file is Splunk may complain about the timestamps being in the wrong format or not in the expected location. However, Splunk will NOT say "you told me to expect sourcetype 'foo', but this is sourcetype 'bar' so I'm not touching it".