Hi all ..
I have syslog come from Forcepoint web proxy and the size of data is very huge, I analysis the data and found some URLs come duplicated many times on same logs and i need remove this data from indexing .
the below sample for this data
Jun 3 23:59:58 xx.xx.xx.xx vendor=Forcepoint product=Security product_version=8.5.4 action=blocked severity=7 category=9 user=LDAP://xx.xx.xx.xx OU\=users,OU\=xx_xx,OU\=xxxx,DC
\=domain,DC\=xxxxxx,DC\=com,DC\=jo/XXXX XXXXX loginID=x.xxxx src_host=xx.xx.xx.xx src_port=55231 dst_host=otelrules.azureedge.net dst_ip=13.107.227.65 dst_port=443 bytes_out=0 bytes
_in=0 http_response=0 http_method=GET http_content_type=- http_user_agent=Microsoft_Office/16.0_(Windows_NT_10.0;_Microsoft_Word_16.0.16327;_Pro) http_proxy_status_code=302 reason=- disposi
tion=1025 policy=Super_Administrator**Default role=8 duration=4 url=https://otelrules.azureedge.net/rules/rule12019v1s19.xml logRecordSource=OnPrem
Hi gcusello
thank you for reply .. i mean this data no needed and its huge (almost 15 GB) and its consumed the license .
and i contacted with IT admin and he cannot remove this data from the source.
Hi @alaa_ahmad,
as I said, it's possible to filter data before indexing to reduce the license consuption, but in this way you cannot use the discarded events (or part of them).
If in your events there's a redendant part of the event that can be discarded, you have to find a regex to identify the relevant part to maintain or the not relevant part to remove.
If you cannot, the only way is a larger license.
As I said, if you want to remove the entire event you can follow the procedure described at https://docs.splunk.com/Documentation/SplunkCloud/latest/Forwarding/Routeandfilterdatad#Filter_event... taking only the relavant data and discardinf the others or discarding a part of events.
If instead you want to reduce the events, you can follow the anonymization procedure I described in the above message.
In all these solutions, you have to identify one or more regexes to identify the the part of events to maintain or to discard.
Ciao.
Giuseppe
Hi @alaa_ahmad,
if you have duplicated events, you have to analyze your data flow to understand why tis happens.
If you want to remove a part of all events, you can follow two approaches:
then you can intervene following the same procedure to anonymize data that you can find at https://docs.splunk.com/Documentation/Splunk/9.0.4/Data/Anonymizedata
Ciao.
Giuseppe