Getting Data In

How to remove unneeded data from imported logs on Splunk?

alaa_ahmad
Loves-to-Learn Everything

Hi all ..

I have syslog come from Forcepoint web proxy and the size of data is very huge, I analysis the data and found some URLs come duplicated many times on same logs and i need remove this data from indexing .

the below sample for this data

Jun 3 23:59:58 xx.xx.xx.xx vendor=Forcepoint product=Security product_version=8.5.4 action=blocked severity=7 category=9 user=LDAP://xx.xx.xx.xx OU\=users,OU\=xx_xx,OU\=xxxx,DC
\=domain,DC\=xxxxxx,DC\=com,DC\=jo/XXXX  XXXXX loginID=x.xxxx src_host=xx.xx.xx.xx src_port=55231 dst_host=otelrules.azureedge.net dst_ip=13.107.227.65 dst_port=443 bytes_out=0 bytes
_in=0 http_response=0 http_method=GET http_content_type=- http_user_agent=Microsoft_Office/16.0_(Windows_NT_10.0;_Microsoft_Word_16.0.16327;_Pro) http_proxy_status_code=302 reason=- disposi
tion=1025 policy=Super_Administrator**Default role=8 duration=4 url=https://otelrules.azureedge.net/rules/rule12019v1s19.xml logRecordSource=OnPrem

Labels (1)
0 Karma

alaa_ahmad
Loves-to-Learn Everything

Hi gcusello

thank you for reply .. i mean this data no needed and its huge (almost 15 GB) and its consumed the license .

and i contacted with IT admin and he cannot remove this data from the source.

 

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @alaa_ahmad,

as I said, it's possible to filter data before indexing to reduce the license consuption, but in this way you cannot use the discarded events (or part of them).

If in your events there's a redendant part of the event that can be discarded, you have to find a regex to identify the relevant part to maintain or the not relevant part to remove.

If you cannot, the only way is a larger license.

As I said, if you want to remove the entire event you can follow the procedure described at https://docs.splunk.com/Documentation/SplunkCloud/latest/Forwarding/Routeandfilterdatad#Filter_event... taking only the relavant data and discardinf the others or discarding a part of events.

If instead you want to reduce the events, you can follow the anonymization procedure I described in the above message.

In all these solutions, you have to identify one or more regexes to identify the the part of events to maintain or to discard.

Ciao.

Giuseppe

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @alaa_ahmad,

if you have duplicated events, you have to analyze your data flow to understand why tis happens.

If you want to remove a part of all events, you can follow two approaches:

  • truncate all the chars that exceed a fixed dimension (I don't hint this!),
  • analyze your logs to find with a regex (if possible) the relevant part of your logs or tha part to remove.

then you can intervene following the same procedure to anonymize data that you can find at https://docs.splunk.com/Documentation/Splunk/9.0.4/Data/Anonymizedata

Ciao.

Giuseppe 

0 Karma
Get Updates on the Splunk Community!

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

March Community Office Hours Security Series Uncovered!

Hello Splunk Community! In March, Splunk Community Office Hours spotlighted our fabulous Splunk Threat ...