Getting Data In

How to remove unneeded data from imported logs on Splunk?

alaa_ahmad
Loves-to-Learn Everything

Hi all ..

I have syslog come from Forcepoint web proxy and the size of data is very huge, I analysis the data and found some URLs come duplicated many times on same logs and i need remove this data from indexing .

the below sample for this data

Jun 3 23:59:58 xx.xx.xx.xx vendor=Forcepoint product=Security product_version=8.5.4 action=blocked severity=7 category=9 user=LDAP://xx.xx.xx.xx OU\=users,OU\=xx_xx,OU\=xxxx,DC
\=domain,DC\=xxxxxx,DC\=com,DC\=jo/XXXX  XXXXX loginID=x.xxxx src_host=xx.xx.xx.xx src_port=55231 dst_host=otelrules.azureedge.net dst_ip=13.107.227.65 dst_port=443 bytes_out=0 bytes
_in=0 http_response=0 http_method=GET http_content_type=- http_user_agent=Microsoft_Office/16.0_(Windows_NT_10.0;_Microsoft_Word_16.0.16327;_Pro) http_proxy_status_code=302 reason=- disposi
tion=1025 policy=Super_Administrator**Default role=8 duration=4 url=https://otelrules.azureedge.net/rules/rule12019v1s19.xml logRecordSource=OnPrem

Labels (1)
0 Karma

alaa_ahmad
Loves-to-Learn Everything

Hi gcusello

thank you for reply .. i mean this data no needed and its huge (almost 15 GB) and its consumed the license .

and i contacted with IT admin and he cannot remove this data from the source.

 

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @alaa_ahmad,

as I said, it's possible to filter data before indexing to reduce the license consuption, but in this way you cannot use the discarded events (or part of them).

If in your events there's a redendant part of the event that can be discarded, you have to find a regex to identify the relevant part to maintain or the not relevant part to remove.

If you cannot, the only way is a larger license.

As I said, if you want to remove the entire event you can follow the procedure described at https://docs.splunk.com/Documentation/SplunkCloud/latest/Forwarding/Routeandfilterdatad#Filter_event... taking only the relavant data and discardinf the others or discarding a part of events.

If instead you want to reduce the events, you can follow the anonymization procedure I described in the above message.

In all these solutions, you have to identify one or more regexes to identify the the part of events to maintain or to discard.

Ciao.

Giuseppe

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @alaa_ahmad,

if you have duplicated events, you have to analyze your data flow to understand why tis happens.

If you want to remove a part of all events, you can follow two approaches:

  • truncate all the chars that exceed a fixed dimension (I don't hint this!),
  • analyze your logs to find with a regex (if possible) the relevant part of your logs or tha part to remove.

then you can intervene following the same procedure to anonymize data that you can find at https://docs.splunk.com/Documentation/Splunk/9.0.4/Data/Anonymizedata

Ciao.

Giuseppe 

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...