Getting Data In

How to remove unneeded data from imported logs on Splunk?

alaa_ahmad
Loves-to-Learn Everything

Hi all ..

I have syslog come from Forcepoint web proxy and the size of data is very huge, I analysis the data and found some URLs come duplicated many times on same logs and i need remove this data from indexing .

the below sample for this data

Jun 3 23:59:58 xx.xx.xx.xx vendor=Forcepoint product=Security product_version=8.5.4 action=blocked severity=7 category=9 user=LDAP://xx.xx.xx.xx OU\=users,OU\=xx_xx,OU\=xxxx,DC
\=domain,DC\=xxxxxx,DC\=com,DC\=jo/XXXX  XXXXX loginID=x.xxxx src_host=xx.xx.xx.xx src_port=55231 dst_host=otelrules.azureedge.net dst_ip=13.107.227.65 dst_port=443 bytes_out=0 bytes
_in=0 http_response=0 http_method=GET http_content_type=- http_user_agent=Microsoft_Office/16.0_(Windows_NT_10.0;_Microsoft_Word_16.0.16327;_Pro) http_proxy_status_code=302 reason=- disposi
tion=1025 policy=Super_Administrator**Default role=8 duration=4 url=https://otelrules.azureedge.net/rules/rule12019v1s19.xml logRecordSource=OnPrem

Labels (1)
0 Karma

alaa_ahmad
Loves-to-Learn Everything

Hi gcusello

thank you for reply .. i mean this data no needed and its huge (almost 15 GB) and its consumed the license .

and i contacted with IT admin and he cannot remove this data from the source.

 

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @alaa_ahmad,

as I said, it's possible to filter data before indexing to reduce the license consuption, but in this way you cannot use the discarded events (or part of them).

If in your events there's a redendant part of the event that can be discarded, you have to find a regex to identify the relevant part to maintain or the not relevant part to remove.

If you cannot, the only way is a larger license.

As I said, if you want to remove the entire event you can follow the procedure described at https://docs.splunk.com/Documentation/SplunkCloud/latest/Forwarding/Routeandfilterdatad#Filter_event... taking only the relavant data and discardinf the others or discarding a part of events.

If instead you want to reduce the events, you can follow the anonymization procedure I described in the above message.

In all these solutions, you have to identify one or more regexes to identify the the part of events to maintain or to discard.

Ciao.

Giuseppe

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @alaa_ahmad,

if you have duplicated events, you have to analyze your data flow to understand why tis happens.

If you want to remove a part of all events, you can follow two approaches:

  • truncate all the chars that exceed a fixed dimension (I don't hint this!),
  • analyze your logs to find with a regex (if possible) the relevant part of your logs or tha part to remove.

then you can intervene following the same procedure to anonymize data that you can find at https://docs.splunk.com/Documentation/Splunk/9.0.4/Data/Anonymizedata

Ciao.

Giuseppe 

0 Karma
Get Updates on the Splunk Community!

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...