Getting Data In

How to remove unneeded data from imported logs on Splunk?

alaa_ahmad
Loves-to-Learn Everything

Hi all ..

I have syslog come from Forcepoint web proxy and the size of data is very huge, I analysis the data and found some URLs come duplicated many times on same logs and i need remove this data from indexing .

the below sample for this data

Jun 3 23:59:58 xx.xx.xx.xx vendor=Forcepoint product=Security product_version=8.5.4 action=blocked severity=7 category=9 user=LDAP://xx.xx.xx.xx OU\=users,OU\=xx_xx,OU\=xxxx,DC
\=domain,DC\=xxxxxx,DC\=com,DC\=jo/XXXX  XXXXX loginID=x.xxxx src_host=xx.xx.xx.xx src_port=55231 dst_host=otelrules.azureedge.net dst_ip=13.107.227.65 dst_port=443 bytes_out=0 bytes
_in=0 http_response=0 http_method=GET http_content_type=- http_user_agent=Microsoft_Office/16.0_(Windows_NT_10.0;_Microsoft_Word_16.0.16327;_Pro) http_proxy_status_code=302 reason=- disposi
tion=1025 policy=Super_Administrator**Default role=8 duration=4 url=https://otelrules.azureedge.net/rules/rule12019v1s19.xml logRecordSource=OnPrem

Labels (1)
0 Karma

alaa_ahmad
Loves-to-Learn Everything

Hi gcusello

thank you for reply .. i mean this data no needed and its huge (almost 15 GB) and its consumed the license .

and i contacted with IT admin and he cannot remove this data from the source.

 

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @alaa_ahmad,

as I said, it's possible to filter data before indexing to reduce the license consuption, but in this way you cannot use the discarded events (or part of them).

If in your events there's a redendant part of the event that can be discarded, you have to find a regex to identify the relevant part to maintain or the not relevant part to remove.

If you cannot, the only way is a larger license.

As I said, if you want to remove the entire event you can follow the procedure described at https://docs.splunk.com/Documentation/SplunkCloud/latest/Forwarding/Routeandfilterdatad#Filter_event... taking only the relavant data and discardinf the others or discarding a part of events.

If instead you want to reduce the events, you can follow the anonymization procedure I described in the above message.

In all these solutions, you have to identify one or more regexes to identify the the part of events to maintain or to discard.

Ciao.

Giuseppe

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @alaa_ahmad,

if you have duplicated events, you have to analyze your data flow to understand why tis happens.

If you want to remove a part of all events, you can follow two approaches:

  • truncate all the chars that exceed a fixed dimension (I don't hint this!),
  • analyze your logs to find with a regex (if possible) the relevant part of your logs or tha part to remove.

then you can intervene following the same procedure to anonymize data that you can find at https://docs.splunk.com/Documentation/Splunk/9.0.4/Data/Anonymizedata

Ciao.

Giuseppe 

0 Karma
Get Updates on the Splunk Community!

Get ready to show some Splunk Certification swagger at .conf24!

Dive into the deep end of data by earning a Splunk Certification at .conf24. We're enticing you again this ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Now On-Demand Join us to learn more about how you can leverage Service Level Objectives (SLOs) and the new ...

Database Performance Sidebar Panel Now on APM Database Query Performance & Service ...

We’ve streamlined the troubleshooting experience for database-related service issues by adding a database ...