Splunk Search

Newbee and blacklistng

s05tsom
New Member

I am getting killed on licensing with the amount of useless data from my IronPort WSA. At this point Splunk is being utilized by HR for individual IP reporting. What is the best place to block the data that I don't need. The junk in the < > is useless to them except the C_A110. Any help would be greatly appreciated.

|œ       ââL1289937535.401 32 10.135.73.188 TCP_MISS/304 229 GET http://photos-b.ak.fbcdn.net/photos-ak-snc1/v27562/209/148475945166653/app_2_148475945166653_1896.gi... - DIRECT/photos-b.ak.fbcdn.net image/gif ALLOW_CUSTOMCAT_11-Aurora_Base_Policy-DefaultGroup-NONE-NONE-NONE-DefaultGroup <C_All0,-,"-","-",-,-,-,"-","-",-,-,-,"-","-",-,"-","-",-,-,-,-,"-","-","-","-","-","-",57.25,0,-,"-","-"> -_h::1 _s::1 _st::1 _indextime::1290789832 timestartpos::0 timeendpos::14 _subsecond::.401 date_second::55 date_hour::19 date_minute::58 date_year::2010 date_month::november date_mday::16 date_wday::tuesday date_zone::0 punct::.__..._/___://-.../--////._-_/-..._/_------_<,-,\"- 
0 Karma

Genti
Splunk Employee
Splunk Employee

This can be done, however using it might be expensive and you should be worried about indexing performance. How much data a day are you gathering from this source? You will have to calculate if decrease in performance is more valuable then increase in your license volume...

Using props and transforms, just the same way that you perform field extractions you can tell splunk to change what it considers as raw data.

something like this should be what you want:

  1. In props.conf:

    [source::]
    TRANSFORMS-set= crop

  2. In transforms.conf:

    [setnull]
    REGEX =
    DEST_KEY = _raw
    FORMAT = $1

in your case i believe the regex should be something like:

( .*\s<\w+)

This will capture the following:

|œ       ââL1289937535.401 32 10.135.73.188 TCP_MISS/304 229 GET http://photos-b.ak.fbcdn.net/photos-ak-snc1/v27562/209/148475945166653/app_2_148475945166653_1896.gi... - DIRECT/photos-b.ak.fbcdn.net image/gif ALLOW_CUSTOMCAT_11-Aurora_Base_Policy-DefaultGroup-NONE-NONE-NONE-DefaultGroup <C_All0

AND SHOULD through away the rest

,-,"-","-",-,-,-,"-","-",-,-,-,"-","-",-,"-","-",-,-,-,-,"-","-","-","-","-","-",57.25,0,-,"-","-">

I'd suggest testing in a dev environment, for both performance and to see if it works or not.

hope this helps,
.gz

0 Karma

s05tsom
New Member

When we originally put the WSA in, we were indexing roughly 10 Gigs per day. We acquired a small business partner with not a noticeable difference in volume, but since the upgrade, we are roughly 22 Gigs per day. At this point, performance wouldn't be an issue, we could have the files moved off hours or every 4 - 6 hours. Thank you for the input and will update after the changes.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...