Splunk Search

makemv: Reducing a multivalued field down to a single value based on a lookup.

Builder

Hi Splunkers/Splunkettes,

To begin, I'm sorry about the length of the question.

Scenario

I have a large amount of BlueCoat proxy logs that require to be reported on by the category that has been assigned to them by the Bluecoat. Example log from the Bluecoat app datagen:

2011-06-15 10:59:31.252088 13 10.0.0.1 sneezy FTW - OBSERVED "News/Media;Reference" - - 200 TCP_HIT GET text/html - "www.associatedbank.com" - "/N/K9USERE07E/CIPCWM03" - aspx Firefox/3.6.3 125.17.14.100 12960 1071 -

In this instance, the category value is News/Media;Reference. So there are two categories: News/Media and Reference.

The Bluecoat app handles this by applying a makemv command to the category value, which effectively counts the usage for this record (1071 bytes) twice for reporting purposes... once in the News/Media category, and once in the Reference category.

End Goal

What I would like to do is redefine the category according to a priority lookup table, where the usage is only counted once in the category with the highest priority.

Given the below lookup table (category_priority.csv😞

category,   priority
---------------------
News/Media,        1
Reference,         2

Running the search over the above event should give me a single event in the 'News/Media' category with 1071 bytes against it.

The problem is, I have got this working... kinda...

What I've Tried

This search splits the category field into it's component categories and applied a priority.

eventtype=bcoat_request | makemv delim=";" allowempty=t category | lookup category_priority.csv category | table dest_host, category, priority, sc_bytes

alt text

Including a mvexpand command will break out the event into two identical events (with the exception of the category field), so a sort & a dedup here will give me what I'm after.

eventtype=bcoat_request | makemv delim=";" allowempty=t category | mvexpand category | lookup cs_category_summary.csv category | sort priority | dedup dest_host | table dest_host, category, priority

alt text

But...

Issues with this approach

  1. I have a LOT of data (~2.4 billion records a month), so dedup isn't really an option or best practice, and;
  2. Even with timestamps in the microseconds, I have identical (not duplicate) events that would be filtered out with a dedup if I used one. Adding more fields as dedup parameters is only going to make the search more expensive in terms of compute, and still no guarantee that I wont be filtering out valid use.

Question

Is there a way to do this purely on a per-event basis using eval statements? I tried applying a sort to the category field after applying the makemv command but before the mvexpand command, but that didn't take.

Sorry for the length of the question 😛 Hoping someone can help!

TL;DR: Need to reduce a multivalued field down to a single value based on a lookup.

0 Karma
1 Solution

Motivator

How about something like this?

eventtype=bcoatrequest | makemv delim=";" allowempty=t category | lookup categorypriority.csv category | sort priority | eval newcategory=mvindex(category,0) | table desthost, newcategory, priority, scbytes

View solution in original post

Motivator

How about something like this?

eventtype=bcoatrequest | makemv delim=";" allowempty=t category | lookup categorypriority.csv category | sort priority | eval newcategory=mvindex(category,0) | table desthost, newcategory, priority, scbytes

View solution in original post