Splunk Search

makemv: Reducing a multivalued field down to a single value based on a lookup.

rturk
Builder

Hi Splunkers/Splunkettes,

To begin, I'm sorry about the length of the question.

Scenario

I have a large amount of BlueCoat proxy logs that require to be reported on by the category that has been assigned to them by the Bluecoat. Example log from the Bluecoat app datagen:

2011-06-15 10:59:31.252088 13 10.0.0.1 sneezy FTW - OBSERVED "News/Media;Reference" - - 200 TCP_HIT GET text/html - "www.associatedbank.com" - "/N/K9USERE07E/CIPCWM03" - aspx Firefox/3.6.3 125.17.14.100 12960 1071 -

In this instance, the category value is News/Media;Reference. So there are two categories: News/Media and Reference.

The Bluecoat app handles this by applying a makemv command to the category value, which effectively counts the usage for this record (1071 bytes) twice for reporting purposes... once in the News/Media category, and once in the Reference category.

End Goal

What I would like to do is redefine the category according to a priority lookup table, where the usage is only counted once in the category with the highest priority.

Given the below lookup table (category_priority.csv😞

category,   priority
---------------------
News/Media,        1
Reference,         2

Running the search over the above event should give me a single event in the 'News/Media' category with 1071 bytes against it.

The problem is, I have got this working... kinda...

What I've Tried

This search splits the category field into it's component categories and applied a priority.

eventtype=bcoat_request | makemv delim=";" allowempty=t category | lookup category_priority.csv category | table dest_host, category, priority, sc_bytes

alt text

Including a mvexpand command will break out the event into two identical events (with the exception of the category field), so a sort & a dedup here will give me what I'm after.

eventtype=bcoat_request | makemv delim=";" allowempty=t category | mvexpand category | lookup cs_category_summary.csv category | sort priority | dedup dest_host | table dest_host, category, priority

alt text

But...

Issues with this approach

  1. I have a LOT of data (~2.4 billion records a month), so dedup isn't really an option or best practice, and;
  2. Even with timestamps in the microseconds, I have identical (not duplicate) events that would be filtered out with a dedup if I used one. Adding more fields as dedup parameters is only going to make the search more expensive in terms of compute, and still no guarantee that I wont be filtering out valid use.

Question

Is there a way to do this purely on a per-event basis using eval statements? I tried applying a sort to the category field after applying the makemv command but before the mvexpand command, but that didn't take.

Sorry for the length of the question 😛 Hoping someone can help!

TL;DR: Need to reduce a multivalued field down to a single value based on a lookup.

0 Karma
1 Solution

Lucas_K
Motivator

How about something like this?

eventtype=bcoat_request | makemv delim=";" allowempty=t category | lookup category_priority.csv category | sort priority | eval new_category=mvindex(category,0) | table dest_host, new_category, priority, sc_bytes

View solution in original post

Lucas_K
Motivator

How about something like this?

eventtype=bcoat_request | makemv delim=";" allowempty=t category | lookup category_priority.csv category | sort priority | eval new_category=mvindex(category,0) | table dest_host, new_category, priority, sc_bytes

Get Updates on the Splunk Community!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer at Splunk .conf24 ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...

Share Your Ideas & Meet the Lantern team at .Conf! Plus All of This Month’s New ...

Splunk Lantern is Splunk’s customer success center that provides advice from Splunk experts on valuable data ...