Splunk Search

Combine mvexpand and stats(sum) in one command

claudio_manig
Communicator

Hi Ninjas

I struggle with query including several "challenges".

I got proxy events like:

time="10-27-17 10:00:00" url="www.applepiesamurai.org/get_more_apple_pie" user="arnold.schwarzenegger" response_size=5000000 domain="applepiesamurai.org" url_cat="1,5"
time="10-27-17 10:02:00" url="www.applepiesamurai.org/get_more_apple_pie" user="arnold.schwarzenegger" response_size=7000000 domain="applepiesamurai.org" url_cat="1,5"

Now i have a csv lookup look like the following:

url_cat,category_name
1,"Business"
5,"Cooking"

What now wanted to create is a sum of the response_size by domain, day, user and then list the proxy category_name(s) by each domain:

Day| user| domain| response_size| category_name
10-27-17|arnold.schwarzenegger|applepiesamurai.org|12000000|Business Cooking

Turned out that this isn't quite as easy as i thought.

As url_cat can consist of multiple values i created a new field called "cat" as a multivalue field using transforms "MV_ADD=true".

In the search, I use mv_expand on cat to do the lookup and get all the category_name's by each event. But using that, the sum of the response size is misscalculated as mv_expand creates x-times events as it has different cat values and therefore multiplies the sum x-times in my stats sum command.

After wasting hours with appends and evals I had to pause before I smashed my keyboard.

Any ideas ond how i get the correct calculation but showing all the category_names?

0 Karma
1 Solution

claudio_manig
Communicator

Ok with parts of Hiroshi's query and some hints from collegues and the fact that due to that I was able to do the mvexpand after the stats sum i figured it out:

base search
| eval date=strftime(_time,"%m-%d-%y")
| stats sum(response_size) as size,by date, user, domain, url_cat
| eval link_key=url_cat
| makemv delim="," link_key
| mvexpand link_key
| lookup lookuptablename url_cat as link_key OUTPUT category_name
| stats list(*) as * by date, user, domain, size
| eval sizeMB=round(size/1024/1024,2)
| fields- size, link_key, url_cat

Now the size gets summarized before the expand, merging it together by using stats list by-

View solution in original post

0 Karma

claudio_manig
Communicator

Ok with parts of Hiroshi's query and some hints from collegues and the fact that due to that I was able to do the mvexpand after the stats sum i figured it out:

base search
| eval date=strftime(_time,"%m-%d-%y")
| stats sum(response_size) as size,by date, user, domain, url_cat
| eval link_key=url_cat
| makemv delim="," link_key
| mvexpand link_key
| lookup lookuptablename url_cat as link_key OUTPUT category_name
| stats list(*) as * by date, user, domain, size
| eval sizeMB=round(size/1024/1024,2)
| fields- size, link_key, url_cat

Now the size gets summarized before the expand, merging it together by using stats list by-

0 Karma

HiroshiSatoh
Champion

Try this!
Please change the part of stats to efficient one.

(your search)
| eval link_key=url_cat
| makemv delim="," link_key
| mvexpand link_key
| lookup your_lookup_file link_key as url_cat OUTPUT category_name
| stats first(*) as *,values(category_name) as join_category_name by _raw
| nomv join_category_name
| fields - category_name
| rename join_category_name as category_name

claudio_manig
Communicator

Thats what i did now, still the same problem:
base search
| eval link_key=urlf_categories
| makemv delim="," link_key
| mvexpand link_key
| lookup proxy_categories url_cat as link_key OUTPUT category_name
| eval date=strftime(_time,"%m-%d-%y")
| stats sum(response_size) as size,values(category_name) as join_category_name by date, user, domain
| eval sizeMB=round(size/1024/1024,2)
| nomv join_category_name
| fields - category_name
| rename join_category_name as category_name

gets back exactly the same (wrong) sum size as my first attempts with the multivalue field. The mvexpand before the stats sum causes multiplication of the response_size as well, ends up with a x times higher sum as it effectly is.
At the end the query should bring back the exact same same as it would without the mvexpand adding the extra category_name field

0 Karma

HiroshiSatoh
Champion

This is a process of returning multiple events to singular.

| stats first(*) as *,values(category_name) as join_category_name by _raw

It is not the same.

| stats sum(response_size) as size,values(category_name) as join_category_nam`e by date, user, domain

You should sum it after returning.

 (your search)
 | eval link_key=url_cat
 | makemv delim="," link_key
 | mvexpand link_key
 | lookup your_lookup_file link_key as url_cat OUTPUT category_name
 | stats first(*) as *,values(category_name) as join_category_name by _raw
 | nomv join_category_name
 | fields - category_name
 | rename join_category_name as category_name
 | eval date=strftime(_time,"%m-%d-%y")
 | stats sum(response_size) as size by date, user, domain ,category_name
 | eval sizeMB=round(size/1024/1024,2)
0 Karma

claudio_manig
Communicator

I figure it out already, thanks

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...