Splunk Search

Combine mvexpand and stats(sum) in one command

Communicator

Hi Ninjas

I struggle with query including several "challenges".

I got proxy events like:

time="10-27-17 10:00:00" url="www.applepiesamurai.org/get_more_apple_pie" user="arnold.schwarzenegger" response_size=5000000 domain="applepiesamurai.org" url_cat="1,5"
time="10-27-17 10:02:00" url="www.applepiesamurai.org/get_more_apple_pie" user="arnold.schwarzenegger" response_size=7000000 domain="applepiesamurai.org" url_cat="1,5"

Now i have a csv lookup look like the following:

url_cat,category_name
1,"Business"
5,"Cooking"

What now wanted to create is a sum of the response_size by domain, day, user and then list the proxy category_name(s) by each domain:

Day| user| domain| response_size| category_name
10-27-17|arnold.schwarzenegger|applepiesamurai.org|12000000|Business Cooking

Turned out that this isn't quite as easy as i thought.

As url_cat can consist of multiple values i created a new field called "cat" as a multivalue field using transforms "MV_ADD=true".

In the search, I use mv_expand on cat to do the lookup and get all the category_name's by each event. But using that, the sum of the response size is misscalculated as mv_expand creates x-times events as it has different cat values and therefore multiplies the sum x-times in my stats sum command.

After wasting hours with appends and evals I had to pause before I smashed my keyboard.

Any ideas ond how i get the correct calculation but showing all the category_names?

0 Karma
1 Solution

Communicator

Ok with parts of Hiroshi's query and some hints from collegues and the fact that due to that I was able to do the mvexpand after the stats sum i figured it out:

base search
| eval date=strftime(_time,"%m-%d-%y")
| stats sum(response_size) as size,by date, user, domain, url_cat
| eval link_key=url_cat
| makemv delim="," link_key
| mvexpand link_key
| lookup lookuptablename url_cat as link_key OUTPUT category_name
| stats list(*) as * by date, user, domain, size
| eval sizeMB=round(size/1024/1024,2)
| fields- size, link_key, url_cat

Now the size gets summarized before the expand, merging it together by using stats list by-

View solution in original post

0 Karma

Communicator

Ok with parts of Hiroshi's query and some hints from collegues and the fact that due to that I was able to do the mvexpand after the stats sum i figured it out:

base search
| eval date=strftime(_time,"%m-%d-%y")
| stats sum(response_size) as size,by date, user, domain, url_cat
| eval link_key=url_cat
| makemv delim="," link_key
| mvexpand link_key
| lookup lookuptablename url_cat as link_key OUTPUT category_name
| stats list(*) as * by date, user, domain, size
| eval sizeMB=round(size/1024/1024,2)
| fields- size, link_key, url_cat

Now the size gets summarized before the expand, merging it together by using stats list by-

View solution in original post

0 Karma

Champion

Try this!
Please change the part of stats to efficient one.

(your search)
| eval link_key=url_cat
| makemv delim="," link_key
| mvexpand link_key
| lookup your_lookup_file link_key as url_cat OUTPUT category_name
| stats first(*) as *,values(category_name) as join_category_name by _raw
| nomv join_category_name
| fields - category_name
| rename join_category_name as category_name

Communicator

Thats what i did now, still the same problem:
base search
| eval link_key=urlf_categories
| makemv delim="," link_key
| mvexpand link_key
| lookup proxy_categories url_cat as link_key OUTPUT category_name
| eval date=strftime(_time,"%m-%d-%y")
| stats sum(response_size) as size,values(category_name) as join_category_name by date, user, domain
| eval sizeMB=round(size/1024/1024,2)
| nomv join_category_name
| fields - category_name
| rename join_category_name as category_name

gets back exactly the same (wrong) sum size as my first attempts with the multivalue field. The mvexpand before the stats sum causes multiplication of the response_size as well, ends up with a x times higher sum as it effectly is.
At the end the query should bring back the exact same same as it would without the mvexpand adding the extra category_name field

0 Karma

Champion

This is a process of returning multiple events to singular.

| stats first(*) as *,values(category_name) as join_category_name by _raw

It is not the same.

| stats sum(response_size) as size,values(category_name) as join_category_nam`e by date, user, domain

You should sum it after returning.

 (your search)
 | eval link_key=url_cat
 | makemv delim="," link_key
 | mvexpand link_key
 | lookup your_lookup_file link_key as url_cat OUTPUT category_name
 | stats first(*) as *,values(category_name) as join_category_name by _raw
 | nomv join_category_name
 | fields - category_name
 | rename join_category_name as category_name
 | eval date=strftime(_time,"%m-%d-%y")
 | stats sum(response_size) as size by date, user, domain ,category_name
 | eval sizeMB=round(size/1024/1024,2)
0 Karma

Communicator

I figure it out already, thanks

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!