Splunk Search

Combine mvexpand and stats(sum) in one command

claudio_manig
Communicator

Hi Ninjas

I struggle with query including several "challenges".

I got proxy events like:

time="10-27-17 10:00:00" url="www.applepiesamurai.org/get_more_apple_pie" user="arnold.schwarzenegger" response_size=5000000 domain="applepiesamurai.org" url_cat="1,5"
time="10-27-17 10:02:00" url="www.applepiesamurai.org/get_more_apple_pie" user="arnold.schwarzenegger" response_size=7000000 domain="applepiesamurai.org" url_cat="1,5"

Now i have a csv lookup look like the following:

url_cat,category_name
1,"Business"
5,"Cooking"

What now wanted to create is a sum of the response_size by domain, day, user and then list the proxy category_name(s) by each domain:

Day| user| domain| response_size| category_name
10-27-17|arnold.schwarzenegger|applepiesamurai.org|12000000|Business Cooking

Turned out that this isn't quite as easy as i thought.

As url_cat can consist of multiple values i created a new field called "cat" as a multivalue field using transforms "MV_ADD=true".

In the search, I use mv_expand on cat to do the lookup and get all the category_name's by each event. But using that, the sum of the response size is misscalculated as mv_expand creates x-times events as it has different cat values and therefore multiplies the sum x-times in my stats sum command.

After wasting hours with appends and evals I had to pause before I smashed my keyboard.

Any ideas ond how i get the correct calculation but showing all the category_names?

0 Karma
1 Solution

claudio_manig
Communicator

Ok with parts of Hiroshi's query and some hints from collegues and the fact that due to that I was able to do the mvexpand after the stats sum i figured it out:

base search
| eval date=strftime(_time,"%m-%d-%y")
| stats sum(response_size) as size,by date, user, domain, url_cat
| eval link_key=url_cat
| makemv delim="," link_key
| mvexpand link_key
| lookup lookuptablename url_cat as link_key OUTPUT category_name
| stats list(*) as * by date, user, domain, size
| eval sizeMB=round(size/1024/1024,2)
| fields- size, link_key, url_cat

Now the size gets summarized before the expand, merging it together by using stats list by-

View solution in original post

0 Karma

claudio_manig
Communicator

Ok with parts of Hiroshi's query and some hints from collegues and the fact that due to that I was able to do the mvexpand after the stats sum i figured it out:

base search
| eval date=strftime(_time,"%m-%d-%y")
| stats sum(response_size) as size,by date, user, domain, url_cat
| eval link_key=url_cat
| makemv delim="," link_key
| mvexpand link_key
| lookup lookuptablename url_cat as link_key OUTPUT category_name
| stats list(*) as * by date, user, domain, size
| eval sizeMB=round(size/1024/1024,2)
| fields- size, link_key, url_cat

Now the size gets summarized before the expand, merging it together by using stats list by-

0 Karma

HiroshiSatoh
Champion

Try this!
Please change the part of stats to efficient one.

(your search)
| eval link_key=url_cat
| makemv delim="," link_key
| mvexpand link_key
| lookup your_lookup_file link_key as url_cat OUTPUT category_name
| stats first(*) as *,values(category_name) as join_category_name by _raw
| nomv join_category_name
| fields - category_name
| rename join_category_name as category_name

claudio_manig
Communicator

Thats what i did now, still the same problem:
base search
| eval link_key=urlf_categories
| makemv delim="," link_key
| mvexpand link_key
| lookup proxy_categories url_cat as link_key OUTPUT category_name
| eval date=strftime(_time,"%m-%d-%y")
| stats sum(response_size) as size,values(category_name) as join_category_name by date, user, domain
| eval sizeMB=round(size/1024/1024,2)
| nomv join_category_name
| fields - category_name
| rename join_category_name as category_name

gets back exactly the same (wrong) sum size as my first attempts with the multivalue field. The mvexpand before the stats sum causes multiplication of the response_size as well, ends up with a x times higher sum as it effectly is.
At the end the query should bring back the exact same same as it would without the mvexpand adding the extra category_name field

0 Karma

HiroshiSatoh
Champion

This is a process of returning multiple events to singular.

| stats first(*) as *,values(category_name) as join_category_name by _raw

It is not the same.

| stats sum(response_size) as size,values(category_name) as join_category_nam`e by date, user, domain

You should sum it after returning.

 (your search)
 | eval link_key=url_cat
 | makemv delim="," link_key
 | mvexpand link_key
 | lookup your_lookup_file link_key as url_cat OUTPUT category_name
 | stats first(*) as *,values(category_name) as join_category_name by _raw
 | nomv join_category_name
 | fields - category_name
 | rename join_category_name as category_name
 | eval date=strftime(_time,"%m-%d-%y")
 | stats sum(response_size) as size by date, user, domain ,category_name
 | eval sizeMB=round(size/1024/1024,2)
0 Karma

claudio_manig
Communicator

I figure it out already, thanks

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...