Splunk Search

Combine mvexpand and stats(sum) in one command

claudio_manig
Communicator

Hi Ninjas

I struggle with query including several "challenges".

I got proxy events like:

time="10-27-17 10:00:00" url="www.applepiesamurai.org/get_more_apple_pie" user="arnold.schwarzenegger" response_size=5000000 domain="applepiesamurai.org" url_cat="1,5"
time="10-27-17 10:02:00" url="www.applepiesamurai.org/get_more_apple_pie" user="arnold.schwarzenegger" response_size=7000000 domain="applepiesamurai.org" url_cat="1,5"

Now i have a csv lookup look like the following:

url_cat,category_name
1,"Business"
5,"Cooking"

What now wanted to create is a sum of the response_size by domain, day, user and then list the proxy category_name(s) by each domain:

Day| user| domain| response_size| category_name
10-27-17|arnold.schwarzenegger|applepiesamurai.org|12000000|Business Cooking

Turned out that this isn't quite as easy as i thought.

As url_cat can consist of multiple values i created a new field called "cat" as a multivalue field using transforms "MV_ADD=true".

In the search, I use mv_expand on cat to do the lookup and get all the category_name's by each event. But using that, the sum of the response size is misscalculated as mv_expand creates x-times events as it has different cat values and therefore multiplies the sum x-times in my stats sum command.

After wasting hours with appends and evals I had to pause before I smashed my keyboard.

Any ideas ond how i get the correct calculation but showing all the category_names?

0 Karma
1 Solution

claudio_manig
Communicator

Ok with parts of Hiroshi's query and some hints from collegues and the fact that due to that I was able to do the mvexpand after the stats sum i figured it out:

base search
| eval date=strftime(_time,"%m-%d-%y")
| stats sum(response_size) as size,by date, user, domain, url_cat
| eval link_key=url_cat
| makemv delim="," link_key
| mvexpand link_key
| lookup lookuptablename url_cat as link_key OUTPUT category_name
| stats list(*) as * by date, user, domain, size
| eval sizeMB=round(size/1024/1024,2)
| fields- size, link_key, url_cat

Now the size gets summarized before the expand, merging it together by using stats list by-

View solution in original post

0 Karma

claudio_manig
Communicator

Ok with parts of Hiroshi's query and some hints from collegues and the fact that due to that I was able to do the mvexpand after the stats sum i figured it out:

base search
| eval date=strftime(_time,"%m-%d-%y")
| stats sum(response_size) as size,by date, user, domain, url_cat
| eval link_key=url_cat
| makemv delim="," link_key
| mvexpand link_key
| lookup lookuptablename url_cat as link_key OUTPUT category_name
| stats list(*) as * by date, user, domain, size
| eval sizeMB=round(size/1024/1024,2)
| fields- size, link_key, url_cat

Now the size gets summarized before the expand, merging it together by using stats list by-

0 Karma

HiroshiSatoh
Champion

Try this!
Please change the part of stats to efficient one.

(your search)
| eval link_key=url_cat
| makemv delim="," link_key
| mvexpand link_key
| lookup your_lookup_file link_key as url_cat OUTPUT category_name
| stats first(*) as *,values(category_name) as join_category_name by _raw
| nomv join_category_name
| fields - category_name
| rename join_category_name as category_name

claudio_manig
Communicator

Thats what i did now, still the same problem:
base search
| eval link_key=urlf_categories
| makemv delim="," link_key
| mvexpand link_key
| lookup proxy_categories url_cat as link_key OUTPUT category_name
| eval date=strftime(_time,"%m-%d-%y")
| stats sum(response_size) as size,values(category_name) as join_category_name by date, user, domain
| eval sizeMB=round(size/1024/1024,2)
| nomv join_category_name
| fields - category_name
| rename join_category_name as category_name

gets back exactly the same (wrong) sum size as my first attempts with the multivalue field. The mvexpand before the stats sum causes multiplication of the response_size as well, ends up with a x times higher sum as it effectly is.
At the end the query should bring back the exact same same as it would without the mvexpand adding the extra category_name field

0 Karma

HiroshiSatoh
Champion

This is a process of returning multiple events to singular.

| stats first(*) as *,values(category_name) as join_category_name by _raw

It is not the same.

| stats sum(response_size) as size,values(category_name) as join_category_nam`e by date, user, domain

You should sum it after returning.

 (your search)
 | eval link_key=url_cat
 | makemv delim="," link_key
 | mvexpand link_key
 | lookup your_lookup_file link_key as url_cat OUTPUT category_name
 | stats first(*) as *,values(category_name) as join_category_name by _raw
 | nomv join_category_name
 | fields - category_name
 | rename join_category_name as category_name
 | eval date=strftime(_time,"%m-%d-%y")
 | stats sum(response_size) as size by date, user, domain ,category_name
 | eval sizeMB=round(size/1024/1024,2)
0 Karma

claudio_manig
Communicator

I figure it out already, thanks

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...