Hi Ninjas
I struggle with query including several "challenges".
I got proxy events like:
time="10-27-17 10:00:00" url="www.applepiesamurai.org/get_more_apple_pie" user="arnold.schwarzenegger" response_size=5000000 domain="applepiesamurai.org" url_cat="1,5"
time="10-27-17 10:02:00" url="www.applepiesamurai.org/get_more_apple_pie" user="arnold.schwarzenegger" response_size=7000000 domain="applepiesamurai.org" url_cat="1,5"
Now i have a csv lookup look like the following:
url_cat,category_name
1,"Business"
5,"Cooking"
What now wanted to create is a sum of the response_size by domain, day, user and then list the proxy category_name(s) by each domain:
Day| user| domain| response_size| category_name
10-27-17|arnold.schwarzenegger|applepiesamurai.org|12000000|Business Cooking
Turned out that this isn't quite as easy as i thought.
As url_cat can consist of multiple values i created a new field called "cat" as a multivalue field using transforms "MV_ADD=true".
In the search, I use mv_expand on cat to do the lookup and get all the category_name's by each event. But using that, the sum of the response size is misscalculated as mv_expand creates x-times events as it has different cat values and therefore multiplies the sum x-times in my stats sum command.
After wasting hours with appends and evals I had to pause before I smashed my keyboard.
Any ideas ond how i get the correct calculation but showing all the category_names?
Ok with parts of Hiroshi's query and some hints from collegues and the fact that due to that I was able to do the mvexpand after the stats sum i figured it out:
base search
| eval date=strftime(_time,"%m-%d-%y")
| stats sum(response_size) as size,by date, user, domain, url_cat
| eval link_key=url_cat
| makemv delim="," link_key
| mvexpand link_key
| lookup lookuptablename url_cat as link_key OUTPUT category_name
| stats list(*) as * by date, user, domain, size
| eval sizeMB=round(size/1024/1024,2)
| fields- size, link_key, url_cat
Now the size gets summarized before the expand, merging it together by using stats list by-
Ok with parts of Hiroshi's query and some hints from collegues and the fact that due to that I was able to do the mvexpand after the stats sum i figured it out:
base search
| eval date=strftime(_time,"%m-%d-%y")
| stats sum(response_size) as size,by date, user, domain, url_cat
| eval link_key=url_cat
| makemv delim="," link_key
| mvexpand link_key
| lookup lookuptablename url_cat as link_key OUTPUT category_name
| stats list(*) as * by date, user, domain, size
| eval sizeMB=round(size/1024/1024,2)
| fields- size, link_key, url_cat
Now the size gets summarized before the expand, merging it together by using stats list by-
Try this!
Please change the part of stats to efficient one.
(your search)
| eval link_key=url_cat
| makemv delim="," link_key
| mvexpand link_key
| lookup your_lookup_file link_key as url_cat OUTPUT category_name
| stats first(*) as *,values(category_name) as join_category_name by _raw
| nomv join_category_name
| fields - category_name
| rename join_category_name as category_name
Thats what i did now, still the same problem:
base search
| eval link_key=urlf_categories
| makemv delim="," link_key
| mvexpand link_key
| lookup proxy_categories url_cat as link_key OUTPUT category_name
| eval date=strftime(_time,"%m-%d-%y")
| stats sum(response_size) as size,values(category_name) as join_category_name by date, user, domain
| eval sizeMB=round(size/1024/1024,2)
| nomv join_category_name
| fields - category_name
| rename join_category_name as category_name
gets back exactly the same (wrong) sum size as my first attempts with the multivalue field. The mvexpand before the stats sum causes multiplication of the response_size as well, ends up with a x times higher sum as it effectly is.
At the end the query should bring back the exact same same as it would without the mvexpand adding the extra category_name field
This is a process of returning multiple events to singular.
| stats first(*) as *,values(category_name) as join_category_name by _raw
It is not the same.
| stats sum(response_size) as size,values(category_name) as join_category_nam`e by date, user, domain
You should sum it after returning.
(your search)
| eval link_key=url_cat
| makemv delim="," link_key
| mvexpand link_key
| lookup your_lookup_file link_key as url_cat OUTPUT category_name
| stats first(*) as *,values(category_name) as join_category_name by _raw
| nomv join_category_name
| fields - category_name
| rename join_category_name as category_name
| eval date=strftime(_time,"%m-%d-%y")
| stats sum(response_size) as size by date, user, domain ,category_name
| eval sizeMB=round(size/1024/1024,2)
I figure it out already, thanks