Hello experts,
How can I get an additional column in the output by matching on a common field. Lets say, below query gives me:
index=1 sourcetype=b | stats sum(price) as total by products, region
products region total
a v 111
b w 222
c x 333
d y 444
e z 555
Now, I have another source that may have tags for each product and there can be multiple tags associated to a single products. The second source has different list and count, i.e, it may have more number of products. So, how can I get something like below for the matching ones for column products from above query:
products region total tags
a v 111 qw,as
b w 222 we
c x 333
d y 444 sd,zx,xc
e z 555
Can I get some help with the query please.
Give this a try (will work fine unless the subsearch has too many rows to produce and/or takes lots of time to run)
index=1 sourcetype=b | stats sum(price) as total by products, region
| append [search index=2 sourcetype=c | stats values(tags) as tags by product]
| stats values(region) as region, values(total) as total values(tags) as "total tags" by product
| where isnotnull(region)
Give this a try (will work fine unless the subsearch has too many rows to produce and/or takes lots of time to run)
index=1 sourcetype=b | stats sum(price) as total by products, region
| append [search index=2 sourcetype=c | stats values(tags) as tags by product]
| stats values(region) as region, values(total) as total values(tags) as "total tags" by product
| where isnotnull(region)
@somesoni2 - what do you think about something like this as a way of getting around the possibility of the second search returning too much data?
index=1 sourcetype=b | stats sum(price) as total by product, region
| appendpipe
[ | stats by product | format | rename search as myproductlist
| map search="search index=2 sourcetype=c $myproductlist$ | stats values(tags) as tags by product" ]
| stats values(region) as region, values(total) as total, values(tags) as "total tags" by product
| where isnotnull(region)
Thank you for your responses @somesoni2 and @DalJeanis.
I may have multiple product values in my second source which again can have 0 or more tags. I think it will make more sense if I add some data for the second source here.
index=2 sourcetype=c | stats values(tags) as tags by product
product tags
a,f qw,as
b we
c,g,h
d sd,zx,xc
e
So, the product 'a' gets tags 'qw,as' in the final output.
Did you get a chance to try any of the answers?
There is a problem.
The products column from source1 is single valued but the productlist column in source2 is multivalued.
index=1 sourcetype=b | stats sum(price) as total by products, region
products region total
a v 111
b w 222
c x 333
d y 444
e z 555
index=2 sourcetype=c | stats values(tags) as tags by productlist
productlist tags
a,f qw,as
b we
c,g,h
d sd,zx,xc
e
Output:
products region total tags
a v 111 qw,as
b w 222 we
c x 333
d y 444 sd,zx,xc
e z 555
I guess the problem is not productlist being multivalued field (the stats should've made it single valued), but the different column names in two sources. Try these
index=1 sourcetype=b | stats sum(price) as total by products, region
| append [search index=2 sourcetype=c | stats values(tags) as tags by productlist | rename productlist as product]
| stats values(region) as region, values(total) as total values(tags) as "total tags" by product
| where isnotnull(region)
AND
( index=1 sourcetype=b) OR (index=2 sourcetype=c)
| eval product=coalesce(product,productlist)
| eventstats values(tags) as tags_all by product
| eval tags=coalesce(tags,tags_all)
| search index=1 sourcetype=b
| stats sum(price) as total values(tags) as "total tags" by product region
Umm, I still think its the problem with the multivalue. All of these queries return data for only those rows which have single values in productlist.
We are renaming productlist as product but 'productlist' may not always have the same matching column values as that in 'product'.
product
a
b
c
d
e
productlist
a,f
b
c,g,h
d
e
For the examples that I have provided, the output is going to be like below and there is not going to be any tags for product 'a':
products region total tags
a v 111
b w 222 we
c x 333
d y 444 sd,zx,xc
e z 555
subsearch query:
append [ search index=2 sourcetype=c | eval temp=split(productlist,",") | mvexpand temp | stats values(tags) as tags by temp | rename temp as product ]
Thank you @somesoni2 and @DalJeanis for helping with this.
Map command in a subsearch can be brutal (there are good chances that it'll get auto-finalized after 2 mins if there are many products). I would rather use an eventstats implementation for that (to avoid subsearches completely) like this
( index=1 sourcetype=b) OR (index=2 sourcetype=c)
| eventstats values(tags) as tags_all by product
| eval tags=coalesce(tags,tags_all)
| search index=1 sourcetype=b
| stats sum(price) as total values(tags) as "total tags" by product region