topic Re: Lookup command distinct column values in Getting Data In

Lookup command distinct column values

matthewb4 — Wed, 31 May 2017 17:29:49 GMT

I am trying to copy the counts of field X (already in logs) into a new field Y (from a lookup csv) so that they have the exact same counts but field Y has better named values.

The problem I am running into is that my csv file has multiple of the same X values in it so instead of field Y ending up with the same counts as field X, it is becoming (X * # of matched duplicates in csv). Is there any way for me to only get distinct rows from a lookup csv based on the matching field value (X in this case).

I am currently just using this... "lookup test.csv X AS X OUTPUTNEW Y AS Y"

And my csv file look something like this

Y      X      Z

y1  |  x1  |  z1
y1  |  x1  |  z2
y1  |  x1  |  z3
y2  |  x2  |  z1
y2  |  x2  |  z2
y3  |  x3  |  z1
y3  |  x3  |  z2
y3  |  x3  |  z3
y3  |  x3  |  z4
y3  |  x3  |  z5
y3  |  x3  |  z6

Given the previous csv file, if my counts for x1 in splunk logs were 1000, my new y1 value in field Y would get the value 3000. This is unwanted behavior for me, I would like for them to be the same. Any help on how I can modify my query is greatly appreciated.

Re: Lookup command distinct column values

woodcock — Wed, 31 May 2017 18:13:51 GMT

Like this:

... | lookup test.csv X OUTPUTNEW Y | eval Y = mvindex(Y, 0) | ...

Re: Lookup command distinct column values

matthewb4 — Wed, 31 May 2017 18:21:23 GMT

Great! This worked, do you mind explaining what mvindex is doing exactly? Each time outputnew is run for a specific Y value is it appending it to a multivalue list? I thought that it was simply adding to the current count, not creating a new index every time.

Re: Lookup command distinct column values

woodcock — Wed, 31 May 2017 18:30:20 GMT

The lookup is creating a multivalued field so we just take the first value and throw away the rest. Probably an even better solution would be to use | eval Y = mvdedup(Y) which would keep every distinct value of Y (which, if your lookup is correct should be only a single value, but if the lookup is broken, this version of the solution would give you some hope of discovering that).

Re: Lookup command distinct column values

woodcock — Wed, 31 May 2017 18:31:16 GMT

Don't forget to click Accept!

Re: Lookup command distinct column values

matthewb4 — Wed, 31 May 2017 19:00:53 GMT

Thank you for the followup, I just have one more question. How was I to know that this field would become multivalued. If I omit the mvindex command, I do not see a comma delimited list or anything like that. I don't even see something like "y1, y1, y1".

Re: Lookup command distinct column values

woodcock — Wed, 31 May 2017 19:10:24 GMT

A single value of X has multiple values of Y in your lookup file. How else could this be sensibly handled other than to create a multivalued output for Y. I guess the answer is "common sense" but maybe not totally obvious. I have requested Splunk update the documentation here:

http://docs.splunk.com/Documentation/Splunk/6.6.0/SearchReference/Lookup

Re: Lookup command distinct column values

matthewb4 — Wed, 31 May 2017 19:26:27 GMT

Would there be any way for me to see this multivalued list as a sanity check. I only see the combined chunk count for each new Y value, which is limiting my comprehension a bit. I think why this is most troubling to me is that when you said " single value of X has multiple values of Y in your lookup file. How else could this be sensibly handled other than to create a multivalued output for Y?" In my case the multiple values of Y are actually all the exact same string for the same X, I'm guessing this is why it's not actually showing up as a list but being treated as one?

Re: Lookup command distinct column values

woodcock — Wed, 31 May 2017 19:30:16 GMT

Sure, do this:

... lookup test.csv X OUTPUTNEW Y | nomv Y ...

This will flatten the list into a single value.