Solved: Lookup command distinct column values

matthewb4 · ‎05-31-2017

I am trying to copy the counts of field X (already in logs) into a new field Y (from a lookup csv) so that they have the exact same counts but field Y has better named values.

The problem I am running into is that my csv file has multiple of the same X values in it so instead of field Y ending up with the same counts as field X, it is becoming (X * # of matched duplicates in csv). Is there any way for me to only get distinct rows from a lookup csv based on the matching field value (X in this case).

I am currently just using this... "lookup test.csv X AS X OUTPUTNEW Y AS Y"

And my csv file look something like this

Y      X      Z

y1  |  x1  |  z1
y1  |  x1  |  z2
y1  |  x1  |  z3
y2  |  x2  |  z1
y2  |  x2  |  z2
y3  |  x3  |  z1
y3  |  x3  |  z2
y3  |  x3  |  z3
y3  |  x3  |  z4
y3  |  x3  |  z5
y3  |  x3  |  z6

Given the previous csv file, if my counts for x1 in splunk logs were 1000, my new y1 value in field Y would get the value 3000. This is unwanted behavior for me, I would like for them to be the same. Any help on how I can modify my query is greatly appreciated.

woodcock · ‎05-31-2017

Like this:

... | lookup test.csv X OUTPUTNEW Y | eval Y = mvindex(Y, 0) | ...

View solution in original post

woodcock · ‎05-31-2017

Like this:

... | lookup test.csv X OUTPUTNEW Y | eval Y = mvindex(Y, 0) | ...

matthewb4 · ‎05-31-2017

Great! This worked, do you mind explaining what mvindex is doing exactly? Each time outputnew is run for a specific Y value is it appending it to a multivalue list? I thought that it was simply adding to the current count, not creating a new index every time.

woodcock · ‎05-31-2017

Don't forget to click Accept!

matthewb4 · ‎05-31-2017

Thank you for the followup, I just have one more question. How was I to know that this field would become multivalued. If I omit the mvindex command, I do not see a comma delimited list or anything like that. I don't even see something like "y1, y1, y1".

woodcock · ‎05-31-2017

A single value of X has multiple values of Y in your lookup file. How else could this be sensibly handled other than to create a multivalued output for Y. I guess the answer is "common sense" but maybe not totally obvious. I have requested Splunk update the documentation here:

http://docs.splunk.com/Documentation/Splunk/6.6.0/SearchReference/Lookup

woodcock · ‎05-31-2017

The lookup is creating a multivalued field so we just take the first value and throw away the rest. Probably an even better solution would be to use | eval Y = mvdedup(Y) which would keep every distinct value of Y (which, if your lookup is correct should be only a single value, but if the lookup is broken, this version of the solution would give you some hope of discovering that).

matthewb4 · ‎05-31-2017

Would there be any way for me to see this multivalued list as a sanity check. I only see the combined chunk count for each new Y value, which is limiting my comprehension a bit. I think why this is most troubling to me is that when you said " single value of X has multiple values of Y in your lookup file. How else could this be sensibly handled other than to create a multivalued output for Y?" In my case the multiple values of Y are actually all the exact same string for the same X, I'm guessing this is why it's not actually showing up as a list but being treated as one?

woodcock · ‎05-31-2017

Sure, do this:

... lookup test.csv X OUTPUTNEW Y | nomv Y ...

This will flatten the list into a single value.

Lookup command distinct column values

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code

Splunk Answers Content Calendar, July Edition I

Are you a member of the Splunk Community?

Lookup command distinct column values

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code

Splunk Answers Content Calendar, July Edition I