Solved: Re: Lookup command distinct column values

matthewb4 · ‎05-31-2017

I am trying to copy the counts of field X (already in logs) into a new field Y (from a lookup csv) so that they have the exact same counts but field Y has better named values.

The problem I am running into is that my csv file has multiple of the same X values in it so instead of field Y ending up with the same counts as field X, it is becoming (X * # of matched duplicates in csv). Is there any way for me to only get distinct rows from a lookup csv based on the matching field value (X in this case).

I am currently just using this... "lookup test.csv X AS X OUTPUTNEW Y AS Y"

And my csv file look something like this

Y      X      Z

y1  |  x1  |  z1
y1  |  x1  |  z2
y1  |  x1  |  z3
y2  |  x2  |  z1
y2  |  x2  |  z2
y3  |  x3  |  z1
y3  |  x3  |  z2
y3  |  x3  |  z3
y3  |  x3  |  z4
y3  |  x3  |  z5
y3  |  x3  |  z6

Given the previous csv file, if my counts for x1 in splunk logs were 1000, my new y1 value in field Y would get the value 3000. This is unwanted behavior for me, I would like for them to be the same. Any help on how I can modify my query is greatly appreciated.

woodcock · ‎05-31-2017

Like this:

... | lookup test.csv X OUTPUTNEW Y | eval Y = mvindex(Y, 0) | ...

View solution in original post

woodcock · ‎05-31-2017

Like this:

... | lookup test.csv X OUTPUTNEW Y | eval Y = mvindex(Y, 0) | ...

matthewb4 · ‎05-31-2017

Great! This worked, do you mind explaining what mvindex is doing exactly? Each time outputnew is run for a specific Y value is it appending it to a multivalue list? I thought that it was simply adding to the current count, not creating a new index every time.

woodcock · ‎05-31-2017

Don't forget to click Accept!

matthewb4 · ‎05-31-2017

Thank you for the followup, I just have one more question. How was I to know that this field would become multivalued. If I omit the mvindex command, I do not see a comma delimited list or anything like that. I don't even see something like "y1, y1, y1".

woodcock · ‎05-31-2017

A single value of X has multiple values of Y in your lookup file. How else could this be sensibly handled other than to create a multivalued output for Y. I guess the answer is "common sense" but maybe not totally obvious. I have requested Splunk update the documentation here:

http://docs.splunk.com/Documentation/Splunk/6.6.0/SearchReference/Lookup

woodcock · ‎05-31-2017

The lookup is creating a multivalued field so we just take the first value and throw away the rest. Probably an even better solution would be to use | eval Y = mvdedup(Y) which would keep every distinct value of Y (which, if your lookup is correct should be only a single value, but if the lookup is broken, this version of the solution would give you some hope of discovering that).

matthewb4 · ‎05-31-2017

Would there be any way for me to see this multivalued list as a sanity check. I only see the combined chunk count for each new Y value, which is limiting my comprehension a bit. I think why this is most troubling to me is that when you said " single value of X has multiple values of Y in your lookup file. How else could this be sensibly handled other than to create a multivalued output for Y?" In my case the multiple values of Y are actually all the exact same string for the same X, I'm guessing this is why it's not actually showing up as a list but being treated as one?

woodcock · ‎05-31-2017

Sure, do this:

... lookup test.csv X OUTPUTNEW Y | nomv Y ...

This will flatten the list into a single value.

Lookup command distinct column values

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

From Data to Insight: Announcing the Winners of the Splunk Dashboard Contest

Splunk Developers: Construct Your Future at the .conf26 Builder Bar

Quick connection discovery mode for forwarders

Join the Conversation