Lookup command returning incorrect null values and...

Bastelhoff · ‎02-09-2019

I encountered a very weird behaviour. This has now also been reported as bug.

Update: I did manage to create some fake data now and further isolated the issue. I cannot attach the files nor links. So you'd need to create them on your own.

So first run this to create some fake logs:

Now that we got the files, run this:

Result:
color1 is null
color2 is "Purple" as it should be

Things you can play around with:
- Remove the mvindex -> it will populate color1 correctly
- Remove the eventstats -> it will populate color1 correctly (even though there is nothing which affects this field at that point)
- in the testlog.csv switch around the two values -> it will populate color1 correctly
- switch the value which is NOT looked up (816341959) to 100 -> it will populate color1 correctly (from 816295885)
- now switch the value which is NOT looked up to 916341959 -> it will populate color1 correctly (from 816295885)

This basically means that with numbers in this range of ~816295885 splunk becomes unreliable, if it is at a location of a similar lookup, which is extremely bad.

Even though only 816295885 is in the lookup, the color will also be found for 816341959, which is even more extremely bad I guess.

Splunk Version 7.2 was used in this case. Don't know if this is version specific.

woodcock · ‎02-11-2019

The only version of the 7.2.* stream of Splunk that is production-stable is the newly released v7.2.4. No previous versions of 7.2.* should be used in production due to devastating bugs like this one.

martin_mueller · ‎02-11-2019

I ran your three test searches, and get both color and color2 as Purple, Splunk 7.2.3.
Ran the final search, only get one row with Purple as it should be.

Bastelhoff · ‎02-11-2019

Tried it in 7.1.2 now which is working as well.
7.2.1 however is broken.

martin_mueller · ‎02-11-2019

Update then 🙂

woodcock · ‎02-10-2019

There are definitely strange bugs when you recycle field names using multi-valued fields and the mv* commands. Also, field names with spaces or periods are evil. Try this:

|inputlookup faketestlogs.csv
| eval test_number=mvindex('test.number',0)
| fields - $test.number$
| lookup fakedictionary.csv test_number output color
| eval mydump=test_number
| eval mydump2=color
| eventstats dc(test.id) AS num_ids BY test_number
| lookup fakedictionary.csv test_number output color AS color2
| search test_number=500

Also, it is poor form (and produces warnings) to use lookup files directly. You should create a lookup definition and use that instead.

Bastelhoff · ‎02-11-2019

I did these changes (including cutting one ,0) but it did not change anything.
I also tried both | fields - $test.number$ (which didnt remove the field) and | fields - test.number, which did remove the field, but it did not have any impact on the final result.

ofc I still had to add the old name in the lookup as this is the name of the field there.
So it was in this example | lookup fakedictionary.csv test.number as test_number output color

Whether the lookup definition or the csv files are used directly makes no difference in this case. But as lookup definitions can have some additional rules associated with them I decided to go with the .csv directly in this example to exclude additional sources of the problem.

martin_mueller · ‎02-10-2019

So... removing the eventstats from your search changes the values of color? That feels weird.
Do post the values of optimizedSearch from the job inspector for both cases.

This would be much easier if you had shareable reproducing data. Can't you anonymize your data?

Bastelhoff · ‎02-10-2019

I checked the job inspector regarding the optimized search.
In the case where it does not work it changes the following:
| search test.number=500 becomes | search "test.number"=500
and this line goes one line up before the second lookup. Also both evals are put together.
No other changes besides of adding quotation marks around the first file name and "test.number" as shown above.

In the case where I removed the eventstats the | search "test.number"=500 goes up all the way until it is right below the mvindex line. No other differences to the other one (besides of the missing line now obviously)

So far I failed to create fake data which worked.
The fields are identical though:
test.number - has same name structure with dot, and is a number (and in these cases not a multi value field. However it can happen that due to issues it becomes a multivalue field, hence the mvindex to remove duplicate entries)
test.id - also same name structure with dot and is a number
color - has same name structure and is a string

I can replicate it with an initial csv as input which has ~17k entries
The real dictionary uses identical names (hence test.number) and has ~3500 entries

martin_mueller · ‎02-09-2019

Do you have a reproducing data set to share?

Bastelhoff · ‎02-10-2019

Unfortunately I cannot share the original data and I am unable to reproduce fake data which comes to the same result. I just updated the issue description though.

pkarpushin · ‎02-09-2019

Hi @Bastelhoff ,

Could you please explain what is your desire with putting [|inputlookup listofnumbers.csv | fields number] to the beginning of your initial search?
If you say that listofnumbers.csv contains 10000 values, this subsearch[|inputlookup listofnumbers.csv | fields number] will return nothing.
If you make your subsearch look like this: [|inputlookup listofnumbers.csv | fields number | return number] - it will append number="<the "number" value of the first row in listofnumbers.csv file>" to your main search.
If you make your subsearch look like this: [|inputlookup listofnumbers.csv | fields number | format] - subsearch will append all values of "number" field in your file in this way: ( ( number="1" ) OR ( number="2" ) OR ( number="3" ) OR ... ( number="10000" ) ).
I mean to say, that your main search might be incorrect and this is the reason you get incorrect results.

Bastelhoff · ‎02-09-2019

Thanks! listofnumbers contains about 5000 values. It would work with 11000 as well, but would be cropped to 10k in this case (and therefore give incomplete results). The main search will however find more than one instance per number.

But you are right that my example here was incorrect. Due to the dedup which was placed before the first lookup it would have been reduced down to the initial value (at max). So I adjusted the code.

That said I was not aware of your workaround. I am using a multivalue field with modulo 10000 usually, so that I get 10000 entries (cap) with zero to X values, expanding it to basically endless (unless the time cap breaks it).

I will do some more tests today and hopefully have a more detailed/conclusive example during the next 12h.

Lookup command returning incorrect null values and values for another entry

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!