Splunk Search

Lookup command returning incorrect null values and values for another entry

Bastelhoff
Path Finder

I encountered a very weird behaviour. This has now also been reported as bug.

Update: I did manage to create some fake data now and further isolated the issue. I cannot attach the files nor links. So you'd need to create them on your own.

So first run this to create some fake logs:

| makeresults | eval id=1 | eval number= 816341959
| append [|makeresults | eval id=1 | eval number=816295885]
| fields - _time
| outputlookup testlog.csv

And now run this to create a lookup:
| makeresults | eval color="Purple" | eval number=816295885 | fields - _time
| outputlookup testlookup.csv

Now that we got the files, run this:

|inputlookup testlog.csv
| eval number=mvindex(number,0,0)
| lookup testlookup.csv number output color as color1
| eval mydump=number
| eval mydump2=color
| eventstats dc(test.id) as ids by number
| lookup testlookup.csv number output color as color2
| search number=816295885

Result:
color1 is null
color2 is "Purple" as it should be

Things you can play around with:
- Remove the mvindex -> it will populate color1 correctly
- Remove the eventstats -> it will populate color1 correctly (even though there is nothing which affects this field at that point)
- in the testlog.csv switch around the two values -> it will populate color1 correctly
- switch the value which is NOT looked up (816341959) to 100 -> it will populate color1 correctly (from 816295885)
- now switch the value which is NOT looked up to 916341959 -> it will populate color1 correctly (from 816295885)

This basically means that with numbers in this range of ~816295885 splunk becomes unreliable, if it is at a location of a similar lookup, which is extremely bad.

You can even do this:
| makeresults | eval number= 816295885
| append [ makeresults | eval number=816341959]
| table number
| lookup testlookup.csv number output color

Even though only 816295885 is in the lookup, the color will also be found for 816341959, which is even more extremely bad I guess.

Splunk Version 7.2 was used in this case. Don't know if this is version specific.

0 Karma

woodcock
Esteemed Legend

The only version of the 7.2.* stream of Splunk that is production-stable is the newly released v7.2.4. No previous versions of 7.2.* should be used in production due to devastating bugs like this one.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

I ran your three test searches, and get both color and color2 as Purple, Splunk 7.2.3.
Ran the final search, only get one row with Purple as it should be.

0 Karma

Bastelhoff
Path Finder

Tried it in 7.1.2 now which is working as well.
7.2.1 however is broken.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Update then 🙂

0 Karma

woodcock
Esteemed Legend

There are definitely strange bugs when you recycle field names using multi-valued fields and the mv* commands. Also, field names with spaces or periods are evil. Try this:

|inputlookup faketestlogs.csv
| eval test_number=mvindex('test.number',0)
| fields - $test.number$
| lookup fakedictionary.csv test_number output color
| eval mydump=test_number
| eval mydump2=color
| eventstats dc(test.id) AS num_ids BY test_number
| lookup fakedictionary.csv test_number output color AS color2
| search test_number=500

Also, it is poor form (and produces warnings) to use lookup files directly. You should create a lookup definition and use that instead.

0 Karma

Bastelhoff
Path Finder

I did these changes (including cutting one ,0) but it did not change anything.
I also tried both | fields - $test.number$ (which didnt remove the field) and | fields - test.number, which did remove the field, but it did not have any impact on the final result.

ofc I still had to add the old name in the lookup as this is the name of the field there.
So it was in this example | lookup fakedictionary.csv test.number as test_number output color

Whether the lookup definition or the csv files are used directly makes no difference in this case. But as lookup definitions can have some additional rules associated with them I decided to go with the .csv directly in this example to exclude additional sources of the problem.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

So... removing the eventstats from your search changes the values of color? That feels weird.
Do post the values of optimizedSearch from the job inspector for both cases.

This would be much easier if you had shareable reproducing data. Can't you anonymize your data?

0 Karma

Bastelhoff
Path Finder

I checked the job inspector regarding the optimized search.
In the case where it does not work it changes the following:
| search test.number=500 becomes | search "test.number"=500
and this line goes one line up before the second lookup. Also both evals are put together.
No other changes besides of adding quotation marks around the first file name and "test.number" as shown above.

In the case where I removed the eventstats the | search "test.number"=500 goes up all the way until it is right below the mvindex line. No other differences to the other one (besides of the missing line now obviously)

So far I failed to create fake data which worked.
The fields are identical though:
test.number - has same name structure with dot, and is a number (and in these cases not a multi value field. However it can happen that due to issues it becomes a multivalue field, hence the mvindex to remove duplicate entries)
test.id - also same name structure with dot and is a number
color - has same name structure and is a string

I can replicate it with an initial csv as input which has ~17k entries
The real dictionary uses identical names (hence test.number) and has ~3500 entries

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Do you have a reproducing data set to share?

0 Karma

Bastelhoff
Path Finder

Unfortunately I cannot share the original data and I am unable to reproduce fake data which comes to the same result. I just updated the issue description though.

0 Karma

pkarpushin
Path Finder

Hi @Bastelhoff ,

Could you please explain what is your desire with putting [|inputlookup listofnumbers.csv | fields number] to the beginning of your initial search?
If you say that listofnumbers.csv contains 10000 values, this subsearch[|inputlookup listofnumbers.csv | fields number] will return nothing.
If you make your subsearch look like this: [|inputlookup listofnumbers.csv | fields number | return number] - it will append number="<the "number" value of the first row in listofnumbers.csv file>" to your main search.
If you make your subsearch look like this: [|inputlookup listofnumbers.csv | fields number | format] - subsearch will append all values of "number" field in your file in this way: ( ( number="1" ) OR ( number="2" ) OR ( number="3" ) OR ... ( number="10000" ) ).
I mean to say, that your main search might be incorrect and this is the reason you get incorrect results.

0 Karma

Bastelhoff
Path Finder

Thanks! listofnumbers contains about 5000 values. It would work with 11000 as well, but would be cropped to 10k in this case (and therefore give incomplete results). The main search will however find more than one instance per number.

But you are right that my example here was incorrect. Due to the dedup which was placed before the first lookup it would have been reduced down to the initial value (at max). So I adjusted the code.

That said I was not aware of your workaround. I am using a multivalue field with modulo 10000 usually, so that I get 10000 entries (cap) with zero to X values, expanding it to basically endless (unless the time cap breaks it).

I will do some more tests today and hopefully have a more detailed/conclusive example during the next 12h.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...