Splunk Search

Why does lookup return null when there are multiple matches?

yuanliu
SplunkTrust
SplunkTrust

I have a lookup with multiple columns (keys).  Some combinations make a unique match, but I need an ambiguous search on a single key to return all matched items of a particular field.  In a simplified form, the lookup is like this

QID IP Detected
12345 127.0.0.1 2022-12-10
45678 127.0.0.1 2023-01-21
12345 127.0.0.2 2023-01-01
45678 127.0.0.2 2022-12-15
23456 ... ...

QID and IP determines a unique Detected value; you can say the combination is a primary key.  No problem with search by primary key.  My requirement is to search by QID alone.  For 12345, for example, I expect the return to be multivalued (2022-12-10, 2023-01-01).

If I hard code QID in an emulation, that's exactly what I get.

 

 

| makeresults
| eval QID=12345
| lookup mylookup QID
| table QID Detected

 

 

This will give me

QID Detected
12345

2022-12-10
2023-01-01

But if use the same lookup in a search, e.g.,

 

index=myindex QID=12345
| stats count by QID ``` result is the same whether or not stats precedes lookup ```
| lookup mylookup QID
| table QID Detected

 

the result is blank

QID Detected
12345

 

The behavior can be more complex if the search returns more than one QID (e.g., QID IN (12345, 45678)).  Sometimes one of them will get Detected populated, but not others.

How can I make sure multiple matches are all returned?

Labels (1)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

It looks like it's related to the size of the lookup, which exceeds max_memtable_bytes (default 25MB) and there is a memory issue, which can give somewhat random results.

When a lookup exceeds that value, Splunk uses an indexing mechanism to access the lookup, but it seems that does not work properly.

If you increase max_memtable_bytes in limits.conf to a size larger than your lookup, the problem goes away, e.g. doubling it to 50MB
 

 

[lookup]
max_memtable_bytes = 52428800

 

 

 

View solution in original post

Tags (1)

bowesmana
SplunkTrust
SplunkTrust

It looks like it's related to the size of the lookup, which exceeds max_memtable_bytes (default 25MB) and there is a memory issue, which can give somewhat random results.

When a lookup exceeds that value, Splunk uses an indexing mechanism to access the lookup, but it seems that does not work properly.

If you increase max_memtable_bytes in limits.conf to a size larger than your lookup, the problem goes away, e.g. doubling it to 50MB
 

 

[lookup]
max_memtable_bytes = 52428800

 

 

 

Tags (1)

yuanliu
SplunkTrust
SplunkTrust

Thank you so much, @bowesmana!  This had been bugging me for years.  Given the unpredictability, I should have suspected a memory-related problem.

So, there seems to be two separate problems.  One is that index mechanism (mylookup.csv_XXXXX.idx) is not performing as designed.  The second problem is the lack of any indication in splunkd.log that suggests max_memtable_bytes might be exceeded.

bowesmana
SplunkTrust
SplunkTrust

As your makeresults example shows, you should get both results, so it points to something different about the true value of QID in the index=myindex case.

If you add the following after stats count for a bit of diagnostics

| eval len=len(QID)
| eval Q=12345
| lookup mylookup QID OUTPUT Detected as D_1
| lookup mylookup QID as Q OUTPUT Detected as D_2
| eval match=if(QID=Q, "true", "false")
| table count QID len Q match D_*

 what do you get?

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Thanks for the diagnostic sequence, @bowesmana.  As I try to implement, I realized two things.  First, I didn't record the real-world QID in the original problem statement, even if the statement was precise.  Now I have doubts about the precision because if I filter down to a single QID, I haven't found one that can trigger a blank return.

Second, I notice that even with makeresults, I can construct a group of QIDs to trigger this behavior. (The behavior seems to be the same as the group of QIDs appear in an index search.)  For example,

 

| makeresults
| fields - _time
| eval QID = split("257210,257212,257100", ",")
| mvexpand QID
| lookup mylookup QID output Detected

 

and

 

index=myindex QID IN ("257210","257212","257100")
| stats count by QID
| lookup mylookup QID output Detected

 

both return something like

QIDDetected
257210 
257212 
2571002022-12-10
2023-01-01
...

Within this group, which one returns null and which real values do not change no matter the order of QIDs.  If I search individual ones, each of them has non-null return.

In short, I couldn't reproduce a difference between simulated QID and real-world QID.  The difference seems to be whether a group of events contain different QIDs or just a single QID.

How can I diagnose this further?

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Is it KV store or CSV - if CSV is mylookup a definition or the CSV directly. If the definition, remove the layer of abstraction and try CSV. If KV store, try writing to the CSV and trying CSV.

Try making it a wildcard(QID) in the definition and adding for a couple of the failing examples, the ** around the QID.

Is it always the case that one of the QID Detected outputs is MV, as in your example? If so, disable one of the entries in the lookup. If it's a definition, set it to return 1 max result to see if that then causes the others to populate.

Just stabbing in the dark here to see what gives...

 

yuanliu
SplunkTrust
SplunkTrust

This is CSV with definition.  I didn't even know that you can use CSV directly.  So I tried.  It made no difference.  I then redefined QID's match type to wildcard.  It also made no difference; if I surround any test value with **, no value can return because the entries themselves do not have wildcard characters.

The third test, unfortunately also doesn't give me additional info.  I use the following as a basic test

 

| inputlookup mylookup
| stats count by QID
| lookup mylookup QID
| where isnull(Detected)

 

Somewhat to my surprise, there are many null outputs with count of 1. (Also, because how many QIDs are in search events is known to make a difference in which QIDs return null, I cannot say that the above test is definitive.  It only shows that it is possible for single-entries to return null.)

0 Karma

bowesmana
SplunkTrust
SplunkTrust

For the wildcard definition, it is the CSV version that needs the **, not the data itself, so the lookup should contain

*257210*

and the data has 257210 - just in case there was something odd.

Can you DM me your lookup - as I can't reproduce it

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...