Splunk Search

Fields from lookup with Join missing

cdevoe57
Explorer

I have a query that detects missing systems.  the lookup table has fields System, Location, responsible.

I am trying to get the location and responsible to show in the end result.  It appears the join is losing those values.   Is there a way to get those values to the final result?

| inputlookup system_info.csv
| eval System_Name=System
| table System_Name
| join type=left Sensor_Name [| search index=servers sourcetype=logs
    | stats latest(_time) as Time by System_Name
    | eval mytime=strftime(Time,"%Y-%m-%dT%H:%M:%S")
    | sort Time asc | eval now_time = now()
    | eval last_seen_ago_in_seconds = now_time - Time
    | sort -last_seen_ago_in_seconds ]
| stats values(*) as * by System
_Name
| eval MISSING = if(isnull(last_seen_ago_in_seconds) OR last_seen_ago_in_seconds>7200,"MISSING","GOOD")
| where MISSING=="MISSING"
| table System
_Name Location Responsible MISSING

Labels (2)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

As @bowesmana points out, join is not the correct approach.  @livehybrid gives a logic that implements your requirements.  But the implementation inherits some convoluted logic in your original attempt. (The use of tstats requires that field System_Name is the same as host or is otherwise extracted at index time.  But your original SPL seems to imply the opposite.  I will not make such assumption below.)

Your original expression | eval MISSING = if(isnull(last_seen_ago_in_seconds) OR last_seen_ago_in_seconds>7200,"MISSING","GOOD") really just says a System contained in system_info.csv is MISSING if

  1. no matching System_Name appears in sourcetype log, or
  2. a matching System_name appears in sourcetype log but is more than 2 hours ago.

Is this correct?  There should be no need to even evaluate last_seen_ago_in_seconds if you simply filter search with earliest=-2h which is more efficient, too.  Additionally, the field MISSING is unnecessary in the table because it will always have value "MISSING" according to your logic.

Here is a much simplified logic:

index=servers sourcetype=logs earliest=-2h
| stats latest(_time) as Time by System_Name sourcetype
| append
    [inputlookup system_info.csv
    | fields System Location Responsible
``` ^^^ only necessary if there are more than these three fields ```
    | rename System as System_Name]
| stats values(sourcetype) as _last2hours values(Location) as Location
  values(Responsible) as Responsible by System_Name
| where isnull(_last2hours)

 

0 Karma

livehybrid
Super Champion

Hi @cdevoe57 

If you want to use the lookup as a source of truth for the list of hosts I would use the following, also just a note that I'm suggesting tstats here which is *much* more performant than a regular index= search.

| tstats latest(_time) as _time WHERE index=servers sourcetype=logs by host
| eval last_seen_ago_in_seconds = now() - _time
| eval System_Name = host
| append [|inputlookup system_info.csv | eval last_seen_ago_in_seconds=9999]
| stats min(last_seen_ago_in_seconds) as last_seen_ago_in_seconds, values(Location) AS Location, values(Responsible) AS Responsible by System_Name
| eval MISSING = if(isnull(last_seen_ago_in_seconds) OR last_seen_ago_in_seconds>7200, "MISSING", "GOOD")
| where MISSING=="MISSING"
| sort -last_seen_ago_in_seconds

This works by appending the system_info.csv with a large last_seen_ago_in_seconds which is updated by a lower last_seen_ago_in_seconds value if the host has been found.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma

Prewin27
Contributor

@cdevoe57  As mentioned by @bowesmana  Its not best to use join, as it can sometimes cause fields from lookup to be lost. but anyway can you try below if you still want to use join,

| inputlookup system_info.csv
| eval System_Name=System
| join type=left System_Name [
| search index=servers sourcetype=logs
| stats latest(_time) as Time by System_Name
| eval mytime=strftime(Time,"%Y-%m-%dT%H:%M:%S")
| eval now_time = now()
| eval last_seen_ago_in_seconds = now_time - Time
]
| stats values(*) as * by System_Name
| lookup system_info.csv System_Name OUTPUT Location Responsible
| eval MISSING = if(isnull(last_seen_ago_in_seconds) OR last_seen_ago_in_seconds>7200,"MISSING","GOOD")
| where MISSING=="MISSING"
| table System_Name Location Responsible MISSING

or you can also try and check below, without join.


index=servers sourcetype=logs
| stats latest(_time) as Time by System_Name
| eval last_seen_ago_in_seconds = now() - Time
| eval MISSING = if(isnull(last_seen_ago_in_seconds) OR last_seen_ago_in_seconds>7200, "MISSING", "GOOD")
| where MISSING=="MISSING"
| lookup system_info.csv System_Name OUTPUT Location Responsible
| table System_Name Location Responsible MISSING last_seen_ago_in_seconds
| sort -last_seen_ago_in_seconds

Regards,
Prewin
Splunk Enthusiast | Always happy to help! If this answer helped you, please consider marking it as the solution or giving a kudos/Karma. Thanks!

cdevoe57
Explorer

Thanks for the input.  In fact, your first solution is what I ended up doing.  That one works.

The second solution does not work.   The query doesn't have the list of all systems when it calculates the missing

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Firstly, join is not a good way to do things in Splunk - it has limitations and can almost always be avoided by using stats and your search pattern is not how you would combine search and lookup elements.

You are discarding Location and Responsible from the lookup because of your table statements line 3.

Anyway, try this (I removed the unused elements in your search and the double sort in your join).

index=servers sourcetype=logs 
| stats latest(_time) as Time by System_Name 
``` Calculate time differences ```
| eval last_seen_ago_in_seconds = now() - Time 
| eval MISSING = if(isnull(last_seen_ago_in_seconds) OR last_seen_ago_in_seconds>7200,"MISSING","GOOD") 
| where MISSING=="MISSING" 
``` Find the Location and Responsible ```
| lookup system_info.csv System as System_Name 

``` Now render the results ```
| table System_Name Location Responsible MISSING
| sort -last_seen_ago_in_seconds

 

 

0 Karma

cdevoe57
Explorer

I had tried your method before

Apparently I screwed up the syntax


The | lookup system_info.csv System as System_Name line was failing.

0 Karma

cdevoe57
Explorer

So let me start by saying I've been struggling with these lookup commands.

My objective here is to use the lookup as it contains all known servers to find the servers that are not logging.  Including those that have yet to log.  This modified query now gives me the other fields.   However, the results are wrong.

In the ned I need to  get the list of servers in the lookup that are not in the query results (index=servers sourcetype=logs)

0 Karma

bowesmana
SplunkTrust
SplunkTrust

So, this is a common pattern and the solution is to work your logic like this

```search indexes to get list of servers```
...
| stats max(_time_) as latest count by System
| rename System_Name as System
``` At this point all Systems found will have a count > 0 ```

``` So now add in your control group to the end of the list ```
| inputlookup append=t system_info.csv
``` Now this will "join" the two possible sets of 'System' together ```
| stats values(*) as * max(count) as count by System
``` And any of those with count = 0 are those that came from your
    lookup control and this gives you the ones not found in your data ```
| where count=0

The final where clause will cause only the missing items to show, but you can of course do what you need there with any time calculations based on the latest value from the top search.

You can if you want use the lookup as a subsearch contstraint on the outer search so that it finds ONLY those in the lookup, as opposed to all systems, e.g.

index=servers sourcetype=logs [ 
  | inputlookup system_info.csv 
  | fields System 
  | rename System as System_Name
]
...

Note that this assumes your data contains a field called System_Name, but your field in the lookup is System.

 

0 Karma
Get Updates on the Splunk Community!

Splunk Answers Content Calendar, July Edition I

Hello Community! Welcome to another month of Community Content Calendar series! For the month of July, we will ...

Secure Your Future: Mastering Upgrade Readiness for Splunk 10

Spotlight: The Splunk Health Assistant Add-On  The Splunk Health Assistant Add-On is your ultimate companion ...

Observability Unlocked: Kubernetes & Cloud Monitoring with Splunk IM

Ready to master Kubernetes and cloud monitoring like the pros? Join Splunk’s Growth Engineering team on ...