Knowledge Management

Data flows between index search and lookups

amitlookin
Loves-to-Learn Lots

I have a whole lot of servers data indexed for our project (index=* sourcetype=* source=) that needs to be searched based on my look up file on basis of host. I have a join query mentioned below:
| inputlookup bsl_project_host.csv | table host
| join type=left host[search index=
sourcetype=* source=* [ | inputlookup bsl_project_host.csv | table host]

It would be really nice, if some one could help me understand how exactly does the data flows in this query, specially the search statement in the query.

Labels (1)
Tags (1)
0 Karma
1 Solution

to4kawa
Ultra Champion
| inputlookup bsl_project_host.csv 
| table host
| join type=left host [search index= sourcetype= 
source= [ | inputlookup bsl_project_host.csv | table host]]

OK, 2nd subsearch result is host=A OR host=B OR ....
and search index= sourcetype= source= (host=A OR host=B OR ....) have host
at last, If bsl_project_host.csv's host and
subsearch's host are same, these are results.

I think,

index= sourcetype= source= [ | inputlookup bsl_project_host.csv | table host |format] |table host

This could be OK.

View solution in original post

0 Karma

to4kawa
Ultra Champion
| inputlookup bsl_project_host.csv 
| table host
| join type=left host [search index= sourcetype= 
source= [ | inputlookup bsl_project_host.csv | table host]]

OK, 2nd subsearch result is host=A OR host=B OR ....
and search index= sourcetype= source= (host=A OR host=B OR ....) have host
at last, If bsl_project_host.csv's host and
subsearch's host are same, these are results.

I think,

index= sourcetype= source= [ | inputlookup bsl_project_host.csv | table host |format] |table host

This could be OK.

0 Karma

amitlookin
Loves-to-Learn Lots

Thank you for the reply, i think this cleared my doubt about how the sub search flow is happening i.e. sub search result is acting like a criteria for the main search and then the results are returned.

0 Karma

DalJeanis
Legend

To be clear - 

 

1) Searches work from the farthest inside  braces, and work out, with the exception of the braces that follow an appendpipe (we'll come back to that.)

2) The subsearch in braces will return some set of information, and what set of information is defined by the verb that preceded it, if any.

3) If the braces are before the first pipe in the outside search, then the braces will return data as if the inside search had been run, then piped to the format command.  thus, if the end of the search in braces looked like this...

 

 

 

[ search index=foo  ... more search terms ... | table host ]

 

 

 

then it returns as if the search had been run like this

 

 

 

index=foo ... more search  terms ... | table host | format

 

 

 

and the output field called search contains something like this

 

 

 

( host="value1" ) OR ( host="value2" ) OR .... ( host="value99" ) ) 

 

 

 

Before the first pipe, the value of field search is what will be dropped into the outer search.

In your case, the inner search is just taking the data out of the lookup table to create the above stuff.  

The next step in your code, it runs that against the index as a subsearch.  This is the place where @woodcock 's alarms went off.  There are limits on how much data a subsearch will return, so you may get results that are wonky, depending on how long you run the search over. 

But let's assume it works. Your data comes back from that subsearch, and then it gets run through on the right side of a join.

Now, joins in Splunk are always, at their most basic level, left joins.  Every record on the left side gets matched to the first matching record on the right side.  If you have NOT told it to match every record on the right, then only the first gets matched. Then, if you've told it you only want an inner join, it will throw away unmatched records from the left side.

(These are some of the reasons we suggest people avoid join when they can.) 

Finally, what you are matching to on the far left is the same lookup table records you had at the far inside.

 

On the other hand, if you use woodcock's version, then it works this way 

All the records from any host are read.

For each record, the lookup table is accessed in memory for a match.  If it is found, then the host value from the lookup table is copied  to a new field.  Next, if the new field is not found, the record is dropped as not being wanted.

Now, if it was possible that there would be no records for any particular host, and you wanted to make sure they were there, then you could add an append at the end like this

 

| inputlookup bsl_project_host.csv append=true

 

 

And, one final thing.  after appendpipe, instead of the braces being processed first, they are processed when the data reaches that point in the search.  the entire set of records up until that point is put through the logic inside the braces, and then the output of that (subject to subsearch limitations) is added onto the end of the current set of records. 

 

 

0 Karma

woodcock
Esteemed Legend

Whenever possible, use lookup files with the lookup command like this:

index="*"
| lookup bsl_project_host.csv host OUTPUT host AS matched
| where isnotnull(matched)

amitlookin
Loves-to-Learn Lots

Thank you for the response. Can you please explain how the search is working here in the join part:
search [index= sourcetype= source= [ | inputlookup bsl_project_host.csv | table host], i mean the data flow.

Thank in advance.

0 Karma

woodcock
Esteemed Legend

It is a very inefficient and non-scalable way to do it. That is why I answered the way that I did. It should NOT be done that way.

0 Karma

amitlookin
Loves-to-Learn Lots

As a beginner i appreciate these inputs and i am going to keep them in my mind going forward. Actually my intent was to understand the data flow, guess what i now know the data flow and the best practice as well. Thank you for the replies.

0 Karma
Get Updates on the Splunk Community!

Monitoring Postgres with OpenTelemetry

Behind every business-critical application, you’ll find databases. These behind-the-scenes stores power ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...