Splunk Search

How to get list from inputlookup and run search on this smaller list?

mihir_hardas
Explorer

Hi All,

I have a SPL query that runs on an index , sourcetype which has milions of jobnames.

I want to my SPL to read through a list of jobnames from a different query and use it as subsearch 
OR
I have created a lookup.csv for this 16,000 list of jonames and want to run my search on it.

 

How to do that ?

 

Main SPL that runs on millions of jobnames :

earliest=-7d index=log-13120-nonprod-c laas_appId=qbmp.prediction* "jobPredictionAnalysis" prediction lastEndDelta | table jobname, prediction_status, predicted_end_time

Below is an input lookup 

freq_used_jobs_bmp_3months.csv which is a simple two columnar file 

jobName, freq_count

 

 

I tried to join main query with this inputfile. I want to operate and write SPL queries on this list of jobNames only.

 

 

earliest=-7d index=log-13120-nonprod-c laas_appId=qbmp.prediction* "jobPredictionAnalysis" prediction lastEndDelta | table jobname, prediction_status, predicted_end_time
| lookup freq_used_jobs_bmp_3months.csv jobName output freq_count

|table jobname, freq_count



The above query fails with error 

na_prod_secure-ist-indexer-1_iapp724.randolph.ms.com-23000] Streamed search execute failed because: Error in 'lookup' command: Could not construct lookup 'freq_used_jobs_bmp_3months.csv, jobName, output, freq_count'. See search.log for more details

I removed any null rows in the file.  Still I get the same error.


Other option is to somehow combine, join main query with a sub search instead of a lookup file.

main query 

earliest=-7d index=log-13120-nonprod-c laas_appId=qbmp.prediction* "jobPredictionAnalysis" prediction lastEndDelta | table jobname, prediction_status, predicted_end_time

sub search that will list a smaller number of jobNames that are used in last 3 months :
earliest=-90d index="log-13120-prod-c" sourcetype="autosys_service_secondary:app" OR "autosys_service_primary:app" "request:JobSearch" installation="P*" NOT"*%*" | stats count as freq_count by jobName



 

Now how to join the above two?

 

 

Labels (3)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

As you already have the lookup, just use the contents of this lookup as a subsearch. Don't use join!

 

View solution in original post

PickleRick
SplunkTrust
SplunkTrust

Your first attempt might be failing simply due to field names case mismatch. You had jobname in your events and jobName in the lookup.

I'm not sure however which approach would be more effective here (you'd have to check for yourself). As I understand you want tomreturn only those events from the "main search" which are included in your lookup.

One approach which you already tried is to list all the events from the main search, do a lookup to fill the destination field (in your case it would be that freq_count field) and filter to include only the results with that field filled.

But that approach has its downside - you have to process all the huge set of results from the main search.

As an alternative approach you can simply use a subsearch to generate a list of jobNames

<your_search_conditions> [ | inputlookup freq_used_jobs_bmp_3months.csv | table jobName | rename jobName as jobname ] | table ...

One more thing - don't do search | table | lookup. In fact you almost never want to do search | table | anything at all. The table command moves the processing of your events from the indexers to a searchhead which kills any parallelization for the commands further down the stream. Table should almost never be used as not the last command in the chain.

0 Karma

mihir_hardas
Explorer

Trying for a subseach and this fails:

 

 

earliest=-1d index=log-13120-nonprod-c laas_appId=qbmp.prediction* "jobPredictionAnalysis" prediction lastEndDelta
| join jobname [search earliest=-90d index="log-13120-prod-c" sourcetype="autosys_service_secondary:app" OR "autosys_service_primary:app" "request:JobSearch" installation="P*" NOT"*%*" | stats count by jobName | table jobName, freq_count]
| table jobname, jobName, freq_count

Please note that jobname is outer query field, but jobName is the subsearch field name. Both these searches run successfully in two differnt windows. 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

No. Don't use subsearches for stuff like this. Remember that subsearches have limit for returned number of results (50k by default) and limited execution time (60 seconds? I don't remember). And what's most annoying about them is that they fail silently if hit any of those limits. So inputlookup with a predictable number of results is a relatively good candidate for a subsearch. A complicated search with long execution time and many returned results is not.

Anyway, your subsearch has one mistake (you do stats count and then want to table a non-existent field; I assume it's a mistake in re-typing the search here) and one very ugly condition - the "NOT *%*" part - it makes Splunk have to go and read all your events from the last 90 days to check whether they match or not. It can't use its indexes. So the search will most definitely time out and - as I wrote above - fail silently and produce incomplete results if any at all.

0 Karma

mihir_hardas
Explorer

Below works slowly. It results into a correct count of 16300 records approx.:

earliest=-1d index=log-13120-nonprod-c laas_appId=qbmp.prediction*  "jobPredictionAnalysis" prediction lastEndDelta

| join

 

[ | inputlookup freq_used_jobs_bmp_3months.csv ]

| stats count, values(freq_count) by jobname

 

 

But can I do a reverse thing like this?:

My logic is only the inputlookup list of jobs will be considered and then other columns will be matched from the index, laas_appId. Will this be optimized?

 

Rather than iterating through millions of jobnames from the index, how to have only those 16.5 thousands jobnames iterated ?

 

inputlookup freq_used_jobs_bmp_3months.csv

| join

[ search earliest=-1d index=log-13120-nonprod-c laas_appId=qbmp.prediction*  "jobPredictionAnalysis" prediction lastEndDelta ]

 

| stats count, values(freq_count) by jobname

0 Karma

mihir_hardas
Explorer

What is the workaround ?

 

To put the problem statement generically:

 

I want splunk query, data analysis to run on a smaller list of jobs (around 20 thousand in number) . I have that list as a csv and created a lookup.

 

I now want SPLs to read  this pool of jobs ONLY  for analysis.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

As you already have the lookup, just use the contents of this lookup as a subsearch. Don't use join!

 

mihir_hardas
Explorer


| lookup freq_used_jobs_bmp_3months.csv jobname output freq_count ... this does NOT work!

Only the below SPL works ! 

 

 

index=log-13120-nonprod-c laas_appId=qbmp.prediction* sourcetype="qbmp.prediction_analyzer:app" "jobPredictionAnalysis" "prediction" "lastEndDelta"

| eval accuracy_category = case( abs(lastEndDelta) <= 600, 10, (abs(lastEndDelta) > 600 and abs(lastEndDelta) <= 1200), 20, (abs(lastEndDelta) > 1200 and abs(lastEndDelta) <= 1800), 30, 1==1,40)
| eval timeDistance_category = case(timeDistance <= 3600, 1, (timeDistance>3600 and timeDistance<=7200),2,(timeDistance>7200 and timeDistance<=10800),3,1==1,4)
| where instance like "P%" and timeDistance_category<=3
| join

[ inputlookup freq_used_jobs_bmp_3months.csv ]

 

| chart count(predicted_end_time) by timeDistance_category,accuracy_category

0 Karma

mihir_hardas
Explorer

And there has to be a join.

 

Without a join how will my large number of events (in the index) match like an  equijoin with a list of 17K jobs in the  lookup file ?

0 Karma

mihir_hardas
Explorer

More generically, 
(lookup file that has one column of 2000 jobname)
(join this to a search such that the execution is for the 2000 jobs above and not the millions of jobs and other fields on index)

0 Karma

PickleRick
SplunkTrust
SplunkTrust

If you want to use a lookup to generate a set of conditions, you can use a subsearch (keep in mind all the restrictions regarding subsearches).

So if you have a lookup jobs that has a field called ID and want only to find events that have the field jobid corresponding to that field, you can construct your search like this:

<your_base_search> [ | inputlookup jobs | table ID | rename ID as jobid ]

But this works relatively well for a "reasonable" number of entries in the lookup. I have no idea how it (in)effectively it will work with several thousand conditions.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...