Splunk Search
Highlighted

Why is sub-search getting auto-finalized after 60 seconds and triggering false alerts

Engager

Hi,

I have the below search query to monitor the process/instances running on our servers and the sub-search within the search is getting auto-finalized and the false alerts are triggered.

Below is the search I am having issues with:

| inputlookup process.csv
| eval count=0
| join type=left PROCESS_NAME, SERVER_NAM overwrite=true 
[| search index="nix" sourcetype="ps" USER="web" host IN(server1, server1) ARGS="*process.conf" 
| dedup host PID  
| rex field=ARGS "\/conf\/(?<PROCESS_NAME>\w+.*)" 
| search PROCESS_NAME="*"" 
| rename host as SERVER_NAM 
| stats count by SERVER_NAM,PROCESS_NAME] 
| search count=0 NOT Maintenance="*"" 
| sort SERVER_NAM 
| streamstats count as row 
| table row,SERVER_NAM,PROCESS_NAME,DESC,Comment

As a workaround, I have created a new search for alerting purpose to avoid false alerts but that doesn't seem to work either.

New search:

| inputlookup process.csv 
| eval count=0 
| appendcols override=true maxtime=100 timeout=100 
[| search index="nix" sourcetype="ps" USER="web" host IN(server1, server1) ARGS="*process.conf" 
| dedup host PID  
| rex field=ARGS "\/conf\/(?<PROCESS_NAME>\w+.*)" 
| rename host as SERVER_NAM 
| stats count by SERVER_NAM,PROCESS_NAME
| sort SERVER_NAM] 
| where count="0" AND isnull(Maintenance)  
| fields SERVER_NAM, PROCESS_NAME, DESC, Comment

Can you please help me with this search? I've searched for related questions on Splunk Answers but nothing seemed to work.

0 Karma
Highlighted

Re: Why is sub-search getting auto-finalized after 60 seconds and triggering false alerts

SplunkTrust
SplunkTrust

Hi @shashank8,

Usually the join should be used with the smaller data set within the subsearch. I advise you to do one of two things.

1- Run the search for index="nix" .... and then join the lookup to that.

2- Instead of using joins use the lookup command to simply append the columns you want from process.csv directly to your logs.

Cheers,
David

View solution in original post

Highlighted

Re: Why is sub-search getting auto-finalized after 60 seconds and triggering false alerts

Engager

Thank you for your suggestions and I kind of tried both of those options earlier:

1- Run the search for index="nix" .... and then join the lookup to that.
If there are no/zero results or if the process/instance is down on one of the hosts then even if I join my lookup there will be no result for the missing data and the alert won't be triggered. Hence I used the inputlookup as my main/primary search so that if there are any missing values or zero results for any host or process then the query will compare the count field value and if it is zero there will be results and an alert is triggered.

2- Instead of using joins use the lookup command to simply append the columns you want from process.csv directly to your logs.
The same thing as in the above scenario happened with my query when I tried to match the values with the lookup file using a lookup command.

0 Karma
Highlighted

Re: Why is sub-search getting auto-finalized after 60 seconds and triggering false alerts

SplunkTrust
SplunkTrust

In case you want all the values to show regardless whether they are in the logs or not then at the end of your search append the content of your CSV and dedup and non-empty value. All remaining empty values are the ones that didnt match in your logs so you can just run a fillnull on that. Does that make sense to you ?

0 Karma
Highlighted

Re: Why is sub-search getting auto-finalized after 60 seconds and triggering false alerts

Engager

Thank you so much for your help, I have appended the csv file values at the end and it worked as I tested it by removing one of the server from the query and I am able to see the results for the missing server. Will let you know how this works with in the next few days to see if it still triggers any false alerts which it should not as the sub-search is a lookup file which would load in a few seconds.

Below is the query I used:

index="nix" sourcetype="ps" USER="web" host IN(server1, server1) ARGS="*process.conf" 
 | dedup host PID  
 | rex field=ARGS "\/conf\/(?<PROCESS_NAME>\w+.*)" 
 | rename host as SERVER_NAM 
 | stats count by SERVER_NAM,PROCESS_NAME
 | appendcols override=false 
 [| inputlookup process.csv
 | eval count=0]
 | where count=0 AND isnull(Maintenance)
 | fields SERVER_NAM, PROCESS_NAME, Comment, DESC
0 Karma
Highlighted

Re: Why is sub-search getting auto-finalized after 60 seconds and triggering false alerts

Engager

@DavidHourani

The search seemed to work for a few days but then I realized it is not giving me the correct results when one of the servers is missing.

index="nix" sourcetype="ps" USER="web" host IN(server1, server2, server3, server4) ARGS="*process.conf" 
  | dedup host PID  
  | rex field=ARGS "\/conf\/(?<PROCESS_NAME>\w+.*)" 
  | rename host as SERVER_NAM 
  | stats count by SERVER_NAM,PROCESS_NAME
  | appendcols override=false 
  [| inputlookup process.csv
  | eval count=0]
  | where count=0 AND isnull(Maintenance)
  | fields SERVER_NAM, PROCESS_NAME, Comment, DESC

In this search if there are no or zero results for server1 the expected result for this query should be:

SERVER_NAM  PROCESS_NAME    Comment DESC
server1                   process.conf                                 DC_SERVER_1

But the result I am seeing is:

SERVER_NAM  PROCESS_NAME    Comment DESC
server4                   process.conf                                 DC_SERVER_1

Here is my csv lookup file content for reference:

PROCESS_NAME    Comment       DESC           Maintenance     SERVER_NAM
process.conf                                DC_SERVER_1                              server1
process.conf                                DC_SERVER_2                      server2
process.conf                                DC_SERVER_3                      server3
process.conf                                DC_SERVER_4                      server4

Can you please tell me what I am missing in my query ?

0 Karma
Highlighted

Re: Why is sub-search getting auto-finalized after 60 seconds and triggering false alerts

SplunkTrust
SplunkTrust

Sure, try it like this instead :

     index="nix" sourcetype="ps" USER="web" host IN(server1, server2, server3, server4) ARGS="*process.conf" 
      | dedup host PID  
      | rex field=ARGS "\/conf\/(?<PROCESS_NAME>\w+.*)" 
      | rename host as SERVER_NAM 
      | stats count by SERVER_NAM,PROCESS_NAME
      | append 
      [| inputlookup process.csv
      | eval count=0]
      | stats sum(count) as count by SERVER_NAM,PROCESS_NAME
0 Karma
Highlighted

Re: Why is sub-search getting auto-finalized after 60 seconds and triggering false alerts

Engager

Thanks David for your help,

I can see the correct results (the host that is actually missing) when a host or results are missing from the logs.

But I just need to include a few more fields from my lookup file in the output

I want to show the fields Comment, DESC and Maintenance in the end results so that users can identify what is the server description and also to use the condition

   | where count=0 AND isnull(Maintenance)
   | fields SERVER_NAM, PROCESS_NAME, Comment, DESC

to control/manage the alerting. When there is any ongoing maintenance then users will be setting the Maintenance field to "YES" or "Y" in the lookup file there by suppressing the alerts during the maintenance window as the where condition will not show the results as the "Maintenance" field is not null and alerts will not be triggered or sent out even though the host/process is down.

0 Karma
Highlighted

Re: Why is sub-search getting auto-finalized after 60 seconds and triggering false alerts

SplunkTrust
SplunkTrust

So to add extra fields make sure you include them either directly in the CSV or as eval with values set to default as follows:

 index="nix" sourcetype="ps" USER="web" host IN(server1, server2, server3, server4) ARGS="*process.conf" 
   | dedup host PID  
   | rex field=ARGS "\/conf\/(?\w+.*)" 
   | rename host as SERVER_NAM 
   | stats count by SERVER_NAM,PROCESS_NAME
   | append 
   [| inputlookup process.csv
   | eval count=0 | eval extrafield1="default1", extrafield2="default2"...]
   | stats sum(count) as count by SERVER_NAM,PROCESS_NAME, extrafield1, extrafield2,...

so long as the fields exist in the logs and in the csv your stats should work fine.

0 Karma
Highlighted

Re: Why is sub-search getting auto-finalized after 60 seconds and triggering false alerts

Engager

Thanks again for your help,

I've tried your query by tweaking it a little bit as I am actually looking for a different output because the fields Comment, DESC and Maintenance I am trying to show or populate in the output are existing only in the lookup file and not in the logs.

This is the query that is working for me, I've tested this query by forcefully removing a server name from the main/top search and I am able to see the same server name in the results with a zero count and I believe this query will not pose any auto-finalizing issues moving forward:

index="nix" sourcetype="ps" USER="web" host IN(server1, server2, server3, server4) ARGS="*process.conf" 
| dedup host PID  
| rex field=ARGS "\/conf\/(?\w+.*)" 
| rename host as SERVER_NAM 
| stats count by SERVER_NAM,PROCESS_NAME
| append 
[| inputlookup process.csv
| eval count=0]
| stats sum(count) as count by SERVER_NAM,PROCESS_NAME
| lookup AEM_MCC_STL_KSC_apache_process_monitor_list.csv SERVER_NAM as SERVER_NAM, PROCESS_NAME as PROCESS_NAME OUTPUT SERVER_NAM as SERVER_NAM, PROCESS_NAME as PROCESS_NAME, Comment as Comment, DESC as DESC, Maintenance as Maintenance
| where count=0 AND isnull(Maintenance) 
| fields SERVER_NAM,PROCESS_NAME, Comment, DESC

Basically I am trying to match or compare the SERVERNAME and PROCESSNAME fields between the logs and lookup file and if the count from the logs and lookup file matches that means the process/host is down and additionally I am trying to include the fields Comment, DESC and Maintenance for user to understand the fields and to control the alerting by setting the Maintenance field to YES during the maintenance to suppress the alerts.

0 Karma