Hi,
I have the below search query to monitor the process/instances running on our servers and the sub-search within the search is getting auto-finalized and the false alerts are triggered.
Below is the search I am having issues with:
| inputlookup process.csv
| eval count=0
| join type=left PROCESS_NAME, SERVER_NAM overwrite=true
[| search index="nix" sourcetype="ps" USER="web" host IN(server1, server1) ARGS="*process.conf"
| dedup host PID
| rex field=ARGS "\/conf\/(?<PROCESS_NAME>\w+.*)"
| search PROCESS_NAME="*""
| rename host as SERVER_NAM
| stats count by SERVER_NAM,PROCESS_NAME]
| search count=0 NOT Maintenance="*""
| sort SERVER_NAM
| streamstats count as row
| table row,SERVER_NAM,PROCESS_NAME,DESC,Comment
As a workaround, I have created a new search for alerting purpose to avoid false alerts but that doesn't seem to work either.
New search:
| inputlookup process.csv
| eval count=0
| appendcols override=true maxtime=100 timeout=100
[| search index="nix" sourcetype="ps" USER="web" host IN(server1, server1) ARGS="*process.conf"
| dedup host PID
| rex field=ARGS "\/conf\/(?<PROCESS_NAME>\w+.*)"
| rename host as SERVER_NAM
| stats count by SERVER_NAM,PROCESS_NAME
| sort SERVER_NAM]
| where count="0" AND isnull(Maintenance)
| fields SERVER_NAM, PROCESS_NAME, DESC, Comment
Can you please help me with this search? I've searched for related questions on Splunk Answers but nothing seemed to work.
Hi @shashank8,
Usually the join should be used with the smaller data set within the subsearch. I advise you to do one of two things.
1- Run the search for index="nix" ....
and then join the lookup to that.
2- Instead of using joins use the lookup
command to simply append the columns you want from process.csv directly to your logs.
Cheers,
David
Hi @shashank8,
Usually the join should be used with the smaller data set within the subsearch. I advise you to do one of two things.
1- Run the search for index="nix" ....
and then join the lookup to that.
2- Instead of using joins use the lookup
command to simply append the columns you want from process.csv directly to your logs.
Cheers,
David
Thank you for your suggestions and I kind of tried both of those options earlier:
1- Run the search for index="nix" .... and then join the lookup to that.
If there are no/zero results or if the process/instance is down on one of the hosts then even if I join my lookup there will be no result for the missing data and the alert won't be triggered. Hence I used the inputlookup as my main/primary search so that if there are any missing values or zero results for any host or process then the query will compare the count field value and if it is zero there will be results and an alert is triggered.
2- Instead of using joins use the lookup command to simply append the columns you want from process.csv directly to your logs.
The same thing as in the above scenario happened with my query when I tried to match the values with the lookup file using a lookup command.
In case you want all the values to show regardless whether they are in the logs or not then at the end of your search append the content of your CSV and dedup and non-empty value. All remaining empty values are the ones that didnt match in your logs so you can just run a fillnull on that. Does that make sense to you ?
Thank you so much for your help, I have appended the csv file values at the end and it worked as I tested it by removing one of the server from the query and I am able to see the results for the missing server. Will let you know how this works with in the next few days to see if it still triggers any false alerts which it should not as the sub-search is a lookup file which would load in a few seconds.
Below is the query I used:
index="nix" sourcetype="ps" USER="web" host IN(server1, server1) ARGS="*process.conf"
| dedup host PID
| rex field=ARGS "\/conf\/(?<PROCESS_NAME>\w+.*)"
| rename host as SERVER_NAM
| stats count by SERVER_NAM,PROCESS_NAME
| appendcols override=false
[| inputlookup process.csv
| eval count=0]
| where count=0 AND isnull(Maintenance)
| fields SERVER_NAM, PROCESS_NAME, Comment, DESC
@DavidHourani
The search seemed to work for a few days but then I realized it is not giving me the correct results when one of the servers is missing.
index="nix" sourcetype="ps" USER="web" host IN(server1, server2, server3, server4) ARGS="*process.conf"
| dedup host PID
| rex field=ARGS "\/conf\/(?<PROCESS_NAME>\w+.*)"
| rename host as SERVER_NAM
| stats count by SERVER_NAM,PROCESS_NAME
| appendcols override=false
[| inputlookup process.csv
| eval count=0]
| where count=0 AND isnull(Maintenance)
| fields SERVER_NAM, PROCESS_NAME, Comment, DESC
In this search if there are no or zero results for server1 the expected result for this query should be:
SERVER_NAM PROCESS_NAME Comment DESC
server1 process.conf DC_SERVER_1
But the result I am seeing is:
SERVER_NAM PROCESS_NAME Comment DESC
server4 process.conf DC_SERVER_1
Here is my csv lookup file content for reference:
PROCESS_NAME Comment DESC Maintenance SERVER_NAM
process.conf DC_SERVER_1 server1
process.conf DC_SERVER_2 server2
process.conf DC_SERVER_3 server3
process.conf DC_SERVER_4 server4
Can you please tell me what I am missing in my query ?
Sure, try it like this instead :
index="nix" sourcetype="ps" USER="web" host IN(server1, server2, server3, server4) ARGS="*process.conf"
| dedup host PID
| rex field=ARGS "\/conf\/(?<PROCESS_NAME>\w+.*)"
| rename host as SERVER_NAM
| stats count by SERVER_NAM,PROCESS_NAME
| append
[| inputlookup process.csv
| eval count=0]
| stats sum(count) as count by SERVER_NAM,PROCESS_NAME
Thanks David for your help,
I can see the correct results (the host that is actually missing) when a host or results are missing from the logs.
But I just need to include a few more fields from my lookup file in the output
I want to show the fields Comment, DESC and Maintenance in the end results so that users can identify what is the server description and also to use the condition
| where count=0 AND isnull(Maintenance)
| fields SERVER_NAM, PROCESS_NAME, Comment, DESC
to control/manage the alerting. When there is any ongoing maintenance then users will be setting the Maintenance field to "YES" or "Y" in the lookup file there by suppressing the alerts during the maintenance window as the where condition will not show the results as the "Maintenance" field is not null and alerts will not be triggered or sent out even though the host/process is down.
So to add extra fields make sure you include them either directly in the CSV or as eval with values set to default as follows:
index="nix" sourcetype="ps" USER="web" host IN(server1, server2, server3, server4) ARGS="*process.conf"
| dedup host PID
| rex field=ARGS "\/conf\/(?\w+.*)"
| rename host as SERVER_NAM
| stats count by SERVER_NAM,PROCESS_NAME
| append
[| inputlookup process.csv
| eval count=0 | eval extrafield1="default1", extrafield2="default2"...]
| stats sum(count) as count by SERVER_NAM,PROCESS_NAME, extrafield1, extrafield2,...
so long as the fields exist in the logs and in the csv your stats should work fine.
Thanks again for your help,
I've tried your query by tweaking it a little bit as I am actually looking for a different output because the fields Comment, DESC and Maintenance I am trying to show or populate in the output are existing only in the lookup file and not in the logs.
This is the query that is working for me, I've tested this query by forcefully removing a server name from the main/top search and I am able to see the same server name in the results with a zero count and I believe this query will not pose any auto-finalizing issues moving forward:
index="nix" sourcetype="ps" USER="web" host IN(server1, server2, server3, server4) ARGS="*process.conf"
| dedup host PID
| rex field=ARGS "\/conf\/(?\w+.*)"
| rename host as SERVER_NAM
| stats count by SERVER_NAM,PROCESS_NAME
| append
[| inputlookup process.csv
| eval count=0]
| stats sum(count) as count by SERVER_NAM,PROCESS_NAME
| lookup AEM_MCC_STL_KSC_apache_process_monitor_list.csv SERVER_NAM as SERVER_NAM, PROCESS_NAME as PROCESS_NAME OUTPUT SERVER_NAM as SERVER_NAM, PROCESS_NAME as PROCESS_NAME, Comment as Comment, DESC as DESC, Maintenance as Maintenance
| where count=0 AND isnull(Maintenance)
| fields SERVER_NAM,PROCESS_NAME, Comment, DESC
Basically I am trying to match or compare the SERVER_NAME and PROCESS_NAME fields between the logs and lookup file and if the count from the logs and lookup file matches that means the process/host is down and additionally I am trying to include the fields Comment, DESC and Maintenance for user to understand the fields and to control the alerting by setting the Maintenance field to YES during the maintenance to suppress the alerts.