Splunk Search

How to remove null values then add fields?

user33
Path Finder

Hi all, would love help with this one. 

I currently have a query where I have 4 different processing times by sessionId. I want the ability to remove/ delete any sessionId from the results that has a blank/ null value. If any one of the four processing times,  has a blank or null value, remove the sessionId from the stats. 

After that, I would like the ability to add those four processing times into one processing time by _time and take the perc95. 

Any assistance is appreciated. Let me know if more clarification is needed. Thank you!!Splunk75.png

index= [...]
| bucket _time span=1h
| eval apiIdentifier=coalesce(msg.apiIdentifier,apiIdentifier)
| eval apiName=coalesce(msg.apiName,apiName)
| eval apiVersion=coalesce(msg.apiVersion,apiVersion)
| eval clientRequestId=coalesce(msg.clientRequestId,clientRequestId)
| eval companyId=coalesce(msg.companyId,companyId)
| eval contentType=coalesce(msg.contentType,contentType)
| eval datacenter=coalesce(msg.datacenter,datacenter)
| eval entityId=coalesce(msg.entityId,entityId)
| eval logType=coalesce(msg.logType,logType)
| eval processingTime=coalesce(msg.processingTime,processingTime)
| eval responseCode=coalesce(msg.responseCode,responseCode)
| eval serverId=coalesce(msg.serverId,serverId)
| eval sessionId=coalesce(msg.sessionId,sessionId)
| eval timestamp=coalesce(msg.timestamp,timestamp)
| eval totalResponseTime=coalesce(msg.totalResponseTime,totalResponseTime)
| eval session-id=coalesce(a_session_id, sessionId)
| eval AM2JSRT = if(a_log_type=="Response" AND isNum(a_req_process_time), a_req_process_time,0) ,JS2ISRT = if(logType=="JS2IS", processingTime, 0), JS2AMRT = if(logType=="JS2AM", processingTime, 0), AM2DPRT = if(a_log_type=="Response" AND isNum(a_res_process_time), a_res_process_time,0)
| stats SUM(AM2JSRT) as AM2JSRespTime, SUM(JS2ISRT) as JS2ISRespTime, SUM(JS2AMRT) as JS2AMRespTime, SUM(AM2DPRT) as AM2DPRespTime by sessionId
| eval gw_processingTime=(AM2JSRespTime+JS2ISRespTime+JS2AMRespTime+AM2DPRespTime

 

Labels (2)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

That is why I asked about your desire to include time but didn't have _time in your stats.  The command you are looking for is eventstats.  

index=* (sourcetype="*" OR sourcetype="*" OR sourcetype="*") ("Response" OR "IS2JS")
| eval session_id= coalesce(a_session_id, sessionId)
| bucket _time span=1h
| fields processingTime, apiName, a_log_type, a_api_responsetime, a_api_name, responsetime, IS2JSRT, session_id, a_session_id, sessionId, PlatformProcessingTime
| eventstats max(eval(a_api_responsetime)) as responsetime, max(eval(processingTime)) as IS2JSRT by session_id
| eval PlatformProcessingTime = (responsetime - IS2JSRT)
| timechart perc95(PlatformProcessingTime)

View solution in original post

Tags (1)

user33
Path Finder

Thank you Taruchit and yuanliu  !! I will take those into consideration for removing null values. 

It seems combining the two logs with eval coalesce as well as removing null values caused issues for the query. So, for now, I have decided to ignore that. I will create two separate queries later. 

I have decided to go a different, simpler route this time. I now have two processing times/ fields to work with: 

processingTime and a_api_responsetime. 

The math to get the PlatformProcessingTime is 

PlatformProcessingTime = a_api_responsetime - processingTime. 

My largest issue that I cannot seem to solve is calculating the processing time by session_id (or one API call) then talking the 95th percentile of the PlatformProcessingTime by _time. The PlatformProcessingTime has to be calculated by session_is. But then, how do I display in the Splunk stats the 95th percentile of PlatformProcessingTime  by time?

Any assistance is appreciated please!

What I have so far:

 

 

index=* (sourcetype="*" OR sourcetype="*" OR sourcetype="*") ("Response" OR "IS2JS")
| eval session_id= coalesce(a_session_id, sessionId)
| bucket _time span=1h
| fields processingTime, apiName, a_log_type, a_api_responsetime, a_api_name, responsetime, IS2JSRT, session_id, a_session_id, sessionId, PlatformProcessingTime
| stats max(eval(a_api_responsetime)) as responsetime, max(eval(processingTime)) as IS2JSRT by session_id
| eval PlatformProcessingTime = (responsetime - IS2JSRT)

 

 

user33_0-1687011223070.png

 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

That is why I asked about your desire to include time but didn't have _time in your stats.  The command you are looking for is eventstats.  

index=* (sourcetype="*" OR sourcetype="*" OR sourcetype="*") ("Response" OR "IS2JS")
| eval session_id= coalesce(a_session_id, sessionId)
| bucket _time span=1h
| fields processingTime, apiName, a_log_type, a_api_responsetime, a_api_name, responsetime, IS2JSRT, session_id, a_session_id, sessionId, PlatformProcessingTime
| eventstats max(eval(a_api_responsetime)) as responsetime, max(eval(processingTime)) as IS2JSRT by session_id
| eval PlatformProcessingTime = (responsetime - IS2JSRT)
| timechart perc95(PlatformProcessingTime)
Tags (1)

user33
Path Finder

Ah, ok. I see. I missed that part. Unfamiliar with eventstats. Thank you very much. That worked!!

0 Karma

yuanliu
SplunkTrust
SplunkTrust

You should first clearly state what those "4 different processing times" are.  If they refer to the columns shown in the screenshot - again, using text illustration is much better than using screenshot, they would be AM2JSRespTime, JS2ISRespTime, JS2AMRespTime, and AM2DPRespTime.  According to illustrated code, they are calculated from three fields, namely msg.processingTime, processingTime, and a_req_process_time.

In your formula, there will be no chance for AM2JRT and AM2DPRT to be null because they are either from a_req_process_time that you checked isNum or zero.  It is possible for msg.processingTime and processingTime to be null, hence JS2ISRT and JS2AMRT can be null.  But then, your stats sums each up by session_Id.  Does each session_Id have only one processingTime and one a_req_process_time?  If so, the task can be even simpler. 

If not, what you ask can easily be processed after stats, like

index= [...]
| bucket _time span=1h
| eval apiIdentifier=coalesce(msg.apiIdentifier,apiIdentifier)
| eval apiName=coalesce(msg.apiName,apiName)
| eval apiVersion=coalesce(msg.apiVersion,apiVersion)
| eval clientRequestId=coalesce(msg.clientRequestId,clientRequestId)
| eval companyId=coalesce(msg.companyId,companyId)
| eval contentType=coalesce(msg.contentType,contentType)
| eval datacenter=coalesce(msg.datacenter,datacenter)
| eval entityId=coalesce(msg.entityId,entityId)
| eval logType=coalesce(msg.logType,logType)
| eval processingTime=coalesce(msg.processingTime,processingTime) 
| eval responseCode=coalesce(msg.responseCode,responseCode)
| eval serverId=coalesce(msg.serverId,serverId)
| eval sessionId=coalesce(msg.sessionId,sessionId)
| eval timestamp=coalesce(msg.timestamp,timestamp)
| eval totalResponseTime=coalesce(msg.totalResponseTime,totalResponseTime)
| eval session-id=coalesce(a_session_id, sessionId)
| eval AM2JSRT = if(a_log_type=="Response" AND isNum(a_req_process_time), a_req_process_time,0) ,JS2ISRT = if(logType=="JS2IS", processingTime, 0), JS2AMRT = if(logType=="JS2AM", processingTime, 0), AM2DPRT = if(a_log_type=="Response" AND isNum(a_res_process_time), a_res_process_time,0)
| stats SUM(AM2JSRT) as AM2JSRespTime, SUM(JS2ISRT) as JS2ISRespTime, SUM(JS2AMRT) as JS2AMRespTime, SUM(AM2DPRT) as AM2DPRespTime by sessionId _time
| where isnotnull(AM2JSRespTime) AND isnotnull(JS2ISRespTime) AND isnotnull(JS2AMRespTime) AND isnotnull(AM2DPRespTime)
| eventstats perc95(eval(AM2JSRespTime+JS2ISRespTime+JS2AMRespTime+AM2DPRespTime)) as gw_processingTime by sessionId _time

Note when you describe your desire to calculate perc95, you also said "by _time".  But your illustrated code does not include _time in stats, therefore you cannot get any information related to _time.  In the above, I added _time in both stats and eventstats.

Hope this helps.

 

 

Tags (1)
0 Karma

Taruchit
Contributor

Hi @user33,

As I understand from your details, you have a set of results with session id and 4-5 columns where each column may have null values in some rows. And you want to remove all those session ids from your records against which there are one or more null values in the corresponding columns. You have replaced all null values with 0 (zero). Please correct if I have misunderstood. 

In such case, you can try using |where command and place the names of the columns on which you want to apply the filter.

|where column1=0 OR column2=0 OR column3=0

This will check all column values in each row and if even one condition is satisfied for a given row, that row gets removed from the resultset.

Please share your results.

Thank you
Taruchit

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...