Dashboards & Visualizations

Why are multiple base queries in Splunk dashboard giving improper visualizations?

mandlikarbaaz
Loves-to-Learn Everything

Hello,

I have 3 base queries in my splunk dashboard. But when the dashboard loads, only 1 or 2 base queries run displaying the data and visualization. Request you to please help me on this. PFB the xml data:

 

<form>
<label>All Errors</label>
<description>Errors</description>
<fieldset submitButton="false">
<input type="time" token="Time" searchWhenChanged="true">
<label>Time</label>
<default>
<earliest>-24h</earliest>
<latest>now</latest>
</default>
</input>
</fieldset>
<search id="search_urls">
<query>
index=abc sourcetype=abc cf_org_name=abc cf_space_name=PROD cf_app_name=* | rex field=_raw "POST\s|GET\s(?&lt;URL&gt;[a-zA-Z0-9\W].+)\?|\s\HTTP" | rex field=_raw "x_b3_traceid\:\"(?&lt;TRACE_ID&gt;[a-zA-Z0-9]+)\"" | rex field=_raw "(?&lt;METHOD&gt;POST|GET)" | rex field=_raw "HTTP\/1.1\"\s+(?&lt;STATUS&gt;\d\d\d)\s" | join TRACE_ID [search index=abc sourcetype=abc cf_org_name=abc cf_space_name=PROD cf_app_name=* cf_instance_index="*APP/PROC/WEB*" (severity!=INFO OR tag=error) | rex field=_raw "(?&lt;ERROR_MESSAGE&gt;com.tmobile[a-zA-Z0-9\W].+)$" | rex field=_raw "\,(?&lt;TRACE_ID&gt;[0-9a-zA-Z]+)\,"]
</query>
<earliest>$Time.earliest$</earliest>
<latest>$Time.latest$</latest>
</search>
<search id="performance_urls">
<query>
index=abc sourcetype=abc cf_org_name=abc cf_space_name=PROD cf_app_name=* | rex field=_raw "POST\s|GET\s(?&lt;URL&gt;[a-zA-Z0-9\W].+)\?|\s\HTTP" | rex field=_raw "x_b3_traceid\:\"(?&lt;TRACE_ID&gt;[a-zA-Z0-9]+)\"" | rex field=_raw "(?&lt;METHOD&gt;POST|GET)" | rex field=_raw "response_time\:(?&lt;RESPONSE_TIME&gt;[\d\.\d]+)" | rex field=_raw "HTTP\/1.1\"\s+(?&lt;STATUS&gt;\d\d\d)\s"
</query>
<earliest>$Time.earliest$</earliest>
<latest>$Time.latest$</latest>
</search>
<search id="errors">
<query>index=abc sourcetype=abc cf_org_name=abc cf_instance_index="*APP/PROC/WEB*" cf_app_name=* | rex field=_raw "(?&lt;ORA_ERROR&gt;ORA\-.+)$" | rex field=_raw "(?&lt;KAFKA_ERROR&gt;org.apache.kafka[a-zA-Z0-9\W].+)$" | rex field=_raw "(?&lt;ERROR_MESSAGE&gt;com.tmobile[0-9a-zA-Z\W].+)$" | rex field=_raw "message\:\s+(?&lt;errorMessage&gt;[0-9a-zA-Z\W].+)$"
</query>
<earliest>$Time.earliest$</earliest>
<latest>$Time.latest$</latest>
</search>
<row>
<panel>
<title>Timechart based on URLs (only 4xx/5xx)</title>
<chart>
<search base="search_urls">
<query> search STATUS&gt;=400 AND URL!="/" | timechart span=1h count by URL usenull=f useother=f</query>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Timechart based on URLs (only 4xx/5xx - Unique Trace IDs)</title>
<chart>
<search base="search_urls">
<query> search STATUS&gt;=400 AND URL!="/" | dedup TRACE_ID | timechart span=1h count by URL usenull=f useother=f</query>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Statistics based on URL, STATUS, METHOD, cf_app_name, ERROR_MESSAGE (Sorted by maximum counts)</title>
<table>
<search base="search_urls">
<query> search STATUS&gt;=400 AND URL!="/" | stats count by URL, STATUS, METHOD, cf_app_name, ERROR_MESSAGE | sort - count | head 6</query>
</search>
<option name="drilldown">row</option>
<option name="percentagesRow">false</option>
<option name="totalsRow">false</option>
<option name="wrap">false</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Timechart based on URLs (including 2xx/3xx)</title>
<chart>
<search base="search_urls">
<query> | timechart span=1h count by URL useother=f usenull=f</query>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Timechart based on URLs (including 2xx/3xx - Unique Trace IDs)</title>
<chart>
<search base="search_urls">
<query> | dedup TRACE_ID | timechart span=1h count by URL useother=f usenull=f</query>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Statistics based on URL, STATUS, METHOD, cf_app_name, ERROR_MESSAGE (including 2xx/3xx)</title>
<table>
<search base="search_urls">
<query> | stats count by URL, STATUS, METHOD, cf_app_name, ERROR_MESSAGE | sort - count | head 6</query>
</search>
<option name="drilldown">row</option>
<option name="totalsRow">false</option>
<option name="wrap">false</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Database Errors (Timechart)</title>
<chart>
<search base="errors">
<query> search tag=error | timechart span=1h count by ORA_ERROR usenull=f useother=f</query>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
</chart>
</panel>
<panel>
<title>Database Errors by cf_app_name, Error Message (sorted by maximum counts)</title>
<table>
<search base="errors">
<query> search tag=error | stats count by cf_app_name, ORA_ERROR | sort - count | head 6</query>
</search>
<option name="drilldown">row</option>
<option name="wrap">false</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Timechart of generic messages</title>
<chart>
<search base="errors">
<query> search errorMessage!="null" | timechart span=1h count by errorMessage useother=f usenull=f</query>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
</chart>
</panel>
<panel>
<title>Statistics of generic messages based on cf_app_name</title>
<table>
<search base="errors">
<query> search errorMessage!="null" | stats count by cf_app_name, errorMessage | sort - count | head 6</query>
</search>
<option name="drilldown">row</option>
<option name="wrap">false</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Timechart of Kafka Errors</title>
<chart>
<search base="errors">
<query> search severity!=INFO OR tag=error | timechart span=1h count by KAFKA_ERROR usenull=f useother=f</query>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Statistics of Kafka Errors based on cf_app_name</title>
<table>
<search base="errors">
<query> search severity!=INFO OR tag=error | stats count by cf_app_name, KAFKA_ERROR | sort - count | head 6</query>
</search>
<option name="drilldown">row</option>
<option name="wrap">false</option>
</table>
</panel>
</row>
<row>
<panel>
<title>RMQ Errors (Timechart)</title>
<chart>
<search base="errors">
<query> search ERROR_MESSAGE="*RMQ*" AND (severity!=INFO OR tag=error) | timechart span=1h count by ERROR_MESSAGE usenull=f useother=f</query>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
</chart>
</panel>
<panel>
<title>Statistics of RMQ Errors based on cf_app_name</title>
<table>
<search base="errors">
<query> search ERROR_MESSAGE="*RMQ*" AND (severity!=INFO OR tag=error) | stats count by cf_app_name, ERROR_MESSAGE | sort - count | head 6</query>
</search>
<option name="drilldown">row</option>
<option name="refresh.display">progressbar</option>
<option name="wrap">false</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Deep Errors (Timechart)</title>
<chart>
<search base="errors">
<query> search ERROR_MESSAGE="*deep*" AND (severity!=INFO OR tag=error) | timechart span=1h count by ERROR_MESSAGE usenull=f useother=f</query>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
</chart>
</panel>
<panel>
<title>Statistics of Deep Errors based on cf_app_name</title>
<table>
<search base="errors">
<query> search ERROR_MESSAGE="*deep*" AND (severity!=INFO OR tag=error) | stats count by cf_app_name, ERROR_MESSAGE | sort - count | head 6</query>
</search>
<option name="drilldown">row</option>
<option name="wrap">false</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Performance of 4xx/5xx URLs - Response &gt; 10 sec (Timechart)</title>
<chart>
<search base="performance_urls">
<query> search STATUS&gt;=400 AND URL!="/" AND RESPONSE_TIME&gt;10 | timechart span=1h count by URL usenull=f useother=f</query>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Statistics of response time &gt; 10 sec for 4xx/5xx URLs</title>
<table>
<search base="performance_urls">
<query> search STATUS&gt;=400 AND URL!="/" AND RESPONSE_TIME&gt;10 | stats count by URL, cf_app_name, STATUS, METHOD | sort - count | head 6</query>
</search>
<option name="drilldown">row</option>
<option name="wrap">false</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Performance of URLs 2xx/3xx/4xx/5xx - Response &gt; 10 sec (Timechart)</title>
<chart>
<search base="performance_urls">
<query> search URL!="/" AND RESPONSE_TIME&gt;10 | timechart span=1h count by URL useother=f usenull=f</query>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
</chart>
</panel>
<panel>
<title>Statistics of response time &gt; 10 sec for 2xx/3xx/4xx/5xx URLs</title>
<table>
<search base="performance_urls">
<query> search URL!="/" AND RESPONSE_TIME&gt;10 | stats count by URL, cf_app_name, STATUS, METHOD | sort - count | head 6</query>
</search>
<option name="drilldown">row</option>
<option name="refresh.display">progressbar</option>
<option name="wrap">false</option>
</table>
</panel>
</row>
</form>

Labels (5)
0 Karma

mandlikarbaaz
Loves-to-Learn Everything

The results are somehow truncated from the base queries to last 1 hour even though the time token is selected for last 6 hours

0 Karma

mandlikarbaaz
Loves-to-Learn Everything

Hello,

Thanks for this wonderful guide. This helped me ease some of my base queries. However, the first base query still limits the chart and stats data even though I am using transforming commands. Can you please help!?

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Can you post your current base query as I understood that you have modified it? Also those queries which are using it? And please use </> button to include those.

0 Karma

mandlikarbaaz
Loves-to-Learn Everything
<search id="base1">
<query>
index=abc sourcetype=abc cf_org_name=abc cf_space_name=PROD cf_app_name=* | rex field=_raw "response_time\:(?&lt;RESPONSE_TIME&gt;[\d\.\d]+)" | rex field=_raw "POST\s|GET\s(?&lt;URL&gt;[a-zA-Z0-9\W].+)\?|\s\HTTP" | rex field=_raw "x_b3_traceid\:\"(?&lt;TRACE_ID&gt;[a-zA-Z0-9]+)\"" | rex field=_raw "(?&lt;METHOD&gt;POST|GET)" | rex field=_raw "HTTP\/1.1\"\s+(?&lt;STATUS&gt;\d\d\d)\s" | join TRACE_ID [search index=cloudfoundry sourcetype=cloudfoundry_apps cf_org_name=Eligibility-Engine cf_space_name=PROD cf_app_name=* cf_instance_index="*APP/PROC/WEB*" (severity!=INFO OR tag=error) | rex field=_raw "(?&lt;ERROR_MESSAGE&gt;com.tmobile[a-zA-Z0-9\W].+)$" | rex field=_raw "\,(?&lt;TRACE_ID&gt;[0-9a-zA-Z]+)\,"] | stats count by cf_app_name, URL, METHOD, STATUS, ERROR_MESSAGE
</query>
<earliest>$Time.earliest$</earliest>
<latest>$Time.latest$</latest>
</search>
<search id="base2">
<query>
index=abc sourcetype=abc cf_org_name=abc cf_org_name=Eligibility-Engine cf_space_name=PROD cf_app_name=* | rex field=_raw "response_time\:(?&lt;RESPONSE_TIME&gt;[\d\.\d]+)" | rex field=_raw "POST\s|GET\s(?&lt;URL&gt;[a-zA-Z0-9\W].+)\?|\s\HTTP" | rex field=_raw "x_b3_traceid\:\"(?&lt;TRACE_ID&gt;[a-zA-Z0-9]+)\"" | rex field=_raw "(?&lt;METHOD&gt;POST|GET)" | rex field=_raw "HTTP\/1.1\"\s+(?&lt;STATUS&gt;\d\d\d)\s" | stats count by _time, RESPONSE_TIME, URL, TRACE_ID, METHOD, STATUS, cf_app_name
</query>
<earliest>$Time.earliest$</earliest>
<latest>$Time.latest$</latest>
</search>
<search id="base3">
<query>
index=abc sourcetype=abc cf_org_name=abc cf_org_name=Eligibility-Engine cf_space_name=PROD cf_app_name=* | rex field=_raw "response_time\:(?&lt;RESPONSE_TIME&gt;[\d\.\d]+)" | rex field=_raw "POST\s|GET\s(?&lt;URL&gt;[a-zA-Z0-9\W].+)\?|\s\HTTP" | rex field=_raw "x_b3_traceid\:\"(?&lt;TRACE_ID&gt;[a-zA-Z0-9]+)\"" | rex field=_raw "(?&lt;METHOD&gt;POST|GET)" | rex field=_raw "HTTP\/1.1\"\s+(?&lt;STATUS&gt;\d\d\d)\s" | rex field=_raw "(?&lt;ERROR_MESSAGE&gt;com.tmobile[a-zA-Z0-9\W].+)$" | rex field=_raw "\,(?&lt;traceID&gt;[0-9a-zA-Z]+)\," | eval newTraceID=if(TRACE_ID==traceID, "no_match", TRACE_ID) | stats count by _time, cf_app_name, newTraceID, URL, RESPONSE_TIME, STATUS, METHOD
</query>
<earliest>$Time.earliest$</earliest>
<latest>$Time.latest$</latest>
</search>
0 Karma

mandlikarbaaz
Loves-to-Learn Everything

Is it because of the 'JOIN' command I am using in 'base1' query?

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Quite probably that is the issue. You can check what Job inspector told about your query. I suppose that it has stopped as time 60s has exceed for subsearch.

0 Karma

mandlikarbaaz
Loves-to-Learn Everything

I can see ttl=600 and runtime auto_cancel=90

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Those are values for search not a subsearch (also base search).

0 Karma

mandlikarbaaz
Loves-to-Learn Everything

Can you please assist me for which value in the job inspector I am looking for? There are a lot of values.

0 Karma

mandlikarbaaz
Loves-to-Learn Everything

This is what I found from job inspector. PFB a snippet from the inspector:

 

-------------------------------------------inspector-----------------------------------------
info : Search finalized.
info : The search auto-finalized after it reached its time limit: 420 seconds.
info : [subsearch]: Subsearch produced 50000 results, truncating to maxout [subsearch_maxout] 50000.
--------------------------------------------logs----------------------------------------------
sid='abc.5641278_20B7EC22-0142-4E25-BEA2-C1A08D00C00E'
05-20-2022 02:15:39.796 INFO ReducePhaseExecutor - Downloading all remote search.log / search_telemetry.json files took 1.408 seconds
05-20-2022 02:15:39.799 INFO ReducePhaseExecutor - Ending phase_1
05-20-2022 02:15:39.799 INFO UserManager - Unwound user context: abc -> NULL
05-20-2022 02:15:39.799 INFO ReducePhaseExecutor - ReducePhaseExecutor=1 action=FINALIZE
05-20-2022 02:15:39.799 INFO DispatchExecutor - User applied action=FINALIZE while status=2
05-20-2022 02:15:39.804 INFO UserManager - Unwound user context: abc -> NULL
05-20-2022 02:15:39.805 INFO DispatchStorageManager - Remote storage disabled for search artifacts.
05-20-2022 02:15:39.805 INFO DispatchManager - DispatchManager::dispatchHasFinished(id='abc.5641278_20B7EC22-0142-4E25-BEA2-C1A08D00C00E', username='abc')
05-20-2022 02:15:39.805 INFO UserManager - Unwound user context: abc -> NULL
05-20-2022 02:15:40.022 INFO UserManager - Unwound user context: abc -> NULL
05-20-2022 02:15:40.695 INFO SearchStatusEnforcer - SearchStatusEnforcer is already terminated
05-20-2022 02:15:40.696 INFO UserManager - Unwound user context: abc -> NULL
05-20-2022 02:15:40.696 INFO LookupDataProvider - Clearing out lookup shared provider map

0 Karma

isoutamo
SplunkTrust
SplunkTrust
You found the correct lines. There is mention that both limits have reached.
0 Karma

mandlikarbaaz
Loves-to-Learn Everything

ok. As a power user, I don't have admin access to change those conf files. I will have to drop a ticket request for admins to change the stanzas in limits.conf file.

However, if you could help me with the ideal values it would be really helpful.

Thanks

0 Karma
Get Updates on the Splunk Community!

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...

Observability protocols to know about

Observability protocols define the specifications or formats for collecting, encoding, transporting, and ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...