Hello,
I have 3 base queries in my splunk dashboard. But when the dashboard loads, only 1 or 2 base queries run displaying the data and visualization. Request you to please help me on this. PFB the xml data:
<form>
<label>All Errors</label>
<description>Errors</description>
<fieldset submitButton="false">
<input type="time" token="Time" searchWhenChanged="true">
<label>Time</label>
<default>
<earliest>-24h</earliest>
<latest>now</latest>
</default>
</input>
</fieldset>
<search id="search_urls">
<query>
index=abc sourcetype=abc cf_org_name=abc cf_space_name=PROD cf_app_name=* | rex field=_raw "POST\s|GET\s(?<URL>[a-zA-Z0-9\W].+)\?|\s\HTTP" | rex field=_raw "x_b3_traceid\:\"(?<TRACE_ID>[a-zA-Z0-9]+)\"" | rex field=_raw "(?<METHOD>POST|GET)" | rex field=_raw "HTTP\/1.1\"\s+(?<STATUS>\d\d\d)\s" | join TRACE_ID [search index=abc sourcetype=abc cf_org_name=abc cf_space_name=PROD cf_app_name=* cf_instance_index="*APP/PROC/WEB*" (severity!=INFO OR tag=error) | rex field=_raw "(?<ERROR_MESSAGE>com.tmobile[a-zA-Z0-9\W].+)$" | rex field=_raw "\,(?<TRACE_ID>[0-9a-zA-Z]+)\,"]
</query>
<earliest>$Time.earliest$</earliest>
<latest>$Time.latest$</latest>
</search>
<search id="performance_urls">
<query>
index=abc sourcetype=abc cf_org_name=abc cf_space_name=PROD cf_app_name=* | rex field=_raw "POST\s|GET\s(?<URL>[a-zA-Z0-9\W].+)\?|\s\HTTP" | rex field=_raw "x_b3_traceid\:\"(?<TRACE_ID>[a-zA-Z0-9]+)\"" | rex field=_raw "(?<METHOD>POST|GET)" | rex field=_raw "response_time\:(?<RESPONSE_TIME>[\d\.\d]+)" | rex field=_raw "HTTP\/1.1\"\s+(?<STATUS>\d\d\d)\s"
</query>
<earliest>$Time.earliest$</earliest>
<latest>$Time.latest$</latest>
</search>
<search id="errors">
<query>index=abc sourcetype=abc cf_org_name=abc cf_instance_index="*APP/PROC/WEB*" cf_app_name=* | rex field=_raw "(?<ORA_ERROR>ORA\-.+)$" | rex field=_raw "(?<KAFKA_ERROR>org.apache.kafka[a-zA-Z0-9\W].+)$" | rex field=_raw "(?<ERROR_MESSAGE>com.tmobile[0-9a-zA-Z\W].+)$" | rex field=_raw "message\:\s+(?<errorMessage>[0-9a-zA-Z\W].+)$"
</query>
<earliest>$Time.earliest$</earliest>
<latest>$Time.latest$</latest>
</search>
<row>
<panel>
<title>Timechart based on URLs (only 4xx/5xx)</title>
<chart>
<search base="search_urls">
<query> search STATUS>=400 AND URL!="/" | timechart span=1h count by URL usenull=f useother=f</query>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Timechart based on URLs (only 4xx/5xx - Unique Trace IDs)</title>
<chart>
<search base="search_urls">
<query> search STATUS>=400 AND URL!="/" | dedup TRACE_ID | timechart span=1h count by URL usenull=f useother=f</query>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Statistics based on URL, STATUS, METHOD, cf_app_name, ERROR_MESSAGE (Sorted by maximum counts)</title>
<table>
<search base="search_urls">
<query> search STATUS>=400 AND URL!="/" | stats count by URL, STATUS, METHOD, cf_app_name, ERROR_MESSAGE | sort - count | head 6</query>
</search>
<option name="drilldown">row</option>
<option name="percentagesRow">false</option>
<option name="totalsRow">false</option>
<option name="wrap">false</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Timechart based on URLs (including 2xx/3xx)</title>
<chart>
<search base="search_urls">
<query> | timechart span=1h count by URL useother=f usenull=f</query>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Timechart based on URLs (including 2xx/3xx - Unique Trace IDs)</title>
<chart>
<search base="search_urls">
<query> | dedup TRACE_ID | timechart span=1h count by URL useother=f usenull=f</query>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Statistics based on URL, STATUS, METHOD, cf_app_name, ERROR_MESSAGE (including 2xx/3xx)</title>
<table>
<search base="search_urls">
<query> | stats count by URL, STATUS, METHOD, cf_app_name, ERROR_MESSAGE | sort - count | head 6</query>
</search>
<option name="drilldown">row</option>
<option name="totalsRow">false</option>
<option name="wrap">false</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Database Errors (Timechart)</title>
<chart>
<search base="errors">
<query> search tag=error | timechart span=1h count by ORA_ERROR usenull=f useother=f</query>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
</chart>
</panel>
<panel>
<title>Database Errors by cf_app_name, Error Message (sorted by maximum counts)</title>
<table>
<search base="errors">
<query> search tag=error | stats count by cf_app_name, ORA_ERROR | sort - count | head 6</query>
</search>
<option name="drilldown">row</option>
<option name="wrap">false</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Timechart of generic messages</title>
<chart>
<search base="errors">
<query> search errorMessage!="null" | timechart span=1h count by errorMessage useother=f usenull=f</query>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
</chart>
</panel>
<panel>
<title>Statistics of generic messages based on cf_app_name</title>
<table>
<search base="errors">
<query> search errorMessage!="null" | stats count by cf_app_name, errorMessage | sort - count | head 6</query>
</search>
<option name="drilldown">row</option>
<option name="wrap">false</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Timechart of Kafka Errors</title>
<chart>
<search base="errors">
<query> search severity!=INFO OR tag=error | timechart span=1h count by KAFKA_ERROR usenull=f useother=f</query>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Statistics of Kafka Errors based on cf_app_name</title>
<table>
<search base="errors">
<query> search severity!=INFO OR tag=error | stats count by cf_app_name, KAFKA_ERROR | sort - count | head 6</query>
</search>
<option name="drilldown">row</option>
<option name="wrap">false</option>
</table>
</panel>
</row>
<row>
<panel>
<title>RMQ Errors (Timechart)</title>
<chart>
<search base="errors">
<query> search ERROR_MESSAGE="*RMQ*" AND (severity!=INFO OR tag=error) | timechart span=1h count by ERROR_MESSAGE usenull=f useother=f</query>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
</chart>
</panel>
<panel>
<title>Statistics of RMQ Errors based on cf_app_name</title>
<table>
<search base="errors">
<query> search ERROR_MESSAGE="*RMQ*" AND (severity!=INFO OR tag=error) | stats count by cf_app_name, ERROR_MESSAGE | sort - count | head 6</query>
</search>
<option name="drilldown">row</option>
<option name="refresh.display">progressbar</option>
<option name="wrap">false</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Deep Errors (Timechart)</title>
<chart>
<search base="errors">
<query> search ERROR_MESSAGE="*deep*" AND (severity!=INFO OR tag=error) | timechart span=1h count by ERROR_MESSAGE usenull=f useother=f</query>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
</chart>
</panel>
<panel>
<title>Statistics of Deep Errors based on cf_app_name</title>
<table>
<search base="errors">
<query> search ERROR_MESSAGE="*deep*" AND (severity!=INFO OR tag=error) | stats count by cf_app_name, ERROR_MESSAGE | sort - count | head 6</query>
</search>
<option name="drilldown">row</option>
<option name="wrap">false</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Performance of 4xx/5xx URLs - Response > 10 sec (Timechart)</title>
<chart>
<search base="performance_urls">
<query> search STATUS>=400 AND URL!="/" AND RESPONSE_TIME>10 | timechart span=1h count by URL usenull=f useother=f</query>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Statistics of response time > 10 sec for 4xx/5xx URLs</title>
<table>
<search base="performance_urls">
<query> search STATUS>=400 AND URL!="/" AND RESPONSE_TIME>10 | stats count by URL, cf_app_name, STATUS, METHOD | sort - count | head 6</query>
</search>
<option name="drilldown">row</option>
<option name="wrap">false</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Performance of URLs 2xx/3xx/4xx/5xx - Response > 10 sec (Timechart)</title>
<chart>
<search base="performance_urls">
<query> search URL!="/" AND RESPONSE_TIME>10 | timechart span=1h count by URL useother=f usenull=f</query>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
</chart>
</panel>
<panel>
<title>Statistics of response time > 10 sec for 2xx/3xx/4xx/5xx URLs</title>
<table>
<search base="performance_urls">
<query> search URL!="/" AND RESPONSE_TIME>10 | stats count by URL, cf_app_name, STATUS, METHOD | sort - count | head 6</query>
</search>
<option name="drilldown">row</option>
<option name="refresh.display">progressbar</option>
<option name="wrap">false</option>
</table>
</panel>
</row>
</form>
The results are somehow truncated from the base queries to last 1 hour even though the time token is selected for last 6 hours
Hi
probably you are hitting the limit of events returned by non-transforming base search (500k / 60s)? Here are some old answers about it.
In above some of those, but you could easily found more.
r. Ismo
Hello,
Thanks for this wonderful guide. This helped me ease some of my base queries. However, the first base query still limits the chart and stats data even though I am using transforming commands. Can you please help!?
Can you post your current base query as I understood that you have modified it? Also those queries which are using it? And please use </> button to include those.
<search id="base1">
<query>
index=abc sourcetype=abc cf_org_name=abc cf_space_name=PROD cf_app_name=* | rex field=_raw "response_time\:(?<RESPONSE_TIME>[\d\.\d]+)" | rex field=_raw "POST\s|GET\s(?<URL>[a-zA-Z0-9\W].+)\?|\s\HTTP" | rex field=_raw "x_b3_traceid\:\"(?<TRACE_ID>[a-zA-Z0-9]+)\"" | rex field=_raw "(?<METHOD>POST|GET)" | rex field=_raw "HTTP\/1.1\"\s+(?<STATUS>\d\d\d)\s" | join TRACE_ID [search index=cloudfoundry sourcetype=cloudfoundry_apps cf_org_name=Eligibility-Engine cf_space_name=PROD cf_app_name=* cf_instance_index="*APP/PROC/WEB*" (severity!=INFO OR tag=error) | rex field=_raw "(?<ERROR_MESSAGE>com.tmobile[a-zA-Z0-9\W].+)$" | rex field=_raw "\,(?<TRACE_ID>[0-9a-zA-Z]+)\,"] | stats count by cf_app_name, URL, METHOD, STATUS, ERROR_MESSAGE
</query>
<earliest>$Time.earliest$</earliest>
<latest>$Time.latest$</latest>
</search>
<search id="base2">
<query>
index=abc sourcetype=abc cf_org_name=abc cf_org_name=Eligibility-Engine cf_space_name=PROD cf_app_name=* | rex field=_raw "response_time\:(?<RESPONSE_TIME>[\d\.\d]+)" | rex field=_raw "POST\s|GET\s(?<URL>[a-zA-Z0-9\W].+)\?|\s\HTTP" | rex field=_raw "x_b3_traceid\:\"(?<TRACE_ID>[a-zA-Z0-9]+)\"" | rex field=_raw "(?<METHOD>POST|GET)" | rex field=_raw "HTTP\/1.1\"\s+(?<STATUS>\d\d\d)\s" | stats count by _time, RESPONSE_TIME, URL, TRACE_ID, METHOD, STATUS, cf_app_name
</query>
<earliest>$Time.earliest$</earliest>
<latest>$Time.latest$</latest>
</search>
<search id="base3">
<query>
index=abc sourcetype=abc cf_org_name=abc cf_org_name=Eligibility-Engine cf_space_name=PROD cf_app_name=* | rex field=_raw "response_time\:(?<RESPONSE_TIME>[\d\.\d]+)" | rex field=_raw "POST\s|GET\s(?<URL>[a-zA-Z0-9\W].+)\?|\s\HTTP" | rex field=_raw "x_b3_traceid\:\"(?<TRACE_ID>[a-zA-Z0-9]+)\"" | rex field=_raw "(?<METHOD>POST|GET)" | rex field=_raw "HTTP\/1.1\"\s+(?<STATUS>\d\d\d)\s" | rex field=_raw "(?<ERROR_MESSAGE>com.tmobile[a-zA-Z0-9\W].+)$" | rex field=_raw "\,(?<traceID>[0-9a-zA-Z]+)\," | eval newTraceID=if(TRACE_ID==traceID, "no_match", TRACE_ID) | stats count by _time, cf_app_name, newTraceID, URL, RESPONSE_TIME, STATUS, METHOD
</query>
<earliest>$Time.earliest$</earliest>
<latest>$Time.latest$</latest>
</search>
Is it because of the 'JOIN' command I am using in 'base1' query?
Quite probably that is the issue. You can check what Job inspector told about your query. I suppose that it has stopped as time 60s has exceed for subsearch.
I can see ttl=600 and runtime auto_cancel=90
Those are values for search not a subsearch (also base search).
Can you please assist me for which value in the job inspector I am looking for? There are a lot of values.
This is what I found from job inspector. PFB a snippet from the inspector:
-------------------------------------------inspector-----------------------------------------
info : Search finalized.
info : The search auto-finalized after it reached its time limit: 420 seconds.
info : [subsearch]: Subsearch produced 50000 results, truncating to maxout [subsearch_maxout] 50000.
--------------------------------------------logs----------------------------------------------
sid='abc.5641278_20B7EC22-0142-4E25-BEA2-C1A08D00C00E'
05-20-2022 02:15:39.796 INFO ReducePhaseExecutor - Downloading all remote search.log / search_telemetry.json files took 1.408 seconds
05-20-2022 02:15:39.799 INFO ReducePhaseExecutor - Ending phase_1
05-20-2022 02:15:39.799 INFO UserManager - Unwound user context: abc -> NULL
05-20-2022 02:15:39.799 INFO ReducePhaseExecutor - ReducePhaseExecutor=1 action=FINALIZE
05-20-2022 02:15:39.799 INFO DispatchExecutor - User applied action=FINALIZE while status=2
05-20-2022 02:15:39.804 INFO UserManager - Unwound user context: abc -> NULL
05-20-2022 02:15:39.805 INFO DispatchStorageManager - Remote storage disabled for search artifacts.
05-20-2022 02:15:39.805 INFO DispatchManager - DispatchManager::dispatchHasFinished(id='abc.5641278_20B7EC22-0142-4E25-BEA2-C1A08D00C00E', username='abc')
05-20-2022 02:15:39.805 INFO UserManager - Unwound user context: abc -> NULL
05-20-2022 02:15:40.022 INFO UserManager - Unwound user context: abc -> NULL
05-20-2022 02:15:40.695 INFO SearchStatusEnforcer - SearchStatusEnforcer is already terminated
05-20-2022 02:15:40.696 INFO UserManager - Unwound user context: abc -> NULL
05-20-2022 02:15:40.696 INFO LookupDataProvider - Clearing out lookup shared provider map
ok. As a power user, I don't have admin access to change those conf files. I will have to drop a ticket request for admins to change the stanzas in limits.conf file.
However, if you could help me with the ideal values it would be really helpful.
Thanks