Hi
We have a dashboard that is getting this error. I am on 8.1.9
the Unknown sid. might stay there for 2 minutes but multiple refresh might have happened - i cant manually refresh it and in a demo it looks really really bad. Any ideas of what it is and how i can make it stop happening.
You have quite interesting dashboard and especially those basesearches! There are some common mistakes on your base searches which must avoid to getting those to work.
Here is one blog post about dashboards and also use base search in those: https://www.splunk.com/en_us/blog/tips-and-tricks/splunk-clara-fication-dashboarding-best-practices....
Your main issue seems to be that your base searches returns events not summaries. So those aren't transforming searches as all base search should be! In some cases you can use also non transforming base searches, but then there are really big possibility that those don't work and you don't notice it!
You are using "sort 0" which usually means that you have lot of events and this means that 1) you base searches returns more rows than sXML can handle 2) some timeous can reached.
You also have refresh time 5s on base search which is quite low. Maybe this can lead the situation that there is already running a new base search before other searches has run (this is just y question, as I haven't check/read how this really works). Maybe you should try to extend this to like min 60s to 5min? How much new events you will get into splunk within 5s or 60s or 5min?
Another option could be a create a "base search" and then reuse it with loadjob like here https://community.splunk.com/t5/Splunk-Search/Is-it-possible-to-use-base-search-in-append-sub-search...
But anyhow currently your base search is something which are not following best practices (or even defined way) of Splunk (see Post-process searches).
I hope that above hints help you to solve that issue? One more thing what you can try to solve this is look what jobs this is generating and then inspect those. Those outputs should explain more what has happened and what was going wrong (see e.g. https://www.splunk.com/en_us/blog/tips-and-tricks/splunk-clara-fication-job-inspector.html and some conf presentations are available also).
r. Ismo
Hi
I have added it below. I have made 3 base searches, and each of them comes from the original.
I do this so we do 1 big data pull in basesearch1 and then we use filtering on no2 and no3, so the dashboard does not jump if a user clicks on a dropdown.
we are doing high-frequency refresh - this is what business wants - so that might be causing the issues.
This is a cutdown version of the code - but the issues could happen on any pannel.
<refresh>5s</refresh>
<form theme="dark">
<label>MX.3 SIGNIFICANT EVENTS</label>
<search>
<query>| makeresults count=1 annotate=false </query>
<done>
<set token="token_event_search">*</set>
<!--set token="host_token">PDT</set-->
<set token="app_name">MX.3_MONITORING</set>
<set token="pid_token">*</set>
<set token="pid_token1">*</set>
</done>
</search>
<search>
<query>| makeresults count=1 annotate=false </query>
<done>
<!--Set Tokens if a URL has come to re set some of the tokens to they will be visibale in the screen-->
<condition match="$URL_TRAP$=="SET"">
<set token="form.host_token">$host_token1$</set>
<set token="form.Severity_token">$Severity_token1$</set>
<set token="Severity_token">$Severity_token1$</set>
<set token="form.Service_Name">$Service_Name1$</set>
<set token="form.pid_token">$pid_token1$</set>
<set token="pid_token">$pid_token1$</set>
</condition>
<condition match="match($ERROR_FILTER$,"true")">
<set token="form.Severity_token">"FATAL" "ERROR"</set>
<set token="Severity_token">"FATAL" "ERROR"</set>
</condition>
<condition match="match($EVENT_FILTER$,"EVENT")">
<eval token="form.time_token.earliest">$latest$</eval>
<eval token="form.time_token.latest">$latest$+300</eval>
<set token="form.Severity_token">*</set>
</condition>
<condition>
<set token="Severity_token">*</set>
<set token="Service_Name">*</set>
</condition>
</done>
</search>
<search id="basesearch">
<query>index="murex_logs"
```using regex as search and where are not reliable and sometimes they don't work```
| regex mx.env="$host_token$"
| regex log.type="sig-event"
| rename code as Code
| rename otel.log.severity.text as Severity
| rename _raw as Description
| rename component.name as Component
| rename service.name as Service_Name
| rename file as evtFile | rex mode=sed field=Service_Name "s/ //g"
| table _time Code Severity Description Component pid Service_Name evtFile
| search pid=$pid_token$
| sort 0 - _time
</query>
<earliest>$time_token.earliest$</earliest>
<latest>$time_token.latest$</latest>
<sampleRatio>1</sampleRatio>
<refresh>5s</refresh>
<refreshType>delay</refreshType>
</search>
<search base="basesearch" id="basesearch2">
<query>
| search Service_Name IN ($Service_Name$) | search Severity IN ($Severity_token$)
</query>
</search>
<search base="basesearch2" id="basesearch3">
<query>
| eval Module="UNDEFINED"
| join type=left Service_Name
[| mstats min(mx.service.dependencies.status) as Dependencies_x WHERE "index"="murex_metrics" AND mx.env="$host_token$" span=10s BY "service.name" "service.type" used.by
| sort 0 - _time
| dedup service.name
| rename "used.by" as Module1
| table service.name service.type Module1
| eval Module=split(Module1, ",")
```used.by which is a comma separated fields, the mvexpand is to make it a multivalue field to be able to join on each of the comma separated values```
| mvexpand Module
```remove all the services that have the used.by field empty as it will not have a related module```
| search Module != ""
```join on the Module to see what are the services that are related by dependency relation to bpc names and not only to other services ```
| join Module
[ ```get all the BPC names that have configured dependencies in order to map them to the different services```
| mstats avg("mx.bpc.status") as BPC_STATUS WHERE "index"="murex_metrics" AND mx.env=$host_token$ span=10s BY "bpc.name"
| dedup bpc.name
| rename bpc.name as Module
| table Module
]
```if there is a service that has several bpc names in its used.by, until now it will show several times in several lines with Module equal to one of the bpc names each time. The below command is to combine these bpc names into one field to have one single line with this service.name```
| stats list(*) as * by service.name service.type Module1
| rename service.name as Service_Name
```below is to convert the multivalue field module into a single value field containing the different bpc names related to this service with a whitespace delimiter```
| rex mode=sed field=Service_Name "s/ //g"
| nomv Module
| fields - Module1]
| search Module IN ($Module_token$)
</query>
</search>
<fieldset submitButton="false" autoRun="true">
<html>
<a href="http://hp737srv:8000/en-US/app/$app_name$/pac_plo_events?form.host_token=$host_token$">Reload All</a>
/
<a href="http://hp737srv:8000/en-US/app/$app_name$/pacplo_production_monitoring?form.host_token=$host_token$">Back To Overall System View</a>
</html>
<input type="dropdown" token="host_token">
<label>HOST</label>
<fieldForLabel>mx.env</fieldForLabel>
<fieldForValue>mx.env</fieldForValue>
<search>
<query>index="murex_logs" | regex log.type="sig-event" | stats count by mx.env | sort 0 mx.env | table mx.env</query>
<earliest>$time_token.earliest$</earliest>
<latest>$time_token.latest$</latest>
</search>
<default>dell1215srv:15017</default>
<initialValue>dell1215srv:15017</initialValue>
</input>
<input type="time" token="time_token" searchWhenChanged="true">
<label>Time</label>
<default>
<earliest>-5m</earliest>
<latest>now</latest>
</default>
</input>
<input type="multiselect" token="Severity_token">
<label>Severity</label>
<default>*</default>
<initialValue>*</initialValue>
<fieldForLabel>Severity</fieldForLabel>
<fieldForValue>Severity</fieldForValue>
<search base="basesearch2">
<query>
| sort 0 - _time
| table Severity | dedup Severity</query>
</search>
</input>
<input type="multiselect" token="Module_token">
<label>Module</label>
<default>*</default>
<initialValue>*</initialValue>
<fieldForLabel>Module</fieldForLabel>
<fieldForValue>Module</fieldForValue>
<search base="basesearch3">
<query>
| sort 0 - _time
| table Module | dedup Module</query>
</search>
</input>
<input type="multiselect" token="Service_Name" searchWhenChanged="true">
<label>Service_Name</label>
<default>*</default>
<initialValue>*</initialValue>
<fieldForLabel>Service_Name</fieldForLabel>
<fieldForValue>Service_Name</fieldForValue>
<search base="basesearch3">
<query>| sort 0 - _time
| table Service_Name | dedup Service_Name</query>
</search>
<valuePrefix>"</valuePrefix>
<valueSuffix>"</valueSuffix>
<delimiter> </delimiter>
</input>
<input type="radio" token="reset_filters" searchWhenChanged="true">
<label></label>
<choice value="true">Reset_Filters</choice>
<default></default>
<change>
<condition value="true">
<set token="token_event_search">*</set>
<set token="form.Severity_token">*</set>
<set token="form.Module_token">*</set>
<unset token="code_token"></unset>
<set token="form.Service_Name">*</set>
<set token="pid_token">*</set>
<unset token="form.reset_filters"></unset>
</condition>
</change>
</input>
</fieldset>
<row>
<panel>
<title>Nb of events by Severity</title>
<table>
<search base="basesearch3">
<query>| stats count as events_Number by Severity</query>
</search>
<option name="count">100</option>
<option name="dataOverlayMode">none</option>
<option name="drilldown">row</option>
<option name="percentagesRow">false</option>
<option name="refresh.display">none</option>
<option name="rowNumbers">false</option>
<option name="totalsRow">false</option>
<option name="wrap">true</option>
<format type="color" field="Severity">
<colorPalette type="map">{"INFO":#4FA484,"ERROR":#DC4E41,"WARN":#F8BE34}</colorPalette>
</format>
<format type="color" field="events_Number">
<colorPalette type="minMidMax" maxColor="#53A051" minColor="#FFFFFF"></colorPalette>
<scale type="minMidMax"></scale>
</format>
<drilldown>
<set token="form.Severity_token">$row.Severity$</set>
</drilldown>
</table>
</panel>
<panel>
<title>Nb of events by Module</title>
<table>
<search base="basesearch3">
<query>
| stats count as events_Number by Module</query>
</search>
<option name="count">100</option>
<option name="dataOverlayMode">none</option>
<option name="drilldown">row</option>
<option name="percentagesRow">false</option>
<option name="refresh.display">none</option>
<option name="rowNumbers">false</option>
<option name="totalsRow">false</option>
<option name="wrap">true</option>
<format type="color" field="Module">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="events_Number">
<colorPalette type="minMidMax" maxColor="#53A051" minColor="#FFFFFF"></colorPalette>
<scale type="minMidMax"></scale>
</format>
<format type="color" field="Module">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<drilldown>
<set token="form.Module_token">$row.Module$</set>
</drilldown>
</table>
</panel>
<panel>
<title>Nb of events by Service_Name</title>
<table>
<search base="basesearch3">
<query>| stats count as events_Number by Service_Name</query>
</search>
<option name="count">10</option>
<option name="dataOverlayMode">none</option>
<option name="drilldown">row</option>
<option name="percentagesRow">false</option>
<option name="refresh.display">none</option>
<option name="rowNumbers">false</option>
<option name="totalsRow">false</option>
<option name="wrap">true</option>
<format type="color" field="Service_Name">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="events_Number">
<colorPalette type="minMidMax" maxColor="#53A051" minColor="#FFFFFF"></colorPalette>
<scale type="minMidMax"></scale>
</format>
<drilldown>
<set token="form.Service_Name">$row.Service_Name$</set>
</drilldown>
</table>
</panel>
</row>
</form>
You have quite interesting dashboard and especially those basesearches! There are some common mistakes on your base searches which must avoid to getting those to work.
Here is one blog post about dashboards and also use base search in those: https://www.splunk.com/en_us/blog/tips-and-tricks/splunk-clara-fication-dashboarding-best-practices....
Your main issue seems to be that your base searches returns events not summaries. So those aren't transforming searches as all base search should be! In some cases you can use also non transforming base searches, but then there are really big possibility that those don't work and you don't notice it!
You are using "sort 0" which usually means that you have lot of events and this means that 1) you base searches returns more rows than sXML can handle 2) some timeous can reached.
You also have refresh time 5s on base search which is quite low. Maybe this can lead the situation that there is already running a new base search before other searches has run (this is just y question, as I haven't check/read how this really works). Maybe you should try to extend this to like min 60s to 5min? How much new events you will get into splunk within 5s or 60s or 5min?
Another option could be a create a "base search" and then reuse it with loadjob like here https://community.splunk.com/t5/Splunk-Search/Is-it-possible-to-use-base-search-in-append-sub-search...
But anyhow currently your base search is something which are not following best practices (or even defined way) of Splunk (see Post-process searches).
I hope that above hints help you to solve that issue? One more thing what you can try to solve this is look what jobs this is generating and then inspect those. Those outputs should explain more what has happened and what was going wrong (see e.g. https://www.splunk.com/en_us/blog/tips-and-tricks/splunk-clara-fication-job-inspector.html and some conf presentations are available also).
r. Ismo
Hi
This is a great answer thanks - lots for me to think about in this one.
For the record, the search normally finishes in .2 seconds.
I will try to but the base searches as a transforming search and see how it works for me.
I will also look at some of the other points to see if the issue 100% goes away.
The dashboard is monitoring the production system - businesses want 5 seconds to refresh rates - I think it is too much as well, but I am trying to give them what they want.
regards
Robert
One way to try to change those to transforming searches is in 1st phase use stats which several by clauses. Then on next phase (basesearch2.. etc) is use again stats by summarising those as you need. Something like
stats count(a) as a sum(b) as b by c, d, e
``` in the next phase ```
stats sum(a) as CountA
Maybe this is not obvious in 1st phase, but after short thinking this works in quite a many case.
r. Ismo
@robertlynch2020 - I've seen a similar error so many times but this usually happens if I've opened the dashboard 10 minutes back. Because a lifetime for an Adhoc search is 10 minutes.
Again it's not true for all the dashboards all the time, this usually happens when the dashboard gets partially refreshed which is what I can see with two of your input filters populating status.
Hi
Yes, we are running a high refresh rate in this dashboard, so the dropdown would be updating sometimes.
In my case, the dashboard is refreshing in the last 1 minute.
I have one base search in the dashboard and all dropdowns and pannels are linked to it.
So it's a bit strange when one of them gives the errors and the other does not.
Any ideas on how to fix it?
Regards
Robert
Yeah, dashboard XML would be useful to see.
Hi @robertlynch2020
sid is the search id of the query being used in the panel.
Check if the search used in the panel is executed and completed successfully.
Try to run query separately and use the Inspect job option to see its execution details, it may give you some idea why its failing.
Also, check if the query used is using loadjob function if so then verify the actual saved search.