I have a custom view with the following hierarchy :
some SideView fields
SideView Button
SideView Search
SideView Pager
SideView Table
SideView Pager
SideView Table
When I launch the search using the button, only one Pager
module displays its results; the other one keeps waiting for data. If I launch the search again, the other one might be the one to get results. Which one gets the data seems to be random.
From Chromium's Developer tools' Network tab, I see only one render
request is sent to /en-GB/module/system/Something
, which is why the other one doesn't receive data.
When I remove one Pager
module and keep only the Table
module (and the other Pager
module), both always render. This is because, for some reason, SideView's Table
module doesn't send a render
request and gets data in another way (how?).
The following XML shows the aforementioned behaviour (only one (random) Pager
module renders):
<view isVisible="true" onunloadCancelJobs="true" template="dashboard.html" isSticky="False">
<label>My view</label>
<module name="AccountBar" layoutPanel="appHeader" />
<module name="AppBar" layoutPanel="appHeader" />
<module name="SideviewUtils" layoutPanel="appHeader" />
<module name="Message" layoutPanel="messaging">
<param name="filter">*</param>
<param name="maxSize">2</param>
<param name="clearOnJobDispatch">False</param>
</module>
<module name="GenericHeader" layoutPanel="panel_row1_col1">
<param name="label">Form</param>
<module name="TextField">
<param name="label">Host</param>
<param name="name">hn</param>
<module name="TextField">
<param name="label">Something</param>
<param name="name">cl</param>
<module name="TextField">
<param name="label">Some other thing</param>
<param name="name">room</param>
<module name="TextField">
<param name="label">System</param>
<param name="name">sys</param>
<module name="TextField">
<param name="label">Subsystem</param>
<param name="name">subsys</param>
<module name="Button">
<param name="label">Go</param>
<module name="Search">
<param name="search">
*
</param>
<module name="GenericHeader" layoutPanel="panel_row2_col1">
<param name="label">Pager1</param>
<module name="Pager">
<param name="count">25</param>
<param name="maxPages">20</param>
<module name="Table">
<param name="hiddenFields">time</param>
</module>
</module>
</module>
<module name="GenericHeader" layoutPanel="panel_row3_col1">
<param name="label">Pager2</param>
<module name="Pager">
<param name="count">25</param>
<param name="maxPages">20</param>
<module name="Table">
<param name="hiddenFields">time</param>
</module>
</module>
</module>
</module>
</module>
</module>
</module>
</module>
</module>
</module>
</module>
</view>
Here's a screenshot of the network trace: 0x3b.org/ss/bronchopulmonary634.png.
The following XML works all the time: both Pager
modules render. The only difference is that I removed one TextField
module.
<view isVisible="true" onunloadCancelJobs="true" template="dashboard.html" isSticky="False">
<label>My view</label>
<module name="AccountBar" layoutPanel="appHeader" />
<module name="AppBar" layoutPanel="appHeader" />
<module name="SideviewUtils" layoutPanel="appHeader" />
<module name="Message" layoutPanel="messaging">
<param name="filter">*</param>
<param name="maxSize">2</param>
<param name="clearOnJobDispatch">False</param>
</module>
<module name="GenericHeader" layoutPanel="panel_row1_col1">
<param name="label">Form</param>
<module name="TextField">
<param name="label">Host</param>
<param name="name">hn</param>
<module name="TextField">
<param name="label">Something</param>
<param name="name">cl</param>
<module name="TextField">
<param name="label">Some other thing</param>
<param name="name">room</param>
<module name="TextField">
<param name="label">System</param>
<param name="name">sys</param>
<module name="Button">
<param name="label">Go</param>
<module name="Search">
<param name="search">
*
</param>
<module name="GenericHeader" layoutPanel="panel_row2_col1">
<param name="label">Pager1</param>
<module name="Pager">
<param name="count">25</param>
<param name="maxPages">20</param>
<module name="Table">
<param name="hiddenFields">time</param>
</module>
</module>
</module>
<module name="GenericHeader" layoutPanel="panel_row3_col1">
<param name="label">Pager2</param>
<module name="Pager">
<param name="count">25</param>
<param name="maxPages">20</param>
<module name="Table">
<param name="hiddenFields">time</param>
</module>
</module>
</module>
</module>
</module>
</module>
</module>
</module>
</module>
</module>
</view>
And its network trace: 0x3b.org/ss/solera370.png.
Why does adding or removing one simple text field change the behaviour that much?
If someone has any idea why this is happening, please tell me!
I use Splunk 5.0.5.
This is a pretty cool bug. It's a real bug and a very deep one down in the Splunk UI code, but it's only reproducible if some pretty unusual things are all true.
I'll give you the quick fix and then I'll post back with what happens under the hood to make it go so horribly wrong.
The quick fix, is that there's no reason to ever nest anything inside GenericHeader, or any other module like GenericHeader that doesn't contribute anything downstream. Granted, it's supposed to be entirely safe to do this! In this case, and for extremely convoluted reasons, that nesting helps you fall into this bizarre bug.
So anyway, the fixed XML would look like this, and as if by magic the bug also goes away.
<module name="GenericHeader" layoutPanel="panel_row1_col1">
<param name="label">Form</param>
</module>
<module name="TextField" layoutPanel="panel_row1_col1">
<param name="label">Host</param>
<param name="name">hn</param>
<module name="TextField">
<param name="label">Something</param>
<param name="name">cl</param>
<module name="TextField">
<param name="label">Some other thing</param>
<param name="name">room</param>
<module name="TextField">
<param name="label">System</param>
<param name="name">sys</param>
<module name="TextField">
<param name="label">Subsystem</param>
<param name="name">subsys</param>
<module name="Button">
<param name="label">Go</param>
<module name="Search">
<param name="search">
*
</param>
<module name="JobProgressIndicator"/>
<module name="GenericHeader" layoutPanel="panel_row2_col1">
<param name="label">Pager1</param>
</module>
<module name="Pager" layoutPanel="panel_row2_col1">
<param name="count">25</param>
<param name="maxPages">20</param>
<module name="Table">
<param name="hiddenFields">time</param>
</module>
</module>
<module name="GenericHeader" layoutPanel="panel_row3_col1">
<param name="label">Pager2</param>
</module>
<module name="Pager" layoutPanel="panel_row3_col1">
<param name="count">25</param>
<param name="maxPages">20</param>
<module name="Table">
<param name="hiddenFields">time</param>
</module>
</module>
</module>
</module>
</module>
</module>
</module>
</module>
</module>
This also fixes another problem in your view - that you were actually dispatching two copies of that search. The "overview of the advanced XML" page in Sideview Utils can help you understand why that was happening, although it's still pretty bizarre. Basically the GenericHeaders being there was forking the single cascading push into two cascading pushes, and they were both getting dispatched separately by the framework when they hit the Pager modules just downstream from GenericHeader.
As to the rest of it, I will update this answer with a deeper explanation, but I want to get to a more complete understanding of all the moving parts first.
UPDATE:
Suspect #1 -- Pager
Pager comes on the scene and in its initialization it checks something in the upstream config while the page is loading.
Unfortunately this triggers weird behavior in the module framework (called "backfill") where all the modules upstream from Pager mark their cache of upstream data as "stale". Basically they all make a promise to remember to refresh their cached context data at the first available opportunity.
If Pager didn't do that load time check, the problem would go away, but it's not his fault.
Suspect #2 -- The Module Framework.
From that point, the core module framework code does a bad job at fixing that staleness issue. Each time it should clear all the staleness out of all upstream layers it only ends up clearing the most upstream layer. Then it'll try to proceed with pushing data down through the page. Then a bit later it'll notice "hey what the heck there are still N-1 stale layers" and it'll try to clear them. To make a long story short, in this view it ends up taking like 3 or 4 times, and yes, this is exactly why taking exactly one of the TextFields away is another way to "fix" the bug. It has nothing to do with the TextField. With 4 stale modules up there though, it would take 4 attempts to clear the staleness and in this view it only ends up being able to make 3.
Then when the job completes, one of the two Pagers will fire first. This triggers the fourth check, and in the middle of the fourth check, it will get it's context data given to it. Some other obscure "staleness" code will try and be careful but it ends up doing it the wrong way so in the end the second Pager+Table are left with no data and no search results.
So in the end, the Pager and the TextField are innocent bystanders in a game with some broken rules. However I believe I can do a lot to improve the general situation.
Bottom Line:
It's not the modules it's the framework code underneath them.
The easiest fix is for you to rework the XML hierarchy as above. Remove the pointless nesting inside GenericHeader modules and for very convoluted reasons the problem will go away.
Long term improvements in Sideview Utils - I will make Pager stop doing that check. Sideview Utils was written to cut out and circumvent old problematic code exactly like this and I want none of this "staleness" code to ever execute when you're using appropriate Sideview modules. I will review all Sideview modules to try and eliminate any others that might be happening.
I will also try to patch the core module code from the Sideview Utils app to fix the problem entirely, although this is a more delicate proposition.
This is a pretty cool bug. It's a real bug and a very deep one down in the Splunk UI code, but it's only reproducible if some pretty unusual things are all true.
I'll give you the quick fix and then I'll post back with what happens under the hood to make it go so horribly wrong.
The quick fix, is that there's no reason to ever nest anything inside GenericHeader, or any other module like GenericHeader that doesn't contribute anything downstream. Granted, it's supposed to be entirely safe to do this! In this case, and for extremely convoluted reasons, that nesting helps you fall into this bizarre bug.
So anyway, the fixed XML would look like this, and as if by magic the bug also goes away.
<module name="GenericHeader" layoutPanel="panel_row1_col1">
<param name="label">Form</param>
</module>
<module name="TextField" layoutPanel="panel_row1_col1">
<param name="label">Host</param>
<param name="name">hn</param>
<module name="TextField">
<param name="label">Something</param>
<param name="name">cl</param>
<module name="TextField">
<param name="label">Some other thing</param>
<param name="name">room</param>
<module name="TextField">
<param name="label">System</param>
<param name="name">sys</param>
<module name="TextField">
<param name="label">Subsystem</param>
<param name="name">subsys</param>
<module name="Button">
<param name="label">Go</param>
<module name="Search">
<param name="search">
*
</param>
<module name="JobProgressIndicator"/>
<module name="GenericHeader" layoutPanel="panel_row2_col1">
<param name="label">Pager1</param>
</module>
<module name="Pager" layoutPanel="panel_row2_col1">
<param name="count">25</param>
<param name="maxPages">20</param>
<module name="Table">
<param name="hiddenFields">time</param>
</module>
</module>
<module name="GenericHeader" layoutPanel="panel_row3_col1">
<param name="label">Pager2</param>
</module>
<module name="Pager" layoutPanel="panel_row3_col1">
<param name="count">25</param>
<param name="maxPages">20</param>
<module name="Table">
<param name="hiddenFields">time</param>
</module>
</module>
</module>
</module>
</module>
</module>
</module>
</module>
</module>
This also fixes another problem in your view - that you were actually dispatching two copies of that search. The "overview of the advanced XML" page in Sideview Utils can help you understand why that was happening, although it's still pretty bizarre. Basically the GenericHeaders being there was forking the single cascading push into two cascading pushes, and they were both getting dispatched separately by the framework when they hit the Pager modules just downstream from GenericHeader.
As to the rest of it, I will update this answer with a deeper explanation, but I want to get to a more complete understanding of all the moving parts first.
UPDATE:
Suspect #1 -- Pager
Pager comes on the scene and in its initialization it checks something in the upstream config while the page is loading.
Unfortunately this triggers weird behavior in the module framework (called "backfill") where all the modules upstream from Pager mark their cache of upstream data as "stale". Basically they all make a promise to remember to refresh their cached context data at the first available opportunity.
If Pager didn't do that load time check, the problem would go away, but it's not his fault.
Suspect #2 -- The Module Framework.
From that point, the core module framework code does a bad job at fixing that staleness issue. Each time it should clear all the staleness out of all upstream layers it only ends up clearing the most upstream layer. Then it'll try to proceed with pushing data down through the page. Then a bit later it'll notice "hey what the heck there are still N-1 stale layers" and it'll try to clear them. To make a long story short, in this view it ends up taking like 3 or 4 times, and yes, this is exactly why taking exactly one of the TextFields away is another way to "fix" the bug. It has nothing to do with the TextField. With 4 stale modules up there though, it would take 4 attempts to clear the staleness and in this view it only ends up being able to make 3.
Then when the job completes, one of the two Pagers will fire first. This triggers the fourth check, and in the middle of the fourth check, it will get it's context data given to it. Some other obscure "staleness" code will try and be careful but it ends up doing it the wrong way so in the end the second Pager+Table are left with no data and no search results.
So in the end, the Pager and the TextField are innocent bystanders in a game with some broken rules. However I believe I can do a lot to improve the general situation.
Bottom Line:
It's not the modules it's the framework code underneath them.
The easiest fix is for you to rework the XML hierarchy as above. Remove the pointless nesting inside GenericHeader modules and for very convoluted reasons the problem will go away.
Long term improvements in Sideview Utils - I will make Pager stop doing that check. Sideview Utils was written to cut out and circumvent old problematic code exactly like this and I want none of this "staleness" code to ever execute when you're using appropriate Sideview modules. I will review all Sideview modules to try and eliminate any others that might be happening.
I will also try to patch the core module code from the Sideview Utils app to fix the problem entirely, although this is a more delicate proposition.
No, you're not supposed to know or care about any of this. Also don't thank me quite so fast... I used to work at Splunk and I was the principal developer for all the code involved here. So there is blood on my hands here.
Splunk at that time had a problem allowing developers to spend time on hard-to-quantify framework improvements, which is partly why we sometimes spent even more time doing little atomic patches on fundamentally bad systems.
Sideview doesn't have this problem, which is why Sideview Utils has over the years thrown the majority of the Splunk ui code away and replaced it.
Thank you so much! I would never have found this myself. And I just don't see why a Splunk app developer would ever need to care about the inner mechanics of the module framework; it is supposed to make the promise that it always works. We're lucky to have dedicated people like you who understand all this. Is this bug fixed in Splunk 6?