I am trying to understand the data path for the latest CEF app release ( https://splunkbase.splunk.com/app/1847/ ).
In the new app, on creation of a new output, you need to push out a created TA to your indexers. This app contains app/indexes/inputs/outputs and props.conf files and that's it.
This is where I am a little lost if I follow the documentation to the letter.
Ripping the app apart it looks like this:
1.The searches are kicked off on the search head.
2. Transformed into cef.
3. Saved as a stash file into the spool dir.
4. Local search heads inputs.conf will pick these up via a batch input.
How do the indexers send this data to the output destination?
The app instructions do not explicitly say you need to set the outputs on the search head to the indexer (even though this is good standard practice for _internal logs).
So if we assume the search head forwards the stash to the indexer there is no props/transforms that would enable this app to actually recognise this data
So in short ...
The old 1.0 app processing pipeline was :
search head -> dist search to indexers -> local cef event conversion -> local stash parsing -> local output group -> destination cef tcp output via tcp routing
New 2.0 app processing pipeline seems to be :
search head -> dist search to indexers -> local cef event conversion -> local stash parsing .... some process here ... indexer stash ingestion -> output via tcp routing to 3rd party destination.
I don't understand how individual indexers are supposed to find this data to forward to the 3rd party as it doesn't have a savedsearches.conf.
Is the indexer based inputs.conf a furphy and it actually uses the index based props as the basis for forwarding instead and thus searches are actually STILL run on the search head and not the indexer as documentation states.
" The indexers are responsible for performing the CEF mapping searches and forwarding the results" - http://docs.splunk.com/Documentation/CEFapp/2.0.0/DeployCEFapp/Howtheappworks
Anyone know the cef 2.0 processing order?
From looking at the docs, it looks like the indexers would be running a custom command:
So, although the search head would kick off the search, part of that command would be a custom command that runs on each indexer and would output the data.
I would go a step further and say that while the CEF app installs on the search head for the purpose of building the field mappings, the actual heavy lifting all occurs at the indexer(s). The output.conf is triggering the output to go to destinations from the indexer. Unlike CEF 1.0 which ran at the search head, pulled all the results together and then started forwarding, this approach provides a more rapid forwarding capability that leverages native output.conf forwarding capability.
I need to fully understand this pipeline as we have some high availability compliance issues around where the processing occurs.
If the documentation is true and the searches run on the indexers then the search head is just a UI to the mappings fields as you said.
If the app is working the way dshpritz suggests isn't the TA-cefout missing the custom command?
Looking at the doco again there is this bit.
When you save this search, it begins to run immediately to filter and map data in real time. The running search is appended with an additional command: | cefout routing=<name_of_your_routing_group> cefout is a custom streaming command which takes a single parameter, routing. The custom streaming command allows the search to perform summary indexing directly on indexers and then routes the output to the destination specified by the routing parameter.
This would suggest that the search head is NOT just a ui to mapping the fields but integral in how the entire app needs to function in an on going basis. ie. if you shut down the search head none of the searches would run and no events would be forwarded.
I am also guessing that the custom streaming command and its required scripts are accessed as part of the bundle that the search head sends to the indexer as they are not included in the created TA but are required if the indexers are to properly process the search.
So a better description is :
1.Search head job scheduler triggers a normal scheduled search.
2.This search is distributed to indexers as per normal. A bundle that includes the local search heads app that includes the custom streaming command is also sent to each indexer.
3.The search is executed by each indexer however by utilising this streaming command it keeps the data local to the indexer and does not return it to the requesting search as is normal splunk behaviour.
4.custom script directs the events via local stash tcp routing markings to be delivered as per the TA's predefined destinations.
If the streaming script is sent via bundle wouldn't this make the TA redundant as you could send up to date output definitions via the bundle instead?
It also feels like the app has a split personality. On one hand removing search head dependency, yet on the other all scheduling is still performed there. I can see that it gives you flexibility on modifying the mappings, yet the outputs are static as they need to be deployed.
If outputs are static then wouldn't it have been better to wrap the entire bundle into a TA that you pushed to the indexers? There would be a zero search head dependency then.