Since 6.1 (6.0?) Splunk forwarders have shipped with an introspection app that is designed to generate Splunk resource utilization logs. Has anyone come up with a good way to enable that in a distributed environment managed by a deployment server?
The app.conf file in the app has state = disabled. Despite that it still generates disk usage data (wut?) but doesn't forward the data to the indexers. Ultimately I'd like to turn on the resource usage AND get the data into Splunk. The server.conf file in the default directory has disabled = false which means, to me, the lynchpin to the operation is in app.conf. In a distributed environment I can't simply push (deployment server) an app by another name to enable the app.conf because those are app specific.
UPDATE: This procedure is now officially documented here.
In order to be enable a built-in app such as "introspection_generator_addon" on a deployment client, your deployment server needs to be upgraded to Splunk Enterprise 6.2, which introduced the excludeFromUpdate
option to serverclass.conf. From serverclass.conf.spec:
excludeFromUpdate = <path>[,<path>]...
* Specifies paths to one or more top-level files or directories (and their contents) to exclude from being touched during app update. Note that each comma-separated entry MUST be prefixed by "$app_root$/" (otherwise a warning will be generated).
So, the idea here is that in order to turn on this built-in app, your deployment server needs to ship a version of it that only contains an app.conf file in the "local" directory, with the state = enabled
directive. All other directories in the app would need to be excluded from update.
Here's an example of how that can be achieved:
Deployment server, simplified serverclass.conf:
[global]
whitelist.0=*
[serverClass:AllApps]
[serverClass:AllApps:app:introspection_generator_addon]
excludeFromUpdate = $app_root$/default, $app_root$/bin
restartSplunkd = True
Note that we are fine-tuning what is being pushed at the app level and excluding both the "bin" and "default" directories. We ONLY want to push stuff to the "local" directory for this app - namely, an app.conf that enables it on the forwarder:
[root@sosdev-sh:/opt/cm/splunk]# ls -lR etc/deployment-apps/introspection_generator_addon/
etc/deployment-apps/introspection_generator_addon/:
total 4
drwxr-xr-x. 2 root root 4096 Dec 5 22:43 local
etc/deployment-apps/introspection_generator_addon/local:
total 4
-rw-r--r--. 1 root root 26 Dec 5 22:43 app.conf
Contents of app.conf:
[root@sosdev-sh:/opt/cm/splunk]# cat etc/deployment-apps/introspection_generator_addon/local/app.conf
[install]
state = enabled
That's it - that's all that our deployment app contains: a local/app.conf file that enables the app.
On the Deployment client, post-deployment:
[root@sosdev-idx3 apps]# ls -lR introspection_generator_addon/
introspection_generator_addon/:
total 16
drwxr-xr-x. 2 506 506 4096 Oct 22 17:05 bin
drwxr-xr-x. 2 506 506 4096 Oct 22 17:05 default
drwx------. 2 root root 4096 Dec 5 23:18 local
drwx------. 2 root root 4096 Dec 5 23:18 metadata
introspection_generator_addon/bin:
total 4
-r-xr-xr-x. 1 506 506 53 Oct 22 16:32 collector.path
introspection_generator_addon/default:
total 16
-r--r--r--. 1 506 506 216 Oct 22 17:07 app.conf
-r--r--r--. 1 506 506 180 Oct 22 16:34 inputs.conf
-r--r--r--. 1 506 506 691 Oct 22 16:32 README
-r--r--r--. 1 506 506 560 Oct 22 16:34 server.conf
introspection_generator_addon/local:
total 4
-rw-------. 1 root root 26 Dec 5 23:18 app.conf
introspection_generator_addon/metadata:
total 4
-rw-------. 1 root root 67 Dec 5 23:18 local.meta
Note the presence of local/app.conf, as expected. Everything else is still there and untouched.
[root@sosdev-idx3 apps]# /opt/fwd5/splunkforwarder/bin/splunk display app
introspection_generator_addon CONFIGURED ENABLED INVISIBLE
[root@sosdev-idx3 apps]# /opt/fwd5/splunkforwarder/bin/splunk cmd btool server list introspection:generator:resource_usage --debug
/opt/fwd5/splunkforwarder/etc/apps/introspection_generator_addon/default/server.conf [introspection:generator:resource_usage]
/opt/fwd5/splunkforwarder/etc/apps/introspection_generator_addon/default/server.conf acquireExtra_i_data = false
/opt/fwd5/splunkforwarder/etc/apps/introspection_generator_addon/default/server.conf disabled = false
Note that now that the introspection_generator_addon app has been enabled by our app.conf push, the resource_usage data input is enabled! And indeed, the collector is running:
[root@sosdev-idx3 apps]# ps -ef | grep instrument-resource-usage
root 27697 27229 0 23:23 ? 00:00:00 /opt/fwd5/splunkforwarder/bin/splunkd instrument-resource-usage -p 8100
....and indeed, we get data:
[root@sosdev-idx3 apps]# tail -1 /opt/fwd5/splunkforwarder/var/log/introspection/resource_usage.log
{"datetime":"12-05-2014 23:33:06.435 -0800","log_level":"INFO","component":"Hostwide","data":{"mem":"7872.797","mem_used":"7100.086","swap":"14095.992","swap_used":"230.082","pg_paged_out":"8409558722","pg_swapped_out":"855499","forks":"86227439","runnable_process_count":"3","normalized_load_avg_1min":"0.91","cpu_user_pct":"66.43","cpu_system_pct":"5.97","cpu_idle_pct":"27.59"}}
I enabled introspection on one of my UF's to see what it would provide. Where in DMC is this new information displayed? I already had information for forwarders under the "Forwarders" section. Trying to determine what added benefits if any to enable this on UF's.
Thanks
UPDATE: This procedure is now officially documented here.
In order to be enable a built-in app such as "introspection_generator_addon" on a deployment client, your deployment server needs to be upgraded to Splunk Enterprise 6.2, which introduced the excludeFromUpdate
option to serverclass.conf. From serverclass.conf.spec:
excludeFromUpdate = <path>[,<path>]...
* Specifies paths to one or more top-level files or directories (and their contents) to exclude from being touched during app update. Note that each comma-separated entry MUST be prefixed by "$app_root$/" (otherwise a warning will be generated).
So, the idea here is that in order to turn on this built-in app, your deployment server needs to ship a version of it that only contains an app.conf file in the "local" directory, with the state = enabled
directive. All other directories in the app would need to be excluded from update.
Here's an example of how that can be achieved:
Deployment server, simplified serverclass.conf:
[global]
whitelist.0=*
[serverClass:AllApps]
[serverClass:AllApps:app:introspection_generator_addon]
excludeFromUpdate = $app_root$/default, $app_root$/bin
restartSplunkd = True
Note that we are fine-tuning what is being pushed at the app level and excluding both the "bin" and "default" directories. We ONLY want to push stuff to the "local" directory for this app - namely, an app.conf that enables it on the forwarder:
[root@sosdev-sh:/opt/cm/splunk]# ls -lR etc/deployment-apps/introspection_generator_addon/
etc/deployment-apps/introspection_generator_addon/:
total 4
drwxr-xr-x. 2 root root 4096 Dec 5 22:43 local
etc/deployment-apps/introspection_generator_addon/local:
total 4
-rw-r--r--. 1 root root 26 Dec 5 22:43 app.conf
Contents of app.conf:
[root@sosdev-sh:/opt/cm/splunk]# cat etc/deployment-apps/introspection_generator_addon/local/app.conf
[install]
state = enabled
That's it - that's all that our deployment app contains: a local/app.conf file that enables the app.
On the Deployment client, post-deployment:
[root@sosdev-idx3 apps]# ls -lR introspection_generator_addon/
introspection_generator_addon/:
total 16
drwxr-xr-x. 2 506 506 4096 Oct 22 17:05 bin
drwxr-xr-x. 2 506 506 4096 Oct 22 17:05 default
drwx------. 2 root root 4096 Dec 5 23:18 local
drwx------. 2 root root 4096 Dec 5 23:18 metadata
introspection_generator_addon/bin:
total 4
-r-xr-xr-x. 1 506 506 53 Oct 22 16:32 collector.path
introspection_generator_addon/default:
total 16
-r--r--r--. 1 506 506 216 Oct 22 17:07 app.conf
-r--r--r--. 1 506 506 180 Oct 22 16:34 inputs.conf
-r--r--r--. 1 506 506 691 Oct 22 16:32 README
-r--r--r--. 1 506 506 560 Oct 22 16:34 server.conf
introspection_generator_addon/local:
total 4
-rw-------. 1 root root 26 Dec 5 23:18 app.conf
introspection_generator_addon/metadata:
total 4
-rw-------. 1 root root 67 Dec 5 23:18 local.meta
Note the presence of local/app.conf, as expected. Everything else is still there and untouched.
[root@sosdev-idx3 apps]# /opt/fwd5/splunkforwarder/bin/splunk display app
introspection_generator_addon CONFIGURED ENABLED INVISIBLE
[root@sosdev-idx3 apps]# /opt/fwd5/splunkforwarder/bin/splunk cmd btool server list introspection:generator:resource_usage --debug
/opt/fwd5/splunkforwarder/etc/apps/introspection_generator_addon/default/server.conf [introspection:generator:resource_usage]
/opt/fwd5/splunkforwarder/etc/apps/introspection_generator_addon/default/server.conf acquireExtra_i_data = false
/opt/fwd5/splunkforwarder/etc/apps/introspection_generator_addon/default/server.conf disabled = false
Note that now that the introspection_generator_addon app has been enabled by our app.conf push, the resource_usage data input is enabled! And indeed, the collector is running:
[root@sosdev-idx3 apps]# ps -ef | grep instrument-resource-usage
root 27697 27229 0 23:23 ? 00:00:00 /opt/fwd5/splunkforwarder/bin/splunkd instrument-resource-usage -p 8100
....and indeed, we get data:
[root@sosdev-idx3 apps]# tail -1 /opt/fwd5/splunkforwarder/var/log/introspection/resource_usage.log
{"datetime":"12-05-2014 23:33:06.435 -0800","log_level":"INFO","component":"Hostwide","data":{"mem":"7872.797","mem_used":"7100.086","swap":"14095.992","swap_used":"230.082","pg_paged_out":"8409558722","pg_swapped_out":"855499","forks":"86227439","runnable_process_count":"3","normalized_load_avg_1min":"0.91","cpu_user_pct":"66.43","cpu_system_pct":"5.97","cpu_idle_pct":"27.59"}}
Thanks for the detailed post!
Of course what I'm curious about is how many support calls have been dealt with on this issue since I posted this question 3 1/2 months ago that prompted such a detailed post now. I think that somewhat validates my comments above /sigh. I'm know I can be a PITA to deal with but FFS this could have been handled better.
Actually, not that many! That being said, I agree with you that more celerity in addressing this blind spot would have been preferable. My apologies for failing to make this move any faster than it has.
Maybe introspection has been a new feature since Splunk V6.1, so you cannot get introspection logs from uf 6.0.
I can collect the resource utilization logs from uf6.1 or later by using "custom_intropection_ta".
"cutom_introspection_ta" contains copied bin and local directory from the native introspection app.
You should try btool command to check the configuration file precedence.
And be careful about bin directory which contains the Paths depends on OS.
I assume it is generating the disk usage stuff because it is CONSTANTLY running ./splunkd instrument-resource-usage (interval = 0 in inputs.conf). Again this is despite the app being disabled.
The only data expected to populate disk_objects.log on a universal forwarder pertains to the fishbucket size and contents. This represents about 1 line of data every 10 minutes, and should therefore be negligible.
I would also not expect for the splunkd instrument-resource-usage
process to be running on universal forwarders by default. If you found that that this is the case on Splunk 6.1.4 or later, please file a support case, as we did have a product defect running between Splunk 6.1 and 6.1.3 where the "introspection_generator_addon" was incorrectly enabled by default on universal forwarders.
Why wasn't this just created as a TA and put on Splunkbase /sigh
The main reason for which we didn't build it as a "stand-alone" TA is that we wanted to ensure that this would be a feature available out-of-the-box.
I admit, however, that the interaction with deployment server wasn't planned as well as it should have.
Thanks for the updates; this was created before you and I talked. What kills me is putting the two options for addressing this issue into a scale.
On one side there is: In UF 6.n+1 changing app.conf to enabled, the default inputs.conf to disabled, and write a blog post about the issue & changes
On the other side: Engineer a new capability within the deployment server, don't push said change until several months after the issue is identified AND larger organizations have a much harder time dealing with change management for their sh/idx infrastructure vs forwarders (as of today my 28 sh/idx are still 6.1.3), the cumulative man months of lost time for Splunk support/customers trying to figure out why the hell the CPU has gone up after upgrade and/or get this operational/and Splunk dev time working through bugs, and of course the time that will be spent in future conversations/efforts working with divergent methodologies within Splunk to support this fundamental change (99% of the time Splunk works like "this" except for introspection). On top of that I think your detailed post below should still be put into an official blog post.
On the whole I can totally understand why Splunk decided to go go with option 2 - but only if whoever's pride got in the way is taken into account. Coming out and saying "whoops, we tried something new and have to adjust fire" isn't nearly as important and should take a back seat to time waste and listed bugs/issues.
To clarify:
- The fact that the "introspection_generator_addon" app shipped enabled by default for universal forwarders in 6.1 was a product defect (SPL-83778 to be accurate). By that token, the change in version 6.1.4 was not to revert a decision but to correct a bug. Our intention was for this feature to have zero impact on universal forwarders upon upgrade to 6.1. We regret any impact caused by this defect.
- The excludeFromUpdate
option for deployment server was not specifically added to solve the problem of "how to remotely control the state of a built-in app on a deployment client?", but it was within the scope of this option to allow such a thing.
- You are absolutely right that the procedure detailed in my answer to remotely control the state of the "introspection_generator_addon" on deployment clients needs to be documented as official. This will happen as soon as our QA department has had the chance to perform a few differential tests on it and documentation writers have had the chance to re-shape its prose.
Yes, I'll apologize again for any pain that this problem has caused. I also freely admit that the collection of this data from forwarders (and how to control it) was not at the forefront of our minds when we implemented it. It was intended for Search Heads and Indexers where resource consumption may be high but I can also see a use case for Forwarders here to verify UF footprint is low. This absolutely wasn't intended to go out of the door enabled by default on forwarders.
I would even go a step further here and warn that the primary interfaces where we are viewing this resource collection data in the product, namely in the Distributed Management Console as of Splunk 6.2, are not prepared to deal with this kind of data yet. Not to say that you couldn't build something yourself from the raw logs but we're evaluating how to add several aspects of Forwarder behavior into our out-of-the-box self monitoring interfaces, this definitely being one aspect.