Re: Splunk App for VMware: Our Production vCenter ...

iatwal · ‎11-07-2016

I have built out the OVA DCN and bumped up the cores to 8 and doubled the RAM...understandably, we are still getting this error:

2016-11-07 16:03:05,757 ERROR [ta_vmware_collection_worker://gamma:17063] [getJob] job=job_4be4c1baa54611e69173005056ac7e38 of task=hostvmperf has expired and will not be run

I have 2 questions:

Can I point 2 or more DCNs to one vCenter?

Can I build my own DCN on a physical machine to ensure I get all my data?

I'm looking for some suggestions please.

iatwal · ‎04-18-2017

I ended up doubling mine to get all the data to work in. Also go into your dashboards under Search and Reporting...you'll have a Hydra Framework dashboard there...it's pretty helpful.

prakash007 · ‎04-19-2017

In my case, the scheduler is running on Search Head.....

1.when i read the docs, i don't find any configs related to hostvmperf_interval and hostvmperf_expiration in \etc\apps\Splunk_TA_vmware\local., but i'm thinking to put down all the 4 changes(hostvmperf_interval,hostvmperf_expiration,hostinv_interval,hostinv_expiration) in \etc\apps\Splunk_TA_vmware\local adding a [default stanza]

2.did you also change the vCenter timeout value....??

Increase the timeout period in the vpxd file on your vCenter.
Open the vpxd.cfg file, located in C:\Documents and Settings\All Users\Application Data\VMware\VMware VirtualCenter\vpxd.cfg file (C:\ProgramData\VMware\VMware VirtualCenter\vpxd.cfg on Windows 2008) using a text editor.

iatwal · ‎04-19-2017

Yeah you have to create the local directory and then override the values
We did end up increasing the time out value here as well. Try #1 first to see what happens. Use the dashboard I mentioned above, it's a big help.

prakash007 · ‎05-09-2017

I made this changes on the Search Head where the scheduler was running, i still see this job expirations on DCnodes when i looked at HydraFramework dashboard...

2017-05-09 10:59:51,007 ERROR [ta_vmware_collection_worker://eta:26954] [getJob] job=job_2016307834e011e7be7aecf4bbd1db64 of task=hostvmperf has expired and will not be run

Do i need to make this changes on DCns too...??

iatwal · ‎04-18-2017

Try increasing the time out value (towards the bottom of this page)

https://docs.splunk.com/Documentation/VMW/3.3.2/Installation/TroubleshoottheSplunkAppforVMware

Problem

Incomplete or no data coming from vCenters that are properly configured and connected to by a DCN. Data collection tasks are failing and/or connections between DCN and vCenter are closing before all data is transferred. This could be due to one of two issues.

The collection tasks taking longer than the vCenter and app are expecting.
Collection intervals are currently overloading your Data Collection Nodes (DCNs) and your vCenters.
Resolution

Change collection intervals in order to reduce the load on your Data Collection Nodes (DCNs) and your vCenters

Change the time interval for your host inventory job.
On the instance where your scheduler is running, navigate to \etc\apps\Splunk_TA_vmware\default.
Open the ta_vmware_collection.conf file.
Change hostinv_interval and hostinv_expiration from the 900 second default to a larger number (maximum 2700 seconds). Keep hostinv_interval and hostinv_expiration at the same number of seconds.
Save your changes and exit.
Change the time interval for host performance data.
On the instance where your scheduler is running, navigate to \etc\apps\Splunk_TA_vmware\local.
Open the ta_vmware_collection.conf file.
Change hostvmperf_interval and hostvmperf_expiration from the 180 second default to a larger number (maximum 1200 seconds). Keep hostvmperf_interval and hostvmperf_expiration at the same number of seconds.
Save your changes and exit.

Masa · ‎11-08-2016

Some users is using 32 CPU cores physical machines for DCNs. You can have multiple DCNs for one vCenter env. VMware app's scheduler will take care of which DCN will collect which data type.

One vCenter with 2000 VMs. Potentially vCenter's response for API calls are slow often times. Or, vCenter will reach its own timeout sometimes?

iatwal · ‎12-01-2016

We're still getting errors...we have a case opened with Splunk. We upgraded to 3.3.1 and still are getting errors. I'll keep this thread updated.

prakash007 · ‎04-18-2017

Did you get any resolution of this issues...we are seeing the same error as you mentioned above....??

we're with splunk app for vmware v3.3.2 with 5DCns pointing to the vCenter...i can see the data in search sourcetype=vmware:perf*, but the dashboard in the home page says No data.

iatwal · ‎04-18-2017

Couple of things that you'll need to make sure of is all of your add-ons are installed on the indexers/search heads. In addition make sure the account you're logging into has the right role tied to it.

https://docs.splunk.com/Documentation/VMW/3.3.2/Installation/ConfigureuserrolesfortheSplunkAppforVMw...

prakash007 · ‎04-18-2017

I did check on the roles, it looks good....seems the error is related to this known issue

VMW-4466    Frequent job expiration leads to not all data being collected, task=hostvmperf has expired and will not be running. 

https://docs.splunk.com/Documentation/AddOns/released/VMW/Releasenotes#Known_Issues

Splunk App for VMware: Our Production vCenter has almost 2,000 VMs. Can I point two or more DCNs to one vCenter?

Introducing the Splunk Community Dashboard Challenge!

Wondering How to Build Resiliency in the Cloud?

Updated Data Management and AWS GDI Inventory in Splunk Observability