All Apps and Add-ons

TrackMe App: how to adjust "max allowed lagging value" depending on the priority?

woodentree
Communicator

Hello,

Do you know if there is a way to modify the data source maximal allowed lagging value automatically depending of its priority? For example 600 seconds for 'high', 3600 for 'medium' and 86400 for 'low'.

Thanks for the help.

Labels (1)
Tags (1)
0 Karma
1 Solution

guilmxm
SplunkTrust
SplunkTrust


Hi @woodentree 

This actually looks a good idea for an enhancement request, in the future feel free to open an issue in GitHub:

https://github.com/guilhemmarchand/trackme/issues

To answer your question, given that TrackMe stores everything in KVstore based lookups, this is very straightforward, you would create a report in TrackMe that you schedule to run on the schedule of your choice (the schedule cost will be near null), for example for data sources:

 

| inputlookup trackme_data_source_monitoring | eval keyid=_key
| eval data_max_lag_allowed=case(priority="low", 86400, priority="medium", 3600, priority="high", 600)
| outputlookup append=t key_field=keyid trackme_data_source_monitoring
| stats c

 


Schedule this as a new report, say "TrackMe - custom lagging policy maintenance", and this will update the lagging value based on your rule.

Note that with this, any change operated in the UI regarding the lagging max value would be overriden by your rule.

The following search would maintain this for all data sources / hosts in one operation:

| inputlookup trackme_data_source_monitoring | eval keyid=_key
| eval data_max_lag_allowed=case(priority="low", 86400, priority="medium", "3600", priority="high", 600)
| outputlookup append=t key_field=keyid trackme_data_source_monitoring
| stats c
| inputlookup append=t trackme_host_monitoring | eval keyid=_key | where isnull(c) | fields - c
| eval data_max_lag_allowed=case(priority="low", 86400, priority="medium", "3600", priority="high", 600)
| outputlookup append=t key_field=keyid trackme_data_source_monitoring
| stats c


I will most certainly look at extending the lagging class feature to add a class to be created based on the priority.

Let me know if you have any question.

Guihem

 

 

 

View solution in original post

guilmxm
SplunkTrust
SplunkTrust


Hi @woodentree 

This actually looks a good idea for an enhancement request, in the future feel free to open an issue in GitHub:

https://github.com/guilhemmarchand/trackme/issues

To answer your question, given that TrackMe stores everything in KVstore based lookups, this is very straightforward, you would create a report in TrackMe that you schedule to run on the schedule of your choice (the schedule cost will be near null), for example for data sources:

 

| inputlookup trackme_data_source_monitoring | eval keyid=_key
| eval data_max_lag_allowed=case(priority="low", 86400, priority="medium", 3600, priority="high", 600)
| outputlookup append=t key_field=keyid trackme_data_source_monitoring
| stats c

 


Schedule this as a new report, say "TrackMe - custom lagging policy maintenance", and this will update the lagging value based on your rule.

Note that with this, any change operated in the UI regarding the lagging max value would be overriden by your rule.

The following search would maintain this for all data sources / hosts in one operation:

| inputlookup trackme_data_source_monitoring | eval keyid=_key
| eval data_max_lag_allowed=case(priority="low", 86400, priority="medium", "3600", priority="high", 600)
| outputlookup append=t key_field=keyid trackme_data_source_monitoring
| stats c
| inputlookup append=t trackme_host_monitoring | eval keyid=_key | where isnull(c) | fields - c
| eval data_max_lag_allowed=case(priority="low", 86400, priority="medium", "3600", priority="high", 600)
| outputlookup append=t key_field=keyid trackme_data_source_monitoring
| stats c


I will most certainly look at extending the lagging class feature to add a class to be created based on the priority.

Let me know if you have any question.

Guihem

 

 

 

guilmxm
SplunkTrust
SplunkTrust

Hi @woodentree 

TrackMe 1.2.27 was published today and allows you to create builtin lagging classes based on the priority for data sources:

https://trackme.readthedocs.io/en/latest/userguide.html#lagging-classes

Let me know if you have any question.

Kind regards,


Guilhem

 

woodentree
Communicator

Hi @guilmxm 

We’ve set up the lagging classes definitions based on priorities, but it looks like we have an issue with it. We configured 4 lagging classes in total: 3 for data sources and 1 for data hosts.

Lagging classes conf.png

It works great for ‘medium’ and ‘low’ priorities, but not for ‘high’: the value of 'data_max_lag_allowed' does not change (for data sources as well as for data hosts).

Lagging classes sourcetype.png

We’ve tried recreate ‘high’ lagging classes, change their order, we’ve checked their the orthography as well, but nothing seems to be useful.

Do you have an idea what could be the issue’s root cause?

Thanks for the help.

0 Karma

guilmxm
SplunkTrust
SplunkTrust

Hi @woodentree 

This should just work fine for high priority too, most likely these entities you are looking at (those in high priority) have a the override lagging class field set to true:

guilmxm_0-1613397121368.png


You can check this out the following way too:

For data sources:

| inputlookup trackme_data_source_monitoring | eval keyid=_key
| search priority="high" AND data_override_lagging_class="true"

 

And for data hosts:

| inputlookup trackme_host_monitoring | eval keyid=_key
| search priority="high" AND data_override_lagging_class="true"


You could change (bulk change) this the following way:

For data sources:

 

| inputlookup trackme_data_source_monitoring | eval keyid=_key
| search priority="high" AND data_override_lagging_class="true"
| eval data_override_lagging_class="false"
| outputlookup key_field=keyid trackme_data_source_monitoring

 

And for data hosts:

 

| inputlookup trackme_host_monitoring | eval keyid=_key
| search priority="high" AND data_override_lagging_class="true"
| eval data_override_lagging_class="false"
| outputlookup key_field=keyid trackme_host_monitoring

 

 

Of course if there are some entities that should override lagging classes, you need to either exclude these from the commands above, or check it out within the UI.

Then after you run the trackers (or wait a bit) and this should be applied.

Let me know

Guihem


 

woodentree
Communicator

Hi @guilmxm,

Thanks for the help.

Actually, the “override lagging classes” option was the first thing we checked. And for all data sources and data hosts it’s set to “false”. Here’s the search you provided for data sources:

data_override_lagging_class_true.png

And the same for 'false':

data_override_lagging_class_false.png

Regards

0 Karma

guilmxm
SplunkTrust
SplunkTrust

@woodentree 

Hum it's actually weird, there should have been a value, remember which which version you're one?
Might have been a thing that was fixed later on, the macro define a default value in case the field would null.

Can you run:

| inputlookup trackme_data_source_monitoring | eval keyid=_key
| eval data_override_lagging_class=if(isnull(data_override_lagging_class), "false", data_override_lagging_class)
| outputlookup trackme_data_source_monitoring key_field=keyid


Then run the tracker and verify? This would fix any null value and set to false (the default)
This is safe to do, only this value will get updated and this won't change a thing for other entities.

If this fixes the issue you can apply for data hosts too.

I will check how it is expected in TrackMe logic too 😉

0 Karma

woodentree
Communicator

Hi @guilmxm,

It didn’t fly 😞

We executed the provided request but it doesn’t changed a thing. We also tried to rewrite the “data_override_lagging_class” value from Lookup editor App, but it wasn’t helpful.

The TrackMe version we use in our test environment is 1.2.28

Regards.

0 Karma

guilmxm
SplunkTrust
SplunkTrust

Hi @woodentree 

Ok, I will check this out deeply and verify, then share some additional things to verify.
First you can please upgrade the app to its latest version, especially since you are in testing, there has been many bug fixes since 😉
Then eventually please verify after the update is the behaviior persists and let me know

Thank you

woodentree
Communicator

Hi @guilmxm ,

Yes, sure. I’ll ask our Tool team to update the app to its last version.

Thanks for the help!

woodentree
Communicator

Hi @guilmxm ,

I asked our Tool Team update TrackMe version. We also performed a few tests.

Good news, there is no more issue with “unapplied” lagging classes for data source.

Unfortunately, there are few bad news as well.

  1. We still face the same issue with hosts

1.png

2.png

3.png

4.png

 

2. Second issue, It looks like after the update the loading time is waaay higher than before.

 

We also discovered one more thing we’d like to ask you about. We you logical groups a lot and we’d like to be able to create lagging classes based on it. My colleague from Toole Team just create a feature request related to it: https://github.com/guilhemmarchand/trackme/issues/272

 

P.S. N’hésitez pas à nous répondre en français si vous le préférez 🙂

Regards.

0 Karma

guilmxm
SplunkTrust
SplunkTrust

Bonjour @woodentree !

Ha ha my keyboard doesn't have any of the French things I'd need not to become mad quickly enough lol

So, to answer, great the data sources issue is fixed, ole!

Then, regarding:

- Data hosts

This I think is actually working, the thing is that for hosts it behaves a little bit differently and it cannot be lower the higher value between all the sourcetypes idenfitied for the host.

The reason is specific to data hosts, which can behave on a per sourcetype, or globally (the global alerting mode) so I had to come with a compromise.

The proof in image:

guilmxm_0-1614706804412.png

guilmxm_1-1614706838267.png

Because the default is 3600 secs,  to go bellow this value, this would mean that the sourcetypes themselves go bellow this value, data hosts will take whatever the biggest value.

Let me know your thinking, I know this part is a bit tricky.

- Loading performance:

Gotcha, it's in my plan to work on this and review the UI globally, which will take time, thus there shouldn't have been such an increase so I will investigate!

Received for the Git issue, in da back log!

Cheers

Guilhem

 











woodentree
Communicator

Hi @guilmxm ,

 

Ok, got it for data host. Yep, it's a bit tricky, but it makes sense after the explanation.

 

Thanks for the help and for your engagement!

Regards.

0 Karma

woodentree
Communicator

Hi @guilmxm ,

Great news, glad to hear it.
Thanks for the update!

 

0 Karma

woodentree
Communicator

Hi @guilmxm 

Awesome. We'll certainly implement those reports.

Thanks for the help!

 

 

Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...