Sharding searches / mcollect

jbuecse · ‎04-08-2024

We have several summary searches that collect data into metric indexes. They run nightly and some of them create quite a large number of events (~100k). As a result we sometimes see warnings, that the metric indexes cannot be optimised fast enough.

A typical query looks like

index=uhdbox  sourcetype="tvclients:log:analytics" name="app*" name="*Play*" OR name="*Open*"   earliest=-1d@d+3h latest=-0d@d+3h 
| bin _time AS day span=24h aligntime=@d+3h
| stats count as eventCount earliest(_time) as _time  by day, eventName, releaseTrack, partnerId, deviceId 
| fields - day 
| mcollect index=uhdbox_summary_metrics split=true marker="name=UHD_AppsDetails, version=1.1.0" eventName, releaseTrack, partnerId, deviceId

The main contributor to the large number of events is the cardinality of deviceId (~100k) which effectively is a "MAC" address with a common prefix and defined length. I could create 4 / 8 /16 reports each selecting a subset of deviceIds and schedule them at different times, but it would be quite a burden to maintain those basicly identical copies.

So...

I wonder if there is a mechanism to shard the search results and feed them it into many separate mcollects that are spaced apart by some delay. Something like

index=uhdbox  sourcetype="tvclients:log:analytics" name="app*" name="*Play*" OR name="*Open*"   earliest=-1d@d+3h latest=-0d@d+3h 

| shard by deviceId bins=10 sleep=60s

| stats count as eventCount earliest(_time) as _time  by day, eventName, releaseTrack, partnerId, deviceId 
| fields - day 
| mcollect index=uhdbox_summary_metrics split=true marker="name=UHD_AppsDetails, version=1.1.0" eventName, releaseTrack, partnerId, deviceId

Maybe my pseudo code above is not so clear. What I would like to achieve is, that instead of one huge mcollect I get 10 mcollects (each for a approximately 1/10th of the events). They should be scheduled approximately 60s apart from each other...

bowesmana · ‎04-08-2024

What you suggest is not possible in a single search. Assuming the cardinality does not change much over the 24h period I don't suppose there is benefit in running the search hourly - which would produce more metrics and would need to be aggregated on consumption.

However, you could create N searches where the body of a search is a single macro, which runs your base SPL and you call the macro with the device id prefixes you want to search for. Not an elegant solution - but functional.

I don't understand the message you say you are getting though - I am not familiar with that - secondly what is the impact of that message occurring - does it break the collected data in some way and does it stop other searches from working?

Sharding searches / mcollect

other

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

Troubleshooting the OpenTelemetry Collector

Adoption of Infrastructure Monitoring at Splunk