Strategies for maintaining summary index consisten...

Lucas_K · ‎07-29-2013

Does anyone have some ways in which they are able to create "report acceleration like" automation into summary index generating jobs?

My method (currently working on it) is to have scheduled jobs that check for x number of jobs per hour/day and then run the backfill script over that time frame if the number of jobs is not correct. As the backfill script runs its own search looking for time frames without summary searches is technically safe to use all the time. But for the sake of efficiency is only run when number of jobs is less than the scheduled number.

Now the flaws in this are that there is no, by default, way to 100% check if the jobs actually finished using information within the summary index. That is to say that the search head that was running the job didn't crash midway through the job, thereby only saving a small amount of data to the summary index (which would result in a backfill being skipped for this time period).

So a modified fill_summary_index.py script would have to be used that looks for some appended "end of job" line marker ( example : http://splunk-base.splunk.com/answers/41525/add-a-row-to-end-of-table )for each matching search.

ie.

<my search> | append [|stats count |eval count="complete"| rename count as "info_search_marker" ]

Then in the modified backfill python script.

dedupsearch = 'search index=$index$ $namefield$="$name$" | stats count by $timefield$'

to

 info_search_marker='complete'
    dedupsearch = 'search index=$index$ $namefield$="$name$" info_search_marker="$info_search_marker$" | stats count by $timefield$'

This will allow previously run but not completed jobs to be re-added to the backfill list.

The other issues,

anything referencing data within this summary MUST be able to handle possible duplications in data due to reruns. Possibly add in a deletion option for that specific partially run search.
no verification like ability. If something changes in the data source. fields, late arriving events etc. It can't be easily rebuilt using a single click (like report acceleration).

Does anyone have any better idea's to keep summary indexes populated/recover from missing searches?

Alternatively, anyway to directly reference report accelerated searches so new search terms can be inserted before the first pipe?

Lucas_K · ‎09-04-2013

ok update on this.

To throw an extra wrinkle into this issue how can one mitigate issues with a index (or indexes) not being available at the time a scheduled summary creating search runs? Using a marker for a completed job will only show that it wasn't interrupted (on the search ead) and has atleast tried to run for that time period. It won't know that a particular indexer wasn't providing the raw events to the search head. Thus we are missing raw events and have created a summary index that is actually incorrect.

So my only though as to a solution would be another alerting script that saves the time of any indexer outage. Another backfill re-runner job will look for these and run a clobbering summary job for this time span.

To put clustering into the mix it might also be possible to use the search factor values as an indicator of when a scheduled job needs to be re-run. I havn't looked to see the rate at whic(or even if) these is monitored and saved somewhere that I can refer to.

This also leads to requiring updates of dashboards to cope with potential duplicate results. This probably wouldn't be solved with a simple dedup either. You'd have to only use the latest search_now results for each time period.

jtrucks · ‎07-30-2013

I check the last timestamp on the newest entry in a summary index that has the same marker - I tag every entry with something related to the search populating the data so I can use a summary index for multiple results and easily sort them out. I've checked for gaps manually in the past, but I plan on automating this in some way so that when I launch a job to populate the summary index, it will figure out for itself what timestamp to start the search in order to backfill correctly. This would mean no gaps, in theory, so even if I run the job daily but it doesn't get run for days for some odd reason, I still get all my data complete in the summary index.

I'm experimenting with something like this:

sourcetype=mylogs [search index=summary_mylogs | head 1 | eval earliest=_time | return earliest] latest=now | timechart span=1m count

This subsearch gets the timestamp of the last event in the summary index and outputs earliest=### into the main search, which then uses it as the earliest start time for the search.

So far it seems to work...

--
Jesse Trucks
Minister of Magic

jtrucks · ‎07-31-2013

I should add that my above search has | collect... after it to send my results into the summary, of course.

--
Jesse Trucks
Minister of Magic

jtrucks · ‎07-31-2013

I look at the timestamp of the summary index event last placed into the summary index, then I use it as the starting point of where to start pulling new data from the index with original data. You could expand this by doing an eval to compare the last event in the index and the last event in the summary, but I don't bother since it will get any gap at the end of the summary. If there are no index events after the last entry in the summary, then the search returns nothing places nothing into the summary. The result of that is that the search runs, finds nothing to update, so it doesn't update.

--
Jesse Trucks
Minister of Magic

Lucas_K · ‎07-30-2013

Very nice.

So what are you actually checking against? The last real event in the index vs the summary index latest timestamp?

If index timestamp > summary for that time period then there is a gap?

Strategies for maintaining summary index consistency.

Routing logs with Splunk OTel Collector for Kubernetes

Welcome to the Splunk Community!

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM