Knowledge Management

Is there any way to fill my summary index with only the newer portion data every day from the raw index?

Motivator

Hi Splunk team,

I have a scenario where i have a raw index and a summary index, and a scheduled search which is used to populate data from the raw index to summary index. My scheduled search runs on daily basis and fills the summary index, and every thing is working fine as expected.

Now here the problem is for Eg . if my raw data is updated with the newer portion of data on already passed days, I need to fill this as well in the summary index which I tried a backfill script for, but it give me proper results.

Example :

index="main" -- raw index
index="summary" - summary index

Assuming the main index has only Dec 12th data i.e with _time Dec 12th and summary generating search ran on dec 12 th and populated all the 12th data in summary index.

Now lets say Dec 15th , I had a few more data of Dec 12th which has come now from the forwarder and it has gone to the "main" index for Dec 12th. Now my issue is to fill in this newer portion of data in the summary index.

I have tried backfill and re-run the searches, but every time it's creating duplicates. I used the nolocal option as well, but no luck.

Any way to fill only the newer portion of data every time to a summary index?

manythanks,
Rakesh.

Esteemed Legend

I am unaware of any way to use the backfill script to do any kind of merge like you are describing HOWEVER, there is a way to do it directly with an additional populating search. You do it by cloning your original populating search and inserting this code somewhere in the middle to limit the search to gathering only late-arriving data by adjusting 2(days)*(24hours/day)*(60minutes/hour)*(60seconds/minute) to meet your needs:

... | eval lagSeconds = _indextime - _time | where (lagSeconds > (2*24*60*60)) | ...

You can then schedule both searches and fuhgeddaboutit.

Motivator

Thanks WoodCock for the work around. But i was thinking if i re-run one more search to find the late landing events and populate them in summary index. i will have late landing events for each and every day , in that case if i run the second search on daily basis again it may cause duplicates in summary index right . is there any way to avoid duplicates before filling the late landing events in summary index.

0 Karma

Esteemed Legend

You will have more than 1 entry per time period because you are running 2 different searches but these are not "duplicates" because each search's entry will have mutually-exclusive data sets: the one search for "on-time" events and the other for "late-arriving" events.

0 Karma

SplunkTrust
SplunkTrust

This is in theory one of the use cases why the backfill script was created for: http://docs.splunk.com/Documentation/Splunk/6.3.1/Knowledge/Managesummaryindexgapsandoverlaps

If that's not working as expecting you might need to raise it with Splunk. It should not create duplicates as far as I'm aware.

Out of curiosity, is there any reason you can't use report acceleration or a data model? They will take care of the gaps for you automatically

0 Karma

Motivator

Hi Javiergn,

I am running my summary generating searches on daily basis and regular intervals and all these searches are running fine for that day , say Dec 12th . Now my worry is if Dec 12th data has come to raw index on Dec 13th , the newer portion of Dec 12th data is missing in summary index. So when i try running backfill script on Dec 12th its creating duplicates for me , giving dedup -true and nolocal true option its saying all the searches have run and there are no searches to re-run.

Many thanks,
Rakesh.

0 Karma
Don’t Miss Global Splunk
User Groups Week!

Free LIVE events worldwide 2/8-2/12
Connect, learn, and collect rad prizes
and swag!