Re: Summary Indexing give duplicated records

lamca · ‎12-26-2018

Hello,

I had set up a few schedule reports that will collect some data from index A every 15 minutes into index B (which is a summary index). However, I found that nearly all events copied from index A to index B are duplicated (some even have >100 same events), the data inside the events are exactly the same(_raw, _time, _indextime, all fields are equal). I don't want to keep those duplicated events and lower the search performance, could anyone help?

I had read several question and answer inside this forum, but still cannot figure out how to solve the problem, could you please kindly advise?

Many thanks!

amitm05 · ‎12-27-2018

Can you also do a cross check on your cron schedule. Because if your search is running multiple times within the span of 15 mins, that would also duplicate your data.

Also check the history of your scheduled search -

index=_internal sourcetype=scheduler | table _time user savedsearch_name status scheduled_time run_time result_count

p_gurav · ‎12-26-2018

Could you please share the search you are using to create summary index? Also some sample events.

lamca · ‎12-27-2018

The search to create summary index is something like this:

index=indexA sourcetype=sourcetype1 earliest=-15m source="*XXX_*" field1="field1" field2="*-field2" | table CREATION_TIME ELAPSED_TIME ... UUID _time source| eval DATA_TYPE="DataType1" | table CREATION_TIME ELAPSED_TIME ... UUID _time source DATA_TYPE | eval _raw="CREATION_TIME=\"".CREATION_TIME."\", ELAPSED_TIME=\"".ELAPSED_TIME."\""."\", DATA_TYPE=\"".DATA_TYPE."\"" | eval _raw=if(isnull(UUID), _raw, _raw.", UUID=".UUID) | eval _raw=if(isnull(_time), _raw, _raw.", _time="._time) | eval _raw=if(isnull(source), _raw, _raw.", source=".source) | dedup _raw | collect index=indexB

Events will have the following fields:

CREATION_TIME="2018-12-17 05:27:22.156", ELAPSED_TIME=3669, DATA_TYPE="TypeA",UUID=123e4567-e89b-12d3-a456-42665544000, _time=1545024442.156, source=sourceA

Thanks!

p_gurav · ‎12-27-2018

Do you have distributed environment or single instance? If distributed, then what is your outputs.conf on search head?
Check this answer may help:
https://answers.splunk.com/answers/290453/why-is-summary-indexing-creating-duplicate-records.html

Summary Indexing give duplicated records

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Stay Connected: Your Guide to October Tech Talks, Office Hours, and Webinars!

Are you a member of the Splunk Community?

Summary Indexing give duplicated records

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Stay Connected: Your Guide to October Tech Talks, Office Hours, and Webinars!