Knowledge Management

Summary Indexing give duplicated records

lamca
New Member

Hello,

I had set up a few schedule reports that will collect some data from index A every 15 minutes into index B (which is a summary index). However, I found that nearly all events copied from index A to index B are duplicated (some even have >100 same events), the data inside the events are exactly the same(_raw, _time, _indextime, all fields are equal). I don't want to keep those duplicated events and lower the search performance, could anyone help?

I had tried index=indexB| eval bucket=_bkt | eval indextime=_indextime |table _time, indextime, bucket splunk_server _raw | convert ctime(indextime) | stats count list(*) as * by _raw | where count>1 | fields * _raw | sort count desc for indexB, and found that same events appear 100 times, just like the pic belowalt text

I had read several question and answer inside this forum, but still cannot figure out how to solve the problem, could you please kindly advise?

Many thanks!

0 Karma

amitm05
Builder

Can you also do a cross check on your cron schedule. Because if your search is running multiple times within the span of 15 mins, that would also duplicate your data.

Also check the history of your scheduled search -

index=_internal sourcetype=scheduler | table _time user savedsearch_name status scheduled_time run_time result_count

0 Karma

p_gurav
Champion

Could you please share the search you are using to create summary index? Also some sample events.

0 Karma

lamca
New Member

The search to create summary index is something like this:

index=indexA sourcetype=sourcetype1 earliest=-15m source="*XXX_*" field1="field1" field2="*-field2" | table CREATION_TIME ELAPSED_TIME ... UUID _time source| eval DATA_TYPE="DataType1" | table CREATION_TIME ELAPSED_TIME ... UUID _time source DATA_TYPE | eval _raw="CREATION_TIME=\"".CREATION_TIME."\", ELAPSED_TIME=\"".ELAPSED_TIME."\""."\", DATA_TYPE=\"".DATA_TYPE."\"" | eval _raw=if(isnull(UUID), _raw, _raw.", UUID=".UUID) | eval _raw=if(isnull(_time), _raw, _raw.", _time="._time) | eval _raw=if(isnull(source), _raw, _raw.", source=".source) | dedup _raw | collect index=indexB

Events will have the following fields:

CREATION_TIME="2018-12-17 05:27:22.156", ELAPSED_TIME=3669, DATA_TYPE="TypeA",UUID=123e4567-e89b-12d3-a456-42665544000, _time=1545024442.156, source=sourceA

Thanks!

0 Karma

p_gurav
Champion

Do you have distributed environment or single instance? If distributed, then what is your outputs.conf on search head?
Check this answer may help:
https://answers.splunk.com/answers/290453/why-is-summary-indexing-creating-duplicate-records.html

Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...