<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Deduplicate events in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Deduplicate-events-How-can-we-override/m-p/644833#M109743</link>
    <description>&lt;P&gt;If I understand you correctly, you'd like to deduplicate events on ingest - either not ingest an event if there is already one with the same value of a field called &lt;EM&gt;ID&lt;/EM&gt; or overwrite previous values of such field.&lt;/P&gt;&lt;P&gt;Well, that's not possible with native splunk functionalities.&lt;/P&gt;&lt;P&gt;1. Splunk ingestion process works one event at a time.&lt;/P&gt;&lt;P&gt;2. Splunk ingestion process works "one-way" - you can't "check what's already in the index". Remember that parsing can be performed way, way before the event even reaches the indexers (and different events from the same source can be processed on different components). Also, you don't have access to search-time extracted values during the ingestion process.&lt;/P&gt;&lt;P&gt;3. There is no "overwriting" in Splunk.&lt;/P&gt;&lt;P&gt;So if it's really essential for you that you don't ingest duplicated &lt;EM&gt;ID&lt;/EM&gt;'s, you need to design your own ingestion process that will keep your events deduplicated (but for that you'd need some buffer window which will increase latency).&lt;/P&gt;</description>
    <pubDate>Sun, 28 May 2023 09:42:51 GMT</pubDate>
    <dc:creator>PickleRick</dc:creator>
    <dc:date>2023-05-28T09:42:51Z</dc:date>
    <item>
      <title>Deduplicate events- How can we override?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Deduplicate-events-How-can-we-override/m-p/644829#M109742</link>
      <description>&lt;DIV&gt;&lt;SPAN&gt;We have a script as a data source, and sometimes events could be duplicated (same ID). Using | dedup id in the search helps, but we want to override events with the same ID if possible. We have tried some solutions from the internet and documentation, but they haven't helped.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;props.conf&lt;BR /&gt;[incidents_script]&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;TZ&lt;/SPAN&gt;&lt;SPAN&gt; = UTC&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;category&lt;/SPAN&gt;&lt;SPAN&gt; = Splunk App Add-on Builder&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;pulldown_type&lt;/SPAN&gt;&lt;SPAN&gt; = 1&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;python.version&lt;/SPAN&gt;&lt;SPAN&gt; = python3&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;TRUNCATE&lt;/SPAN&gt;&lt;SPAN&gt; = 1000000&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;INDEXED_EXTRACTIONS&lt;/SPAN&gt;&lt;SPAN&gt; = json&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;TIMESTAMP_FIELDS&lt;/SPAN&gt;&lt;SPAN&gt; = trigger_time&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;SHOULD_LINEMERGE&lt;/SPAN&gt;&lt;SPAN&gt; = false&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;AUTO_KV_JSON&lt;/SPAN&gt;&lt;SPAN&gt; = false&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;KV_MODE&lt;/SPAN&gt;&lt;SPAN&gt; = none&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;TRANSFORMS-index&lt;/SPAN&gt;&lt;SPAN&gt; = replace_existing deduplicate&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;REPORT-id&lt;/SPAN&gt;&lt;SPAN&gt; = extract_id&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;TRANSFORMS-debug&lt;/SPAN&gt;&lt;SPAN&gt; = debug_deduplicate&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;EXTRACT-id&lt;/SPAN&gt;&lt;SPAN&gt; = &lt;/SPAN&gt;&lt;SPAN&gt;"id"&lt;/SPAN&gt;&lt;SPAN&gt;\s*:\s*&lt;/SPAN&gt;&lt;SPAN&gt;"([^"&lt;/SPAN&gt;&lt;SPAN&gt;]+)&lt;/SPAN&gt;&lt;SPAN&gt;&lt;SPAN&gt;"&lt;BR /&gt;&lt;BR /&gt;transforms.conf&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;[replace_existing]&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;REGEX&lt;/SPAN&gt;&lt;SPAN&gt; = .&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;DEST_KEY&lt;/SPAN&gt;&lt;SPAN&gt; = _SYS_CHECKSUM&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;FORMAT&lt;/SPAN&gt;&lt;SPAN&gt; = index-replace&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;BR /&gt;
&lt;DIV&gt;&lt;SPAN&gt;[deduplicate]&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;REGEX&lt;/SPAN&gt;&lt;SPAN&gt; = .&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;MV_ADD&lt;/SPAN&gt;&lt;SPAN&gt; = true&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;BR /&gt;
&lt;DIV&gt;&lt;SPAN&gt;[debug_deduplicate]&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;REGEX&lt;/SPAN&gt;&lt;SPAN&gt; = .&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;MV_ADD&lt;/SPAN&gt;&lt;SPAN&gt; = true&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;BR /&gt;
&lt;DIV&gt;&lt;SPAN&gt;[extract_id]&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;REGEX&lt;/SPAN&gt;&lt;SPAN&gt; = &lt;/SPAN&gt;&lt;SPAN&gt;"id"&lt;/SPAN&gt;&lt;SPAN&gt;\s*:\s*&lt;/SPAN&gt;&lt;SPAN&gt;"([^"&lt;/SPAN&gt;&lt;SPAN&gt;]+)&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;FORMAT = id::$1&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 30 May 2023 03:38:03 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Deduplicate-events-How-can-we-override/m-p/644829#M109742</guid>
      <dc:creator>cyberhaven</dc:creator>
      <dc:date>2023-05-30T03:38:03Z</dc:date>
    </item>
    <item>
      <title>Re: Deduplicate events</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Deduplicate-events-How-can-we-override/m-p/644833#M109743</link>
      <description>&lt;P&gt;If I understand you correctly, you'd like to deduplicate events on ingest - either not ingest an event if there is already one with the same value of a field called &lt;EM&gt;ID&lt;/EM&gt; or overwrite previous values of such field.&lt;/P&gt;&lt;P&gt;Well, that's not possible with native splunk functionalities.&lt;/P&gt;&lt;P&gt;1. Splunk ingestion process works one event at a time.&lt;/P&gt;&lt;P&gt;2. Splunk ingestion process works "one-way" - you can't "check what's already in the index". Remember that parsing can be performed way, way before the event even reaches the indexers (and different events from the same source can be processed on different components). Also, you don't have access to search-time extracted values during the ingestion process.&lt;/P&gt;&lt;P&gt;3. There is no "overwriting" in Splunk.&lt;/P&gt;&lt;P&gt;So if it's really essential for you that you don't ingest duplicated &lt;EM&gt;ID&lt;/EM&gt;'s, you need to design your own ingestion process that will keep your events deduplicated (but for that you'd need some buffer window which will increase latency).&lt;/P&gt;</description>
      <pubDate>Sun, 28 May 2023 09:42:51 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Deduplicate-events-How-can-we-override/m-p/644833#M109743</guid>
      <dc:creator>PickleRick</dc:creator>
      <dc:date>2023-05-28T09:42:51Z</dc:date>
    </item>
  </channel>
</rss>

