<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Duplicate entries with continuous csv indexing in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Duplicate-entries-with-continuous-csv-indexing/m-p/445944#M168899</link>
    <description>&lt;P&gt;The wrong thing I wrote is that my input file looks like this:&lt;BR /&gt;
&lt;span class="lia-inline-image-display-wrapper" image-alt="alt text"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/5160iADAC77EEECC4F4E4/image-size/large?v=v2&amp;amp;px=999" role="button" title="alt text" alt="alt text" /&gt;&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;I build this file with a python script from scratch based on the xml file every minute and this csv file has been monitored all the time in splunk. When updating, the address of the mac, etc., the signal strength, last seen etc., changes. And every time I rebuild this file, the csv in splunk shows me a new entry to each maca, even if it was already there. My main point is not to add a new maca entry and update the signal value etc.&lt;/P&gt;</description>
    <pubDate>Wed, 13 Jun 2018 18:04:08 GMT</pubDate>
    <dc:creator>haker146</dc:creator>
    <dc:date>2018-06-13T18:04:08Z</dc:date>
    <item>
      <title>Duplicate entries with continuous csv indexing</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Duplicate-entries-with-continuous-csv-indexing/m-p/445942#M168897</link>
      <description>&lt;P&gt;Hello, I write with a small problem to you. I'm building a wi-fi monitoring system for my diploma thesis. I use Kismet software, which creates a netxml file for me, which I then parse to csv. Then, this csv file I want to constantly watch in splunk and see changes in signal strength over time. Unfortunately, each rebooting my netxml file and creating a new csv with the same name causes that more and more duplicates of the same network appear. As shown in the figure below. &lt;span class="lia-inline-image-display-wrapper" image-alt="alt text"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/5162iF5F8B80D3044975D/image-size/large?v=v2&amp;amp;px=999" role="button" title="alt text" alt="alt text" /&gt;&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;I am asking for help, what should I do to ensure that these duplicates do not arise and there is one original entry.&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jun 2018 17:32:29 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Duplicate-entries-with-continuous-csv-indexing/m-p/445942#M168897</guid>
      <dc:creator>haker146</dc:creator>
      <dc:date>2018-06-13T17:32:29Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicate entries with continuous csv indexing</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Duplicate-entries-with-continuous-csv-indexing/m-p/445943#M168898</link>
      <description>&lt;P&gt;When you reboot it generates a new log file under the same name. Does it contain the old entries still? Are you able to control the name of the file it generates? It sounds like each time you reboot it reindexes the entire file. You should be able to control this using your inputs.conf. Could you provide what your inputs.conf for this looks like?&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jun 2018 17:46:31 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Duplicate-entries-with-continuous-csv-indexing/m-p/445943#M168898</guid>
      <dc:creator>mdsnmss</dc:creator>
      <dc:date>2018-06-13T17:46:31Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicate entries with continuous csv indexing</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Duplicate-entries-with-continuous-csv-indexing/m-p/445944#M168899</link>
      <description>&lt;P&gt;The wrong thing I wrote is that my input file looks like this:&lt;BR /&gt;
&lt;span class="lia-inline-image-display-wrapper" image-alt="alt text"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/5160iADAC77EEECC4F4E4/image-size/large?v=v2&amp;amp;px=999" role="button" title="alt text" alt="alt text" /&gt;&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;I build this file with a python script from scratch based on the xml file every minute and this csv file has been monitored all the time in splunk. When updating, the address of the mac, etc., the signal strength, last seen etc., changes. And every time I rebuild this file, the csv in splunk shows me a new entry to each maca, even if it was already there. My main point is not to add a new maca entry and update the signal value etc.&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jun 2018 18:04:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Duplicate-entries-with-continuous-csv-indexing/m-p/445944#M168899</guid>
      <dc:creator>haker146</dc:creator>
      <dc:date>2018-06-13T18:04:08Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicate entries with continuous csv indexing</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Duplicate-entries-with-continuous-csv-indexing/m-p/445945#M168900</link>
      <description>&lt;P&gt;Adding a timestamp to each event in the csv and viewing the csv as a snapshot in time the duplicates can then be handled within Splunk. This also gives the advantage of being able to plot how signal changes over time in Splunk. Splunk can be thought of a a time series database so adding events with the same data but with different timestamps is fine.&lt;/P&gt;

&lt;P&gt;Generated event query to show concept of how signal strength changes over time.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| makeresults count=4 
| streamstats count 
| eval _time = _time - (count*3600) 
| eval mac_address = "00:1D:0F:FB:40:4A", channel=6, signal=(random()%10)+ -74 
| timechart max(signal) as Signal span=1h
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Also within Splunk you will be able to query the first seen and last seen values, so no need to generate these fields in the extract itself&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| makeresults count=4 
| streamstats count 
| eval _time = _time - (count* 3600) 
| eval mac_address = "00:1D:0F:FB:40:4A", channel=6, signal= count + -74 
| eval time=strftime(_time,"%y/%m/%d %H:%M:%S") 
| stats min(time) As first_seen, max(time) AS last_seen by mac_address
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;To map a field in the csv extract to a Splunk timestamp &lt;A href="https://docs.splunk.com/Documentation/Splunk/7.1.1/Data/HowSplunkextractstimestamps"&gt;https://docs.splunk.com/Documentation/Splunk/7.1.1/Data/HowSplunkextractstimestamps&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;The _time field in Splunk is where the timestamp is held.&lt;/P&gt;</description>
      <pubDate>Thu, 14 Jun 2018 08:13:28 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Duplicate-entries-with-continuous-csv-indexing/m-p/445945#M168900</guid>
      <dc:creator>msivill_splunk</dc:creator>
      <dc:date>2018-06-14T08:13:28Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicate entries with continuous csv indexing</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Duplicate-entries-with-continuous-csv-indexing/m-p/445946#M168901</link>
      <description>&lt;P&gt;@msivill &lt;BR /&gt;
Thank you so much for help. I still have a question and what if I want to make a table without repeating entries?&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jun 2018 16:40:43 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Duplicate-entries-with-continuous-csv-indexing/m-p/445946#M168901</guid>
      <dc:creator>haker146</dc:creator>
      <dc:date>2018-06-18T16:40:43Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicate entries with continuous csv indexing</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Duplicate-entries-with-continuous-csv-indexing/m-p/445947#M168902</link>
      <description>&lt;P&gt;There is no concept of updating an event in Splunk. If you send the same data twice to Splunk then you will end up with 2 events. Using a timestamp when the events are saved into Splunk will help differentiate between the events. The above example produces a view without repeating entries (but there will still be duplicate events within Splunk itself)&lt;/P&gt;</description>
      <pubDate>Tue, 26 Jun 2018 08:00:31 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Duplicate-entries-with-continuous-csv-indexing/m-p/445947#M168902</guid>
      <dc:creator>msivill_splunk</dc:creator>
      <dc:date>2018-06-26T08:00:31Z</dc:date>
    </item>
  </channel>
</rss>

