<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Reimporting Event Data / Fixing Data Holes in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Reimporting-Event-Data-Fixing-Data-Holes/m-p/489861#M83740</link>
    <description>&lt;P&gt;Because of network problems between my HFs and my indexing tier I have some "holes" in my data. With holes I mean missing data. These Holes need to be fixed. My Idea for this goes as follows:&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;reindex all logs (and rotated logs) within the timerange, but to a new index&lt;/LI&gt;
&lt;LI&gt;search for &lt;CODE&gt;index=newindex OR index=originalindex | eventstats count by raw | where count=1 | eval count=null&lt;/CODE&gt;&lt;/LI&gt;
&lt;LI&gt;exporting these events in a file&lt;/LI&gt;
&lt;LI&gt;reimporting these events into the original index&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;Generally this seems to work, but there is still one problem: My data comes from my different sources with many different source types. &lt;STRONG&gt;How can I export data with source and sourcetype and keep those fields when reimporting?&lt;/STRONG&gt; I am also open to better solutions to my general problem.&lt;/P&gt;

&lt;P&gt;Thanks in advance.&lt;/P&gt;</description>
    <pubDate>Mon, 25 Nov 2019 13:25:16 GMT</pubDate>
    <dc:creator>jroedel</dc:creator>
    <dc:date>2019-11-25T13:25:16Z</dc:date>
    <item>
      <title>Reimporting Event Data / Fixing Data Holes</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Reimporting-Event-Data-Fixing-Data-Holes/m-p/489861#M83740</link>
      <description>&lt;P&gt;Because of network problems between my HFs and my indexing tier I have some "holes" in my data. With holes I mean missing data. These Holes need to be fixed. My Idea for this goes as follows:&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;reindex all logs (and rotated logs) within the timerange, but to a new index&lt;/LI&gt;
&lt;LI&gt;search for &lt;CODE&gt;index=newindex OR index=originalindex | eventstats count by raw | where count=1 | eval count=null&lt;/CODE&gt;&lt;/LI&gt;
&lt;LI&gt;exporting these events in a file&lt;/LI&gt;
&lt;LI&gt;reimporting these events into the original index&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;Generally this seems to work, but there is still one problem: My data comes from my different sources with many different source types. &lt;STRONG&gt;How can I export data with source and sourcetype and keep those fields when reimporting?&lt;/STRONG&gt; I am also open to better solutions to my general problem.&lt;/P&gt;

&lt;P&gt;Thanks in advance.&lt;/P&gt;</description>
      <pubDate>Mon, 25 Nov 2019 13:25:16 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Reimporting-Event-Data-Fixing-Data-Holes/m-p/489861#M83740</guid>
      <dc:creator>jroedel</dc:creator>
      <dc:date>2019-11-25T13:25:16Z</dc:date>
    </item>
    <item>
      <title>Re: Reimporting Event Data / Fixing Data Holes</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Reimporting-Event-Data-Fixing-Data-Holes/m-p/489862#M83741</link>
      <description>&lt;P&gt;There is a mistake in the SPL-query. It should be &lt;CODE&gt;eventstats count by _raw&lt;/CODE&gt;.&lt;/P&gt;</description>
      <pubDate>Mon, 25 Nov 2019 13:27:50 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Reimporting-Event-Data-Fixing-Data-Holes/m-p/489862#M83741</guid>
      <dc:creator>jroedel</dc:creator>
      <dc:date>2019-11-25T13:27:50Z</dc:date>
    </item>
    <item>
      <title>Re: Reimporting Event Data / Fixing Data Holes</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Reimporting-Event-Data-Fixing-Data-Holes/m-p/489863#M83742</link>
      <description>&lt;P&gt;How many events we're talking about here? Do remember that both step 1 and step 2 will be counted against license.&lt;/P&gt;</description>
      <pubDate>Mon, 25 Nov 2019 21:57:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Reimporting-Event-Data-Fixing-Data-Holes/m-p/489863#M83742</guid>
      <dc:creator>somesoni2</dc:creator>
      <dc:date>2019-11-25T21:57:08Z</dc:date>
    </item>
    <item>
      <title>Re: Reimporting Event Data / Fixing Data Holes</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Reimporting-Event-Data-Fixing-Data-Holes/m-p/489864#M83743</link>
      <description>&lt;P&gt;Over the time range in question, run a search like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index="foo" AND sourcetype="bar" | stats count BY source
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Then in the shell on your HF, do this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;for FILE in /path/to/files/in/question/*
do
   wc -l ${FILE}
done
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Then cross-reference these 2 lists.  Whichever ones are wrong, do this in Splunk:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index="foo" AND sourcetype="bar" source="bad" | delete
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Then in the shell on your HF, for each &lt;CODE&gt;bad&lt;/CODE&gt; file, do this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;/opt/splunk/bin/splunk add oneshot /path/to/files/in/question/bad.csv -index foo -sourcetype bar -auth admin:changeme
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 25 Nov 2019 23:51:35 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Reimporting-Event-Data-Fixing-Data-Holes/m-p/489864#M83743</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2019-11-25T23:51:35Z</dc:date>
    </item>
    <item>
      <title>Re: Reimporting Event Data / Fixing Data Holes</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Reimporting-Event-Data-Fixing-Data-Holes/m-p/489865#M83744</link>
      <description>&lt;P&gt;Keep in mind that if the gaps are very large, doing this will erode your overall retention of data because &lt;CODE&gt;delete&lt;/CODE&gt; does not really delete anything, it just &lt;CODE&gt;hides&lt;/CODE&gt; it.&lt;/P&gt;</description>
      <pubDate>Mon, 25 Nov 2019 23:53:32 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Reimporting-Event-Data-Fixing-Data-Holes/m-p/489865#M83744</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2019-11-25T23:53:32Z</dc:date>
    </item>
    <item>
      <title>Re: Reimporting Event Data / Fixing Data Holes</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Reimporting-Event-Data-Fixing-Data-Holes/m-p/489866#M83745</link>
      <description>&lt;P&gt;With this solution I would have to repeat the procedure manually for every sourcetype. Furthermore the HF only just forwards most of data, which are coming from UFs on which I have no shell-access.&lt;/P&gt;

&lt;P&gt;Thus I am sorry, this solution does not work for me.&lt;/P&gt;</description>
      <pubDate>Tue, 26 Nov 2019 07:51:24 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Reimporting-Event-Data-Fixing-Data-Holes/m-p/489866#M83745</guid>
      <dc:creator>jroedel</dc:creator>
      <dc:date>2019-11-26T07:51:24Z</dc:date>
    </item>
    <item>
      <title>Re: Reimporting Event Data / Fixing Data Holes</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Reimporting-Event-Data-Fixing-Data-Holes/m-p/489867#M83746</link>
      <description>&lt;P&gt;I found a solution myself in the meantime. In particular for step three and four:&lt;/P&gt;

&lt;P&gt;3) export as json file&lt;BR /&gt;
4.1) let the json-file run through my python-script (see below) &lt;CODE&gt;./script.py &amp;gt; /tmp/missingdata.txt&lt;/CODE&gt;&lt;BR /&gt;
4.2) one-shot the outputs of this to the index &lt;CODE&gt;./splunk add oneshot /tmp/missingdata.txt -index foo -sourcetype logimport&lt;/CODE&gt;&lt;/P&gt;

&lt;HR /&gt;

&lt;P&gt;my python3 script:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;#!/usr/bin/python3
import json

fp=open('./export.json', 'r')

line = fp.readline()
while line:
  parsedline=json.loads(line)
  print(parsedline["result"]["_raw"])
  print("HOST = "+parsedline["result"]["host"])
  print("SOURCE = "+parsedline["result"]["source"])
  print("SOURCETYPE = "+parsedline["result"]["sourcetype"])
  print("###")
  line=fp.readline()

fp.close()
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;HR /&gt;

&lt;P&gt;props.conf:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[logimport]
LINE_BREAKER=(###\n)
TRANSFORMS = importsource, importsourcetype, importhost, importraw
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;HR /&gt;

&lt;P&gt;transforms.conf:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[importhost]
REGEX =\nHOST = (.*)
FORMAT= host::$1
DEST_KEY = MetaData:Host
WRITE_META = true

[importsource]
REGEX=\nSOURCE = (.*)
FORMAT= source::$1
DEST_KEY = MetaData:Source

[importsourcetype]
REGEX=\nSOURCETYPE = (.*)
FORMAT= sourcetype::$1
DEST_KEY = MetaData:Sourcetype

[importraw]
REGEX=^(.*)\nHOST
DEST_KEY = _raw
FORMAT = $1
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 26 Nov 2019 08:02:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Reimporting-Event-Data-Fixing-Data-Holes/m-p/489867#M83746</guid>
      <dc:creator>jroedel</dc:creator>
      <dc:date>2019-11-26T08:02:20Z</dc:date>
    </item>
    <item>
      <title>Re: Reimporting Event Data / Fixing Data Holes</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Reimporting-Event-Data-Fixing-Data-Holes/m-p/489868#M83747</link>
      <description>&lt;P&gt;Way to go sharing your code.  Come back and click &lt;CODE&gt;Accept&lt;/CODE&gt; on your answer to close the question and let other people know there is a good answer.&lt;/P&gt;</description>
      <pubDate>Tue, 26 Nov 2019 16:42:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Reimporting-Event-Data-Fixing-Data-Holes/m-p/489868#M83747</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2019-11-26T16:42:57Z</dc:date>
    </item>
  </channel>
</rss>

