<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What is the best way to load archived logs? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/What-is-the-best-way-to-load-archived-logs/m-p/10680#M394</link>
    <description>&lt;P&gt;&lt;STRONG&gt;NOTE:&lt;/STRONG&gt;  This approach works with Splunk 3.4.x (possibly earlier) and in Splunk 4.0+.  The oneshot mode may be preferable if you are only working with Splunk 4.0+.&lt;/P&gt;

&lt;P&gt;I've had success by copying log files into the &lt;CODE&gt;$SPLUNK_HOME/var/spool/splunk&lt;/CODE&gt; folder, which is the default batch mode input.  This works best if your historical log files are already on an indexer or forwarding instance.&lt;/P&gt;

&lt;P&gt;You can add a special &lt;CODE&gt;***SPLUNK***&lt;/CODE&gt; header to the start of your file to give it the original path, if that is important to you.  (You can also set &lt;CODE&gt;index&lt;/CODE&gt;, &lt;CODE&gt;sourcetype&lt;/CODE&gt;, and &lt;CODE&gt;host&lt;/CODE&gt;.)&lt;/P&gt;

&lt;P&gt;You could for example, load old mail logs from your &lt;CODE&gt;old&lt;/CODE&gt; directory with a set of commands like so:&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;(echo '***SPLUNK**** source=/var/log/mail'; zcat /var/log/old/mail*.gz) &amp;gt; $SPLUNK_HOME/var/spool/splunk/mail.log&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;This uses your shell to match files with simmilar names and copy them to a single location.  And you can use other simple shell scripting techniques to make this a simple or complicated as you need.  You can also do some data throttling to keep from overwhelming your indexer with some &lt;CODE&gt;sleep&lt;/CODE&gt; commands.  Yeah, not a very high-tech solution, but it can work.)&lt;/P&gt;

&lt;P&gt;Be careful when loading event that are more than a year old when missing the year portion of the event's timestamp.  (For example, syslog files often don't show the year.)  One way around this problem is put a timestamp in the filename (or at least a 4 digit year portion of the timestamp).  Most often, splunk will recognize this and load the events with the correct date.  But it's best to keep an eye out for this problem.&lt;/P&gt;

&lt;P&gt;You may also want to consider using a separate index for loading historical data.  Splunk 4.0+ handles wide date ranges much better than earlier versions, but there could still be advantages to using this approach even with Splunk 4.0.  For example, if you aren't sure you have you have your &lt;CODE&gt;props.conf&lt;/CODE&gt; indexing settings setup correctly yet (timestamp parsing, sourcetype matching, etc), then loading into an independent (and potentially throw-away) index could pay of big time in terms of cleanup time (especially compared to manually finding and deleting incorrectly indexed events.)  You can always merge buckets from your temporary index into your desired destination index.&lt;/P&gt;</description>
    <pubDate>Tue, 06 Apr 2010 03:33:33 GMT</pubDate>
    <dc:creator>Lowell</dc:creator>
    <dc:date>2010-04-06T03:33:33Z</dc:date>
    <item>
      <title>What is the best way to load archived logs?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/What-is-the-best-way-to-load-archived-logs/m-p/10677#M391</link>
      <description>&lt;P&gt;I have a large archive of old data i want to load while also loading new real-time data. &lt;/P&gt;

&lt;P&gt;What is the most efficient way to load archived data?
I see batch, one-shot, and monitor. I want to make sure that i dont impact loading new new real-time data.&lt;/P&gt;</description>
      <pubDate>Mon, 29 Mar 2010 23:52:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/What-is-the-best-way-to-load-archived-logs/m-p/10677#M391</guid>
      <dc:creator>Erik_Swan</dc:creator>
      <dc:date>2010-03-29T23:52:47Z</dc:date>
    </item>
    <item>
      <title>Re: What is the best way to load archived logs?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/What-is-the-best-way-to-load-archived-logs/m-p/10678#M392</link>
      <description>&lt;P&gt;Splunk has a configuration-free input type called oneshot that's ideal for this task.&lt;/P&gt;

&lt;P&gt;From the UI, it is labeled as "Manager &amp;gt;&amp;gt; Data inputs &amp;gt;&amp;gt; Files &amp;amp; Directories &amp;gt;&amp;gt; Add New &amp;gt;&amp;gt; Index a file on the Splunk server" and from the CLI it's invoked as "splunk add oneshot  [-source sourcename] [-sourcetype sourcetype]"&lt;/P&gt;

&lt;P&gt;When added, the input begins immediately regardless of whether Splunk has seen this particular file before.&lt;/P&gt;

&lt;P&gt;The inputs can be tracked via the REST management API like:&lt;/P&gt;

&lt;P&gt;wget &lt;A href="https://localhost:8089/services/data/inputs/oneshot" rel="nofollow"&gt;https://localhost:8089/services/data/inputs/oneshot&lt;/A&gt; --no-check-certificate --user admin --password changeme -O -&lt;/P&gt;

&lt;P&gt;Note that oneshot input can only load files (including archives). To load full directories, oneshot should be called per file in the directory.&lt;/P&gt;</description>
      <pubDate>Mon, 29 Mar 2010 23:59:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/What-is-the-best-way-to-load-archived-logs/m-p/10678#M392</guid>
      <dc:creator>Stephen_Sorkin</dc:creator>
      <dc:date>2010-03-29T23:59:23Z</dc:date>
    </item>
    <item>
      <title>Re: What is the best way to load archived logs?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/What-is-the-best-way-to-load-archived-logs/m-p/10679#M393</link>
      <description>&lt;P&gt;It may be useful to note that &lt;CODE&gt;oneshot&lt;/CODE&gt; will also let you specify &lt;CODE&gt;-host&lt;/CODE&gt; and &lt;CODE&gt;-index&lt;/CODE&gt; parameters as well.&lt;/P&gt;</description>
      <pubDate>Tue, 30 Mar 2010 12:19:27 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/What-is-the-best-way-to-load-archived-logs/m-p/10679#M393</guid>
      <dc:creator>gkanapathy</dc:creator>
      <dc:date>2010-03-30T12:19:27Z</dc:date>
    </item>
    <item>
      <title>Re: What is the best way to load archived logs?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/What-is-the-best-way-to-load-archived-logs/m-p/10680#M394</link>
      <description>&lt;P&gt;&lt;STRONG&gt;NOTE:&lt;/STRONG&gt;  This approach works with Splunk 3.4.x (possibly earlier) and in Splunk 4.0+.  The oneshot mode may be preferable if you are only working with Splunk 4.0+.&lt;/P&gt;

&lt;P&gt;I've had success by copying log files into the &lt;CODE&gt;$SPLUNK_HOME/var/spool/splunk&lt;/CODE&gt; folder, which is the default batch mode input.  This works best if your historical log files are already on an indexer or forwarding instance.&lt;/P&gt;

&lt;P&gt;You can add a special &lt;CODE&gt;***SPLUNK***&lt;/CODE&gt; header to the start of your file to give it the original path, if that is important to you.  (You can also set &lt;CODE&gt;index&lt;/CODE&gt;, &lt;CODE&gt;sourcetype&lt;/CODE&gt;, and &lt;CODE&gt;host&lt;/CODE&gt;.)&lt;/P&gt;

&lt;P&gt;You could for example, load old mail logs from your &lt;CODE&gt;old&lt;/CODE&gt; directory with a set of commands like so:&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;(echo '***SPLUNK**** source=/var/log/mail'; zcat /var/log/old/mail*.gz) &amp;gt; $SPLUNK_HOME/var/spool/splunk/mail.log&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;This uses your shell to match files with simmilar names and copy them to a single location.  And you can use other simple shell scripting techniques to make this a simple or complicated as you need.  You can also do some data throttling to keep from overwhelming your indexer with some &lt;CODE&gt;sleep&lt;/CODE&gt; commands.  Yeah, not a very high-tech solution, but it can work.)&lt;/P&gt;

&lt;P&gt;Be careful when loading event that are more than a year old when missing the year portion of the event's timestamp.  (For example, syslog files often don't show the year.)  One way around this problem is put a timestamp in the filename (or at least a 4 digit year portion of the timestamp).  Most often, splunk will recognize this and load the events with the correct date.  But it's best to keep an eye out for this problem.&lt;/P&gt;

&lt;P&gt;You may also want to consider using a separate index for loading historical data.  Splunk 4.0+ handles wide date ranges much better than earlier versions, but there could still be advantages to using this approach even with Splunk 4.0.  For example, if you aren't sure you have you have your &lt;CODE&gt;props.conf&lt;/CODE&gt; indexing settings setup correctly yet (timestamp parsing, sourcetype matching, etc), then loading into an independent (and potentially throw-away) index could pay of big time in terms of cleanup time (especially compared to manually finding and deleting incorrectly indexed events.)  You can always merge buckets from your temporary index into your desired destination index.&lt;/P&gt;</description>
      <pubDate>Tue, 06 Apr 2010 03:33:33 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/What-is-the-best-way-to-load-archived-logs/m-p/10680#M394</guid>
      <dc:creator>Lowell</dc:creator>
      <dc:date>2010-04-06T03:33:33Z</dc:date>
    </item>
    <item>
      <title>Re: What is the best way to load archived logs?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/What-is-the-best-way-to-load-archived-logs/m-p/10681#M395</link>
      <description>&lt;P&gt;Let's say you have a directory full of large log files that you want to feed to the Splunk oneshot command w/ a delay, the bash script below is an efficient and clean way to do this.&lt;/P&gt;

&lt;P&gt;For example, if you had &lt;CODE&gt;someapp2011-01-01.log&lt;/CODE&gt; through &lt;CODE&gt;someapp2011-12-31.log&lt;/CODE&gt; in &lt;CODE&gt;/some/directory&lt;/CODE&gt; and you want to feed each file every 5 minutes:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;#!/bin/bash
SPLUNK_HOME=/your/path/to/splunk
for f in $(ls /some/directory/*.log);
do
 echo "Processing $f file..."
 $SPLUNK_HOME/bin/splunk add oneshot "/some/directory/$f" -index someindex -sourcetype sometype -host somehost -auth admin:changeme
 echo "Finished feeding $f... pausing 5 minutes."
 sleep 300
done
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 24 Feb 2012 19:31:00 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/What-is-the-best-way-to-load-archived-logs/m-p/10681#M395</guid>
      <dc:creator>mcluver</dc:creator>
      <dc:date>2012-02-24T19:31:00Z</dc:date>
    </item>
    <item>
      <title>Re: What is the best way to load archived logs?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/What-is-the-best-way-to-load-archived-logs/m-p/10682#M396</link>
      <description>&lt;P&gt;To oneshot add an entire directory recursively, in powershell, the following worked for me&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;forfiles /p D:\tutorialdata /s /c "cmd /c if @isdir==FALSE D:\Splunk\bin\splunk.exe add oneshot @PATH"
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 31 Mar 2014 15:48:19 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/What-is-the-best-way-to-load-archived-logs/m-p/10682#M396</guid>
      <dc:creator>neiljpeterson</dc:creator>
      <dc:date>2014-03-31T15:48:19Z</dc:date>
    </item>
  </channel>
</rss>

