<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Data Import Question in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Data-Import-Question/m-p/195093#M56266</link>
    <description>&lt;P&gt;So I have a log file that has a unique format similar to the following&lt;/P&gt;

&lt;P&gt;==============================================&lt;/P&gt;

&lt;H1&gt;&lt;TIMESTATMP&gt;&lt;/TIMESTATMP&gt;&lt;/H1&gt;

&lt;P&gt;==============Summary=========================&lt;BR /&gt;
Total Memory: 8834798374&lt;BR /&gt;
Cached: 39399&lt;BR /&gt;
...&lt;/P&gt;

&lt;P&gt;===============up time=========================&lt;BR /&gt;
19:00:20 up 5 days, 8:53&lt;/P&gt;

&lt;P&gt;=================memory========================&lt;BR /&gt;
USER PID COMMAND MEM%&lt;BR /&gt;
root 2919 /bash    9&lt;BR /&gt;
root 2023 top     14&lt;/P&gt;

&lt;P&gt;Based on what I've read in the documentation and the posts, it looks like I can either write a very sophisticated sourcetype or just write a separate pre-processing script to properly parse the data and output it into a friendlier format for the engine. My question is am I missing something or are these my only realistic options?&lt;/P&gt;</description>
    <pubDate>Fri, 03 Jan 2014 16:19:07 GMT</pubDate>
    <dc:creator>SteveWu</dc:creator>
    <dc:date>2014-01-03T16:19:07Z</dc:date>
    <item>
      <title>Data Import Question</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Data-Import-Question/m-p/195093#M56266</link>
      <description>&lt;P&gt;So I have a log file that has a unique format similar to the following&lt;/P&gt;

&lt;P&gt;==============================================&lt;/P&gt;

&lt;H1&gt;&lt;TIMESTATMP&gt;&lt;/TIMESTATMP&gt;&lt;/H1&gt;

&lt;P&gt;==============Summary=========================&lt;BR /&gt;
Total Memory: 8834798374&lt;BR /&gt;
Cached: 39399&lt;BR /&gt;
...&lt;/P&gt;

&lt;P&gt;===============up time=========================&lt;BR /&gt;
19:00:20 up 5 days, 8:53&lt;/P&gt;

&lt;P&gt;=================memory========================&lt;BR /&gt;
USER PID COMMAND MEM%&lt;BR /&gt;
root 2919 /bash    9&lt;BR /&gt;
root 2023 top     14&lt;/P&gt;

&lt;P&gt;Based on what I've read in the documentation and the posts, it looks like I can either write a very sophisticated sourcetype or just write a separate pre-processing script to properly parse the data and output it into a friendlier format for the engine. My question is am I missing something or are these my only realistic options?&lt;/P&gt;</description>
      <pubDate>Fri, 03 Jan 2014 16:19:07 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Data-Import-Question/m-p/195093#M56266</guid>
      <dc:creator>SteveWu</dc:creator>
      <dc:date>2014-01-03T16:19:07Z</dc:date>
    </item>
    <item>
      <title>Re: Data Import Question</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Data-Import-Question/m-p/195094#M56267</link>
      <description>&lt;P&gt;You can pre-process the data with a script, sure. But Splunk takes care of things like restarts and doesn't duplicate data, etc. These things can be a PITA to do in a script. I don't think that your sourcetype needs to be that difficult. There are two main tasks: (1) index the data and (2) set up the fields, etc.&lt;/P&gt;

&lt;P&gt;First, create a test index. Use Data Preview to bring in a sample of the data. You will be able to set the event boundaries (line-breaking) and timestamps with Data Preview. Data Preview will create the sourcetype settings in &lt;CODE&gt;props.conf&lt;/CODE&gt; that you need to index the data. Put the &lt;CODE&gt;props.conf&lt;/CODE&gt; stanza on the indexer(s). Create the stanza in &lt;CODE&gt;inputs.conf&lt;/CODE&gt; to start reading/indexing the real data.&lt;/P&gt;

&lt;P&gt;Second, create the fields that you need. This will involve regular expressions and may be a bit more tricky. On the other hand, you can change the field extractions in production, without having to re-index any data. You can use the Interactive Field Extractor to help, although it may not be able to deal with all of the fields.&lt;/P&gt;

&lt;P&gt;Finally, show a couple of sample events on the forum and people will help you write the field extractions.&lt;/P&gt;

&lt;P&gt;The biggest reason that people get into difficulties, is that they load their data into production before testing it for a day or two in a test index. If you play with your new data for a bit, and even write a few searches, I think you will have a much better idea of what you want - even if you decide to write that pre-processing script in the end.&lt;/P&gt;</description>
      <pubDate>Fri, 03 Jan 2014 18:28:48 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Data-Import-Question/m-p/195094#M56267</guid>
      <dc:creator>lguinn2</dc:creator>
      <dc:date>2014-01-03T18:28:48Z</dc:date>
    </item>
  </channel>
</rss>

