<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: I have a question about raw data and index data in indexer. Please help me to understand. in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/I-have-a-question-about-raw-data-and-index-data-in-indexer/m-p/227916#M44418</link>
    <description>&lt;P&gt;Seksit-&lt;BR /&gt;
I think you should understand how Splunk processes and stores files. That should lead to a better understanding of whats going on and how it relates to your use case.&lt;/P&gt;

&lt;P&gt;When you 'monitor' a file or directory, irregardless of if the file is manually copied or generated by an app, Splunk will read the files and index them. The indexing process take the 'raw' data and reads it in and performs various operations such as assigning sourcetypes, placing it in a defined index, extracting timestamps and hostnames. Files are written to buckets(files on disk) on the indexers, and associated metadata is created and stored with the buckets. When you search in Splunk, this is what is searched. Typically the indexed data is compressed as white space and unneeded characters are removed.&lt;/P&gt;

&lt;P&gt;So with that in mind, once you have indexed the monitored files, they can be deleted or rotated out. Of course, you need to consider your retention and legal compliance policies if you can delete the files.&lt;/P&gt;

&lt;P&gt;On another note, compressed files and Splunk are a sticky point. Splunk's unarchiving tool is single threaded. So when Splunk encounters a tar/zip/gzip/tgz file, it has to extract it before it can read it. If you are dealing with a lot of files at once, this will create a slow down on your system and use more memory. &lt;/P&gt;</description>
    <pubDate>Mon, 18 Jan 2016 05:33:16 GMT</pubDate>
    <dc:creator>esix_splunk</dc:creator>
    <dc:date>2016-01-18T05:33:16Z</dc:date>
    <item>
      <title>I have a question about raw data and index data in indexer. Please help me to understand.</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/I-have-a-question-about-raw-data-and-index-data-in-indexer/m-p/227911#M44413</link>
      <description>&lt;P&gt;Hi friend,&lt;/P&gt;

&lt;P&gt;I've a server and already install splunk. This server has many log file (tar.gz) that import from another server.&lt;/P&gt;

&lt;P&gt;I would like to use splunk monitor this log via directory such as /var/log/2016/01,  /var/log/2016/02.&lt;/P&gt;

&lt;P&gt;If splunk monitoring the directory, splunk will store the raw data (double raw data) from log file?&lt;/P&gt;

&lt;P&gt;Please help me to understand it.&lt;/P&gt;

&lt;P&gt;Thank you &lt;/P&gt;

&lt;P&gt;sorry for my english&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jan 2016 03:36:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/I-have-a-question-about-raw-data-and-index-data-in-indexer/m-p/227911#M44413</guid>
      <dc:creator>seksit</dc:creator>
      <dc:date>2016-01-18T03:36:47Z</dc:date>
    </item>
    <item>
      <title>Re: I have a question about raw data and index data in indexer. Please help me to understand.</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/I-have-a-question-about-raw-data-and-index-data-in-indexer/m-p/227912#M44414</link>
      <description>&lt;P&gt;Sorry but what you mean by double raw data? Splunk picks up files from the directory and indexes it. It won't pick up the same file twice;Splunk checks first few bytes of file to see if it was already indexed&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jan 2016 04:10:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/I-have-a-question-about-raw-data-and-index-data-in-indexer/m-p/227912#M44414</guid>
      <dc:creator>renjith_nair</dc:creator>
      <dc:date>2016-01-18T04:10:23Z</dc:date>
    </item>
    <item>
      <title>Re: I have a question about raw data and index data in indexer. Please help me to understand.</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/I-have-a-question-about-raw-data-and-index-data-in-indexer/m-p/227913#M44415</link>
      <description>&lt;P&gt;Hi renjith.nair Thank you for your advice.&lt;/P&gt;

&lt;P&gt;That log file import by manual don't use splunk forwarder (copy from external HDD). &lt;/P&gt;

&lt;P&gt;If splunk monitor directory splunk will store raw data in splunk directory?&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jan 2016 04:23:18 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/I-have-a-question-about-raw-data-and-index-data-in-indexer/m-p/227913#M44415</guid>
      <dc:creator>seksit</dc:creator>
      <dc:date>2016-01-18T04:23:18Z</dc:date>
    </item>
    <item>
      <title>Re: I have a question about raw data and index data in indexer. Please help me to understand.</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/I-have-a-question-about-raw-data-and-index-data-in-indexer/m-p/227914#M44416</link>
      <description>&lt;P&gt;If you have configured Splunk to monitor a directory, Splunk picks up the files irrespective of whether it's copied manually or generated by some apps. Splunk checks the first bytes to check if the file was indexed previously and stores the events. If you want to exclude some files from a directory, that's also possible.&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jan 2016 04:38:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/I-have-a-question-about-raw-data-and-index-data-in-indexer/m-p/227914#M44416</guid>
      <dc:creator>renjith_nair</dc:creator>
      <dc:date>2016-01-18T04:38:57Z</dc:date>
    </item>
    <item>
      <title>Re: I have a question about raw data and index data in indexer. Please help me to understand.</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/I-have-a-question-about-raw-data-and-index-data-in-indexer/m-p/227915#M44417</link>
      <description>&lt;P&gt;Hi seksit,&lt;/P&gt;

&lt;P&gt;In your case, splunk will index the data from the log files ( present in the directory such as /var/log/2016/01, /var/log/2016/02) in the splunk index directory &lt;CODE&gt;$SPLUNK_HOME/var/lib/splunk/&lt;/CODE&gt; in compressed format. &lt;/P&gt;

&lt;P&gt;In simple words, this is a copy of the source data but the size and format of the data is not same. Splunk stores the data in a series of index files.&lt;/P&gt;

&lt;P&gt;For more read on how splunk indexes, please refer &lt;A href="http://docs.splunk.com/Documentation/Splunk/6.3.1511/Indexer/HowSplunkstoresindexes"&gt;http://docs.splunk.com/Documentation/Splunk/6.3.1511/Indexer/HowSplunkstoresindexes&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Hope this solves your queries to some extend.&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jan 2016 04:42:42 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/I-have-a-question-about-raw-data-and-index-data-in-indexer/m-p/227915#M44417</guid>
      <dc:creator>Murali2888</dc:creator>
      <dc:date>2016-01-18T04:42:42Z</dc:date>
    </item>
    <item>
      <title>Re: I have a question about raw data and index data in indexer. Please help me to understand.</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/I-have-a-question-about-raw-data-and-index-data-in-indexer/m-p/227916#M44418</link>
      <description>&lt;P&gt;Seksit-&lt;BR /&gt;
I think you should understand how Splunk processes and stores files. That should lead to a better understanding of whats going on and how it relates to your use case.&lt;/P&gt;

&lt;P&gt;When you 'monitor' a file or directory, irregardless of if the file is manually copied or generated by an app, Splunk will read the files and index them. The indexing process take the 'raw' data and reads it in and performs various operations such as assigning sourcetypes, placing it in a defined index, extracting timestamps and hostnames. Files are written to buckets(files on disk) on the indexers, and associated metadata is created and stored with the buckets. When you search in Splunk, this is what is searched. Typically the indexed data is compressed as white space and unneeded characters are removed.&lt;/P&gt;

&lt;P&gt;So with that in mind, once you have indexed the monitored files, they can be deleted or rotated out. Of course, you need to consider your retention and legal compliance policies if you can delete the files.&lt;/P&gt;

&lt;P&gt;On another note, compressed files and Splunk are a sticky point. Splunk's unarchiving tool is single threaded. So when Splunk encounters a tar/zip/gzip/tgz file, it has to extract it before it can read it. If you are dealing with a lot of files at once, this will create a slow down on your system and use more memory. &lt;/P&gt;</description>
      <pubDate>Mon, 18 Jan 2016 05:33:16 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/I-have-a-question-about-raw-data-and-index-data-in-indexer/m-p/227916#M44418</guid>
      <dc:creator>esix_splunk</dc:creator>
      <dc:date>2016-01-18T05:33:16Z</dc:date>
    </item>
    <item>
      <title>Re: I have a question about raw data and index data in indexer. Please help me to understand.</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/I-have-a-question-about-raw-data-and-index-data-in-indexer/m-p/227917#M44419</link>
      <description>&lt;P&gt;That's my understanding and that's what I was trying to convey to seksit's question as well. The question was not asked by me but seksit &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jan 2016 05:39:48 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/I-have-a-question-about-raw-data-and-index-data-in-indexer/m-p/227917#M44419</guid>
      <dc:creator>renjith_nair</dc:creator>
      <dc:date>2016-01-18T05:39:48Z</dc:date>
    </item>
    <item>
      <title>Re: I have a question about raw data and index data in indexer. Please help me to understand.</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/I-have-a-question-about-raw-data-and-index-data-in-indexer/m-p/227918#M44420</link>
      <description>&lt;P&gt;Updated, misread the first commen!&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jan 2016 05:45:02 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/I-have-a-question-about-raw-data-and-index-data-in-indexer/m-p/227918#M44420</guid>
      <dc:creator>esix_splunk</dc:creator>
      <dc:date>2016-01-18T05:45:02Z</dc:date>
    </item>
  </channel>
</rss>

