<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Index based on Raw Data? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Index-based-on-Raw-Data/m-p/60775#M12087</link>
    <description>&lt;P&gt;Can you paste the exact search you're using? &lt;/P&gt;

&lt;P&gt;In a nutshell, if Splunk is having to read all the data off disk,  then the most likely reason is that your searchterms are either not in the initial search clause...   ie you're doing something like &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;`sourcetype=foo | &amp;lt;some other command(s)&amp;gt; | search &amp;lt;searchterms&amp;gt;`
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Then there are a lot of other strange possibilities, like, to take a random example,  you could have a &lt;CODE&gt;foo="bar"&lt;/CODE&gt; term, and you could have it in the initial search clause, but then for some reason something could have configured INDEXED_VALUE in fields.conf to be false for that field.  &lt;/P&gt;

&lt;P&gt;In any event, without seeing the search it's hard to speculate on the answer, but there most likely is an answer, and it's probably fixable. &lt;/P&gt;</description>
    <pubDate>Sat, 26 May 2012 05:11:38 GMT</pubDate>
    <dc:creator>sideview</dc:creator>
    <dc:date>2012-05-26T05:11:38Z</dc:date>
    <item>
      <title>Index based on Raw Data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Index-based-on-Raw-Data/m-p/60773#M12085</link>
      <description>&lt;P&gt;I run a python script to get data into an indexer from mdb files, this basically creates events with source, host, sourcetype and raw data.  We are almost always concerned with only reporting on the raw data.  millions of rows are generated in csv format and I have created custom fields within the raw data with splunk having no problem identifying those.&lt;/P&gt;

&lt;P&gt;The issue is doing a search to find 2 distinct values in those 12 million + rows takes forever, it parses all 12 million rows before returning the values. &lt;/P&gt;</description>
      <pubDate>Fri, 25 May 2012 16:51:50 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Index-based-on-Raw-Data/m-p/60773#M12085</guid>
      <dc:creator>Cuyose</dc:creator>
      <dc:date>2012-05-25T16:51:50Z</dc:date>
    </item>
    <item>
      <title>Re: Index based on Raw Data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Index-based-on-Raw-Data/m-p/60774#M12086</link>
      <description>&lt;P&gt;What is the keyword you are searching and exact query?   How often does it exist in the raw data?&lt;/P&gt;</description>
      <pubDate>Fri, 25 May 2012 18:12:42 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Index-based-on-Raw-Data/m-p/60774#M12086</guid>
      <dc:creator>Simeon</dc:creator>
      <dc:date>2012-05-25T18:12:42Z</dc:date>
    </item>
    <item>
      <title>Re: Index based on Raw Data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Index-based-on-Raw-Data/m-p/60775#M12087</link>
      <description>&lt;P&gt;Can you paste the exact search you're using? &lt;/P&gt;

&lt;P&gt;In a nutshell, if Splunk is having to read all the data off disk,  then the most likely reason is that your searchterms are either not in the initial search clause...   ie you're doing something like &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;`sourcetype=foo | &amp;lt;some other command(s)&amp;gt; | search &amp;lt;searchterms&amp;gt;`
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Then there are a lot of other strange possibilities, like, to take a random example,  you could have a &lt;CODE&gt;foo="bar"&lt;/CODE&gt; term, and you could have it in the initial search clause, but then for some reason something could have configured INDEXED_VALUE in fields.conf to be false for that field.  &lt;/P&gt;

&lt;P&gt;In any event, without seeing the search it's hard to speculate on the answer, but there most likely is an answer, and it's probably fixable. &lt;/P&gt;</description>
      <pubDate>Sat, 26 May 2012 05:11:38 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Index-based-on-Raw-Data/m-p/60775#M12087</guid>
      <dc:creator>sideview</dc:creator>
      <dc:date>2012-05-26T05:11:38Z</dc:date>
    </item>
    <item>
      <title>Re: Index based on Raw Data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Index-based-on-Raw-Data/m-p/60776#M12088</link>
      <description>&lt;P&gt;index = perfdata | dedup LR_Run_Name&lt;/P&gt;

&lt;P&gt;Where LR_Run_Name is in the raw data and we extracted the field value.  I checked the fields.conf and there are no indexed values in there on these fields we extracted, they all look like this&lt;/P&gt;

&lt;P&gt;Out of millions of rows there are only a handful of unique values in the indexed raw data.&lt;/P&gt;

&lt;P&gt;[sourcetype]&lt;BR /&gt;
INDEXED = True&lt;BR /&gt;
INDEXED_VALUE = False&lt;/P&gt;

&lt;P&gt;Would I add something like?&lt;BR /&gt;
[LR_Run_Name]&lt;BR /&gt;
INDEXED = True&lt;BR /&gt;
INDEXED_VALUE = False&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 11:52:35 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Index-based-on-Raw-Data/m-p/60776#M12088</guid>
      <dc:creator>Cuyose</dc:creator>
      <dc:date>2020-09-28T11:52:35Z</dc:date>
    </item>
    <item>
      <title>Re: Index based on Raw Data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Index-based-on-Raw-Data/m-p/515761#M87338</link>
      <description>&lt;P&gt;Could you please help me out how to search multiple words from raw data&lt;/P&gt;</description>
      <pubDate>Mon, 24 Aug 2020 12:00:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Index-based-on-Raw-Data/m-p/515761#M87338</guid>
      <dc:creator>Supriya</dc:creator>
      <dc:date>2020-08-24T12:00:53Z</dc:date>
    </item>
  </channel>
</rss>

