<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Part-time on-demand Indexing in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Part-time-on-demand-Indexing/m-p/106257#M22376</link>
    <description>&lt;P&gt;According to the docs for &lt;CODE&gt;transforms.conf&lt;/CODE&gt;, &lt;CODE&gt;date_hour&lt;/CODE&gt; is not a supported field for &lt;CODE&gt;SOURCE_KEY&lt;/CODE&gt;.  So, I'm quite confident it is computed too late.  Agreed that dealing with epoch time would be insanely difficult.&lt;/P&gt;</description>
    <pubDate>Mon, 14 Nov 2011 02:54:50 GMT</pubDate>
    <dc:creator>dwaddle</dc:creator>
    <dc:date>2011-11-14T02:54:50Z</dc:date>
    <item>
      <title>Part-time on-demand Indexing</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Part-time-on-demand-Indexing/m-p/106249#M22368</link>
      <description>&lt;P&gt;Currently, I have a forwarder monitoring a directory of files that are being logged in real time. My indexer is receiving all of the latest info from the forwarder as expected.&lt;/P&gt;

&lt;P&gt;I now have the following requirement requested of me:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;Only index the events being logged in real time to these files between the hours of 8:00p and 6:00a each night.&lt;/LI&gt;
&lt;LI&gt;Allow all events logged between those hours each night to be searchable at any time of the day.&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;My thoughts are to setup the followTail = True option and just turn off the forwarder when I don't need to log real time events (e.g. between the hours of 6:01a and 7:59p)&lt;/P&gt;

&lt;P&gt;Does anyone else have a better idea?&lt;/P&gt;</description>
      <pubDate>Tue, 08 Nov 2011 14:50:38 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Part-time-on-demand-Indexing/m-p/106249#M22368</guid>
      <dc:creator>maverick</dc:creator>
      <dc:date>2011-11-08T14:50:38Z</dc:date>
    </item>
    <item>
      <title>Re: Part-time on-demand Indexing</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Part-time-on-demand-Indexing/m-p/106250#M22369</link>
      <description>&lt;P&gt;Hmm, I'm not 100% sure that followTail=1 will be honoured in the way one may think. The following is from the &lt;A href="http://docs.splunk.com/Documentation/Splunk/4.2.3/Admin/inputsconf"&gt;docs for inputs.conf&lt;/A&gt;;&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;followTail = [0|1]
* Determines whether to start monitoring at the beginning of a file or at the end (and then index all events 
  that come in after that). 
* If set to 1, monitoring begins at the end of the file (like tail -f).
* If set to 0, Splunk will always start at the beginning of the file. 
* This only applies to files the first time Splunk sees them. After that, Splunk's internal file position 
  records keep track of the file. 
* Defaults to 0.
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;You have a few options I guess, some of which may not be feasible;&lt;/P&gt;

&lt;P&gt;a) prevent the log files from being written to during the daytime (6am-8pm). Or possibly write to a daytime directory which is not being monitored. Not very neat solution.&lt;/P&gt;

&lt;P&gt;b) stop the forwarder as you suggested, and delete (parts of) the fishbucket, which should give your forwarder a convenient case of amnesia, thus allowing for the followTail=1 to work again. Depending on your setup, i.e. what else is being monitored by the forwarder, this is perhaps not so easy a/o may produce strange results. Then again, it may work just fine.&lt;/P&gt;

&lt;P&gt;c) &lt;A href="http://docs.splunk.com/Documentation/Splunk/4.2.3/Deploy/Routeandfilterdatad#Filter_event_data_and_send_to_queues"&gt;Route all events originating during the day to the nullQueue&lt;/A&gt;, so they do not get indexed. You would have to craft a regex to match event timestamps for 6am-8pm, but I'm not sure what fields are available to you at this part of the process. Would probably be the neatest way of doing it, but I haven't tried anything similar, so it may not work at all.&lt;/P&gt;

&lt;P&gt;UPDATE:&lt;/P&gt;

&lt;P&gt;d) as &lt;CODE&gt;_d_&lt;/CODE&gt; pointed out, you could work with &lt;CODE&gt;ignoreOlderThan&lt;/CODE&gt; to control which files will be read by the monitor. The option here would then be to &lt;BR /&gt;
i) ensure all logs are rotated at 7.59PM&lt;BR /&gt;
ii) use &lt;CODE&gt;ignoreOlderThan=1m&lt;/CODE&gt; for the directory monitor stanza&lt;BR /&gt;
iii) start the forwarder through cron or whatever at 8.01PM&lt;BR /&gt;
iv) stop the forwarder through cron or whatever at 6.00AM&lt;/P&gt;

&lt;P&gt;this ensures that the events from between 6AM-8PM will not get indexed, since &lt;CODE&gt;ignoreOlderThan&lt;/CODE&gt; goes by the modtime of the file.&lt;/P&gt;

&lt;P&gt;Hope this helps, or at least serves as inspiration to somebody more knowledgeable than me to work out the exact steps to take.&lt;/P&gt;

&lt;P&gt;regards,&lt;/P&gt;

&lt;P&gt;kristian&lt;/P&gt;</description>
      <pubDate>Sat, 12 Nov 2011 22:33:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Part-time-on-demand-Indexing/m-p/106250#M22369</guid>
      <dc:creator>kristian_kolb</dc:creator>
      <dc:date>2011-11-12T22:33:57Z</dc:date>
    </item>
    <item>
      <title>Re: Part-time on-demand Indexing</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Part-time-on-demand-Indexing/m-p/106251#M22370</link>
      <description>&lt;P&gt;Perhaps a better option than completely turning off the forwarder would be to simply disable that input. The assumption here is that you may need the forwarder to monitor other files. &lt;/P&gt;

&lt;P&gt;I normally pack an &lt;CODE&gt;app.conf&lt;/CODE&gt; and an &lt;CODE&gt;inputs.conf&lt;/CODE&gt; in an (rather conveniently called) input app; both files reside under &lt;CODE&gt;$SPLUNK_HOME/etc/apps/my_input_app/local&lt;/CODE&gt;. The &lt;CODE&gt;inputs.conf&lt;/CODE&gt; contains the monitor stanza that points to where your files reside and other options including &lt;CODE&gt;followTail=1&lt;/CODE&gt; ; the app.conf contains the following:&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;[install]&lt;BR /&gt;
state = enabled&lt;/CODE&gt; &lt;/P&gt;

&lt;P&gt;I would then have a cron job that runs according to your schedule and does the following:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;overwrites or swaps out the normal app.conf with one that has &lt;CODE&gt;state=disabled&lt;/CODE&gt;&lt;/LI&gt;
&lt;LI&gt;restarts forwarder's splunkd &lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;EDIT_1: As Kristian points out the &lt;CODE&gt;followTail=1&lt;/CODE&gt; only applies to files the first time they are picked up. After that, Splunk's internal file position records keep track of the file. This means that the fishbucket files will tell Splunk where it left off an it will pick up the old, unnecessary data as well as real time ones. As i remark below, I would try playing with &lt;CODE&gt;ignoreOlderThan&lt;/CODE&gt; setting (using seconds for better resolution ).&lt;/P&gt;

&lt;P&gt;Hope this helps.&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;&amp;gt; please upvote and accept answer if you find it useful - thanks!&lt;/CODE&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 12 Nov 2011 23:27:02 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Part-time-on-demand-Indexing/m-p/106251#M22370</guid>
      <dc:creator>_d_</dc:creator>
      <dc:date>2011-11-12T23:27:02Z</dc:date>
    </item>
    <item>
      <title>Re: Part-time on-demand Indexing</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Part-time-on-demand-Indexing/m-p/106252#M22371</link>
      <description>&lt;P&gt;Does the enabling/disabling of an app/monitor stanza actually clear the fishbucket for the inputs involved, i.e. wont the forwarder pick up where it left off?&lt;/P&gt;

&lt;P&gt;/k&lt;/P&gt;</description>
      <pubDate>Sun, 13 Nov 2011 01:45:44 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Part-time-on-demand-Indexing/m-p/106252#M22371</guid>
      <dc:creator>kristian_kolb</dc:creator>
      <dc:date>2011-11-13T01:45:44Z</dc:date>
    </item>
    <item>
      <title>Re: Part-time on-demand Indexing</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Part-time-on-demand-Indexing/m-p/106253#M22372</link>
      <description>&lt;P&gt;No, you're right - the fishbucket won't be purged. He can, though, try to play with ignoreOlderThan setting (in minutes or seconds for better resolution ). But, yes, it is not a trivial and requires a lot of testing.&lt;/P&gt;</description>
      <pubDate>Sun, 13 Nov 2011 01:55:42 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Part-time-on-demand-Indexing/m-p/106253#M22372</guid>
      <dc:creator>_d_</dc:creator>
      <dc:date>2011-11-13T01:55:42Z</dc:date>
    </item>
    <item>
      <title>Re: Part-time on-demand Indexing</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Part-time-on-demand-Indexing/m-p/106254#M22373</link>
      <description>&lt;P&gt;I UPDATED my original answer, since you may be on to something here.&lt;/P&gt;</description>
      <pubDate>Sun, 13 Nov 2011 11:39:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Part-time-on-demand-Indexing/m-p/106254#M22373</guid>
      <dc:creator>kristian_kolb</dc:creator>
      <dc:date>2011-11-13T11:39:20Z</dc:date>
    </item>
    <item>
      <title>Re: Part-time on-demand Indexing</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Part-time-on-demand-Indexing/m-p/106255#M22374</link>
      <description>&lt;P&gt;The &lt;CODE&gt;ignoreOlderThan&lt;/CODE&gt; and &lt;CODE&gt;followTail&lt;/CODE&gt; options are definitely interesting and might work.  But it sounds like the most straightforward approach is to have the originating system rotate logs at 20:00 and 06:00.  Or, even hourly, if the system is producing enough logs to justify it.  (And hourly might be easier to configure in something like log4j).  And then use &lt;CODE&gt;blacklist&lt;/CODE&gt; and &lt;CODE&gt;whitelist&lt;/CODE&gt;, both of which are well-known and surprise free.  The other options are complicated enough to worry me about long-term reliability.&lt;/P&gt;

&lt;P&gt;As Kristian mentioned, if you could &lt;CODE&gt;nullQueue&lt;/CODE&gt; this data that would be the most ideal approach and wouldn't require application changes.  From the docs on &lt;CODE&gt;transforms.conf&lt;/CODE&gt;, &lt;CODE&gt;_time&lt;/CODE&gt; is a valid field to use as a &lt;CODE&gt;SOURCE_KEY&lt;/CODE&gt;.  So, &lt;STRONG&gt;in theory&lt;/STRONG&gt;, you could precompute a series of regular expressions expressing periods of 06:01 - 19:59 in &lt;CODE&gt;time_t&lt;/CODE&gt; format for future dates. Such regexes would probably be nontrivial and would need to be maintained for the life of the system to add in in new &lt;CODE&gt;time_t&lt;/CODE&gt; values.  I wouldn't suggest trying this at home.&lt;/P&gt;

&lt;P&gt;If you can't get the providers of the logfiles to do rotation to help you, then I would suggest filing an ER to ask for something like a &lt;CODE&gt;_time_of_day&lt;/CODE&gt; key (In the format of HH:MM:SS.ssssss or similar) that would be usable in &lt;CODE&gt;transforms.conf&lt;/CODE&gt; for the purpose of sending data to the &lt;CODE&gt;nullQueue&lt;/CODE&gt;.&lt;/P&gt;</description>
      <pubDate>Sun, 13 Nov 2011 17:00:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Part-time-on-demand-Indexing/m-p/106255#M22374</guid>
      <dc:creator>dwaddle</dc:creator>
      <dc:date>2011-11-13T17:00:47Z</dc:date>
    </item>
    <item>
      <title>Re: Part-time on-demand Indexing</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Part-time-on-demand-Indexing/m-p/106256#M22375</link>
      <description>&lt;P&gt;Exactly my point regarding the regex for _time - I haven't had time/reason to figure out if date_hour (which is derived from _time) is computed at the parsing stage, or rather if it's computed before the nullQueue routing would take place. Dealing directly with epoch time is more likely than not going to give headaches in the long run.&lt;/P&gt;

&lt;P&gt;/k&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 10:05:55 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Part-time-on-demand-Indexing/m-p/106256#M22375</guid>
      <dc:creator>kristian_kolb</dc:creator>
      <dc:date>2020-09-28T10:05:55Z</dc:date>
    </item>
    <item>
      <title>Re: Part-time on-demand Indexing</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Part-time-on-demand-Indexing/m-p/106257#M22376</link>
      <description>&lt;P&gt;According to the docs for &lt;CODE&gt;transforms.conf&lt;/CODE&gt;, &lt;CODE&gt;date_hour&lt;/CODE&gt; is not a supported field for &lt;CODE&gt;SOURCE_KEY&lt;/CODE&gt;.  So, I'm quite confident it is computed too late.  Agreed that dealing with epoch time would be insanely difficult.&lt;/P&gt;</description>
      <pubDate>Mon, 14 Nov 2011 02:54:50 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Part-time-on-demand-Indexing/m-p/106257#M22376</guid>
      <dc:creator>dwaddle</dc:creator>
      <dc:date>2011-11-14T02:54:50Z</dc:date>
    </item>
  </channel>
</rss>

