<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Performing procedural search on daily indexed data, appending results as they appear. in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Performing-procedural-search-on-daily-indexed-data-appending/m-p/483012#M82741</link>
    <description>&lt;P&gt;Cool, thanks for the response! I have a follow up question:&lt;/P&gt;

&lt;P&gt;Can I leverage &lt;CODE&gt;monitor&lt;/CODE&gt; to make sure that new incoming logs are only searched once? At the moment, the search restarts and reads every indexed log. This becomes a problem when the amount of logs for the month grows to a very large size; the search begins to take many hours to complete.&lt;/P&gt;</description>
    <pubDate>Thu, 27 Feb 2020 00:28:20 GMT</pubDate>
    <dc:creator>jacksonmcarthur</dc:creator>
    <dc:date>2020-02-27T00:28:20Z</dc:date>
    <item>
      <title>Performing procedural search on daily indexed data, appending results as they appear.</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Performing-procedural-search-on-daily-indexed-data-appending/m-p/483010#M82739</link>
      <description>&lt;H2&gt;Just looking for the best practice solution to the below problem. I'm pretty new to Splunk, so I feel the answer might be quite simple.&lt;/H2&gt;

&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;The problem:&lt;/EM&gt;&lt;/STRONG&gt;&lt;BR /&gt;
Currently, a million logs come into a location daily. At the end of every month, these logs are indexed, and a report based on search results is created. Since thirty million logs are all being processed in a block, it takes a lot of time to index them - and an even longer time to search.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;The fix:&lt;/EM&gt;&lt;/STRONG&gt;&lt;BR /&gt;
A single search runs over the course of the month, indexing new logs as they arrive, searching them, and appending all results in one large XML, or CSV, or similar.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;The implementations:&lt;/EM&gt;&lt;/STRONG&gt;&lt;BR /&gt;
• Set an alert that triggers on detecting new files to be indexed. I.e: if not already indexed, index them and immediately run the search on these new files, then append resulting search data to file.&lt;/P&gt;

&lt;P&gt;• Run &lt;CODE&gt;tscollect&lt;/CODE&gt; daily on data that is already not indexed in a &lt;CODE&gt;.tsidx&lt;/CODE&gt; to collect a relevant subset of data from raw, then process it in a block to create a report at end-of-month using the quicker &lt;CODE&gt;tstats&lt;/CODE&gt;.&lt;/P&gt;

&lt;P&gt;• Simply set a scheduled search (searching last 24h) to run daily after the logs are indexed, appending results to file.&lt;/P&gt;

&lt;P&gt;&lt;EM&gt;Thanks for the help!&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 26 Feb 2020 02:32:15 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Performing-procedural-search-on-daily-indexed-data-appending/m-p/483010#M82739</guid>
      <dc:creator>jacksonmcarthur</dc:creator>
      <dc:date>2020-02-26T02:32:15Z</dc:date>
    </item>
    <item>
      <title>Re: Performing procedural search on daily indexed data, appending results as they appear.</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Performing-procedural-search-on-daily-indexed-data-appending/m-p/483011#M82740</link>
      <description>&lt;P&gt;eh.!? why you want to do all these complicated logic of triggering detection? &lt;/P&gt;

&lt;P&gt;Splunk collection in inputs.conf uses "monitor" and "batch" can automatically do this for you.. ie. look when they arrive and index it instantly and it will check if it has been indexed already etc.&lt;/P&gt;

&lt;P&gt;&lt;A href="https://docs.splunk.com/Documentation/Splunk/8.0.2/Data/Monitorfilesanddirectorieswithinputs.conf"&gt;https://docs.splunk.com/Documentation/Splunk/8.0.2/Data/Monitorfilesanddirectorieswithinputs.conf&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;In your case, I would just do a &lt;CODE&gt;monitor&lt;/CODE&gt; and it will be all good&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[monitor:///your/location/with/read/permission/*.log]
disabled = 0
sourcetype = mycustom_logs
index = my_index
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Also Please: Ensure you do props &amp;amp; transforms to do indextime and search-time extraction for your sourcetype&lt;/P&gt;</description>
      <pubDate>Wed, 26 Feb 2020 09:03:29 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Performing-procedural-search-on-daily-indexed-data-appending/m-p/483011#M82740</guid>
      <dc:creator>koshyk</dc:creator>
      <dc:date>2020-02-26T09:03:29Z</dc:date>
    </item>
    <item>
      <title>Re: Performing procedural search on daily indexed data, appending results as they appear.</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Performing-procedural-search-on-daily-indexed-data-appending/m-p/483012#M82741</link>
      <description>&lt;P&gt;Cool, thanks for the response! I have a follow up question:&lt;/P&gt;

&lt;P&gt;Can I leverage &lt;CODE&gt;monitor&lt;/CODE&gt; to make sure that new incoming logs are only searched once? At the moment, the search restarts and reads every indexed log. This becomes a problem when the amount of logs for the month grows to a very large size; the search begins to take many hours to complete.&lt;/P&gt;</description>
      <pubDate>Thu, 27 Feb 2020 00:28:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Performing-procedural-search-on-daily-indexed-data-appending/m-p/483012#M82741</guid>
      <dc:creator>jacksonmcarthur</dc:creator>
      <dc:date>2020-02-27T00:28:20Z</dc:date>
    </item>
    <item>
      <title>Re: Performing procedural search on daily indexed data, appending results as they appear.</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Performing-procedural-search-on-daily-indexed-data-appending/m-p/483013#M82742</link>
      <description>&lt;P&gt;That is correct. Splunks &lt;CODE&gt;monitor&lt;/CODE&gt; is quite powerful and leverages lot of sanity checks . Please read the inputs.conf details and look for initCrcLength, and reduancy checks which splunk does and the defaults. (you can of course change them, but it is normally not required)&lt;BR /&gt;
I'm sure, in your case Splunk will index from the point it saw the previous data and shouldn't be an issue at all&lt;/P&gt;

&lt;P&gt;if you find the answer is good, please upvote &amp;amp; accept. cheers&lt;/P&gt;</description>
      <pubDate>Thu, 27 Feb 2020 20:44:46 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Performing-procedural-search-on-daily-indexed-data-appending/m-p/483013#M82742</guid>
      <dc:creator>koshyk</dc:creator>
      <dc:date>2020-02-27T20:44:46Z</dc:date>
    </item>
    <item>
      <title>Re: Performing procedural search on daily indexed data, appending results as they appear.</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Performing-procedural-search-on-daily-indexed-data-appending/m-p/483014#M82743</link>
      <description>&lt;P&gt;Sorry Koshyk, I should clairfy what I mean to ask. I'm all okay with indexing the raw data - what I'm asking about is whether or not it is possible to &lt;EM&gt;search&lt;/EM&gt; the indexed logs only &lt;EM&gt;once&lt;/EM&gt;.&lt;/P&gt;

&lt;P&gt;I want to perform a search, and output the resulting data to a .csv file. Here is some example code for that:&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;index="myIndex" mySearchTerms | outputcsv myCsvFile.csv append=true&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;But if I perform this search once a day, logs in the index will get searched &lt;EM&gt;more than one time&lt;/EM&gt;, which leads to longer processing times (as the index gets larger), and redundant data making its way into the .csv output.&lt;/P&gt;

&lt;P&gt;Is it possible to perform a search only on data that has not been searched yet?&lt;/P&gt;</description>
      <pubDate>Thu, 27 Feb 2020 22:00:00 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Performing-procedural-search-on-daily-indexed-data-appending/m-p/483014#M82743</guid>
      <dc:creator>jacksonmcarthur</dc:creator>
      <dc:date>2020-02-27T22:00:00Z</dc:date>
    </item>
    <item>
      <title>Re: Performing procedural search on daily indexed data, appending results as they appear.</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Performing-procedural-search-on-daily-indexed-data-appending/m-p/483015#M82744</link>
      <description>&lt;P&gt;for searching, its upto your logic.  What we normally do is "scheduled searches".  i.e. run saved-searches in a scheduled/cron manner&lt;/P&gt;

&lt;P&gt;eg. &lt;BR /&gt;
- Run a search every 30mins and search for (earliest=-1h and latest=-30m)&lt;BR /&gt;
- Run it continously and you will get whatever it hasn't searched before&lt;BR /&gt;
- You can &lt;STRONG&gt;alert&lt;/STRONG&gt; anything particular from this or &lt;STRONG&gt;summary index&lt;/STRONG&gt; or put into a &lt;STRONG&gt;outputCSV&lt;/STRONG&gt; as you wish&lt;/P&gt;

&lt;P&gt;Also, Splunk searches very fast. For example if you indexer is capable,(or clustered), for a billion events Splunk can search within 30seconds (a rough estimate)&lt;/P&gt;</description>
      <pubDate>Thu, 27 Feb 2020 22:22:01 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Performing-procedural-search-on-daily-indexed-data-appending/m-p/483015#M82744</guid>
      <dc:creator>koshyk</dc:creator>
      <dc:date>2020-02-27T22:22:01Z</dc:date>
    </item>
    <item>
      <title>Re: Performing procedural search on daily indexed data, appending results as they appear.</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Performing-procedural-search-on-daily-indexed-data-appending/m-p/483016#M82745</link>
      <description>&lt;P&gt;Awesome, didn't know that about scheduled searches! Thank you for all the help &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Feb 2020 22:46:28 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Performing-procedural-search-on-daily-indexed-data-appending/m-p/483016#M82745</guid>
      <dc:creator>jacksonmcarthur</dc:creator>
      <dc:date>2020-02-27T22:46:28Z</dc:date>
    </item>
  </channel>
</rss>

