I know this Question has been asked before (http://answers.splunk.com/questions/712/put-data-in-separate-index-based-on-timestamp) but we will start with end of year tests soon. Some of our test servers will simulate what will happen on Dec 31st at midnight. We would like to have the data from those test servers in a different index somehow.
I'd like to know if anyone has done anaything similar before. We're thinking about setting up a temporary indexer and then reconfigure syslog and our Splunk Forwarders to make sure that our main data does not get polluted.
Unless there is some piece of this setup that I am unaware of, it's pretty simple to do what you are asking because the index is set when you add the Data Input monitor in Splunk Manager.
The default index is set as 'main', but you can override that and specify a new test index that you create yourself in the Manager>>Indexes page.
Therefore, on your test servers, add the data inputs and be sure to specify the test index you create and all of that data will go into the index you specify.
I hope this makes sense.
I did describe our setup properly. Changeing the Splunk LWFs configuration will be easy. We have a central syslog server (we had that before we had Splunk) that collects all the syslog stuff in a directory (with a lot of subdirectories for facilities & severities per server) and we just index that directory in splunk recursively. So the syslog data from the tests will get mixed with our real data if we don't do anything. I was hoping for an easy switch that will seperate everything that is in the future to a different place so won't end up with a mess with our main data.
Oh I see now. So you may have another option, in this case. But need to confirm: Will you have specific test hosts that will ONLY send syslog containing future timestamps? OR will your test hosts send both real and future timestamped syslog events?
We will change the system time on our test servers so all the events will be in the future. The suggestion in your comment to Lowells answer will help us. Thank you very much
Also keep in mind that in Splunk 4.0 and newer it is possible to have multiple "hot" buckets per index which helps in this kind of situation where you have data being loaded from different points in time (although more often this is used for historical data, there is no reason why future data would be handled differently.) I think the default bucket span is 90 days, so as of right now, loading any data for Dec 31, 2010 should cause a new bucket to be created (as the date approaches, this will no longer be true... it all depends on the rotation of your buckets.)
With that said, using a separate index would be best. And if you have any concern about missing inputs or not being able to separate everything out, then perhaps setting up a temporary "test" splunk instance may be worth the effort. (If you've ever dealt with the results of messed up timestamps before you know how painful it can be to fix this after the fact.) Some of this will depend on whether or not you want to keep around this test data after your done testing or not. You have to decide what your comfortable with.
Config settings to consider:
Make sure you review the following settings in
props.conf. You may need to customize these in order for splunk to accept your future dates:
MAX_DAYS_HENCE = <integer> MAX_DIFF_SECS_HENCE = <integer>
Also see the following settings in
quarantineFutureSecs = <non-negative number>
I would suggest that you read the docs related to these settings and understand what is going on before trying this.
Another option is to use a transformer to set the
_MetaData:Index property. I would only suggest this if you have very simmilar timestamps across all of your events; otherwise writing a proper regular expression will be very difficult.
This example assumes that only events for Dec 31 2010 and Jan 1 2011 will occur for this test. In other words, if you forget to correct your clock and the system rolls over to Jan 2, 2011 the that your event will end up in your current index. Here is an example set of config files: (I would recommend you put them in an app that you disable as soon as your testing period is done. You obviously don't want your real events on Dec 31 and Jan 1 to end up in your testing index.)
[syslog] TRANSFORMS-year_end_testing = route_index_YE_testing [sourcetype-n] TRANSFORMS-year_end_testing = route_index_YE_testing ...
[route_index_YE_testing] REGEX = ^(Dec\s+31|Jan\s+1)\s FORMAT = test_ye DEST_KEY = _MetaData:Index
In this example, "test_ye" is the name of your testing index which you must create. Also, "sourcetype-n" is a placeholder. You must explicitly list out all all sourcetypes that will be involved. And each sourcetype must use this transformer (or a simmilar transformer, if you create a different transformer for your timestamp formats) of only part of your data will be routed to the correct location.
If you aren't very familiar with indexing routing like this, fluent with writing and testing regular expression, or don't have full control over your sourcetypes than one other options would probably be better. They all have different pros/cons, and this could be rather tricky to get right on the first try....
Cool, this is what I was hoping for. The challenge is, that test is a bit longer than just 31 Dec/Jan 1 (I wanted to illustrate what we are doing) and there will be several timejumps on those test servers. We will discuss if we want to try this or if we will set up the seperate instance (which you don't consider to be the wrong approach). Thank you for your input