We have an index, say 'index1' that has log retention upto 7 days. As the log volume is huge, we dont want to retain all logs there for more than 7 days.
However, there is also requirement to retain some logs for later use, say some errors logs that we want to inspect later. So the solution we though of is to use 'collect' and have it in a separate index say 'index2' which has a greater retention, say 6 months.
So I planned on using the below way for this
index=index1 level=ERROR | collect index=index2 output_format=hec
*using output_format=hec because we want to use exact same source and source type to have the field extractors working exactly like the original index
However there are some questions I have with this.
1. Does this method use license?
The doc says below statements which is kind of confusing.
a)Allows the source, sourcetype, and host from the original data to be used directly in the summary index.
b)No license is counted for the internal stash source type. License is counted when the original source type is used instead of stash in output_mode=hec
https://docs.splunk.com/Documentation/Splunk/9.0.2/SearchReference/Collect
2. The document also mentions
"This command is considered risky because, if used incorrectly, it can pose a security risk or potentially lose data when it runs"
But its not clear how is it risky and what are the things to make sure to avoid problems?
3. It looks like 'collect' command can be used by any user. I tried removing 'run_collect' capability and it also doesnt prevent a role from using collect.
How to only allow certain roles to use 'collect' command?
4. The collect command is basically writing to an index. Is there a way to restrict a role from writing data to an index using 'collect' or any other command?
BTW, if you have a fairly limited set of transforms used in your sourcetype(s), you can "cheat" a bit and store the data as the "stash" sourcetype but be able to parse it later.
For example - I did a simple search on my lab server
index=firewall earliest=-1m | collect index=test1
That gave me a minute worth of events from the firewall logs.
Now if I just do
index=test1 sourcetype=stash
it will give me the stashed events and will not parse anything interesting out of them obviously.
But since I know that these are junos firewall logs, I can do
index=test1 sourcetype=stash
| extract auto_kv_for_rt_flow_session_create
| extract auto_kv_for_rt_flow_session_close
| extract auto_kv_for_rt_flow_session_deny
To get the events parsed.
As one can easily notice - that's in no way convenient option for handling the events. You can't use inline extractions this way - only those having own transforms stanza. And you can't filter them on search until you parse_all_ of them. So it's more like a proof-of-concept that it can be done at all than a solution to use in prod.
Why not just filter those logs into two separate indexes?
Retention period is one of the main reasons for using multiple indexes (along permissions and cardinality).
This way you could expire your "main" index within - for example - 30 days and keep your "important" index for 10 years. Why not?
We do not want to continuously retain all 'ERROR' logs or logs using any such global filter. It can be ad hoc depending on issue we are troubleshooting or for whatever other reason.
Licence usage is counted unless the sourcetype is stash, so with hec, you will incur licence usage.
Not sure what the risks are other than that it can increase your storage requirements, licence costs.
It does seem that a user without run_collect can still run collect - that seems wrong
Hi
as @bowesmana said, it will use license as you want those events be like original.
Could it be possible to add into props&transforms.conf configurations which change index for those particular events to index2? That way there is no need to run collect later on and double your license usage? Just add second index name on your queries or even use eventtype which contains both indexes?
r. Ismo
In our case we do not want to actually move the events to new index(because the main index is still used for some reports and stats which demands all the events to be there). And the expected use case is to retain a bunch of logs while not using a huge chunk of the license. We also do not expect this to happen too frequently (so i think even if it uses license, it wont be huge and we do usually have much left to spare every day).
One of the ways I came up with safely using this is to have a search macro like below.
macro: retain_logs
head 1000 | collect index=index2 output_format=hec |
So a user can use search like below so there are no errors in syntax or index used etc, and also there is a limit on how many events are retained.
<search> | `retain_logs` |
Good thing about macros are that we can control permission to different roles. But still the 'collect' command can be directly run by anyone which I'm till not sure how to prevent.
There is no way to "keep just a subset of our index". The data is rolled out as whole buckets so you need to copy the data out if you want to retain it for longer.
Using the splunk 'collect' as mentioned above works for us, except that it will incur additional license usage. As long as the license used is under a safe limit, we are fine with this. At this point Im just worried about how 'collect' can be run by anyone to write stuff to any index, and also the below warning in 'collect' documentation.
"This command is considered risky because, if used incorrectly, it can pose a security risk or potentially lose data when it runs"
https://docs.splunk.com/Documentation/Splunk/9.0.2/SearchReference/Collect
If you are absolutely sure the those users haven't run_collect role (could be inherited etc. which are not shown on GUI?) then you should create a support case as a bug report to splunk.
I can confirm - I just created a user with an extremely limited role - just a "search" capability. And this user can still run the collect command.
I was using the basic 'user' role and removed run_collect capability from this - but I till could run the collect command from a user with this role. I saw this discussed in some other forum discussions and looks like its still the case. I have opened a case with support on this and will update here if there is any useful updates.