I want to know if below things are possible in splunk and if YES then How it can be achieved-
1. Below is sample events
2019-07-16|21:15:43.370|INFO|This is a statement 2019-07-16|21:16:43.370|INFO|Random statement 2019-07-16|21:17:43.370|INFO|Random statement 2019-07-16|21:18:43.370|INFO|This is a statement 2019-07-16|21:19:43.370|INFO|This is a statement
I have heavy forwarder where I want to index only first occurrence of "This is a statement" line and do not want other lines which contain "This is a statement" string to be index. Since same line coming multiple time in log file and I want to index only first occurrence of it.
2. Below is another sample events
2019-07-16|21:15:43.370|INFO|Temprature-30 2019-07-16|21:16:43.370|INFO|Temprature-30 2019-07-16|21:17:43.370|INFO|Temprature-30 2019-07-16|21:18:43.370|INFO|Temprature-32 2019-07-16|21:19:43.370|INFO|Temprature-32
Here I want only two lines which has distinct temprature to be index.
are these above two strings possible in splunk? I want these to be done before indexing so to reduce indexing volume.
Currently I am using nullqueue and indexqueue to parse required data but now I want to index only first occurrence.
Appreciate your help.
This kind of logic is not possible on the HF alone as the indexing pipeline doesn't keep a history of the indexed events. You can see here in more details how that layer works :
My advice in your case is to create a scripted input and configure it in the
inputs.conf to run with run it with an
interval of 5-10 mins (more or less depending on your needs). Within this scrip you can apply the required logic and then the output which is the non-duplicated events is the only thing that will get indexed.
Details here of when to use scripted inputs can be found here : https://docs.splunk.com/Documentation/Splunk/7.3.0/AdvancedDev/ScriptedInputsIntro#Use_cases_for_scr...
If you're not comfortable with scripted inputs you can simply
cron a script to apply cleansing on your file and rewrite them into new files without duplicates. Then you would index those files instead of the main ones.
If I write script to remove duplicates logic and if I run then it will require to store parsed files in another folder and then with monitor stanza I will monitor these parsed files which will require "disk space" since my all files are zip files.
Can it be possible with zip files without storing any parsed log files on separate folder and directly send for indexing? if yes can you please help me with sample script..
You're welcome @ips_mandar.
You don't have to unzip your files, then read, then delete. You can simply read them using
zcatfrom the script :
Let me know if that's what you're looking for 🙂
sorry I didn't mention that I am on Windows server.
I am very new with scripts it will be good if you can share me one script which I can run to remove duplicates from zip files although in duplicates line timestamp will be different.
The logic should be as follows :
1- find unique events
2- write into new files
For linux you can very easily do that using :
sort -u your_file > new_file
You could try finding the equivalent for windows, it surely exists.
Also you might need to handle the timestamp because that makes all lines different, so you'll also need to exclude that from the "unique" logic.