Question - is there a CRC equivalent for data indexed from a Powershell function?
On a server, I have a log file generated everyday. There are many events in this daily log file, but I only need to index a specific event.
My approach was to create a powershell script which matches a specified string and returns the lines (the content, not the line number) that contain these events. Running the script from the Powershell CLE appears to work as expected.
I then created the following stanza in inputs.conf on the UF of the server:
[powershell://apache_search_test]
script = . "C:<subdirectories>\scanApacheLog_Splunk.ps1"
sourcetype = apache_test
schedule = 60
disabled = 0
My test condition was to match the string "30/Jun/2017:09:59:17 -0400" - I see that the four events with this timestamp are indexed:
xxx.xxx.xxx.xxx - - [30/Jun/2017:09:59:17 -0400] "HEAD /enterprise HTTP/1.0" 302 -
xxx.xxx.xxx.xxx - - [30/Jun/2017:09:59:17 -0400] "POST /u/xmlrpc HTTP/1.1" 200 120
xxx.xxx.xxx.xxx - - [30/Jun/2017:09:59:17 -0400] "POST /u/xmlrpc HTTP/1.1" 200 120
xxx.xxx.xxx.xxx - - [30/Jun/2017:09:59:17 -0400] "POST /u/xmlrpc HTTP/1.1" 200 120
The problem is these events are additionally indexed each time the schedule of the stanza runs. But I only want the four events indexed (or less if they are not unique).
Is there something missing in the stanza? Do I need to add a stanza to props.conf (in that it may not be recognizing the timestamp)?
Any help is appreciated!
If I understand correctly the best way to diagnose, i'm thinking of few options
1. use your powershell script to write into a FILE. and let Splunk read the file. This way you can understand if the script creates duplicate events or NOT.
2. Write a unique ID in your messages (i.e session id or process id ), that way you are sure if they are same messages or not.
3. output timestamp to much precise values with milliseconds and timezone
4. Tackle at Splunk level by putting props for duplicate events (not preferred)