Getting Data In

Powershell script to extract events from a log file - many dupe events indexed in Splunk

a_splunk_user
Path Finder

Question - is there a CRC equivalent for data indexed from a Powershell function?

On a server, I have a log file generated everyday. There are many events in this daily log file, but I only need to index a specific event.

My approach was to create a powershell script which matches a specified string and returns the lines (the content, not the line number) that contain these events. Running the script from the Powershell CLE appears to work as expected.

I then created the following stanza in inputs.conf on the UF of the server:
[powershell://apache_search_test]
script = . "C:<subdirectories>\scanApacheLog_Splunk.ps1"
sourcetype = apache_test
schedule = 60
disabled = 0

My test condition was to match the string "30/Jun/2017:09:59:17 -0400" - I see that the four events with this timestamp are indexed:
xxx.xxx.xxx.xxx - - [30/Jun/2017:09:59:17 -0400] "HEAD /enterprise HTTP/1.0" 302 -
xxx.xxx.xxx.xxx - - [30/Jun/2017:09:59:17 -0400] "POST /u/xmlrpc HTTP/1.1" 200 120
xxx.xxx.xxx.xxx - - [30/Jun/2017:09:59:17 -0400] "POST /u/xmlrpc HTTP/1.1" 200 120
xxx.xxx.xxx.xxx - - [30/Jun/2017:09:59:17 -0400] "POST /u/xmlrpc HTTP/1.1" 200 120

The problem is these events are additionally indexed each time the schedule of the stanza runs. But I only want the four events indexed (or less if they are not unique).

Is there something missing in the stanza? Do I need to add a stanza to props.conf (in that it may not be recognizing the timestamp)?

Any help is appreciated!

0 Karma

koshyk
Super Champion

If I understand correctly the best way to diagnose, i'm thinking of few options
1. use your powershell script to write into a FILE. and let Splunk read the file. This way you can understand if the script creates duplicate events or NOT.
2. Write a unique ID in your messages (i.e session id or process id ), that way you are sure if they are same messages or not.
3. output timestamp to much precise values with milliseconds and timezone
4. Tackle at Splunk level by putting props for duplicate events (not preferred)

0 Karma
Get Updates on the Splunk Community!

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...

Introducing New Splunkbase Governance!

Splunk apps are essential for maximizing the value of your Splunk Experience. Whether you’re using the default ...

3 Ways to Make OpenTelemetry Even Better

My role as an Observability Specialist at Splunk provides me with the opportunity to work with customers of ...