Getting Data In

When testing UF deployment on windows endpoints, winevents are delayed, what is the best way to optimize inputs on the UF?

packet_hunter
Contributor

I am test deploying UFs to collect windows event logs from Windows 10 endpoints.

I have installed the UF on Windows and entered the Deployment Server info during install.

I am using the DS to push out two deployment apps to the UF.
1st is a custom app to push to all indexers, could the following be improved?

[tcpout]
defaultGroup = default-autolb-group

[tcpout:default-autolb-group]

server = splunkindexer1.mycorp.com:9997, splunkindexer2.mycorp.com:9997, splunkindexer3.mycorp.com:9997, splunkindexer4.mycorp.com:9997


[tcpout-server://splunkindexer1.mycorp.com:9997]

The other deployment app I want to use is Splunk_TA_Windows which is currently being used to collect DC winevent logs without an issue.

# Copyright (C) 2005-2015 Splunk Inc. All Rights Reserved.
# DO NOT EDIT THIS FILE!
# Please make all changes to files in $SPLUNK_HOME/etc/apps/Splunk_TA_windows/local.
# To make changes, copy the section/stanza you want to change from $SPLUNK_HOME/etc/apps/Splunk_TA_windows/default
# into ../local and edit there.
#

[default]
evt_dc_name =
evt_dns_name =


###### OS Logs ######
[WinEventLog://Application]
disabled = false
start_from = oldest
current_only = 0
checkpointInterval = 5
index = wineventlog
renderXml=false

[WinEventLog://Security]
disabled = false
start_from = oldest
current_only = 0
evt_resolve_ad_obj = 1
checkpointInterval = 5
blacklist1 = EventCode="4662" Message="Object Type:\s+(?!groupPolicyContainer)"
blacklist2 = EventCode="566" Message="Object Type:\s+(?!groupPolicyContainer)"
index = wineventlog
renderXml=false

[WinEventLog://System]
disabled = false
start_from = oldest
current_only = 0
checkpointInterval = 5
index = wineventlog
renderXml=false


####### OS Logs (Splunk 5.x only) ######
# If you are running Splunk 5.x remove the above OS log stanzas and uncomment these three.
#[WinEventLog:Application]
#disabled = 1
#start_from = oldest
#current_only = 0
#checkpointInterval = 5
#index = wineventlog
#
#[WinEventLog:Security]
#disabled = 1
#start_from = oldest
#current_only = 0
#evt_resolve_ad_obj = 1
#checkpointInterval = 5
#index = wineventlog
#
#[WinEventLog:System]
#disabled = 1
#start_from = oldest
#current_only = 0
#checkpointInterval = 5
#index = wineventlog


###### IIS ######
#[monitor://D:\inetpub\logs\LogFiles]
#sourcetype=iis
#disabled = 1

###### DHCP ######
[monitor://$WINDIR\System32\DHCP]
disabled = 0
whitelist = DhcpSrvLog*
crcSalt = <SOURCE>
sourcetype = DhcpSrvLog
index = windows


###### Windows Update Log ######
[monitor://$WINDIR\WindowsUpdate.log]
disabled = 1
sourcetype = WindowsUpdateLog
index = windows


###### Scripted Input (See also wmi.conf)
[script://.\bin\win_listening_ports.bat]
disabled = 1
## Run once per hour
interval = 3600
sourcetype = Script:ListeningPorts
index = windows

[script://.\bin\win_installed_apps.bat]
disabled = 1
## Run once per day
interval = 86400
sourcetype = Script:InstalledApps
index = windows

###### Host monitoring ######
[WinHostMon://Computer]
interval = 600
disabled = 1
type = Computer
index = windows

[WinHostMon://Process]
interval = 600
disabled = 1
type = Process
index = windows

[WinHostMon://Processor]
interval = 600
disabled = 1
type = Processor
index = windows

[WinHostMon://Application]
interval = 600
disabled = 1
type = Application
index = windows

[WinHostMon://NetworkAdapter]
interval = 600
disabled = 1
type = NetworkAdapter
index = windows

[WinHostMon://Service]
interval = 600
disabled = 1
type = Service
index = windows

[WinHostMon://OperatingSystem]
interval = 600
disabled = 1
type = OperatingSystem
index = windows

[WinHostMon://Disk]
interval = 600
disabled = 1
type = Disk
index = windows

[WinHostMon://Driver]
interval = 600
disabled = 1
type = Driver
index = windows

[WinHostMon://Roles]
interval = 600
disabled = 1
type = Roles
index = windows

###### Print monitoring ######
[WinPrintMon://printer]
type = printer
interval = 600
baseline = 1
disabled = 1
index = windows

[WinPrintMon://driver]
type = driver
interval = 600
baseline = 1
disabled = 1
index = windows

[WinPrintMon://port]
type = port
interval = 600
baseline = 1
disabled = 1
index = windows

###### Network monitoring ######
[WinNetMon://inbound]
direction = inbound
disabled = 1
index = windows

[WinNetMon://outbound]
direction = outbound
disabled = 1
index = windows

###### Splunk 5.0+ Performance Counters ######
## CPU
[perfmon://CPU]
counters = % Processor Time; % User Time; % Privileged Time; Interrupts/sec; % DPC Time; % Interrupt Time; DPCs Queued/sec; DPC Rate; % Idle Time; % C1 Time; % C2 Time; % C3 Time; C1 Transitions/sec; C2 Transitions/sec; C3 Transitions/sec
disabled = 1
instances = *
interval = 10
object = Processor
useEnglishOnly=true
index = perfmon

## Logical Disk
[perfmon://LogicalDisk]
counters = % Free Space; Free Megabytes; Current Disk Queue Length; % Disk Time; Avg. Disk Queue Length; % Disk Read Time; Avg. Disk Read Queue Length; % Disk Write Time; Avg. Disk Write Queue Length; Avg. Disk sec/Transfer; Avg. Disk sec/Read; Avg. Disk sec/Write; Disk Transfers/sec; Disk Reads/sec; Disk Writes/sec; Disk Bytes/sec; Disk Read Bytes/sec; Disk Write Bytes/sec; Avg. Disk Bytes/Transfer; Avg. Disk Bytes/Read; Avg. Disk Bytes/Write; % Idle Time; Split IO/Sec
disabled = 1
instances = *
interval = 10
object = LogicalDisk
useEnglishOnly=true
index = perfmon

## Physical Disk
[perfmon://PhysicalDisk]
counters = Current Disk Queue Length; % Disk Time; Avg. Disk Queue Length; % Disk Read Time; Avg. Disk Read Queue Length; % Disk Write Time; Avg. Disk Write Queue Length; Avg. Disk sec/Transfer; Avg. Disk sec/Read; Avg. Disk sec/Write; Disk Transfers/sec; Disk Reads/sec; Disk Writes/sec; Disk Bytes/sec; Disk Read Bytes/sec; Disk Write Bytes/sec; Avg. Disk Bytes/Transfer; Avg. Disk Bytes/Read; Avg. Disk Bytes/Write; % Idle Time; Split IO/Sec
disabled = 1
instances = *
interval = 10
object = PhysicalDisk
useEnglishOnly=true
index = perfmon

## Memory
[perfmon://Memory]
counters = Page Faults/sec; Available Bytes; Committed Bytes; Commit Limit; Write Copies/sec; Transition Faults/sec; Cache Faults/sec; Demand Zero Faults/sec; Pages/sec; Pages Input/sec; Page Reads/sec; Pages Output/sec; Pool Paged Bytes; Pool Nonpaged Bytes; Page Writes/sec; Pool Paged Allocs; Pool Nonpaged Allocs; Free System Page Table Entries; Cache Bytes; Cache Bytes Peak; Pool Paged Resident Bytes; System Code Total Bytes; System Code Resident Bytes; System Driver Total Bytes; System Driver Resident Bytes; System Cache Resident Bytes; % Committed Bytes In Use; Available KBytes; Available MBytes; Transition Pages RePurposed/sec; Free & Zero Page List Bytes; Modified Page List Bytes; Standby Cache Reserve Bytes; Standby Cache Normal Priority Bytes; Standby Cache Core Bytes; Long-Term Average Standby Cache Lifetime (s)
disabled = 1
interval = 10
object = Memory
useEnglishOnly=true
index = perfmon

## Network
[perfmon://Network]
counters = Bytes Total/sec; Packets/sec; Packets Received/sec; Packets Sent/sec; Current Bandwidth; Bytes Received/sec; Packets Received Unicast/sec; Packets Received Non-Unicast/sec; Packets Received Discarded; Packets Received Errors; Packets Received Unknown; Bytes Sent/sec; Packets Sent Unicast/sec; Packets Sent Non-Unicast/sec; Packets Outbound Discarded; Packets Outbound Errors; Output Queue Length; Offloaded Connections; TCP Active RSC Connections; TCP RSC Coalesced Packets/sec; TCP RSC Exceptions/sec; TCP RSC Average Packet Size  
disabled = 1
instances = *
interval = 10
object = Network Interface
useEnglishOnly=true
index = perfmon

## Process
[perfmon://Process]
counters = % Processor Time; % User Time; % Privileged Time; Virtual Bytes Peak; Virtual Bytes; Page Faults/sec; Working Set Peak; Working Set; Page File Bytes Peak; Page File Bytes; Private Bytes; Thread Count; Priority Base; Elapsed Time; ID Process; Creating Process ID; Pool Paged Bytes; Pool Nonpaged Bytes; Handle Count; IO Read Operations/sec; IO Write Operations/sec; IO Data Operations/sec; IO Other Operations/sec; IO Read Bytes/sec; IO Write Bytes/sec; IO Data Bytes/sec; IO Other Bytes/sec; Working Set - Private
disabled = 1
instances = *
interval = 10
object = Process
useEnglishOnly=true
index = perfmon

## System
[perfmon://System]
counters = File Read Operations/sec; File Write Operations/sec; File Control Operations/sec; File Read Bytes/sec; File Write Bytes/sec; File Control Bytes/sec; Context Switches/sec; System Calls/sec; File Data Operations/sec; System Up Time; Processor Queue Length; Processes; Threads; Alignment Fixups/sec; Exception Dispatches/sec; Floating Emulations/sec; % Registry Quota In Use
disabled = 1
instances = *
interval = 10
object = System
useEnglishOnly=true
index = perfmon

[admon://default]
disabled = 1
monitorSubtree = 1

[WinRegMon://default]
disabled = 1
hive = .*
proc = .*
type = rename|set|delete|create
index = windows

[WinRegMon://hkcu_run]
disabled = 1
hive = \\REGISTRY\\USER\\.*\\Software\\Microsoft\\Windows\\CurrentVersion\\Run\\.*
proc = .*
type = set|create|delete|rename
index = windows

[WinRegMon://hklm_run]
disabled = 1
hive = \\REGISTRY\\MACHINE\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Run\\.*
proc = .*
type = set|create|delete|rename
index = windows

The problem I noticed is that security events were delayed as I monitored the events from a test endpoint. I am not sure if that is normal at start up.

Can anyone provide guidance on building/modifying the inputs.conf for window event collection from endpoints or point to a good reference?

Also does any one have tips on how they separated the winevent logs into different indexes??? I am thinking about separating winevents from different servers e.g. DC, DHCP, DNS, Web and endpoints by index. But I would like to learn if anyone has a different method or solution to keep events organized.

Thank you

0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi packet_hunter,
which delay are you speaking of: 1 minute or more?
Forwarder usually sends logs every 30 seconds (default) but if the log of a single server are many it separates his logs in packets to optimize transmission without an eccessive bandwidth occupation.
Surely if first logs are many so there's more delay that should be normalized to 30 seconds after some time, have you a delay also after time?

It's possible to change to frequency of forwarder's connection, but I neved did it in my projects for bandwidth occupation needs.

About the choice to put logs in more indexes or indexers, it depends on the number of logs: how many logs are you waiting for?
Did you used the hardware requirements suggested by Splunk for the Indexer?
If you have an hardware configuration non sufficient to index and search your logs you surely have a delay in indexing.

I hope to be helpful for you, anyway see http://docs.splunk.com/Documentation/Splunk/7.0.0/Indexer/Systemrequirements and http://docs.splunk.com/Documentation/Splunk/latest/Data/Getstartedwithgettingdatain

Bye.
Giuseppe

View solution in original post

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi packet_hunter,
which delay are you speaking of: 1 minute or more?
Forwarder usually sends logs every 30 seconds (default) but if the log of a single server are many it separates his logs in packets to optimize transmission without an eccessive bandwidth occupation.
Surely if first logs are many so there's more delay that should be normalized to 30 seconds after some time, have you a delay also after time?

It's possible to change to frequency of forwarder's connection, but I neved did it in my projects for bandwidth occupation needs.

About the choice to put logs in more indexes or indexers, it depends on the number of logs: how many logs are you waiting for?
Did you used the hardware requirements suggested by Splunk for the Indexer?
If you have an hardware configuration non sufficient to index and search your logs you surely have a delay in indexing.

I hope to be helpful for you, anyway see http://docs.splunk.com/Documentation/Splunk/7.0.0/Indexer/Systemrequirements and http://docs.splunk.com/Documentation/Splunk/latest/Data/Getstartedwithgettingdatain

Bye.
Giuseppe

0 Karma

packet_hunter
Contributor

Thank you Giuseppe,

The system and application logs rolled in right away but there were no security logs.

I did contaminate my test a bit, because I edited the client name under the deployment server > fwdr management > server class > (my windows endpoint class) > edit clients > whitelist...

I added two more alias client's names to the original dns name, those being host name and client name. So there were three total names for the specific host/client I was testing. Shortly after that the security logs rolled in, and I am not sure if that is coincidence or not...

Currently there are only about 300 UFs sending winevents from servers... the indexers seem to be handling the load fine, I don't think adding an additional wkstn would cause the delay.

I was hoping to find an instructional "how to" post/reference specific to my windows event collection project where there is a mix of windows servers and endpoints in an environment >10k UFs feeding directly to the indexers....

Thank you

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi packet_hunter,
to debug Security logs, run a check using ./splunk cmd btool inputs list --debug > inputx.txt
so you can find if there are other configurations for Security logs (maybe anywhere there's a disabled=1).

i don't think that the operation you did on Deployment Server could affect problems, but to be more sure try to delete this configuration.

To understand how indexers work, you can use the Distributed Monitoring Console to understand if you have bottlenecks or if there are delays in indexing chain.

Only e stupid question: how much performant is your storage?
many times the problem is that disksaren't compliant with the Splunk requirements (at least 800 iops, better 1200) so Indexers have problems to write logs on storage.
You can check disks iops using some Open Source tool as Bonnie++.

Bye.
Giuseppe

0 Karma

packet_hunter
Contributor

Thank you for the suggestions, I will use btool as you suggested. As far as our disk performance we have been good as we use ssd (s) with on demand expansion... I think I just need to retest with only dns name and btool.

Last question, besides changing the intervals in the inputs.conf (splunk_ta_windows) is that the correct inputs.conf to use?

0 Karma

gcusello
SplunkTrust
SplunkTrust

I don't know your requirements: in this TA-Windows I see enabled only Security, Application, System and DHCP, check if this is correct.
Anyway if you have dubt , download the latest version of this TA from Appbase.
Bye.
Giuseppe

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

&#x1f342; Fall into November with a fresh lineup of Community Office Hours, Tech Talks, and Webinars we’ve ...

Transform your security operations with Splunk Enterprise Security

Hi Splunk Community, Splunk Platform has set a great foundation for your security operations. With the ...

Splunk Admins and App Developers | Earn a $35 gift card!

Splunk, in collaboration with ESG (Enterprise Strategy Group) by TechTarget, is excited to announce a ...