All Apps and Add-ons

Windows DNS Analytical and Diagnostic Logs: How are others monitoring DNS Analytical Logs?

coltwanger
Contributor

I've deployed the Windows DNS Analytical and Diagnostic Logs add-on to our DNS servers, but the PowerShell script returns the following error:

ERROR ExecProcessor - message from "C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -command "& 'C:\Program Files\SplunkUniversalForwarder\etc\apps\TA-windnsanalytical\bin\get_dns_analytics.ps1'"" Get-WinEvent : The requested operation cannot be performed over an enabled direct channel. The channel must first be disabled before performing the requested operation.

I suppose this is normal with ETL logs. The fix appears to be to let the log fill up and clear it manually, or let it roll off into archived files, which will eventually need to be deleted manually. This doesn't help the fact that Splunk still wouldn't be able to monitor the active ETL analytics log while it's being written to.

So I guess my option is to roll the log into archive files, and monitor the archive files (a batch input maybe)? How is everyone else monitoring the Analytics log? We really don't want to enable debug logging on our DNS servers due to the performance hit. It seems like this add-on can only work if the retention on the DNS Analytics log is configured to "Do not automatically overwrite events".

1 Solution

gjanders
SplunkTrust
SplunkTrust

Splunk Stream can do this work, we are using the stream forwarder on the non-windows platform and it's great for DNS traffic.

We are evaluating it's usage for windows as the documentation mentions that winpcap allows any local user to use it (not a Splunk issue but the Splunk Stream uses that library on Windows).

Alerts for Splunk Admins https://splunkbase.splunk.com/app/3796/
Version Control for Splunk https://splunkbase.splunk.com/app/4355/

View solution in original post

muhammadfahimma
Engager

Did anybody actually manage to parse the packet data? My initial understanding of the logs is that the PacketData field contains the answer to the query. I found this out by parsing the PacketData using the dnslib python library.

0 Karma

muhammadfahimma
Engager

@hughkelley

0 Karma

hughkelley
Path Finder

I saw your mention of dnslib and I skimmed the project but haven't had time to see what they are doing to decode. The dns-packet also looked promising -but neither can be called directly from Powershell. This article has no code, but does seem to explain the packet layout: Dissecting DNS Communications.

0 Karma

hughkelley
Path Finder

Good find.

For what it's worth, we ultimately found that the performance of Get-WinEvent was terrible. The time spent in FormatDescription() seems to be the issue. We ultimately rewrote the script a bit to bypass it and do the formatting ourselves. We don't send the raw packet data to Splunk (didn't have time to work out the formatting for it).

$logName = 'Microsoft-Windows-DNSServer/Analytical'
$filterXPath = "*[System[EventID!=280] and EventData[Data[@Name='InterfaceIP']!='127.0.0.1']]"

$eventlogSettings = get-winevent -ListLog $logName
$logFile = [System.Environment]::ExpandEnvironmentVariables($eventlogSettings.LogFilePath)  # expand the variables in the file path


# Extract the DNS "message types" from the event provider into a sparse array (element number == event ID).   This is used later for our lightweight message generation
$prov = Get-WinEvent -ListProvider $eventlogSettings.OwningProviderName 
$messageTypes= new-object string[] 999  # no four-digit event IDs so we should be fine with 999
$prov.Events | %{

    $description = $_.Description -replace ";\s+PacketData=%\d+", ""  # remove packetdata  (for now,  too complicated to parse)
    $description = $description -replace "%(?<token>\d{1,2})", "{`${token}}"   # convert for PS-based tokens
    $messageTypes[$_.Id]  = $description
}

# Do not use Get-WinEvent to avoid performance overhead of FormatDescription()

$events = @()

$query = New-Object System.Diagnostics.Eventing.Reader.EventLogQuery($logFile,[System.Diagnostics.Eventing.Reader.PathType]::FilePath , $filterXPath);
$reader = New-Object System.Diagnostics.Eventing.Reader.EventLogReader($query)
while(($record = $reader.ReadEvent()) -ne $null)
{
    # convert the raw data to the format, without relying on EventLogRecord.FormatDescription () 
    $propVals=@($null)
    foreach($prop in $record.psbase.Properties)
    {
        $propVals += $prop.value
    }

    $record | Add-Member -MemberType NoteProperty -Name Message -Value ($messageTypes[$record.Id] -f $propVals)

    $events += $record
}

# emit for Splunk UF to parse
$events | fl 

# Clear the log
$logSize = $eventlogSettings.Filesize  # before clearing
$swLogPaused = [Diagnostics.Stopwatch]::StartNew()
$eventlogSettings.IsEnabled = $false
$eventlogSettings.SaveChanges()
try
{
    [System.Diagnostics.Eventing.Reader.EventLogSession]::GlobalSession.ClearLog($eventlogSettings.LogName) 
}
catch  [System.Management.Automation.MethodException]
{ # eat this exception.   It says "The process cannot access the file because it is being used by another process" but it lies, the log is cleared    
}
$eventlogSettings.IsEnabled = $true
$eventlogSettings.SaveChanges()
$swLogPaused.Stop()

# calculate the run time
$elapsedTimeSecs = (New-TimeSpan -Start (Get-Process -Id $pid).StartTime  -End (Get-Date)).Seconds  

# Emit some performance stats
[pscustomobject]@{
    LoggingPausedMs=$swLogPaused.Elapsed.Milliseconds;
    LogFileMaxBytes=$eventlogSettings.MaximumSizeInBytes
    LoggedBytes=$logSize
    LoggedRecs=$events.Count
    LoggedTimespanSecs=(New-TimeSpan -Start $events[0].timecreated -End $events[-1].timecreated).Seconds
    ScriptRunSecs=$elapsedTimeSecs
}   | fl 
0 Karma

robert_miller
Path Finder

Hey @hughkelley ,

Since you are clearing out the log file via this script, how often are you running this? (i.e. every x seconds, minute, five minutes, 10 minutes, etc.)

0 Karma

hughkelley
Path Finder

We currently run it every minute - and it sometimes runs as long as 20s. I think we're going to filter further (possibly not logging results that were returned from authoritative zones).

In addition to the CPU load on our DCs we're chewing up quite a bit of license volume.

0 Karma

robert_miller
Path Finder

When the script runs as long as 20s, how much logs do you think are possibly not being captured? I noticed in your script you have "LoggingPausedMs" which might help answer this question. I am trying to determine if rolling the log file vs pausing/clearing is the right solution.

0 Karma

hughkelley
Path Finder

I think/fear we're missing that whole window. My next move (in what seems like a never-ending saga of optimization) is to switch to the following method for clearing the log and then read/parse from that backup. This will probably drive up the "paused" counter but should increase my visibility overall.

ClearLog (string logName, string backupPath);

I'm open to other ideas. One other thing I've observed, these logs get really slow to read once they get too large. I'd like to have a "clear" command that runs whenever the UF starts. Otherwise, the script simply can't handle the pile of data that might be sitting there in the log.

The original author of this TA doesn't seem to have it in GitHub/similar so I'm not sure how best to get all of our ideas back into the original work.

0 Karma

hughkelley
Path Finder

Here's a sample of the results from that approach. The two TimeCreated entries are at the front and back of the logs. As you can see, now that I'm using an "offline" copy of the file, there's ~100ms of pause but that seems to be the only data loss.

TimeCreated : 11/1/2018 5:43:31 PM
TimeCreated : 11/1/2018 5:43:43 PM

LoggingPausedMs : 69
DataRetrievalMs : 1237
LogFileMaxBytes : 1073741824
LoggedBytes : 737280
LoggedRecs : 2088
LoggedTimespanSecs : 11.8507947
ScriptRunSecs : 5.8446792

TimeCreated : 11/1/2018 5:43:43 PM
TimeCreated : 11/1/2018 5:45:09 PM

LoggingPausedMs : 114
DataRetrievalMs : 22110
LogFileMaxBytes : 1073741824
LoggedBytes : 6139904
LoggedRecs : 16581
LoggedTimespanSecs : 85.8057772
ScriptRunSecs : 26.4272838

0 Karma

hughkelley
Path Finder

I've pushed a "whitelist" (zone ignore) feature to the GitHub repo as well. At least in our environment, I'm burning up several GB per day indexing local and/or low-risk DNS queries, e.g. microsoft.com.

0 Karma

exel_wild
Engager

Nice update. I'm wondering if you have specific message parser on a Splunk side? Right now we have a custom source type and it gives us whole message output without field separation in message.

0 Karma

hughkelley
Path Finder

The props.conf has regexes (written by the original author) to extract the fields.

EXTRACT-win-dns-analytical-if-dest = InterfaceIP=(?<src>[^;]+);\s+Destination=(?<dest>[^;]+);
EXTRACT-win-dns-analytical-if-src = InterfaceIP=(?<dest>[^;]+);\s+Source=(?<src>[^;]+);
EXTRACT-win-dns-analytical-src-if = Source=(?<src>[^;]+);\s+InterfaceIP=(?<dest>[^;]+);

..........
0 Karma

gjanders
SplunkTrust
SplunkTrust

@hughkelley I see no specific license on that app, I think the approach would be create your own github with the app in it, attribute the original author and then make enhancements, my opinion only!

Alerts for Splunk Admins https://splunkbase.splunk.com/app/3796/
Version Control for Splunk https://splunkbase.splunk.com/app/4355/
0 Karma

hughkelley
Path Finder

I'm a novice at pushing to Github, but here goes.

https://github.com/hkelley/TA-windnsanalytical/

Warning - I added an "init" script to run at UF startup. This may not meet your needs.

gjanders
SplunkTrust
SplunkTrust

Looks fine to me, at least the updates are accessible. Publishing to SplunkBase will obviously get more downloads but github is great for sharing!

Alerts for Splunk Admins https://splunkbase.splunk.com/app/3796/
Version Control for Splunk https://splunkbase.splunk.com/app/4355/
0 Karma

hughkelley
Path Finder

I'll try to share it via Splunkbase once I get one more tweak checked in.

We burn up a lot of log volume on A lookups for microsoft.com (4x more than for our internal/authoritative zones). I want to get some sort of filter in place for one or more of:

  • zones hosted on the DNS server
  • an Alexa 100-like list (open to suggestions here)
  • manual entries in the UI or script

exel_wild
Engager

OK, so the difference here is:

Get-WinEvent -LogName 'Microsoft-Windows-DNSServer/Analytical' -Oldest

This will throw error about busy channel.
Get-WinEvent : The requested operation cannot be performed over an enabled direct channel. The channel must first be disabled before performing the requested operation
At line:1 char:2
+ Get-WinEvent -LogName 'Microsoft-Windows-DNSServer/Analytical' -Olde ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [Get-WinEvent], EventLogException
+ FullyQualifiedErrorId : System.Diagnostics.Eventing.Reader.EventLogException,Microsoft.PowerShell.Commands.GetWinEventCommand

And code below would work perfectly fine:

Get-WinEvent -Path C:\Windows\System32\winevt\Logs\Microsoft-Windows-DNSServer%4Analytical.etl -Oldest

muhammadfahimma
Engager

I tried this, as a result I can parse the logs without actually stopping them which is pretty cool

0 Karma

mpasha
Path Finder

I have no issue reading the Events even when it is configured to overwrite the log.
it does not purge the CSV file, i use the Getdate function to append the date to the file so if the file exists the new data will be appended to the existing file and it will create a new file everyday.
you can use another process to deal with the older files. I personally delete the files older than 5 days.
to avoid the performance issue, i have scheduled it to run every 5 minutes and developed the PS script to read the file starting from 5 minutes ago.
it takes 60-70 seconds for our servers to append the new content for the past 5 minutes.

0 Karma

hughkelley
Path Finder

Are you able to read the logs via PS without getting the "enabled channel" error?

How/where does the CSV get purged?

I took a similar approach with my mods (my log is set for retention) but I read the log in place and then pause it just long enough to clear it (and write the logs to STDOUT).

Separate issue - performance of the Get-WinEvent cmdlet was terrible for us ( a few thousand records every minute) so I had to resort to reading via .NET classes and formatting my own string.

Is there a github site for this TA so that we could merge back some of these fixes?

0 Karma
.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!