Splunk Search

How to get a random sample of iis events each day for the last X days to build control charts?

New Member

On iis logs, suppose I have 60000 transactions per 24 hours. How can I get a random sample of say 5000 events? I need to get a random sample for each day for suppose last 50 days. I want to build control charts based on response time (time_taken) from the iis logs.

0 Karma

Esteemed Legend

Here is a random sample macro I use:

From macros.conf:

[Random_Sample(1)]
args = RandomSamplePercentEventsToKeep
definition = eval RandomSampleSeed = random()\
| sort 0 -RandomSampleSeed\
| eventstats count AS RandomSmpleTotalEventCount\
| eval RandomSampleNumberToKeep = ceil($RandomSamplePercentEventsToKeep$ * RandomSmpleTotalEventCount / 100)\
| streamstats count AS RandomSampleSerialNumber\
| where RandomSampleSerialNumber<=RandomSampleNumberToKeep\
| fields = RandomSample*
iseval = 0
0 Karma

SplunkTrust
SplunkTrust

The latest Splunk Cloud version has recently gotten an event sampling feature, so it'd be reasonable to assume that's coming to Splunk Enterprise some day as well.
http://docs.splunk.com/Documentation/Splunk/6.3.1511/Search/Retrieveasamplesetofevents

Until then, you could fake a sampling rate of 1:60 by only looking at a specific date_second, or a sampling rate of 1:30 by looking at two seconds, and so on. If your data is sufficiently well-spread, this not-random sampling should work well enough.

For both sampling approaches, make sure you don't mess up your transactions if they comprise of multiple events per transaction.

Alternatively, just run over all your data without sampling.

0 Karma

SplunkTrust
SplunkTrust

For 1:60 sampling, add date_second = 42 to your search. Any other second will do.

To check if this gives you a reasonable sampling, you could run statistics by second to see if there are any outliers, e.g. lots of events generated at the second zero from cronjobs.

You really should first consider running over your entire data set though. 60000 events over 24 hours really isn't that much if you have reference-spec hardware or better.

0 Karma

New Member

can you give me example of how to fake the sampling ?

0 Karma