Getting Data In

Average Daily Unique IPs by Weekday in Web Log Files

Path Finder

Hi all,
I'm having a great time with Splunk and using it to analyze some IIS web logs. I've been successful in creating a search that counts the daily unique IPs. But I am wanting to use that same data and show the average number of unique IPs by weekday over different periods of time (3 months, 6 months, all time).

My issue is that I have created a successful search for 3 months, but when I try to run it for anything greater than that, the calculations are incorrect. I presume I am hitting some sort of summarization limit and it is grouping the data in an odd way.

Here is my search string that works at three months or less:

sourcetype=iis
| bin _time span=1d 
| dedup c_ip
| timechart count(c_ip) as daily_uniques
| eval weekday = strftime(_time, "%w")
| stats avg(daily_uniques) by weekday

Why would this search work correctly for the last three months but not for 6 or 12 month periods?

Tags (2)
1 Solution

Path Finder

Thanks to both linu1988 and martin_mueller, my question was answered. The correct query for what I am trying to accomplish is:

sourcetype=iis
| timechart span=1d dc(c_ip) as daily_uniques
| eval weekday = strftime(_time, "%w")
| stats avg(daily_uniques) by weekday

View solution in original post

0 Karma

Path Finder

Thanks to both linu1988 and martin_mueller, my question was answered. The correct query for what I am trying to accomplish is:

sourcetype=iis
| timechart span=1d dc(c_ip) as daily_uniques
| eval weekday = strftime(_time, "%w")
| stats avg(daily_uniques) by weekday

View solution in original post

0 Karma

Champion

That would give you the redundant count of IP. I was assuming that the c_ip was the IP extracted field. So it should be done as dc(IP) by date_wday that would be appropriate.

0 Karma

Path Finder

Brilliant. Just what i needed to know. Thanks!

0 Karma

SplunkTrust
SplunkTrust

Yeah, dedup is not aware of any bucketing but instead operates over the entire dataset.

Path Finder

martin, are you saying that dedup is not deduping on the time bucket of 1 day but across the entire dataset?

0 Karma

SplunkTrust
SplunkTrust

Your original query is incorrect because of the way you use dedup. If someone visits on two days you're only counting him for one of the days, dedup throws out the other visit before it was counted for that day.

0 Karma

Champion

logically they should work equally, but the timechart aggregation should be faster. Honestly i don't know internally how would splunk process but would only prefer a line of code rather than 3 lines doing the same thing.

would like to see this running.

|timechart dc(IP) by date_wday

0 Karma

Path Finder

so i ran the command you gave above:
sourcetype=iis
| timechart span=1d dc(c_ip) as daily_uniques
| eval weekday = strftime(_time, "%w")
| stats avg(daily_uniques) by weekday

and i'm getting results, but the counts are different than with my original query. can you tell me how these two command sequences are different?

timechart span=1d dc(c_ip) as daily_uniques

vs.

bin _time span=1d
| dedup c_ip
| timechart count(c_ip) as daily_uniques

why would is structure a search as you did versus the way i did?

0 Karma

Path Finder

looks like date_wday is present. I ran the following command on the last 3 months of data and got results as expected:

sourcetype=iis
| timechart count by date_wday

0 Karma

Champion

date_wday is available in splunk or not when you run with sourcetype=iis?

sourcetype=iis
| timechart span=1d dc(c_ip) as daily_uniques
| eval weekday = strftime(_time, "%w")
| stats avg(daily_uniques) by weekday

could you try if date_wday is not present?

0 Karma

Path Finder

when i do that, i get "No results found"

0 Karma

Champion

it was an after thought 😛

0 Karma

Path Finder

huh, the query didn't show when i first looked at your comment. trying now.

0 Karma

Champion

did you try the query which is provided? what was the result?

0 Karma

Path Finder

Sorry, I should have clarified that point.

So the commands above from sourcetype through timechart group the data by day, then dedup by client IP address, then counts the number of daily uniques. From there I organize the unique IP counts by weekday and calculate the average number of unique IPs by weekday.

When I choose a time slice larger than three months, however, my averages are no longer calculated by day of the week, but calculate to some unknown grouping.

This has the effect of not showing every single work day or showing hundreds (instead of 10s) of unique IPs as the average.

0 Karma

Champion

how do you say it's not working?

sourcetype=iis
| dedup c_ip,date_wday
|timechart avg(c_ip) by date_wday

could you try this and see where do you see the issue?

0 Karma