Subsearch results differ from execution by itself

ww9rivers · ‎06-10-2013

Got 2 searches that searches in MAC address data exported daily:

(1) Get the everyday count of unique MAC addresses for the last week:

sourcetype=usertracking earliest=-7d@d latest=-1d@d
  | bucket _time span=1d
  | dedup umac date_mday
  | eval DayOfWeek=strftime(_time, "%u.%a")
  | eval WoY=strftime(_time, "%W")
  | stats count AS count1 by DayOfWeek, WoY

(2) To generate a daily average of such counts for the last 8 weeks (EDIT: 7 weeks, actually):

sourcetype=usertracking earliest=-8w@w latest=-1w@w
    | bucket _time span=1d
    | dedup umac date_mday
    | eval DayOfWeek=strftime(_time, "%u.%a")
    | eval WoY=strftime(_time, "%W")
    | stats count AS count8 by DayOfWeek, WoY
    | stats avg(count8) AS Last8WeekAvg by DayOfWeek

Below are result sets for (1) and (2):

    DayOfWeek   WoY     count1
1   1.Mon       22  50041
2   2.Tue       22  49985
3   3.Wed       22  50387
4   4.Thu       22  50026
5   5.Fri       22  49990
6   6.Sat       22  38166

    DayOfWeek   Last8WeekAvg
1   1.Mon       29481.428571
2   2.Tue       30043.571429
3   3.Wed       31783.714286
4   4.Thu       39884.142857
5   5.Fri       39821.714286
6   6.Sat       27848.857143
7   7.Sun       22013.857143

(3) Now I combine the 2 searches:

sourcetype=usertracking earliest=-7d@d latest=-1d@d
  | bucket _time span=1d
  | dedup umac date_mday
  | eval DayOfWeek=strftime(_time, "%u.%a")
  | eval WoY=strftime(_time, "%W")
  | stats count AS count1 by DayOfWeek, WoY
  | append [ search sourcetype=usertracking earliest=-8w@w latest=-1w@w
    | bucket _time span=1d
    | dedup umac date_mday
    | eval DayOfWeek=strftime(_time, "%u.%a")
    | eval WoY=strftime(_time, "%W")
    | stats count AS count8 by DayOfWeek, WoY
    | stats avg(count8) AS AvgCount by DayOfWeek ]
  | stats first(count1) AS LastWeek, first(AvgCount) as Last8WeekAvg by DayOfWeek

The result for Last8WeekAvg changes:

    DayOfWeek   LastWeek    Last8WeekAvg
1   1.Mon       50041       48230.500000
2   2.Tue       49985       49876.000000
3   3.Wed       50387       50140.000000
4   4.Thu       50026       50347.500000
5   5.Fri       49990       49750.666667
6   6.Sat       38166       38134.000000
7   7.Sun                   36887.500000

Why?

How do I go about debugging a subsearch?

Help and insights would be much appreciated. Thanks.

[EDIT]

In trying to debug this issue, I ran this search:

sourcetype=usertracking earliest=-7d@d latest=-1d@d
  | bucket _time span=1d
  | dedup umac date_mday
  | eval DayOfWeek=strftime(_time, "%u.%a")
  | eval WoY=strftime(_time, "%W")
  | stats count AS count1 by DayOfWeek, WoY
  | append [ search sourcetype=usertracking earliest=-9w@w latest=-1w@w
    | bucket _time span=1d
    | dedup umac date_mday
    | eval DayOfWeek=strftime(_time, "%u.%a")
    | eval WoY=strftime(_time, "%W")
    | stats count AS count8 by DayOfWeek, WoY ]
  | stats first(count1) AS LastWeek, values(count8), avg(count8) as Last8WeekAvg by DayOfWeek

The result seems to show that less than 8 weeks were searched in the subsearch:

    DayOfWeek   LastWeek    values(count8)  Last8WeekAvg
1   1.Mon       50041       46582           48230.500000
                            49879
2   2.Tue       49985       49710           49876.000000
                            50042
3   3.Wed       50387       50088           50140.000000
                            50192
4   4.Thu       50026       50159           50293.666667
                            50186
                            50536
5   5.Fri       49990       49572           49750.666667
                            49724
                            49956
6   6.Sat       38166       38088           38134.000000
                            38114
                            38200
7   7.Sun                   36878           36887.500000
                            36897

ww9rivers · ‎06-10-2013

Found this in Splunk Answers: http://splunk-base.splunk.com/answers/11949/auto-finalized-after-time-limit-reached

Now this seems to work as expected:

sourcetype=usertracking earliest=-7d@d latest=-1d@d
  | bucket _time span=1d
  | dedup umac date_mday
  | eval DayOfWeek=strftime(_time, "%u.%a")
  | eval WoY=strftime(_time, "%W")
  | stats count AS count1 by DayOfWeek, WoY
  | append maxtime=600 [ search sourcetype=usertracking earliest=-9w@w latest=-1w@w
    | bucket _time span=1d
    | dedup umac date_mday
    | eval DayOfWeek=strftime(_time, "%u.%a")
    | eval WoY=strftime(_time, "%W")
    | stats count AS count8 by DayOfWeek, WoY ]
  | stats first(count1) AS LastWeek, avg(count8) as Last8WeekAvg by DayOfWeek

Of course, bmacias84 is right about summary indexing when efficiency is needed.

bmacias84 · ‎06-10-2013

Use the search job inspector to debug your searches. Also it possible that you are hitting an events limit which will cause events to be truncated. By default splunk limits maxresultrows to 50k and subsearch results to 10.5k. Take a look at the limits.conf searchresults and subsearch stanzas. Be careful when modifying your limits.conf files as this can cause instabilities.

Also consider using summary indexing. This will speed up your searches and allowing you to retain data longer.

Additional Reading:

Hope this help or gets you started. Don’t forget to accept and vote up answers that help.

Cheers,

bmacias84 · ‎06-10-2013

Subsearches can get you in trouble with large data sets. Thats why I perfer Summary indexing things like this.

ww9rivers · ‎06-10-2013

I guess it may be because of this: "[subsearch]: Search auto-finalized after time limit (60 seconds) reached." -- somehow I missed that message earlier.

ww9rivers · ‎06-10-2013

Thanks. I did a bit of debugging and edited my post above. Will try more.

Subsearch results differ from execution by itself

AppDynamics Summer Webinars

SOCin’ it to you at Splunk University

Credit Card Data Protection & PCI Compliance with Splunk Edge Processor

Are you a member of the Splunk Community?

Subsearch results differ from execution by itself

AppDynamics Summer Webinars

SOCin’ it to you at Splunk University

Credit Card Data Protection & PCI Compliance with Splunk Edge Processor