Getting Data In

Splunk 7.0.0 - Metrics Index - Filter on hour and weekday

Explorer

Hi
We are trying out the new Metrics Index in Splunk 7.0 and ran into issues when filtering on the data.
We want to only report the values inside business hours (Mon-Fri 7-18) for monthly reporting.

Our old search in the Events type Index looked like this:
index="response" (datehour>=7 AND datehour<=18 AND NOT datewday=sunday AND NOT datewday=saturday) | stats count, avg(time), perc95(time) by app

We can query the data but can't filter on hour or weekday, is there any possibilities to do this in the Metrics Index with mstats?
Or can you only see all data in a selected time period?
| mstats avg(value), perc95(value) as p95, count WHERE index=mcgapmresponsemetrics AND metricname=apm.response by app

One work-around could be to add a dimension that was named Inside/Outside business hours, but then it would be impossible to change afterwards.

1 Solution

Champion

Per usual, everything @guilxmx is on point. Still wanted to offer something else I figured out.

Basically, you don't have things like date_hour and date_wday in the default Metrics Indexes because you are likely using the default Metrics sourcetypes (stats or collectd) or you have a custom sourcetype that has ADD_EXTRA_TIME_FIELDS = false in props.conf.

If you create a new sourcetype (a custom one) where you define the metrics fields and you set ADD_EXTRA_TIME_FIELDS = true in props.conf, you will see the additional time fields.

From a licensing standpoint, this shouldn't matter because Metrics Indexes only charge by the event count, not the size of the event. Still, there may be some performance hit as @guilmxm suggested (i.e., you are increasing the cardinality of the Metrics Index), so its worth experimenting with or getting feedback from Splunk before moving forward.

View solution in original post

0 Karma

Champion

Per usual, everything @guilxmx is on point. Still wanted to offer something else I figured out.

Basically, you don't have things like date_hour and date_wday in the default Metrics Indexes because you are likely using the default Metrics sourcetypes (stats or collectd) or you have a custom sourcetype that has ADD_EXTRA_TIME_FIELDS = false in props.conf.

If you create a new sourcetype (a custom one) where you define the metrics fields and you set ADD_EXTRA_TIME_FIELDS = true in props.conf, you will see the additional time fields.

From a licensing standpoint, this shouldn't matter because Metrics Indexes only charge by the event count, not the size of the event. Still, there may be some performance hit as @guilmxm suggested (i.e., you are increasing the cardinality of the Metrics Index), so its worth experimenting with or getting feedback from Splunk before moving forward.

View solution in original post

0 Karma

Explorer

@rjthibod that would be really cool if it's possible to do.

Have tried to create my own custom sourcetype, but how can I search in the metrics index for datehour and datewday, all my searches returns empty datasets when including hour?

For example: | mstats avg(value), perc95(value), count WHERE index=mcgapmresponsemetrics AND metricname=apm.* AND date_hour>2 by app

| mstats avg(value), perc95(value), count WHERE index=mcgapmresponsemetrics AND metricname=apm.* by app, date_hour

My sourcetype:
[mcgapmmetrics]
ADDEXTRATIMEFIELDS = true
DATETIME
CONFIG =
NOBINARYCHECK = true
category = Metrics
description = Metrics Index Data with Time Fields
pulldown_type = 1

0 Karma

Explorer

Got it to work, it only works when you index CSV data, not when you use the HTTP Event collector. Will create other question about that.

For those interested, using ADDEXTRATIME_FIELDS add the fields as dimensions in the Metrics Index, so it's properly not an god idea, as it will give overhead to number of dimensions/cardinality, have to test it with a large number of results with different values to see the performance/memory overhead.

CSV file:
metrictimestamp,metricname,_value
1508756758.000,Test,0.50
1508756758.000,Test2,1.50

Props:
[mcgapmmetricscsv]
ADD
EXTRATIMEFIELDS = True
DATETIMECONFIG =
INDEXED
EXTRACTIONS = csv
KVMODE = none
NO
BINARYCHECK = true
SHOULD
LINEMERGE = false
TIMESTAMPFIELDS = metrictimestamp
TIMEFORMAT = %s.%Q
category = Metrics
description = Comma-separated value format for metrics. Must have metric
timestamp, metricname, and _value fields.
disabled = false
pulldown
type = 1

Used this command to see what was stored in the Splunk Index: c:\Program Files\Splunk\bin>splunk cmd walklex "C:\Program Files\Splunk\var\lib\splunk\mcgapmresponsemetricshour\db\hotv11\1507908525-1507598431-3077548847437235074.tsidx" ""

my needle:
0 2 host::rt-laptop
1 1 metricname::Test
2 1 metric
name::Test2
3 2 source::metricsdata.txt
4 2 sourcetype::mcg
apmmetricscsv
5 1 catalog::Test2|datehour|datemday|dateminute|datemonth|datesecond|datewday|dateyear|datezone
6 1 _catalog::Test|date
hour|datemday|dateminute|datemonth|datesecond|datewday|dateyear|datezone
7 2 _dims::date
hour
8 2 dims::datemday
9 2 dims::dateminute
10 2 dims::datemonth
11 2 dims::datesecond
12 2 dims::datewday
13 2 dims::dateyear
14 2 dims::datezone
15 2 subsecond::.000
16 2 date
hour::11
17 2 datemday::23
18 2 date
minute::5
19 2 datemonth::october
20 2 date
second::58
21 2 datewday::monday
22 2 date
year::2017
23 2 datezone::0
24 2 host::rt-laptop
25 1 metric
name::test
26 1 metricname::test2
27 2 source::metrics
data.txt
28 2 sourcetype::mcgapmmetrics_csv

0 Karma

Path Finder

For me this doesn't work.
I'm using a Summary-Search with | mcollect

I even tried to explicitly specify the fields:

index=_internal source=*license_usage.log type="Usage" 
| eval h=if(len(h)=0 OR isnull(h),"(SQUASHED)",h) 
| eval s=if(len(s)=0 OR isnull(s),"(SQUASHED)",s) 
| eval idx=if(len(idx)=0 OR isnull(idx),"(UNKNOWN)",idx) 
| bin _time span=1h 
| stats sum(b) as _value by _time, pool, s, st, h, idx, date_wday, date_hour 
| eval metric_name="license_usage.used_bytes" 
| mcollect index=summary_license_metric

Any ideas?

0 Karma

Champion

I had added them in while I was doing something else (that's how I knew they would show up), but I have not tried using them. Let me go spin that up.

0 Karma

Champion

It works fine for me. Did you re-index your data correctly once you set up the new props.conf settings?

Here is my props.conf

[metrics_app_ux_layer8insight]
MAX_TIMESTAMP_LOOKAHEAD = 20
TIME_FORMAT             = %s%3N
TZ                      = UTC
ADD_EXTRA_TIME_FIELDS   = true
SHOULD_LINEMERGE        = false
NO_BINARY_CHECK         = true
ANNOTATE_PUNCT          = false
pulldown_type           = 1
category                = Metrics
TRANSFORMS-app-ux-metrics      = app-ux-metrics
TRANSFORMS-app-ux-metrics-host = app-ux-metrics-host

The data is pipe-delimited with the timestamp at the front. The transforms (not shown) just pull out the fields for each segment between the pipes.

I re-indexed some data and then ran the following search and everything worked fine

| mstats sum(value) WHERE metricname=layer8insight.appux.activity AND datewday="friday" AND datehour > 20 by dest datewday date_hour

The data came out like this.

dest datewday datehour sum(_value)
XXX-xa-1 friday 22 568.780000
XXX-xa-4 friday 23 906.570000
XXX-xa-3 friday 22 488.120000
XXX-xa-2 friday 23 5757.760000

0 Karma

SplunkTrust
SplunkTrust

Thanks @rjthibod, you are the man 😉

0 Karma

SplunkTrust
SplunkTrust

Hello,

I actually had to check that as well for the release under development of my apps.

What about:

| mstats avg(_value) as value WHERE index=mcg_apm_response_metrics AND metric_name=apm.response by app span=1s
| eval date_hour=strftime(_time, "%H"), date_wday=lower(strftime(_time, "%A"))
| where (date_hour>=7 AND date_hour<=18 AND NOT date_wday=sunday AND NOT date_wday=saturday)
| stats avg(value) as value, perc95(value) as p95, count as count by app

Or with some variance. (not sure for the count but it should work)

Basically it does not change much but you first retrieve the metrics on the lowest time scale and then you rely on the SPL power to achieve filtering out the periods you don't want, and finally do the calculation.
Thanks to the metric store performance, this should perform well.

I doubt that ingesting the datehour and datewday as dimensions would be the best way to to go (specially for storage, licensing costs, reliability and maintainability), but the performance comparison of the 2 would be an interesting test.

Cheers,

Guilhem

Explorer

Will have to test it on a larger dataset, I wouldn't add datehour and datewday as dimensions.
I would add an dimension called "inbusinesshours" and set it to 0 or 1, and then do my hour/day check before I index the data, the big limitation is that I can't change it dynamically afterwards.

One big issue with your query is that we are doing avg and percentiles on averages (already aggregated data), it's properly ok when span is 1s, by if we increased it to span=1h for performance reasons, the data would be incorrect. (sistats fixes this on the Events Index when doing rollups)

0 Karma