Hi,
I have a requirement to create dashboards around user activity. Best practice suggests I use summary indexes but I am having no joy.
The requirement is to view user activity by day and month.
I need to setup the summary index so it contains all past data and indexes the data moving forward.
I thought I could just do the below once and schedule it moving forward.
index=iis
| sistats count by cs_username date
| collect index=sumuserdate
and run all the searches off that index but the results are odd and also I lose the ability to use the time range picker.
For example I can use the below search against the normal index but this results in Splunk going over 60M events
index=iis | stats count as Hits dc(date) by cs_username date_month date_year | rangemap field=dc(date) "1"=1-1 "2-11"=2-11 "12-19"=12-19 "20"=20-50 | stats count dc(cs_username) by range date_month date_year
Once complete I plan to enrich the users with extra information form a lookup table, geo-location, user rights, business area etc
Do I need to add a bucket & span? strptime the date? timechart & span the query? have separate index's for the day and month?
I know I'm well off track so any help would be great.
Hello,
Don't include the time any where while doing the summary. When you start aggregating data the search automatically becomes faster.We all know collect command will populate all the value but it has limitation where it take the local machine's name rather than the actual host. So do something like this.
Have some sample timing to have granularity of the summary
index=iis|bucket _time span=30m| stats count dc(cs_username) by cs_username
Create a saved search and schedule it for a one time load to the target summary index. define the earliest and latest parameter according to your requirement. Avoid running it again and again to have duplicate entries.
with the above you can preserve internal event timing and you can use the time range picker because in the summary data you have the time trend.
in your Dashboard do the calculation. You should be fine now by user selecting time ranges
Thanks,
L
I don't know if this is an error on my part but would the below bug be causing a problem?
I know it causes issues with REST as well as the UI.
(From another post) - To set the time for summary index events, Splunk uses the following information, in this order of precedence:
Could this bug be causing _time to be skipped over?
Hello,
Don't include the time any where while doing the summary. When you start aggregating data the search automatically becomes faster.We all know collect command will populate all the value but it has limitation where it take the local machine's name rather than the actual host. So do something like this.
Have some sample timing to have granularity of the summary
index=iis|bucket _time span=30m| stats count dc(cs_username) by cs_username
Create a saved search and schedule it for a one time load to the target summary index. define the earliest and latest parameter according to your requirement. Avoid running it again and again to have duplicate entries.
with the above you can preserve internal event timing and you can use the time range picker because in the summary data you have the time trend.
in your Dashboard do the calculation. You should be fine now by user selecting time ranges
Thanks,
L
Thanks for the advice, so I use stats instead of sistats?
You can use anything as long as you keep meaningful data. It is not mandatory to use sistats for Summary. Make your Summary set and make the searches go from there which will work perfectly. Choose your desired level of bucket time.
Thanks for the advice but I cant get the above to work.
Searching against the summary only works over all time as there is only one timestamp on each event after adding it to the summary index. All event default to 01/06/2014.
In my test env I used the following query
index=iis | bucket _time span=30m | stats count dc(cs_username) by cs_username
Start time -3mon@mon Finish Time -20d@d
Scheduled the search via cron 15 17 * * *
Enabled summary indexing and assigned it to "summarytest2"
Any ideas?
Delete the summary index completely and index them freshly, it should work..
Same issue as before, all the timestamps are the start time of the time range. I thought I had the query wrong but its was as above.
I don't know if it makes any difference but for the index / source type I query I had to adjust the porps.conf to get the correct timestamp when indexing, would this cause any issues?
[iis-prod]
TIME_FORMAT = %Y-%m-%d %H:%M:%S
sample event in the summary index
05/01/2014 00:00:00 +0100, search_name="TEST-BL2", search_now=1411493100.000, info_min_time=1398898800.000, info_max_time=1410130800.000, info_search_time=1411493101.076, count=190, cs_username=PSI8xxxxxxxx, dc(cs_username)=1
host = NLDxxxxxxP source = TEST-BL2 sourcetype = stash
it seems like your raw events are not having any other _time. Did you delete the existing index and tries to re-index it?
index=iis | bucket _time span=1d | timechart span=1d count dc(cs_username)
The above works and the correct time is passed to the summary index.
I can query on _time and the time range picker workls, there is no date field but there is date_zone,date_year, date_zone, date_wday, date_second, date_month, date_minuet, date_mday, date_hour.
The below query still will not work but at least I've got something. I'll try to adjust it so its by cs_username.
index=iis | bucket _time span=30m | stats count dc(cs_username) by cs_username
yeah timechart also should do. Yesterday asked you to check with
| stats count by _time, cs_username
.
My main intention was to make the timerange option available in summary. Happy that you worked it out 🙂
That works as well. 🙂 thanks
Yes deleted the index, created new index, new reports etc. I'll keep plugging away at this. Thanks so much for your help so far. I'll drop an update if there is any meaningful progress.
when you run the search manually with the timestamp selected, don't you see the trend? the test env search should give you the data for every 30minutes count. if you want daily distinct value you need to use different summary index, that will not give you correct value after summarized as there are no values to look at
I do in Verbose Mode in the Events tab, the time is grouped into the 30 min buckets with the correct date stamp. In the Statistics tab I only see cs_username count dc(cs_username)