Splunk Search

Accelerating searches over data with millions of logs

anthonycopus
Path Finder

Hi,

Each day I receive millions of events of type "video_view". These logs are taken for hundreds of thousands of videos.

index="main"  action="video" | stats count as views by video_name |  sort -views | head 1000

index="main"  action="video" | stats dc(user) as views by video_name |  sort -views | head 1000

As you can see, we need to table both views and distinct user views with the video names. What is the best way to approach a very fast loading version of these?
We have previously gone with accelerated searches, but due to the massive amount of data (logs and different names of videos), this still takes minutes to load at times.

0 Karma

lukejadamec
Super Champion

You should use summary indexing.

Here is a good document:

http://docs.splunk.com/Documentation/Splunk/6.0/Knowledge/Usesummaryindexing

Here is a video explaination:

http://www.splunk.com/view/SP-CAAACZW

0 Karma

somesoni2
Revered Legend

I guess summary indexing will solve both your problem, with or without including _time. Suppose your create a sumary index search with "index=main action=video | stats count by user,video_name " which will run hourly storing into main_1h index, then using "index=main_1h source='yoursummary index search name' | stats sum(count) as views by video_name | sort -views |head 1000" will serve your first requirement and "index=main_1h source='yoursummary index search name' | stats dc(user) as views by video_name| sort -views |head 1000" will serve second.

0 Karma

lukejadamec
Super Champion

I would create the summary index with table stats.

index="main" action="video" | table _time, action, user, video_name

It will be far faster to search the subset of data in the summary index then it would be searching the raw data.

0 Karma

anthonycopus
Path Finder

How do you mean to include _time? I don't see how this would solve the problem at the moment

0 Karma

lukejadamec
Super Champion

Include _time
I had the same problem with failed logon events with over a hundred million windows security events. It changed a searches that were basically impossible to ones that complete in seconds.

0 Karma

anthonycopus
Path Finder

Summary indexing is certainly one solution to the first query.

However, if I were to use this for the latter then it would count a distinct user view once per time the summary index runs. So, if the index was run hourly then it would be counted up to 24 times in a report which is meant to show distinct video views over the last 24 hours. This should never be more than once, or it's not distinct.

Any idea to get around this? I've used summary indexes in the past but the same issue always arises when distinct user counts get involved

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...