Splunk export has a limitation of 10000 rows. How do i get 5 top pages (either by visit or pageviews) per day? Currently I'm using a data model. For example sake, lets say i there's more than 10000 visited pages for any given day. If i ran the query below with a pipe of |head 5 and ran the search for a full year, it wouldn't give me top 5 pages per day for a full year. Instead it would give me top 5 rows from all pages and days.
|pivot MainSites HomePages dc(User) AS "DistinctUser" dc(VisitSession) AS "DistinctVisits" sum(wtPageView) AS "TotalPageViews" sum(Click) AS "TotalClicks" SPLITROW TimePeriod_Daily SPLITROW eventURL | sort - "TotalPageViews"
For anyone who is looking to do something similiar - get top x per y when dealing with a data model/pivot, you'll want to run a subsearch similiar to the one below. You'll get an extra column called sum(HomePages) which is the equivalent of total hits for the current time period.
|pivot MainSites HomePages SPLITROW TimePeriod_Daily | appendcols [pivot MainSites HomePages dc(User) AS "DistinctUser" dc(VisitSession) AS "DistinctVisits" sum(wtPageView) AS "TotalPageViews" sum(Click) AS "TotalClicks" SPLITROW TimePeriod_Daily SPLITROW eventURL | sort - "TotalPageViews"| head 5]
I don't think that is going to do what you think it does. Using a subsearch with 2 pivot commands is not a good practice to start, since using datamodel
is the preferred method any way.
Without knowing the specifics of your Data Model, I can give you a "normal search" that gives you what you want, and then perhaps you can apply it to your data model.
<find_your_web_logs> | bin span=1d _time | top limit=5 VisitSession by _time
This will split your data in 1 day chunks, and then do a top
with the split by of _time
, which then gives the top 5 per day.
Sticking it into datamodel
land (using some assumptions):
|datamodel <datamodel> <object> search | bin span=1d _time | top limit=5 VisitSession by _time
If i used | top limit=5 VisitSession by _time, my result will look something like
_time VisitSession count percent
This tends to be completely useless if i want the top 5 eventURLs per month for the span of 12 months.
Time Period eventURL DistinctUser DistinctVisits TotalPageViews TotalClicks
month1 URL1 10 50 100 0
month1 URL2 9 40 100 0
month1 URL3 7 31 100 30
month1 URL4 6 30 100 20
month1 URL5 5 20 100 10
month2 URL1 10 50 100 0
month2 URL2 9 40 100 0
month2 URL3 7 31 100 30
month2 URL4 6 30 100 20
month3 URL5 5 20 100 10