This question was asked in the interview. Index is splunk's _internal, fields are host and date_month. I want to create an output in such a way that columns will be "Month", "Host", "Count". In month column each row will represent each day of month, in host column name of the 1st host will appear for all days of that month and count in column count. After the last day of month, in month column I want to display month name, in host column name of the 2nd host for all the days of month and count. How to write SPL query for this?
This is a stupid question on many levels. You should NEVER use the "date_*" fields for anything because they are pre-TZ-normalized values and do not take into account your user's personal TZ value and so almost always will be incorrect. You should instead calculate your own "date_*" values, if you even need them (which in this case, you do not). The question is phrased EXTREMELY poorly and is very confusing to understand. I spent 5 times as much time reading and re-reading the question as I did coming up with the answer. I suspect that part of the answer is that you should know (and explain in your answer) that by default, only the last 30 days are stored in the internal index data so this question is even dumber because you will never have a complete month in your data. In any case, if I understand it correctly, here is one solution:
|tstats count AS Count
WHERE index="_internal" AND date_month="*" earliest=0 latest=now
BY host _time span=1d
| rename _time AS Month, host AS Host
| eval _mon=strftime(Month, "%Y%B")
| streamstats current=f last(_mon) AS prevmon BY Host
| eval neweventhere = if(_mon!=prevmon, mvappend("FIXME","KEEPME"), "NO")
| fields - prevmon
| mvexpand neweventhere
| rename neweventhere AS _neweventhere
| eval Month = if(_neweventhere="FIXME", (Month - 1), Month)
| eval _mon=strftime(Month, "%Y%B")
| eventstats sum(Count) AS _TotalCount BY _mon Host
| eval Count = if(_neweventhere="FIXME", _TotalCount, Count)
| fieldformat Month = if(_neweventhere="FIXME", strftime(Month, "%B"), strftime(Month, "%B %d"))
| table Month Host Count _*
Hi @splunkuser1
The interview question is poorly worded and a little hard to understand. If I break it down there looks to be 5 key requirements
Requirement 4 and 5 are hardest to make sense of but as an exercise in writing SPL, I guess it has merit as it would test your knowledge.
Anyway, based on the above, this is how I've interpreted the question and what I believe would provide a method to produce a result set with the answer...
index=_internal date_month=*
| bin span=1d _time
| stats
count
BY _time date_month host
| eval day=strftime(_time, "%d")
| streamstats
count AS order
first(host) AS first_host
BY date_month
| appendpipe [ where order=2 | eval second_host=host ]
| eventstats
max(order) as last_month_day
values(*_host) AS *_host
BY date_month
| eval host=if(order=last_month_day, second_host, first_host)
,day=if(order=last_month_day, date_month, day)
| sort _time order
| rename day AS Month host AS Host count AS Count
| table Month Host Count
Hint: if testing, then I suggest using a high event sampling ratio to make the query fast and efficient.
Would be interested in how others in the Splunk Community would use SPL to answer this problem too
Hope it helps
What have you tried so far?
index=_internal, |stats count by date_month, host| rename date_month AS Month
If you want counts by day, you could try by date_day
Can you provide a mock-up of what you want the result to look like?
Month | Host | Count |
1 | xyz | 3 |
2 | xyz | 7 |
3 | xyz | 7 |
April | xyz | 17 |
4 | abc | 4 |
5 | abc | 6 |
6 | abc | 8 |
7 | abc | 3 |
April | abc | 21 |
Try something like this
index=_internal
| bin _time span=1d
| stats count by _time, host
| eval month=strftime(_time,"%b")
| eventstats sum(count) as total max(_time) as lastday by host month
| eval row=if(_time==lastday,mvrange(0,2),0)
| mvexpand row
| eval Day=if(row==1,Month,strftime(_time,"%d"))
| eval count=if(row==1,total,count)
| table Day, host, count
Your query gave me almost the correct answer, but as I mentioned in the above table, the month name is not displayed in the answer
Is it just that you are getting the abbreviated month and not the full month - if so, try this change
| eval month=strftime(_time,"%B")
If not, please provide more information about what is and is not working