I have this search string below which gives the top files with the most Bugs related to them.
index = git | rename Data.payload.head_commit.modified{} as FilesModified | rex field=Data.payload.head_commit.message max_match=10 "(?<BugID>[bB]ug.*[^@\w\w\d{4}]\d{3,6})" |rex field=BugID max_match=10 "(?<BugIDs>\d{3,6})"|eval BugIDs = ltrim(BugIDs,"0") | stats values(BugIDs) by FilesModified delim=", " | rename values(BugIDs) as BugIDs | eval BugCountPerFile = mvcount(BugIDs)| nomv BugIDs | search BugIDs =* | sort by -BugCountPerFile
And the result gives a list that looks like this:
Files Modified BugIDs BugCountPerFile
RequestHandler.cs 889,555,333 3
Response.cs 963,548 2
File.cs 874 1
And basically I want a time graph that has 5 lines. A line for each of the top five Files Modified with the most BugCountPerFile. The x axis is the time in months, and the y axis is the number of BugCountPerFile for that month. I want to be able to monitor the trends of Bug Fixes per file over time, and view them in this linegraph for the top 5 files with the most overall BugCountPerFile
Give this a try
index = git | rename Data.payload.head_commit.modified{} as FilesModified
| rex field=Data.payload.head_commit.message max_match=10 "(?<BugID>[bB]ug.*[^@\w\w\d{4}]\d{3,6})"
|rex field=BugID max_match=10 "(?<BugIDs>\d{3,6})"|eval BugIDs = ltrim(BugIDs,"0")
| bucket span=1mon _time
| stats dc(BugIDs) as BugCountPerFile by _time, FilesModified ,delim=", "
| sort _time,-BugCountPerFile | streamstats count as sno by _time
|where sno < 6 | timechart span=1mon max(BugCountPerFile) by FilesModified
Give this a try
index = git | rename Data.payload.head_commit.modified{} as FilesModified
| rex field=Data.payload.head_commit.message max_match=10 "(?<BugID>[bB]ug.*[^@\w\w\d{4}]\d{3,6})"
|rex field=BugID max_match=10 "(?<BugIDs>\d{3,6})"|eval BugIDs = ltrim(BugIDs,"0")
| bucket span=1mon _time
| stats dc(BugIDs) as BugCountPerFile by _time, FilesModified ,delim=", "
| sort _time,-BugCountPerFile | streamstats count as sno by _time
|where sno < 6 | timechart span=1mon max(BugCountPerFile) by FilesModified
Actually the query gives you graph for top 5 Files modified every month and it's possible that the top 5 files differs between months (e.g. A,B,C,D,E in month1 and A,B,G,E,F in next), hence its giving more than 5 fields. Also, The timechart bydefault show top 10 values and put remaining smaller values in one group called OTHER. You can disable showing other by using "useother=f" at the end of timechart command.
One more question on the sno thing. It gives the top 5 Files modified? So its suppose to give 5 lines basically? Because I have 11, one of them being an "OTHER" line. Any thoughts on what that may be?
Thanks so much!
More details on different search commands is available here.
http://docs.splunk.com/Documentation/Splunk/6.0/SearchReference/WhatsInThisManual
See section "Search Command Reference"
The bucket command basically put events into evenly distributed bins ('span=1month _time' will modify _time so that all _time value in same month with have same value. e.g. 01/01/2014 01:01:02 and 01/23/2014 05:03:33 will have same _time value as 01/01/2013 00:00:00). In the stats dc is for distinct count. (your earliest search was taking values(BugIDs) and then taking mvcount which will basically giving distinct count of BugIDs). The streamstats giving statistics for each event basically providing serial no based on _time (1 month bucket). condition sno<6 selects top 5 values (after sort).
That seems to do the trick. I had to change the timechart span to 1month instead of 1m because it exceeded 50,000 rows. Would you mind explaining what the part of the search string you added is actually doing. I'm pretty new to Splunk, I've never seen bucket, dc, streamstats, and why is sno < 6?
Thank you so much