Hi all,
I need your help.
I retrieve a log from Sharepoint which contains the list of all published document with its information, in CSV format
Title, Author, Contributor, Last_Update_date, Version
Driver_policy, Albert, John, 07/03/2017, 4.0
Security_policy, Denise, MArie, 01/01/2017, 2.0
I would like to detect if a document has been reviewed at least 1 time a year.
Each day, I'll send the logs to Splunk. This logs could be the same if the document has been not modified
My guess is to play with the last update and the version in a lookup.
But I don't how to start.
Do you have any idea?
Thanks in advance.
Updated Query based on field names/details provided. You had provided Author and Contributor as extracted fields earlier but in the new details you have not mentioned the same. Please confirm. In any case basically you have to take two things in account.
Since Document ID Is unique for each document that is the key, so your stats will be by ID.
Since you can get multiple entries for the same document everyday with same version, you should perform a distinct_count() or dc() based on the Versions i.e. dc(Version). min(), max(), last() and values() are other statistical functions that you can take advantage of to pull the data/stats as per your need.
<YourBaseSearch>| stats dc(Version) as VersionCount last(Title) LatestTitle last(Accountable_Owner) Publisher values(support_Contributor) as Contributors min(Last_Update) as FileCreateDate max(Last_Update) as FinalUpdateDate values(Version) as Versions by ID
| search VersionCount>1
| table ID Title Publisher Versions Contributors FileCreateDate FinalUpdateDate
Files with VersionCount=1
will imply that they have not been modified since creation.
Refer to the list of statistical functions available in Splunk: http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/CommonStatsFunctions
If you have something like a Document ID it would be better to correlate, however, based on data you provided you can try with Title and Author of the document as the correlation key. There can be several options, but I am using stats as the query would run for really long duration of time. It would be better to provide some filters on your dashboard for Searching patterns of Title, Author, Contributors and include them in your base search.
Updated Query based on field names/details provided. You had provided Author and Contributor as extracted fields earlier but in the new details you have not mentioned the same. Please confirm. In any case basically you have to take two things in account.
Since Document ID Is unique for each document that is the key, so your stats will be by ID.
Since you can get multiple entries for the same document everyday with same version, you should perform a distinct_count() or dc() based on the Versions i.e. dc(Version). min(), max(), last() and values() are other statistical functions that you can take advantage of to pull the data/stats as per your need.
<YourBaseSearch>| stats dc(Version) as VersionCount last(Title) LatestTitle last(Accountable_Owner) Publisher values(support_Contributor) as Contributors min(Last_Update) as FileCreateDate max(Last_Update) as FinalUpdateDate values(Version) as Versions by ID
| search VersionCount>1
| table ID Title Publisher Versions Contributors FileCreateDate FinalUpdateDate
Files with VersionCount=1
will imply that they have not been modified since creation.
Refer to the list of statistical functions available in Splunk: http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/CommonStatsFunctions
If you have something like a Document ID it would be better to correlate, however, based on data you provided you can try with Title and Author of the document as the correlation key. There can be several options, but I am using stats as the query would run for really long duration of time. It would be better to provide some filters on your dashboard for Searching patterns of Title, Author, Contributors and include them in your base search.
Thanks niketnilay,
However I don't understand all the query.
I'll give you here the exact fields of the CSV file:
"File_Name","Title","ID","Doc_Category_BMS","Confidentiality","Last_Update","Version","Responsible_A_Manager","Responsible_B_Manager","Accountable_Owner","Support_Contributor","Informed_Users","Approval_Status"
"Car_Policy.docx","Car_Policy","ITMS-4-13","Policy",Confidential,"9/11/2015 11:39:57","7.0","John","Mick","Robert","Miles","All Users","0"
"Mobile_Policy","Mobile_Policy","ITMS-4-16","Policy",Confidential,"31/01/2017 16:38:40","9.0","Allan","Mick","Robert","Miles","All Users","0"
Filname, il the real filename of the document
Title; is the title of the document
ID: id of the document
Doc_Category_BMS, is the category of the document: could be Policy, Procedure, Design,...
Confidentiality, could be Confidential, Limited, Private
Last_Update, is the last time update of the document
Version, the version of the document
Responsible_A_Manager, Person could edit the document
Responsible_B_Manager, Person could edit the document
Accountable_Owner, Owner of the document, which could publish the document (Approve it)
support_Contributor, People can edit the document
Informed_Users, People which can read the document when it's published
Approval_Status, Status of the document, could be Approved, Rejected, Pending, Draft
As explained before, I'll receive the log every day.
This log will contain the information of the document.
A document could be re-uploaded without change in the version. but the last update date could change.
A document could pass from a version to another: 4 ==> 4.1 ==> 4.2 ==> 5
I would like to detect when a document changed: new version and/or new last update date.
Sorry if the explanation is too long.
Thanks again for your help.
@danje57 I have updated my answer based on the details you have provided... I have not printed all the required values but given you hints which you can use to do in case you need the same.
Many thanks!!!
It works like charm!!!!
@danje57... Glad it worked... You might have to consider summary indexing as well... So that older versions of same file are summarized as single row. That way search will perform better for longer duration!