Hi!
I have spent a couple of days now hunting for an answer with similar issues, but I am no closer to finding a solution. The gist of it is that each org has several case files, and multiple documents within each case. I am trying to track the user accounts to see the volume of documents they are accessing.
The information is kept in two separate log files: the user log output in userlog; and the request log accessible through access_combined. The request log doesn't store the user ID and the userlog doesn't store the document access, only the reference page with links. I have managed to get all the data I need, I just cannot get it to display correctly. The code I have so far to get the data is:
sourcetype=userlog page='/cases/case_docs' userid=$userid$ earliest=-24h
| rex field=url "(?<matchORG>J\d{6}J),(?<caseID>\d{5}\w{1,2})," max_match=0 
| rex field=WebServiceHost "host=\'(?<host>\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\'"
| lookup orgName.csv ORGCODE as matchORG OUTPUT NAME 
| rename NAME as orgName 
| fields host userid orgName matchORG caseID
| eval clientip = host
| join clientip max=0
[search sourcetype=access_combined earliest=-24h
| fields file clientip CASE_ID ORG_ID accessTime uri_path
| where like(file,"%.pdf")
| where like(uri_path,"%/service/document/%") 
 ]
| where match(CASE_ID,caseID)
The fields...
file
uri_path
CASE_ID
ORG_ID
...are already predefined, extracted, and captured by Splunk.
Where I am hitting a dead end is attempting to show the number of documents access by case file, and the overall documents accessed by organization. My desire is to have it show up as the Org Name, total document count, case ID, document count per case ID. Something like the image below:
(Note: There is a typo in the table. JxxxxxxJ should instead be OrgA, OrgB, etc.)
I have looked at options from other similar posts, but I just can't seem to get them work. And the one close to working, modifying the answer on a different similar question, only gives me partial results (but they all appear in the events tab):
| stats count by orgName  caseID
| stats values(caseID) as Cases values(count) as CaseCount sum(count) as Total by orgName
If anyone can point me in the right direction, it would be greatly appreciated. Also, any tips on streamlining this code would be appreciated as well.
Addendum: Stuck at Splunk Enterprise 6.1.2 for...reasons.
You can use the below stats commnd after your query
| stats count  by CaseID OrgName|appendpipe[ |stats sum(count) as total by OrgName]|eval count=if(isnull(count),total,count)| fields - total | sort by TransactionID ,count desc
You can use the below stats commnd after your query
| stats count  by CaseID OrgName|appendpipe[ |stats sum(count) as total by OrgName]|eval count=if(isnull(count),total,count)| fields - total | sort by TransactionID ,count desc
Thanks for the stats command. It needed a bit of tweaking to get a result - namely since I was failing to retrieve the orgName from the join, and that was one of the keys in the stats. Basically, I added the following into the joined search:
| lookup orgName.csv ORGCODE as ORG_ID OUTPUT NAME 
| rename NAME as foundOrgName 
and added that to the fields within the join.
All I had to do then was modify your provided stats command to reflect the foundOrgName (yeah, not very original when it comes to field names):
| stats count by caseID foundOrgName
| appendpipe[ |stats sum(count) as total by foundOrgName]
| eval count=if(isnull(count),total,count)
| fields - total 
| sort by TransactionID ,count desc
Thank you for your help!
If I could trouble you further, as a Splunk newbie, can you give me a breakdown of how and why and the what of the stats flow here to make it work? For instance, I am not sure where you are getting "fields" or "TransactionID" from.
TransactionID is not right in the context of your query. I was querying on my system that is why it got pasted here. You can sort by OrgName instead.
what is the output you are getting now?
The stats output from my test data is this (having issues formatting it on here as a table...)
OrgA        28
20181F  3
20182F      4
20183F      5
20184F 
20185F
20186F
20187F
And I just noticed that the displayed results for the document count per case is incorrect. The values should be:
20181F     4
20182F     4
20183F     4
20184F     3
20185F     4
20186F     4
20187F     5
Why are you joining only on host and not on host,caseid ? Also send the result only of the join without applying any stats.
The first part of your question is...um...I didn't know I could honestly. All the join examples that I saw in the documentation show only one field used for the join. That said, testing with adding:
| eval CASE_ID = caseID
| join clientip CASE_ID max=0
and removing the
| where match(CASE_ID,caseID)
Seemed to speed it up. Thank you for that.
As for the second part of your question, I am not sure I understand exactly what you are asking. The way the join search is written, I do believe I am simply returning results without any stats. I did try to do some stat processing in the join search before, but that was tanking everything, and I got zero results back.
I think I see what you were asking for on rereading. The output from the join search (using 2018F7 as the example)
1   
2   
3   
4   
5   
