I am working on an auditing report for Splunk searches.
My initial goal is to see what searches a user made in a session. Later I want to add some more info to the session/report (but that is later).
It is quite easy to find searches:
index=_audit search_id=* action=search info=granted | regex search="^'search.*" | table timestamp, search_id, search
And then I can manually find the sessionid in the _internal index:
index=_internal sid=1427186389.188 | rex field=other "^- (?<sessionid>\w+)" | table _time, sid, other, sessionid
But when I try to join the two together so I can correlate the sessionid, with the _audit I get no results.
index=_audit search_id=* action=search | join search_id [ search index=_internal | eval search_id=sid ]
I can remove the action=search from the join query. But then I don't get the search terms.
Can you provide a sample of what you see from the second search result? I'm not able to find any logs in my environment that seem like they include a sessionid that would be matched from that regex. As a general rule, you are likely to benefit by doing an index=_audit OR index=_internal and then stats values() by search_id for whatever information you want -- joins aren't the most efficient way to approach this challenge in Splunk, and you're likely to run into scalability issue.
That said, in general, I would recommend you try an app I built -- Search Activity (https://splunkbase.splunk.com/app/2632/). It doesn't have the exact report you're referring to (though if I can figure out how to replicate it, I will), but it does allow you to explore a user's activity, seeing when they first log in and then what dashboards they go to, what searches they run, etc., in addition to a lot (a lot) of other metrics and an accelerated datastore.
Can you provide a sample of what you see from the second search result? I'm not able to find any logs in my environment that seem like they include a sessionid that would be matched from that regex. As a general rule, you are likely to benefit by doing an index=_audit OR index=_internal and then stats values() by search_id for whatever information you want -- joins aren't the most efficient way to approach this challenge in Splunk, and you're likely to run into scalability issue.
That said, in general, I would recommend you try an app I built -- Search Activity (https://splunkbase.splunk.com/app/2632/). It doesn't have the exact report you're referring to (though if I can figure out how to replicate it, I will), but it does allow you to explore a user's activity, seeing when they first log in and then what dashboards they go to, what searches they run, etc., in addition to a lot (a lot) of other metrics and an accelerated datastore.
Thank you David. The second search gives me results like this. I just guess that the sessionid is the 2nd last word. I doesn't match the cookie I can see in my browser tools though.
127.0.0.1 - admin [24/Mar/2015:09:40:02.594 +0100] "GET /en-US/api/shelper?snippet=true&snippetEmbedJS=false&namespace=search&search=search+index%3D_aud&useTypeahead=true&useAssistant=true&showCommandHelp=true&showCommandHistory=true&showFieldInfo=false&_=1427183824768 HTTP/1.1" 200 847 "http://localhost:8000/en-US/app/search/search?q=search%20index%3D%22swipp_dci_utst%22%20qwerty12345&earliest=1136070000&latest=1451602800&display.prefs.events.count=50&display.page.search.tab=events&display.general.type=events&sid=1427186389.188" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36" - 551122e298aa08c88 13ms
I'll take a look at you app. It might do most of what I want.
I am looking for a way to let the user enter a reason for doing the searches. This is to help everybody figure our why a search was made during a later audit session.
Gotcha. Specifically for your search, here's what I was able to make work in my environment:
(index=_audit info=granted action=search search=*) OR (index=_internal sourcetype=splunk_web_access search jobs) | rex "search='(?<actualsearch>.*?[^\\\])('|\]$|$)" | rex "\s\-\s*(?<session_id>[a-f0-9]{10,30})\s" | rex "search_id=[\"'](?<searchid>.*?)[\"']"| rex "\s\/.*?\/jobs\/(?<searchid>.*?[\._].*?)(\/|\])" | stats values(actualsearch) values(session_id) by searchid
What's the rationale for the audit piece? Is this so that you can do a spot audit and ask someone to validate, or do you want users to provide the reason for searches on an ongoing basis?
I think I made a false assumption :$ that the "551122e298aa08c88" value in the event above is the sessionid. I don't think it is, since it changes. I discovered that (a little late) by:
index=_internal search | rex field=other "^- (?<sessionid>\w+)" | transaction sid | table sid, sessionid
I am not sure what I set out to do is even possible. If the splunk sessionid is not in the log files, I think what I want to do will be very hard to achieve.
The overall business requirement is that "the must be a good reason to access log data on production systems" and there must be a way to audit that.
We narrowed that down to that a "reason" pr. session would be alright and so I set out to do a searches pr. session rapport. But there might be other ways to approach the problem. Any ideas on that are most welcome.
Thanks again, David
So.. if you want to do this with traditional tools, you could use the transaction command. This would probably look like:
index=_audit info=granted action=search search=* | rex "search='(?.*?[^\])('|]$|$)" | rex "search_id=\"'[\"']" | fields _time user actualsearch searchid | transaction maxpause=25m user | table _time user actualsearch searchid
Transaction isn't particularly quick, but the above wouldn't be too bad. It will collect all the events into a single event, so you could then do whatever you want with it.
Looking broader, though, you might want to reconsider the approach, and request a reason when someone initiates a new session. This would likely require some custom javascript coding (Professional Services can help here), but it might more elegantly meet your needs.
Looking even broader, there are a ton of valid reasons for searching production logs. If you discourage people from doing so, you won't be getting the most out of your Splunk investment. There are numerous examples of a limited security or IT use case which is valuable to the project sponsor.. but then they allow other teams (security, IT, marketing, product management, etc.) to access to the data and created insane ROIs. That data driven culture is a really powerful tool for improving organizations. If I were you, I would try to look at controlling access to really sensitive data and allowing access to the rest, or trying to avoid limitations through other manners.
I very much agree with your analysis 🙂