I have a list of accounts that I wish to monitor in a csv file, say accounts.csv.
The file looks like:
userid,username
joeuser,Joe User
jimuser,Jim User
Janeuser,Jane User
And the logs are in the format of:
operation=openedfile
UserId=joeuser@domain.com
so i need to compare the CSVs userid field to the UserId (stripping away the @ so they are the same).
I would like to produce a table showing all the accounts in the csv file and when they performed an activity (or what the activity was).
The trick is to create a table that shows all users in the CSV (not all that were found in the search - just the CSV), whether or not they performed an action. If they never did anything, then the last_activity field would be blank (or some message like "not logged in"). I can get close by doing something like:
index=unix_security
| dedup userid
| rename _time as last_login
| append [inputlookup accounts.csv]
| table userid name department last_login
| sort -userid
I believe I need to do a join somehow, but I can't seem to get the syntax right, I'm stilling learning this.
If anyone can provide some pointers that would be greatly appreciated.
Thanks!
EDIT
I found this but can't seem to make the answer work as my userid from my events & csv are a bit different.
https://answers.splunk.com/answers/589551/get-last-login-time-based-upon-a-list-of-accounts-1.html
You don't mention where the field department
comes from (the csv file or the events), so I'm going to assume it's in the csv file. This will need some slight adjustment if department
is in the base events.
index=unix_security
[| inputlookup accounts.csv
| fields userid
| eval userid=userid."@*"]
| stats latest(_time) AS last_login BY userid
| rex field=userid "^(?<userid>[^@]+)"
| append [| inputlookup accounts.csv | eval last_login=0 ]
| stats max(last_login) AS last_login, latest(username) AS name, latest(department) AS department BY userid
| sort - userid
As you mentioned that you're still learning, I'll walk through it, chunk by chunk. First, we gather events relating only to the users in the csv by using a subsearch to gather the userid
values and adding "@*"
to the end of each one, as you mentioned that the raw events contain the full email address but the csv file contains the username (portion of the email before the @ sign). If all the userids belong to the same domain, you might replace "@*"
with "@yourdomain.com"
or whatnot.
index=unix_security
[| inputlookup accounts.csv
| fields userid
| eval userid=userid."@*"]
After that, I replaced your dedup
command with this stats call:
| stats latest(_time) AS last_login BY userid
This more efficiently returns the latest value of _time for each userid, which also dedups the results. I next truncate the userid values from the raw events:
| rex field=userid "^(?<userid>[^@]+)"
Then I append in the whole list from the csv, and add last_login=0
to each user so that this call:
| stats max(last_login) AS last_login, latest(username) AS name, latest(department) AS department BY userid
can either preserve the 0 (if the user was not seen in the raw events) or carry through the last_login value from the events. This also preserves the username and department if they were in the csv. If you need to extract them from the raw events, you'll need to add latest(username) AS name, latest(department) AS department
to the first stats call, as well. Finally, I preserve the sorting you intended:
| sort - userid
I hope this is helpful!
I'm trying to run this and it's not returning any results, i think the problem lies here (?)
index=unix_security
[| inputlookup accounts.csv
| fields userid
| eval userid=userid."@*"]
my csv contains userids but my events actually contain full email address (userid@domain)
This line should be accounting for that:
| eval userid=userid."@*"
That should transform a userid from the csv file, such as jsmith
and turn it into jsmith@*
. So then the result of this:
index=unix_security
[| inputlookup accounts.csv
| fields userid
| eval userid=userid."@*"]
would be a search like this:
index=unix_security (userid="jsmith@*" OR userid="dburns@*" OR userid="alews@*" ...)
But of course I believe you when you say it's not working, so let's start with this:
| inputlookup accounts.csv
| fields userid
| eval userid=userid."@*"
| format
Does that return something like userid="jsmith@*" OR userid="dburns@*" OR userid="alews@*" ...
?
If so, then you could manually troubleshoot by running this:
index=unix_security
| stats latest(_time) AS last_login BY userid
See if the list of userid
values contains users from the csv - and, in particular, users whose userid
values match that pattern of jsmith@*
, etc. Let me know if you find the discrepancy, and I can help you adjust the search as needed.
Any luck on this? Do you still want some help?
You're the man - thanks so much. This got me exactly where I wanted to be!
Great! Glad it's working. 🙂