Getting Data In

Input CSV and run search against subset of users

cewing082
New Member

I have a list of accounts that I wish to monitor in a csv file, say accounts.csv.

The file looks like:
userid,username
joeuser,Joe User
jimuser,Jim User
Janeuser,Jane User

And the logs are in the format of:

operation=openedfile
UserId=joeuser@domain.com


so i need to compare the CSVs userid field to the UserId (stripping away the @ so they are the same).

I would like to produce a table showing all the accounts in the csv file and when they performed an activity (or what the activity was).

The trick is to create a table that shows all users in the CSV (not all that were found in the search - just the CSV), whether or not they performed an action. If they never did anything, then the last_activity field would be blank (or some message like "not logged in"). I can get close by doing something like:

index=unix_security
| dedup userid
| rename _time as last_login
| append [inputlookup accounts.csv]
| table userid name department last_login
| sort -userid

I believe I need to do a join somehow, but I can't seem to get the syntax right, I'm stilling learning this.

If anyone can provide some pointers that would be greatly appreciated.

Thanks!

EDIT
I found this but can't seem to make the answer work as my userid from my events & csv are a bit different.

https://answers.splunk.com/answers/589551/get-last-login-time-based-upon-a-list-of-accounts-1.html

Tags (2)
0 Karma

elliotproebstel
Champion

You don't mention where the field department comes from (the csv file or the events), so I'm going to assume it's in the csv file. This will need some slight adjustment if department is in the base events.

index=unix_security 
    [| inputlookup accounts.csv 
    | fields userid 
    | eval userid=userid."@*"] 
| stats latest(_time) AS last_login BY userid 
| rex field=userid "^(?<userid>[^@]+)"
| append [| inputlookup accounts.csv | eval last_login=0 ]
| stats max(last_login) AS last_login, latest(username) AS name, latest(department) AS department BY userid
| sort - userid

As you mentioned that you're still learning, I'll walk through it, chunk by chunk. First, we gather events relating only to the users in the csv by using a subsearch to gather the userid values and adding "@*" to the end of each one, as you mentioned that the raw events contain the full email address but the csv file contains the username (portion of the email before the @ sign). If all the userids belong to the same domain, you might replace "@*" with "@yourdomain.com" or whatnot.

index=unix_security 
    [| inputlookup accounts.csv 
    | fields userid 
    | eval userid=userid."@*"] 

After that, I replaced your dedup command with this stats call:

| stats latest(_time) AS last_login BY userid 

This more efficiently returns the latest value of _time for each userid, which also dedups the results. I next truncate the userid values from the raw events:

| rex field=userid "^(?<userid>[^@]+)"

Then I append in the whole list from the csv, and add last_login=0 to each user so that this call:

| stats max(last_login) AS last_login, latest(username) AS name, latest(department) AS department BY userid

can either preserve the 0 (if the user was not seen in the raw events) or carry through the last_login value from the events. This also preserves the username and department if they were in the csv. If you need to extract them from the raw events, you'll need to add latest(username) AS name, latest(department) AS department to the first stats call, as well. Finally, I preserve the sorting you intended:

| sort - userid

I hope this is helpful!

cewing082
New Member

I'm trying to run this and it's not returning any results, i think the problem lies here (?)

 index=unix_security 
     [| inputlookup accounts.csv 
     | fields userid 
     | eval userid=userid."@*"] 

my csv contains userids but my events actually contain full email address (userid@domain)

0 Karma

elliotproebstel
Champion

This line should be accounting for that:

| eval userid=userid."@*"

That should transform a userid from the csv file, such as jsmith and turn it into jsmith@*. So then the result of this:

index=unix_security 
    [| inputlookup accounts.csv 
    | fields userid 
    | eval userid=userid."@*"] 

would be a search like this:

index=unix_security (userid="jsmith@*" OR userid="dburns@*" OR userid="alews@*" ...)

But of course I believe you when you say it's not working, so let's start with this:

| inputlookup accounts.csv 
| fields userid 
| eval userid=userid."@*"
| format

Does that return something like userid="jsmith@*" OR userid="dburns@*" OR userid="alews@*" ...?
If so, then you could manually troubleshoot by running this:

 index=unix_security 
| stats latest(_time) AS last_login BY userid 

See if the list of userid values contains users from the csv - and, in particular, users whose userid values match that pattern of jsmith@*, etc. Let me know if you find the discrepancy, and I can help you adjust the search as needed.

0 Karma

elliotproebstel
Champion

Any luck on this? Do you still want some help?

0 Karma

cewing082
New Member

You're the man - thanks so much. This got me exactly where I wanted to be!

0 Karma

elliotproebstel
Champion

Great! Glad it's working. 🙂

0 Karma
Get Updates on the Splunk Community!

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...

Observability protocols to know about

Observability protocols define the specifications or formats for collecting, encoding, transporting, and ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...