Splunk Search

How to create a report from reading all data from csv file?

szrobag
Explorer

Hello Splunkers,

Help me please. I need a search to generate daily report looking for user's traffic in internal logs. I got an csv file generated daily by an external system what contains username, and an start-end time period like this:

report.csv
user,start_time,end_time
user1,8,16
user2,8,20

I have to insert this three field per user into my search. I am using inputlookup to catch the "user" field this way:

[base search] 
| search user=*[|inputlookup "report.csv" |fields user ]*
| table x,y,z,user

It works, shows only the user'related logs, could be one or more users i csv. The problem i canno handle yet is about the additionl fileds. I had an idea to add an extra field with "eval" cmd, but doesn't work. So how can I read rest of the data form an external csv file?

thanks 

Labels (3)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @szrobag,

do you want to filter results using the usernames in the lookup or add the start-time and end_time information to the results?

if the first you have to use a subsearch like the following:

[base search] [ | inputlookup "report.csv" | fields user ]
| table x,y,z,user

putting attention that the field "user" in the subsearch is the same of the main search (it's case sensitive).

if the second, you have to use the lookup command that's something like an inner join, see something liek this:

[base search] 
| lookup "report.csv" user OUTPUT start_time end_time
| table x y z user start_time end_time

Ciao.

Giuseppe

View solution in original post

0 Karma

PickleRick
SplunkTrust
SplunkTrust

If you simply want to use externally sourced lookup to generate a set of conditions for your search, you can just use a subsearch. Remember that the subsearch is executed _before_ the main search and as a results renders a string which is substituted into the main search. So your search

|inputlookup "report.csv" |fields user 

 Will return a set of results with just user field which in turn will get render into

user="user1" OR user="user2" OR user="user3" ...

So by using clever renames you can reformat your search to include earliest/latest condition (probably using some evals in the middle to render proper timestamps from your lookup fields). If you make your subsearch so that it returns three fields - user, earliest, latest - your subsearch will get rendered to:

(user="user1" earliest=earliest1 latest=latest1) OR (user="user2" earliest=earliest2 latest=latest2) ...

So you can just insert your subsearch into the main search - don't use the "search | search [ subsearch] " construction - Splunk might be able to optimize it out anyway but generally what you're asking it to do is to search for all events and then filter them out (which might be much less efficient than selecting just a subset from the indexes).

Having said that - generating too many conditions might also be detrimental to the performance of the search so it _might_ (but doesn't have to) be better to consider other ways to filter those events.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @szrobag,

do you want to filter results using the usernames in the lookup or add the start-time and end_time information to the results?

if the first you have to use a subsearch like the following:

[base search] [ | inputlookup "report.csv" | fields user ]
| table x,y,z,user

putting attention that the field "user" in the subsearch is the same of the main search (it's case sensitive).

if the second, you have to use the lookup command that's something like an inner join, see something liek this:

[base search] 
| lookup "report.csv" user OUTPUT start_time end_time
| table x y z user start_time end_time

Ciao.

Giuseppe

0 Karma

szrobag
Explorer

Hello Giuseppe,

Thank you for reply. 
My index constains logs about a lot users, so my idea was to filter it first to users added in csv file.
So, i choose inputcsv to filter to 3-4-5 ( usually up to ten ) user first. The "user" field is a common used filed in csv and the indexed data, so It is like an inner join. user filering is okay ( as in your first query ) but can't work with the two added start_time and end_time filed even it is added in OUTPUT part. It is needed to filter users logs to specified time range.

Using inputlookup to filter users first, and after using your query 

| lookup "report.csv" user OUTPUT start_time end_time

 

to extract data from two fields seems working now

 

Tags (1)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @szrobag,

good for you, see next time!

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the contributors 😉

0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

REGISTER NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If ...

Observability | Use Synthetic Monitoring for Website Metadata Verification

If you are on Splunk Observability Cloud, you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...