Hello Splunkers,
Help me please. I need a search to generate daily report looking for user's traffic in internal logs. I got an csv file generated daily by an external system what contains username, and an start-end time period like this:
report.csv
user,start_time,end_time
user1,8,16
user2,8,20
I have to insert this three field per user into my search. I am using inputlookup to catch the "user" field this way:
[base search]
| search user=*[|inputlookup "report.csv" |fields user ]*
| table x,y,z,user
It works, shows only the user'related logs, could be one or more users i csv. The problem i canno handle yet is about the additionl fileds. I had an idea to add an extra field with "eval" cmd, but doesn't work. So how can I read rest of the data form an external csv file?
thanks
Hi @szrobag,
do you want to filter results using the usernames in the lookup or add the start-time and end_time information to the results?
if the first you have to use a subsearch like the following:
[base search] [ | inputlookup "report.csv" | fields user ]
| table x,y,z,user
putting attention that the field "user" in the subsearch is the same of the main search (it's case sensitive).
if the second, you have to use the lookup command that's something like an inner join, see something liek this:
[base search]
| lookup "report.csv" user OUTPUT start_time end_time
| table x y z user start_time end_time
Ciao.
Giuseppe
If you simply want to use externally sourced lookup to generate a set of conditions for your search, you can just use a subsearch. Remember that the subsearch is executed _before_ the main search and as a results renders a string which is substituted into the main search. So your search
|inputlookup "report.csv" |fields user
Will return a set of results with just user field which in turn will get render into
user="user1" OR user="user2" OR user="user3" ...
So by using clever renames you can reformat your search to include earliest/latest condition (probably using some evals in the middle to render proper timestamps from your lookup fields). If you make your subsearch so that it returns three fields - user, earliest, latest - your subsearch will get rendered to:
(user="user1" earliest=earliest1 latest=latest1) OR (user="user2" earliest=earliest2 latest=latest2) ...
So you can just insert your subsearch into the main search - don't use the "search | search [ subsearch] " construction - Splunk might be able to optimize it out anyway but generally what you're asking it to do is to search for all events and then filter them out (which might be much less efficient than selecting just a subset from the indexes).
Having said that - generating too many conditions might also be detrimental to the performance of the search so it _might_ (but doesn't have to) be better to consider other ways to filter those events.
Hi @szrobag,
do you want to filter results using the usernames in the lookup or add the start-time and end_time information to the results?
if the first you have to use a subsearch like the following:
[base search] [ | inputlookup "report.csv" | fields user ]
| table x,y,z,user
putting attention that the field "user" in the subsearch is the same of the main search (it's case sensitive).
if the second, you have to use the lookup command that's something like an inner join, see something liek this:
[base search]
| lookup "report.csv" user OUTPUT start_time end_time
| table x y z user start_time end_time
Ciao.
Giuseppe
Hello Giuseppe,
Thank you for reply.
My index constains logs about a lot users, so my idea was to filter it first to users added in csv file.
So, i choose inputcsv to filter to 3-4-5 ( usually up to ten ) user first. The "user" field is a common used filed in csv and the indexed data, so It is like an inner join. user filering is okay ( as in your first query ) but can't work with the two added start_time and end_time filed even it is added in OUTPUT part. It is needed to filter users logs to specified time range.
Using inputlookup to filter users first, and after using your query
| lookup "report.csv" user OUTPUT start_time end_time
to extract data from two fields seems working now
Hi @szrobag,
good for you, see next time!
Ciao and happy splunking
Giuseppe
P.S.: Karma Points are appreciated by all the contributors 😉