I have 3 data sets that I need to combine with 1 data set not having a field to perform a compare. I initially started with a join but moved away from that given the limited results returned and also to save on processing time of running the query. My data sets are:
Users Index
index=users sourcetype=userlist
This has the following fields
user_id
user_title
user_name
Workstations Index
index=workstations sourcetype=machines
This has the following fields
pc_id
pc_type
user_name
I can then do a stats instead of join on this data using user_name as the "join". So my query is as follows (note a user can have more than 1 PC hence the mvexpand to break into individual entries)
(index=users sourcetype=userlist) OR (index=workstations sourcetype=machines)
| stats values(*) as * by user_name
| mvexpand pc_id
| table pc_id pc_type user_name
I now want to take the information I have from this search with various other filters from another source (for example
index=windows sourcetype=connections which has lots of fields but I am only interested in the following
device_name
last_activity_time
I want to be able to match the results from my combined search and extend this to check against the 3rd index, matching on device_name against pc_id. I am thinking that my query might add another statement into the first line of my SPL for example
(i.e. index=users ...) OR (index=workstations .....) OR (index=windows | fields device_name last_activity_time)
| stats values(*) as * by user_name
however I don't think this will work as user_name is not a field in the "windows" index. I am not sure how to now extend this to then look into another index without then doing a join.
hi @willadams,
Use coalesce function on fields pc_id and device_name, then do stats on pc_id and user_name.
(index=users sourcetype=userlist) OR (index=workstations sourcetype=machines) OR index=windows sourcetype=connections
| fields user_id, user_title, user_name, pc_id, pc_type, device_name, last_activity_time
| eval pc_id=coalesce(pc_id, device_name)
| stats values(*) as * by pc_id, user_name
| mvexpand pc_id
hi @willadams,
Use coalesce function on fields pc_id and device_name, then do stats on pc_id and user_name.
(index=users sourcetype=userlist) OR (index=workstations sourcetype=machines) OR index=windows sourcetype=connections
| fields user_id, user_title, user_name, pc_id, pc_type, device_name, last_activity_time
| eval pc_id=coalesce(pc_id, device_name)
| stats values(*) as * by pc_id, user_name
| mvexpand pc_id
@manjunathmeti thanks. I have run into a problem though and realised that 2 of my indexers have the same field "names" in the index but different values. Do you know if there is anyway to do a rename of the fields before stats based on the source of the data and then to do values as post processing? For example
index=users sourcetype=userlist) OR (index=workstations sourcetype=machines) OR (index=windows sourcetype=connections | rename last_activity_time AS win_last_activity)
This will generate a error in search (unable to parse the search: unbalanced parentheses). I have tried adjusting to something similar such as
index=users sourcetype=userlist) OR (index=workstations sourcetype=machines) OR (index=windows sourcetype=connections [| rename last_activity_time AS win_last_activity])
or even
index=users sourcetype=userlist) OR (index=workstations sourcetype=machines) OR (index=windows sourcetype=connections (| rename last_activity_time AS win_last_activity))
but have no luck so far. The only solution so far would suggest a change to my data source which would be a massive undertaking. My only other thoughts may be to have a lookup table done from my last data set and change the values and then update this query using something like
index=users sourcetype=userlist) OR (index=workstations sourcetype=machines) OR (lookup pc_activity_log.csv)
but would much prefer to keep it in a single query if at all possible.
I found a "dirty" way to then use the existing query and then run a join after to the additional data source
You can create a new field based on the field name and assign value only if a particular index name matches.
Try this,
index=users sourcetype=userlist) OR (index=workstations sourcetype=machines) OR (index=windows sourcetype=connections) | eval win_last_activity=if(index="windows", last_activity_time, ""), work_last_activity=if(index="workstations", last_activity_time, "")
And use win_last_activity and work_last_activity fields for the next operations.
I am testing using source instead, but trying to figure out an issue wherein it doesn't recognise the field name. I confirmed this with the eval and instead of "" did a "0" and this returns the "0" not the actual values of that field.
Will feedback once I work my way around it.
Thanks again @manjunathmeti. Did consider the eval but was trying to do it in line with the multi-index search, however never considered to use eval with the if statement. Will give it a crack and see what differences I have when my query has run.