Looked at join and append. Tried both, couldn't get them working. I need your eyes to help me here!
This is my current search where I'd like to actually hold onto some of the subsearch's data to toss them into the table in the outer search to add context. Outer search has hosts and the hashes that were seen on them, and the subsearch sourcetype "fileinfo" has the juicy file data I want for context. Each index is a different work site, full of sourcetypes that are also in every other index/site (but with their own unique data/events):
[search index=site_* sourcetype=fileinfo | dedup hash | fields hash index]
| stats values(host) by index host hash
What I really want to do is simply put fields from the fileinfo sourctype in the inner search onto the end of the stats section. I cannot figure out how to do it though. When I try an inner join, it returns way too many events that don't even have the same fields that I'm trying to match on. I want to join data right?
Example of current output:
host hash timesseen
abc1host 12389hu4t223eg732327gfr2367 12-25-2015
Example of what I want:
host hash timesseen fileinfo_mime_type file_info_date_created
abc1host 12389hu4t223eg732327gfr2367 12-25-2015 gif 03-01-2009
Anyone?
Thanks for the answers. I did end up figuring it out myself:
Orignal:
index=site_* sourcetype=host_hashes [search index=site_* sourcetype=fileinfo mime_type="gif" | dedup hash | fields hash index]
| stats values(host) by index host hash
Working one with join:
index=site_* sourcetype=host_hashes [search index=site_* sourcetype=fileinfo mime_type="gif" | dedup hash | fields hash index]
| join hash [search index=*_fn sourcetype=fileinfo]
| stats count by index host hash fileinfo_mime_type fileinfo_date_created
The trick was understanding that my initial subsearch is part of the "outer search" of the join. The "inner search" is the subsearch after the join command. So yeah, two subsearches made it tricky.
So yeah - what I'm doing is asking "give me every hash that is a gif via the fileinfo sourcetype, now tell me if any of those hashes have been seen on our hosts via our host_hashes sourcetype, then finally append useful data right back from fileinfo on the results.
Thanks for the answers. I did end up figuring it out myself:
Orignal:
index=site_* sourcetype=host_hashes [search index=site_* sourcetype=fileinfo mime_type="gif" | dedup hash | fields hash index]
| stats values(host) by index host hash
Working one with join:
index=site_* sourcetype=host_hashes [search index=site_* sourcetype=fileinfo mime_type="gif" | dedup hash | fields hash index]
| join hash [search index=*_fn sourcetype=fileinfo]
| stats count by index host hash fileinfo_mime_type fileinfo_date_created
The trick was understanding that my initial subsearch is part of the "outer search" of the join. The "inner search" is the subsearch after the join command. So yeah, two subsearches made it tricky.
So yeah - what I'm doing is asking "give me every hash that is a gif via the fileinfo sourcetype, now tell me if any of those hashes have been seen on our hosts via our host_hashes sourcetype, then finally append useful data right back from fileinfo on the results.
You may want to throw a stats in the 2nd subsearch (join). This will improve your search performance. Something like this
... | join hash [search index=*_fn sourcetype=fileinfo | stats count by hash fileinfo_mime_type fileinfo_date_created] ....
The information I have to go on is limited, but I think this will work.
index=site_* (sourcetype=hashes_on_hosts OR sourcetype=fileinfo) | stats values(fileinfo_mime_type) AS fileinfo_mime_type values(file_info_date_created) AS file_info_date_created by host hash
Thanks! I tried that but it didn't work. I need to filter on mime_type, but if I do that then I lose all of the events from the hashes_hosts sourcetype since they don't have a mime_type, losing all of the context.
That said, I found my answer.
Can you provide the query that is generating your "Current output" in the question? How many hashes are available in both sourcetypes?
The current output is produced by the query above it. Well, I rearranged what's in stats, but it's the same besides that.
My join search that isn't working is (bring back tons of events without a hash field in them):
index=site_* sourcetype=hashes_on_hosts | join hash [search index=site_* sourcetype=fileinfo]