I tried many alternatives but no good results. Please help if possible.
I have a multi search with two sets of results:
31/12 9:00 cat sofa red
31/12 9:00 dog bed blue
30/12 9:00 dog bed red
29/12 9:00 dog door orange
29/12 9:00 cow sofa blue
31/12 10:00 dog
31/12 3:00 cat
31/12 1:00 cow
30/12 11:00 dog
30/12 2:00 dog
30/12 1:00 cow
29/12 4:00 cat
29/12 11:00 cat
29:00 5:00 dog
I need to use multisearch because both tables can be huge. I need to merge the results following:
Where was the animal with which color on each event.
The place should look for the last 24 hours info only from the location table.
The events result without match stay. The locations info without match need to be excluded.
Your help is much appreciated,
Happy New Year.
Okay, as far as logic is concerned,
multisearch is exactly the same as an
OR, so I'll pseudocode this with an
OR. Try it both ways and see what actual performance is on your systems.
I'm going to assume that color is a detail not directly modifying the animal or the location. That is, dog bed red is not a red dog in the bed or a dog in a red bed. Let's say it's the color of a balloon that happens to be in the location. If you want to differentiate between the location of a red dog and a blue dog, then we'll need more discussion.
Given your 24 hour stricture, and the fact that your location info seem to exist once a day, I'm going to use streamstats to copy the most recent location data to the animal, subject to a 24 hour window. I'm also going to sort to make sure the location data appears before the animal data in the sort order for a given hour. You'll have to revisit that assumption for your real, confidential use case.
I pseudocode this query thus:
( your search that gets hour animal location color ) OR ( your search that gets hour and animal ) | fields _time animal location color | rename COMMENT as "the above selects the records and limits to only data we need" | rename COMMENT as "roll the location data (within 24 hours) over to the animal record" | eval sortlocationfirst=if(isnotnull(location), "1","2") | sort 0 _time sortlocationfirst | rename COMMENT as "roll the location data (within 24 hours) over to the animal record" | streamstats time_window=24h last(location) as location last(color) as color by animal | rename COMMENT as "Now throw away ALL location records, since the data we needed has been copied. " | where (it is an animal record)
If there is some other data you need to retain on the location records, then there would be various strategies to identify which ones have been used or not use. Let us know, and we can help you with that mod.
Thanks a lot it worked very well but I am having some issues.
Is there any replacement for the command time_window? This command is limited to 10000 events but I have a bigger number.
I would use one search to construct a
time-based lookup table (https://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Configureatime-boundedlookup) from one dataset. Then use this lookup in another search.