I'm trying to join 3 types of data, we have recordings which belong to a shower which belongs to a user. For some users the join works perfectly but for others, it isn't returning anything.
A simplified version of the join looks something like this:
index="iot-production" trigger="NewShowerEvent" | spath output=serial_number path=object.serial_number | join serial_number
max=0 [ search index="iot-production" trigger="NewRecordingEvent" | spath output=serial_number path=object.serial_number ] |
search serial_number=1004285
But I am getting 0 results for this. If I run the 2 queries separate:
index="iot-production" trigger="NewShowerEvent" | search object.serial_number=1004285
index="iot-production" trigger="NewRecordingEvent" | search object.serial_number=1004285
I get results in both with the serial numbers matching.
What could be the reasoning behind this?
In such cases, I try to make a RegEx-based field extraction that ignores the XML/JSON and pulls out the fields/values that I need for this sourcetype
to do a proper base search with filters to keep only the stuff that I know that I need and then pass the relevant subset of events to spath
to do the fine-detail work that requires the other pile of fields.
Try this (you missed object.
in your search condition)
index="iot-production" trigger="NewShowerEvent" | spath output=serial_number path=object.serial_number | join serial_number
max=0 [ search index="iot-production" trigger="NewRecordingEvent" | spath output=serial_number path=object.serial_number ] | search object.serial_number=1004285
Also, join command is expensive to think about changing your query to use stats, similar to this
index="iot-production" trigger="NewShowerEvent" OR trigger="NewRecordingEvent"
| spath output=serial_number path=object.serial_number | search object.serial_number=1004285
| stats values(fieldyouwant1) as fieldyouwant1 values(fieldyouwant2) as fieldyouwant2.. by object.serial_number
Try each of these
index="iot-production" trigger="NewShowerEvent"
| spath output=serial_number path=object.serial_number
| search serial_number=1004285
| index="iot-production" trigger="NewRecordingEvent"
| spath output=serial_number path=object.serial_number
| search serial_number=1004285
If you get valid results from both of those, then the issue is probably that you have more than 50K results in the right side of the join, or you have run into a timeout situation. In other words, that right side of the join does not i practice include every result that it should, in theory.
This is one reason that we prefer to avoid join
when other verbs will do the job.
If you are only looking for a single returned value, then restrict each side of the search., so the number of results for the subsearch becomes irrelevant.
index="iot-production" trigger="NewShowerEvent" "1004285"
| spath output=serial_number path=object.serial_number
| join serial_number max=0 [ search
index="iot-production" trigger="NewRecordingEvent" "1004285"
| spath output=serial_number path=object.serial_number ]
| search serial_number=1004285
Note - the final search
may not be redundant, based on your data. If there are any other serial numbers that happen to have the number "1004285" as a single unit somewhere in their data fields, they would be allowed back from the initial search(es).
The stats
version of your join is going to look something like this...
index="iot-production" (trigger="NewShowerEvent" OR trigger="NewRecordingEvent" )
| fields (... list every field you need to retain from either kind of event ...)
| spath output=serial_number path=object.serial_number
| stats values(foo) as foo values(bar) as bar values(baz) as baz by serial_number
if you want literally every field that is remaining on the events on both sides, then the last command can be simplified as ...
| stats values(*) as * by serial_number
I do not know precisely how Splunk handles JSONs in that stats clause, so you will need to experiment to verify you are getting everything you need. Also, this will lump both sides together, so if there is a variable named "foo" on each side, you may need to use one of these constructs...
index="iot-production" (trigger="NewShowerEvent" OR trigger="NewRecordingEvent" )
| fields (... list every field you need to retain from either kind of event ...)
| spath output=serial_number path=object.serial_number
| eval foo_shower = case(trigger="NewShowerEvent",foo)
| eval foo_record = case(trigger="NewRecordingEvent",foo)
| stats values(foo*) as foo* values(bar) as bar values(baz) as baz by serial_number
Yes, I have both of those in my question, they are both returning valid results. What can be used instead of a join then?
So I think I'm not conveying my issue good enough here. We're doing our initial search on the email address of the user, this is tied to a NewUserEvent
then we're trying to get a value from the last recording for this user's showers.
So our support team types in an email address and we search for all of the showers for that email.
This is a join between NewUserEvent
and NewShowerEvent
but then we need information for the recording.
So we have an email address that gives us a user, this user has an ID. We join where the ID of the user matches the user_id
of the shower. The shower has a serial_number
we use this to then join the shower with the recording on serial_number
. I hope this makes sense and that I'm explaining it well enough
in psuedocode I basically need SELECT * FROM Users JOIN Showers on Users.id = Showers.user_id JOIN Recordings on Showers.serial_number = Recordings.serial_number where Users.email = "blah@blah.com"
@bshega - the code you posted for each side didn't include the spath, so I wanted to be doubly sure. You should be able to adapt the code I've given you to your use case, based on what information you want to keep from each side of the "join".
Hmm, you might be on to something with the timeout, the spath
really hurts us for the recordings because we have millions of them so I think it's scanning them sequentially with the spath
in there. There has to be a way to search/join without doing spath
on each object, right?