Problem using Join function

JHFRDANALYSIS

I'm a novice working in fraud prevention; appreciate your help. When running the following, I'm getting a failure error and job inspector shows excessive time (106.46) on dispatch.evaluate.join. Can you help identify what needs to change to output chart of Condition_Attrib_17 by Treatment Group. I'm a novice working in fraud prevention; appreciate your help.

index=TEST sourcetype="TEST:user_activity" application_id=ABC123 policy_id=" UPDATE" "*"

| dedup data.condition_attrib_22

| rename data. condition_attrib_22 AS data.params.policy

| fields data.params.policy

| eval join_key=data.params.policy

| fields join_key, data.treatment_group

| join type=inner join_key

[search index=TEST sourcetype="TEST:user_activity" application_id=ABC123 policy_id="UPDATE" "*"]

| stats latest(data.request.condition_attrib_17) as Condition_Attrib_17 by data. request.condition_attrib_22

| rename data.request.condition_attrib_22 as join_key

| fields join_key, Condition_Attrib_17

| chart count by Condition_Attrib_17 by data.treatment_group

MuS

If possible share sanitised sample events otherwise we will not be able to actually help 😉

Cheers, MuS

livehybrid

Hi @JHFRDANALYSIS

I would try and avoid using join unless absolutely necessary, you can get the chart in a single pass with stats, then chart. Also it looks like the chart syntax is wrong; it should be “chart count over X by Y”, not “chart count by X by Y”.

Something like this should work:

index=TEST sourcetype="TEST:user_activity" application_id=ABC123 policy_id=UPDATE 
| stats latest(data.request.condition_attrib_17) as Condition_Attrib_17 latest(data.treatment_group) as treatment_group by data.request.condition_attrib_22 
| where isnotnull(Condition_Attrib_17) AND isnotnull(treatment_group) 
| chart count over Condition_Attrib_17 by treatment_group

If your key is data.condition_attrib_22 (not data.request.condition_attrib_22), change the stats “by” field accordingly. Also, if multiple treatment_group values can exist per key and you want one, replace latest(...) with values(...) and then mvexpand treatment_group before charting.

🌟 Did this answer help you? If so, please consider:

Adding karma to show it was useful
Marking it as the solution if it resolved your issue
Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

JHFRDANALYSIS

This was helpful by giving me new techniques. But, it didn't return data and the data is there in the json. One thing I note: The policy_id for Condition_Attrib_17 is UPDATE, but the policy_id for Treatment_Group is SRF. Modified to policy_id IN (UPDATE,SRF) but it still didn't return any data which I can see in the json data. I'm thankful that you voluntarily give thoughts to help me learn.

yuanliu

Everybody has already told you that you shouldn't use join in the first place. @MuS asked you to illustrate your data, which is always the best recommendation. Now that you mention your dateset is in JSON, you really have to share/mock data. Sanitize any sensitive information but make sure to maintain structures that matter.

Also, instead of telling volunteers "error when I run this complex SPL snippet", follow these golden rules; nay, call them the four commandments:

Illustrate data input (in raw text, anonymize as needed), whether they are raw events or output from a search (SPL that volunteers here do not have to look at).
Illustrate the desired output from illustrated data.
Explain the logic between illustrated data and desired output without SPL.
If you also illustrate attempted SPL, illustrate actual output and compare with desired output, explain why they look different to you if that is not painfully obvious.

Problem using Join function

join

Tech Talk Recap | Mastering Threat Hunting

Observability for AI Applications: Troubleshooting Latency

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

Are you a member of the Splunk Community?

Problem using Join function

join

Tech Talk Recap | Mastering Threat Hunting

Observability for AI Applications: Troubleshooting Latency

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?