Solved: How to join with rex not working?

harryhcg · ‎01-25-2023

I have 2 index, abc and bcz

index abc data is in raw format like below.

I want to pull 87675678vf6x_ as field1 at-abc.com as field2 and 500 as field3.

index bcz got formatted data.

I now want to compare both indexes with field 1 of index abc with another field7 in bcz where bcz field5="name"
and return field1 field2 and field3.

It looks simple but not working.

gcusello · ‎01-26-2023

Hi @harryhcg,

this is the regex to extract the fields you need:

| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"

that you can test at https://regex101.com/r/EsHXQv/1

About the join, you have two solutions that depends on the number of results of the second search:

if in the second search (index=bcz field5="name") you have more than 50,000 results, you can use the stats command, applying something like this:

index=abc OR (index=bcz field5="name")
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
| eval key=coalesce(field1,field7)
| stats 
   dc(index) AS index_count 
   values(field2) AS field2
   values(field3) AS field3
   BY key
| where index_count=2

if instead you have less than 50,000 results, you can use a subsearch:

index=abc 
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
| search [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]
| table field1 field2 field3

If you create a field extraction for field 1, you can also use a simpler search:

index=abc [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]
| table field1 field2 field3

Ciao.

Giuseppe

View solution in original post

gcusello · ‎01-26-2023

Hi @harryhcg,

this is the regex to extract the fields you need:

| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"

that you can test at https://regex101.com/r/EsHXQv/1

About the join, you have two solutions that depends on the number of results of the second search:

if in the second search (index=bcz field5="name") you have more than 50,000 results, you can use the stats command, applying something like this:

index=abc OR (index=bcz field5="name")
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
| eval key=coalesce(field1,field7)
| stats 
   dc(index) AS index_count 
   values(field2) AS field2
   values(field3) AS field3
   BY key
| where index_count=2

if instead you have less than 50,000 results, you can use a subsearch:

index=abc 
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
| search [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]
| table field1 field2 field3

If you create a field extraction for field 1, you can also use a simpler search:

index=abc [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]
| table field1 field2 field3

Ciao.

Giuseppe

gcusello · ‎02-02-2023

Hi @harryhcg,

good for you, see next time!

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the contributors 😉

harryhcg · ‎01-26-2023

@gcusello Thank you for giving time.

Can you help to guide why used search twice? Apologies that not shared earlier but I had similar query with search keyword once and was not giving result.

index=abc | rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"| search [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]| table field1 field2 field3index=abc 
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
| search [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]
| table field1 field2 field3

I am getting the result now but have some issue with count which is not corelating with another query result. Hope not duplicating something. The other query is similar and based on index bcz and bda.

For current query output I used

index=abc
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
| search [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]

| dedup field1, field3
| stats count by field3

@Anonymous field2, 4 and 6 I kept like that and will use on actual result. Trying to fix my errors with expert guidance, get correct logic of my silly mistakes.

gcusello · ‎01-26-2023

Hi @harryhcg,

if you use dedup before a stats command, you'll never have correct results!

Try without dedup.

Ciao.

Giuseppe

yuanliu · ‎01-25-2023

The first thing you need to do is to explain "not working". (Really, try to avoid this meaningless phrase at all cost.) What have you tried? What is the output? What is the desired or "correct" output? How do the two output differ? You mentioned field1, field2, field3, field5, field7. Is there any significance in field4, field6? (Also, even though I can understand the desire to anonymize raw data, spelling a URL as http:\\ is bad practice.)

How to join with rex not working?

join

rex

Tech Talk Recap | Mastering Threat Hunting

Observability for AI Applications: Troubleshooting Latency

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

Are you a member of the Splunk Community?

How to join with rex not working?

join

rex

Tech Talk Recap | Mastering Threat Hunting

Observability for AI Applications: Troubleshooting Latency

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?