Splunk Search

How to join with rex not working?

harryhcg
Explorer

I have 2 index, abc and bcz

index abc data is in raw format like below.

<random ip address>|-NA\CAPITA|5xxhxh545|jljdjhsdhj78987|hkjhkdjfkjfd5672v2hg7|87675678vf6x_
<random date time> "GET http:\\at-abc.com http/1.1" 500 <random values>

I want to pull 87675678vf6x_ as field1 at-abc.com as field2 and 500 as field3.

index bcz got formatted data.

I now want to compare both indexes with field 1 of index abc with another field7 in bcz where bcz field5="name"
and return field1 field2 and field3.

It looks simple but not working.

 

Labels (2)
Tags (1)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @harryhcg,

this is the regex to extract the fields you need:

| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"

that you can test at https://regex101.com/r/EsHXQv/1

About the join, you have two solutions that depends on the number of results of the second search:

if in the second search (index=bcz field5="name") you have more than 50,000 results, you can use the stats command, applying something like this:

index=abc OR (index=bcz field5="name")
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
| eval key=coalesce(field1,field7)
| stats 
   dc(index) AS index_count 
   values(field2) AS field2
   values(field3) AS field3
   BY key
| where index_count=2

if instead you have less than 50,000 results, you can use a subsearch:

index=abc 
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
| search [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]
| table field1 field2 field3

If you create a field extraction for field 1, you can also use a simpler search:

index=abc [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]
| table field1 field2 field3

Ciao.

Giuseppe

View solution in original post

gcusello
SplunkTrust
SplunkTrust

Hi @harryhcg,

this is the regex to extract the fields you need:

| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"

that you can test at https://regex101.com/r/EsHXQv/1

About the join, you have two solutions that depends on the number of results of the second search:

if in the second search (index=bcz field5="name") you have more than 50,000 results, you can use the stats command, applying something like this:

index=abc OR (index=bcz field5="name")
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
| eval key=coalesce(field1,field7)
| stats 
   dc(index) AS index_count 
   values(field2) AS field2
   values(field3) AS field3
   BY key
| where index_count=2

if instead you have less than 50,000 results, you can use a subsearch:

index=abc 
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
| search [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]
| table field1 field2 field3

If you create a field extraction for field 1, you can also use a simpler search:

index=abc [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]
| table field1 field2 field3

Ciao.

Giuseppe

gcusello
SplunkTrust
SplunkTrust

Hi @harryhcg,

good for you, see next time!

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the contributors 😉

0 Karma

harryhcg
Explorer

@gcusello Thank you for giving time. 

Can you help to guide why used search twice? Apologies that not shared earlier but I had similar query with search keyword once and was not giving result. 

index=abc | rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"| search [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]| table field1 field2 field3index=abc 
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
| search [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]
| table field1 field2 field3

 

I am getting the result now but have some issue with count which is not corelating with another query result. Hope not duplicating something.  The other query is similar and based on index bcz and bda.

For current query output I used 

index=abc
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
| search [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]

| dedup field1, field3
| stats count by field3

@yuanliuleth  field2, 4 and 6 I kept like that and will use on actual result. Trying to fix my errors with expert guidance, get correct logic of my silly mistakes. 

Tags (3)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @harryhcg,

if you use dedup before a stats command, you'll never have correct results!

Try without dedup.

Ciao.

Giuseppe

0 Karma

yuanliu
SplunkTrust
SplunkTrust

The first thing you need to do is to explain "not working". (Really, try to avoid this meaningless phrase at all cost.)  What have you tried?  What is the output?  What is the desired or "correct" output?  How do the two output differ? You mentioned field1, field2, field3, field5, field7.  Is there any significance in field4, field6? (Also, even though I can understand the desire to anonymize raw data, spelling a URL as http:\\ is bad practice.)

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...