Splunk Search

How to join with rex not working?

harryhcg
Explorer

I have 2 index, abc and bcz

index abc data is in raw format like below.

<random ip address>|-NA\CAPITA|5xxhxh545|jljdjhsdhj78987|hkjhkdjfkjfd5672v2hg7|87675678vf6x_
<random date time> "GET http:\\at-abc.com http/1.1" 500 <random values>

I want to pull 87675678vf6x_ as field1 at-abc.com as field2 and 500 as field3.

index bcz got formatted data.

I now want to compare both indexes with field 1 of index abc with another field7 in bcz where bcz field5="name"
and return field1 field2 and field3.

It looks simple but not working.

 

Labels (2)
Tags (1)
0 Karma
1 Solution

gcusello
Esteemed Legend

Hi @harryhcg,

this is the regex to extract the fields you need:

| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"

that you can test at https://regex101.com/r/EsHXQv/1

About the join, you have two solutions that depends on the number of results of the second search:

if in the second search (index=bcz field5="name") you have more than 50,000 results, you can use the stats command, applying something like this:

index=abc OR (index=bcz field5="name")
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
| eval key=coalesce(field1,field7)
| stats 
   dc(index) AS index_count 
   values(field2) AS field2
   values(field3) AS field3
   BY key
| where index_count=2

if instead you have less than 50,000 results, you can use a subsearch:

index=abc 
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
| search [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]
| table field1 field2 field3

If you create a field extraction for field 1, you can also use a simpler search:

index=abc [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]
| table field1 field2 field3

Ciao.

Giuseppe

View solution in original post

gcusello
Esteemed Legend

Hi @harryhcg,

this is the regex to extract the fields you need:

| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"

that you can test at https://regex101.com/r/EsHXQv/1

About the join, you have two solutions that depends on the number of results of the second search:

if in the second search (index=bcz field5="name") you have more than 50,000 results, you can use the stats command, applying something like this:

index=abc OR (index=bcz field5="name")
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
| eval key=coalesce(field1,field7)
| stats 
   dc(index) AS index_count 
   values(field2) AS field2
   values(field3) AS field3
   BY key
| where index_count=2

if instead you have less than 50,000 results, you can use a subsearch:

index=abc 
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
| search [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]
| table field1 field2 field3

If you create a field extraction for field 1, you can also use a simpler search:

index=abc [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]
| table field1 field2 field3

Ciao.

Giuseppe

gcusello
Esteemed Legend

Hi @harryhcg,

good for you, see next time!

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the contributors 😉

0 Karma

harryhcg
Explorer

@gcusello Thank you for giving time. 

Can you help to guide why used search twice? Apologies that not shared earlier but I had similar query with search keyword once and was not giving result. 

index=abc | rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"| search [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]| table field1 field2 field3index=abc 
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
| search [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]
| table field1 field2 field3

 

I am getting the result now but have some issue with count which is not corelating with another query result. Hope not duplicating something.  The other query is similar and based on index bcz and bda.

For current query output I used 

index=abc
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
| search [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]

| dedup field1, field3
| stats count by field3

@yuanliuleth  field2, 4 and 6 I kept like that and will use on actual result. Trying to fix my errors with expert guidance, get correct logic of my silly mistakes. 

Tags (3)
0 Karma

gcusello
Esteemed Legend

Hi @harryhcg,

if you use dedup before a stats command, you'll never have correct results!

Try without dedup.

Ciao.

Giuseppe

0 Karma

yuanliu
SplunkTrust
SplunkTrust

The first thing you need to do is to explain "not working". (Really, try to avoid this meaningless phrase at all cost.)  What have you tried?  What is the output?  What is the desired or "correct" output?  How do the two output differ? You mentioned field1, field2, field3, field5, field7.  Is there any significance in field4, field6? (Also, even though I can understand the desire to anonymize raw data, spelling a URL as http:\\ is bad practice.)

0 Karma
Get Updates on the Splunk Community!

Dashboard Studio Challenge - Learn New Tricks, Showcase Your Skills, and Win Prizes!

Reimagine what you can do with your dashboards. Dashboard Studio is Splunk’s newest dashboard builder to ...

Introducing Edge Processor: Next Gen Data Transformation

We get it - not only can it take a lot of time, money and resources to get data into Splunk, but it also takes ...

Take the 2021 Splunk Career Survey for $50 in Amazon Cash

Help us learn about how Splunk has impacted your career by taking the 2021 Splunk Career Survey. Last year’s ...