I have 2 index, abc and bcz
index abc data is in raw format like below.
<random ip address>|-NA\CAPITA|5xxhxh545|jljdjhsdhj78987|hkjhkdjfkjfd5672v2hg7|87675678vf6x_
<random date time> "GET http:\\at-abc.com http/1.1" 500 <random values>
I want to pull 87675678vf6x_ as field1 at-abc.com as field2 and 500 as field3.
index bcz got formatted data.
I now want to compare both indexes with field 1 of index abc with another field7 in bcz where bcz field5="name"
and return field1 field2 and field3.
It looks simple but not working.
Hi @harryhcg,
this is the regex to extract the fields you need:
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
that you can test at https://regex101.com/r/EsHXQv/1
About the join, you have two solutions that depends on the number of results of the second search:
if in the second search (index=bcz field5="name") you have more than 50,000 results, you can use the stats command, applying something like this:
index=abc OR (index=bcz field5="name")
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
| eval key=coalesce(field1,field7)
| stats
dc(index) AS index_count
values(field2) AS field2
values(field3) AS field3
BY key
| where index_count=2
if instead you have less than 50,000 results, you can use a subsearch:
index=abc
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
| search [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]
| table field1 field2 field3
If you create a field extraction for field 1, you can also use a simpler search:
index=abc [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]
| table field1 field2 field3
Ciao.
Giuseppe
Hi @harryhcg,
this is the regex to extract the fields you need:
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
that you can test at https://regex101.com/r/EsHXQv/1
About the join, you have two solutions that depends on the number of results of the second search:
if in the second search (index=bcz field5="name") you have more than 50,000 results, you can use the stats command, applying something like this:
index=abc OR (index=bcz field5="name")
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
| eval key=coalesce(field1,field7)
| stats
dc(index) AS index_count
values(field2) AS field2
values(field3) AS field3
BY key
| where index_count=2
if instead you have less than 50,000 results, you can use a subsearch:
index=abc
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
| search [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]
| table field1 field2 field3
If you create a field extraction for field 1, you can also use a simpler search:
index=abc [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]
| table field1 field2 field3
Ciao.
Giuseppe
Hi @harryhcg,
good for you, see next time!
Ciao and happy splunking
Giuseppe
P.S.: Karma Points are appreciated by all the contributors 😉
@gcusello Thank you for giving time.
Can you help to guide why used search twice? Apologies that not shared earlier but I had similar query with search keyword once and was not giving result.
index=abc | rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"| search [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]| table field1 field2 field3index=abc | rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)" | search [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ] | table field1 field2 field3
I am getting the result now but have some issue with count which is not corelating with another query result. Hope not duplicating something. The other query is similar and based on index bcz and bda.
For current query output I used
index=abc
| rex "(?ms)^([^\|]+\|){5}(?<field1>\w+).*\"GET\s+(http|https):\\\\(?<field2>[^\/]+)[^\"]+\"\s+(?<field3>\d+)"
| search [ search index=bcz field5="name" | rename field7 AS field1 | fields field1 ]
| dedup field1, field3
| stats count by field3
@yuanliuleth field2, 4 and 6 I kept like that and will use on actual result. Trying to fix my errors with expert guidance, get correct logic of my silly mistakes.
Hi @harryhcg,
if you use dedup before a stats command, you'll never have correct results!
Try without dedup.
Ciao.
Giuseppe
The first thing you need to do is to explain "not working". (Really, try to avoid this meaningless phrase at all cost.) What have you tried? What is the output? What is the desired or "correct" output? How do the two output differ? You mentioned field1, field2, field3, field5, field7. Is there any significance in field4, field6? (Also, even though I can understand the desire to anonymize raw data, spelling a URL as http:\\ is bad practice.)