Hello Everyone!
I have a scenario to get a Date column from index1 in search1 and remove the rows with null values in Date column in search2 on index2. I have column id common in both the indexes
Example :
Search1:
index="index1" sourcetype="st1" field1="abc"
|table id1 Date1
Search2:
index="index2" source="xyz"
|?????????
| eval Date2 =Date1 where id2 =id1
|.........
Output:
|table id2 Date2 where Date2 NOT NULL
What's the best way to go about it? Including both indexes at the start of the search is not feasible given the absurd size of the second index.
Can anyone please help me here?
Thank you in advance.
Hi @kiru2992,
if id field is the same in both the searches, you don't need of eval and coalesce, so try something like this:
(index="index1" sourcetype="st1" field1="abc") OR (index="index2" source="xyz")
| stats values(fielda) AS fielda values(fieldb) AS fieldb values(DateField) AS DateField BY id
eventually for DateField you can use earliest or lates instead values.
For your information coalesce takes an arbitrary number of arguments and returns the first value that is not NULL and it's useful when you want to take the values from two or more columns in a new one (for more infos see at https://docs.splunk.com/Documentation/Splunk/8.0.5/SearchReference/CommonEvalFunctions ).
Ciao.
Giuseppe
Hi @kiru2992,
when you say that search2 has an "absurd size", you mean that it's very large, I think.
This means that you cannot put the second search in a subsearch, because there's the limit of 50,000 results in subsearches.
I don't like join because it's very slow and I always prefer to use a main search with both the searches correlating the values with a stats command BY the common key.
In other words something like this:
(index="index1" sourcetype="st1" field1="abc") OR (index="index2" source="xyz")
| eval id=coalesce(id2, id1)
| stats values(Date2) AS Date2 BY id
obviously I cannot test your search, but I'd like to show you the approach to this kind of use cases, Splunk isn't a DB where join is the best approach.
Ciao.
Giuseppe
Hi @gcusello,
Yes, by "absurd" I meant large. Thank you for reply.
I was not sure about how coalesce work, nevertheless I tried you suggestion.
And found few things mentioned below:
- In both the indexes the id is stored as 'id' so I tried 'rename' within parentheses of search and received the below error even when they were balanced. From what I understood, introducing a "|" symbol leads to the below error.Error: Error in 'search' command: Unable to parse the search: unbalanced parentheses.
- id1 is a subset of id2 and index2 has duplicate id's and I require them for my calculation. According to my knowledge using coalesce removes duplicate values.
To make things clear, in index1 I have to extract set of ids(id1) and their respective Date(Date1) and then I have to find all(even duplicates) the ids(id2) in index2 which is equivalent to id1 and map the respective Date.
For example:
Sample Data
index1 : id field1 field2 DateField
100 abc xyz 10-06-2020
101 def www 08-06-2020
102 abc uuu 25-05-2020
index2 : id fielda fieldb
100 open ""
101 closed Solved
100 Progress ""
101 closed Solved
102 Progress ""
100 Closed Invalid
102 open ""
Sample Output:
id fielda fieldb DateField
100 open "" 10-06-2020
101 closed Solved 08-06-2020
100 Progress "" 10-06-2020
101 closed Solved 08-06-2020
102 Progress "" 25-05-2020
101 closed Solved 08-06-2020
100 Closed Invalid 10-06-2020
102 open "" 25-05-2020
I hope this helps to understand the scenario better. To proceed further, please let me know how to use 'rename' within parentheses and how to map date to even duplicate ids.
Any help or lead is much appreciated. Kindly let me know in case any further queries.
Thanks,
Hi @kiru2992,
if id field is the same in both the searches, you don't need of eval and coalesce, so try something like this:
(index="index1" sourcetype="st1" field1="abc") OR (index="index2" source="xyz")
| stats values(fielda) AS fielda values(fieldb) AS fieldb values(DateField) AS DateField BY id
eventually for DateField you can use earliest or lates instead values.
For your information coalesce takes an arbitrary number of arguments and returns the first value that is not NULL and it's useful when you want to take the values from two or more columns in a new one (for more infos see at https://docs.splunk.com/Documentation/Splunk/8.0.5/SearchReference/CommonEvalFunctions ).
Ciao.
Giuseppe
Hi @gcusello,
Thank you for your swift reply. I would have a look at linked page.
When I used your method, I am not able to see duplicate ids..
For Example:
Actual Output:id fielda fieldb DateField
101 closed Solved 08-06-2020
100 Closed Invalid 10-06-2020
102 open "" 25-05-2020
Required Output:
id fielda fieldb DateField
100 open "" 10-06-2020
101 closed Solved 08-06-2020
100 Progress "" 10-06-2020
101 closed Solved 08-06-2020
102 Progress "" 25-05-2020
101 closed Solved 08-06-2020
100 Closed Invalid 10-06-2020
102 open "" 25-05-2020
Can you please let me know how to retain the duplicate ids as I require them to calculate 'fielda' transitions?
Thank you in advance.
hi @kiru2992,
ok, so try this please:
(index="index1" sourcetype="st1" field1="abc") OR (index="index2" source="xyz")
| fillnull fielda value="-"
| fillnull fieldb value="-"
| stats values(DateField) AS DateField BY id fielda fieldb
Ciao.
Giuseppe
hi @gcusello,
I am sorry, still I am not able to get duplicates.
From my understanding, the below line itself removes duplicates. May I please know whether we have a way to retain duplicates here.
(index="index1" sourcetype="st1" field1="abc") OR (index="index2" source="xyz")Thank you in advance.
Hi @kiru2992,
no the main search doesn't remove duplicates, it gives you all the events.
You remove duplicates with the stats command indicating (with BY clause) which are the keys to consider.
Ciao.
Giuseppe
Hi @gcusello,
Yes... You are right.. But for me stats with both id and fields resulted in only distinct values. Luckily, streamstats worked for me.. Thanks a lot for your help:)
You should go with "inner Join" with subsearch...
Something like this
Index=index1 sourcetype="st1" field1="abc"
| join type=inner ID [search index=index2 ... remaining query}
| table your fields
Note : Joining field names should be same, so if you have different names then rename them using "rename" command.
Hope it gives you some clue. Thanks.