I have tried stats dc(sourcetype) as count by commonfield | where count > 1. I assume this search is fetching all records to SH and doing statistics as a results its consuming space in Gigs in search head. its very time consuming. if you have done some thing to two compare two big data sets please guys advice me.
Note: These two data sets don't have a timestamp. The size of the two data sets is 300 million records x 1 field and 20 million records x 3 fields.
Sir, I have been told that we can use coalesce to join two big data sets.
I have seen that you have used coalesce in post like below,
index=abc OR index=def | eval commonfield=coalesce(field1,field4) | makemv commonfield delim="," | mvexpand commonfield | stats list() as * by commonfield | where isnotnull(field4) | eval temp=mvzip(field4,field5,"#") | mvexpand temp | rex field=temp "(?.)#(?.*)" | fields - temp commonfield
but, I am not sure how to use that.
please help me.
field3 and field4 are same. but there is not timestamp in those fields.
First of all, no sir.
You can do something like this (for inner join, data available in both indexes). It would be better if you can also provide your expected output.
index=a OR index=b | fields index field1 field2 field3 field4 | eval commonfield=coalesce(field3,field4) | stats dc(index) as indexes values(field1) as field1 values(field2) as field2 by commonfield | where indexes=2
first of all thanks a lot for your swift response..
but, my concern is its taking more storage space while calculating. I have not used coalesce.. i used stats after renaming the field4 to field3.
this is what i have done.
index=a OR index=b | rename field4 as field3 | stats dc(index) as count values(field1) values(field2) by field3 | where count>1
its killing my storage. is there any difference using rename and coaleasce.
I'm afraid it's because of too many rows. Sometime including a fields or table command to limit the number of fields helps.
index=a OR index=b | table field1 field2 field3 field3 | eval field3=coalesce(field3,field4) | stats dc(index) as count values(field1) values(field2) by field3 | where count>1
fields command is the important one for reducing storage. If you don't put that one in, then your
| stats list(*) is keeping literally EVERY field in both files.
Try this -
index=a OR index=b | fields field1 field2 field3 field4 | rename field4 as field3 | stats dc(index) as indexcount, values(field1) as field1, values(field2) as field2 by field3 | where indexcount>1
Thanks for your answers..
I have tried renaming field and i dont have more than 3 fields in one dataset and one field in another dataset.
I am much interested to know if there is any way to achieve this .easily
Thanks for your answers. fields and coalesce did good job.
@thambisetty - you are welcome! We've moved the comment to answer. Please accept somesoni2's answer so that the question will show as solved.