Hi There,
we have two inputlook kv (File1 and File2) files and I want to compare 3 columns (AvsA, BvsB, CvsC) between each file and display the result where we clouldn't find any value in all the 3 columns( like non of the comparison written any value I mean blank).
Eg. File1 has Column A which contains roll numbers, Column B contains the Name and Column C contains the Registration number. File2 has the similar columns therefore we need to compare the information (AvsA, BvsB, CvsC) and find out where we have blanks (like there were no results for all the comparison AvsA, BvsB and CvsC) and report them in a table unique once.
The query is giving false results ( I mean, the query is reporting the data which is present in File 2)
let me explain the problem one more time.
File 1
Row# | roll numbers | Name | Registration # |
1 | 5 | ||
2 | 7 | Ajay | 999 |
3 | 123 | ||
4 | 10 | ||
5 | Vijay |
File2
Row# | Class | roll numbers | Section | Name | Registration # |
1 | V | 2 | A | Aaron | 565 |
2 | VI | 4 | B | Michel | 321 |
3 | IV | 3 | D | Jeff | 678 |
4 | VIII | 7 | E | Ajay | 999 |
5 | X | 8 | H | Kumar | 767 |
6 | XII | 10 | F | 098 | |
7 | XI | 12 | N | Evan | 345 |
now, I want to compare the following columns:
Roll numbers(File1) against Roll numbers(File2)
Name(File1) against the Name(File2)
Registration #(File1) against the Registration#(File2)
if we find any one of the record File1 in File2 then we will not report that, I shall only report the record which we couldn't find by roll number/name/registration#
Note - We have to compare FILE1 against FILE 2 and report the FILE1 missing records in File2
Desired OUTPUT
Row # | roll numbers | Name | Registration # |
1 | 5 | ||
2 | 123 | ||
3 | Vijay |
Total 3 records not found (present in File1 and not present in File2)
This should be a lesson for anyone asking question in this forum: Explain the raw data, describe important features such as blank or null values, which fields accompany which type of events, etc. We could have saved a ton of time and energy had you
What may seem "obvious" to you cannot be obvious to volunteers who lack intimate knowledge about your data and use case.
If I take data from your last post, A -> Name, B -> Registration #, and C -> roll numbers, I see lots of them have a single blank character (" ") as value in both File 1 and File 2. Do you count them as "found"? Based on your desired output, you seem to deem them "not found" in File 2.
If this interpretation is correct, and if your special repeating values are single blank character " ", the following should give you desired output:
| inputlookup File1
| eval kv = "File1"
| append
[| inputlookup File2
| eval kv = "File2"]
| eventstats dc(kv) as lookupcount values(kv) as lookup by Name
| where lookupcount == 1 OR Name == " "
| eventstats dc(kv) as lookupcount values(kv) as lookup by Registration__
| where lookupcount == 1 OR Registration__ == " "
| eventstats dc(kv) as lookupcount values(kv) as lookup by roll_numbers
| where (lookupcount == 1 OR roll_numbers == " ") AND lookup == "File1"
To help visualize, the following is an emulation of the two inputlookup appends based on sample data you posted:
| makeresults
| eval _raw = "
Row# roll numbers Name Registration #
1 5
2 7 Ajay 999
3 123
4 10
5 Vijay "
| multikv
| rename Name_ as Name
| eval kv = "File1"
| fields - _time _raw linecount
| append
[| makeresults
| eval _raw ="
Row# Class roll numbers Section Name Registration #
1 V 2 A Aaron 565
2 VI 4 B Michel 321
3 IV 3 D Jeff 678
4 VIII 7 E Ajay 999
5 X 8 H Kumar 767
6 XII 10 F 098
7 XI 12 N Evan 345"
| multikv
| fields - _time _raw linecount
| eval kv = "File2"]
``` the above simulates two inputlookups appended together ```
You can play with this emulation and compare with your actual inputlookups. Put this emulation with cascaded eventstats-where from the above code, i.e.,
| eventstats dc(kv) as lookupcount values(kv) as lookup by Name
| where lookupcount == 1 OR Name == " "
| eventstats dc(kv) as lookupcount values(kv) as lookup by Registration__
| where lookupcount == 1 OR Registration__ == " "
| eventstats dc(kv) as lookupcount values(kv) as lookup by roll_numbers
| where (lookupcount == 1 OR roll_numbers == " ") AND lookup == "File1"
I get
Class | Name | Registration__ | Row_ | Section | kv | lookup | lookupcount | roll_numbers |
1 | File1 | File1 | 1 | 5 | ||||
123 | 3 | File1 | File1 | 1 | ||||
Vijay | 5 | File1 | File1 | 1 |
Hope this helps
By "comparison" I assume you mean an equality test, where a false value is considered "blank". If this is correct, distinct_count, or dc, is your answer. This is something you can try
| inputlookup File1 ``` OUTPUT A B C ```
| eval kv = File1
| append
[| inputlookup File2 ``` OUTPUT A B C ```
| eval kv = File2]
| stats dc(kv) as lookupcount values(kv) as lookup by A B C
| where lookupcount = 1
Hope this helps.
Hi There,
Below are the issues I am facing while trying the solution
1. dc(kv) isn't working in lookupfile. Tried multiple ways and looks like it isn't supporting the lookupfile.
2. We need to compare File 1 data(it has 10K records) with File2(it has 30K) records and figure out the data which is present in File1 and not present in File 2 based on lookup by 3 columns (A,B,C) and report the unique only records. We need to report File1 missing data in File2.
My mistake. File1 and File2 needs to be quoted.
| inputlookup File1 ``` OUTPUT A B C ```
| eval kv = "File1"
| append
[| inputlookup File2 ``` OUTPUT A B C ```
| eval kv = "File2"]
| stats dc(kv) as lookupcount values(kv) as lookup by A B C
| where lookupcount = 1 AND kv == "File1"
Hi @yuanliu
when we run the query, the second part of the where condition( kv == "File1") doesn't return any results.
| where lookupcount = 1 AND kv == "File1"
where condition works for lookupcount =1 and shows the File2 records but we need to find the results of File1 (unique values only). Kindly help.
You are correct that inputlookup command doesn't have an "OUTPUT" option; the tree back ticks ("`") opens a comment that is closed by three back ticks. These comments are meant to highlight that I expect both output from File1 and from File2 to contain A, B, and C.
Now, if the condition lookupcount = 1 AND kv == "File1" gives you no results but lookupcount = 1 AND kv == "File2" returns some thing, that logically means that every combination of A, B, and C that appears in File1 is found in File2, whereas there are some unique combinations in File2. One way to examine data quickly is to run
| inputlookup File1 ``` OUTPUT A B C ```
| eval kv = "File1"
| append
[| inputlookup File2 ``` OUTPUT A B C ```
| eval kv = "File2"]
| stats dc(kv) as lookupcount values(kv) as lookup by A B C
| sort lookupcount
You can see which lookup table have count of one for which combinations of A, B, C.
I also want to understand what you mean by ```OUTPUT A B C`` because inputlookup doesn't support OUTPUT.